Add tree label-synchronous beam-search algorithm #129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

larissakl wants to merge 68 commits into master from tree_labelsync_search

Contributor

larissakl commented May 30, 2025

This adds a label-synchronous search algorithm on a search tree built by the AedTreeBuilder (#127).
It is derived from the lexicon-free label-synchronous beam-search algorithm (#126). Similar to the time-synchronous treesearch (#113), global or separate pruning of within-word and word-end hypotheses (within the set of active hypotheses) is possible and a language model score is added a word end.

Simon Berger and others added 30 commits

February 19, 2025 19:10


          Implement simple lexiconfree time-sync beam search

1e7035e


          Add some comments

bf0a8ce


          Add createSearchAlgorithm to Search::Module

d6689b4


          Fix compilation

664945c


          Refactor traceback/lattice building and construct proper (nonlinear) …

488fb0e

…lattice from beam


          Factor out time statistics into new Core::StopWatch class


          Don't copy sibling from predecessor

9a60916


          Better handling of blank index

8e96423


          Apply suggestions from code review

536ac82


          Implement StopWatch class

f21935e


          Use TIMER_START and TIMER_STOP macros instead

5f82460


          Simplify AdvancedTreeSearch PerformanceCounter by inheriting from Sto…

4779dd5

…pWatch


          Small fixes in StopWatch class

f5a3182


          Make StopWatch a member of PerformanceCounter instead of inheriting

97e5bd7


          Implement LatticeTrace class

b77cf23


          Make predecessor and sibling public members

5fcfff7


          Look for initial trace instead of associating empty trace with initia…

…l state


          Remove redundant includes

0b676f9


          Add assertions for assumptions in lattice building

159fbd8


          Merge remote-tracking branch 'origin/lattice_traces' into lexiconfree…

f2f4cf7

…_beam_search


          Remove wrong assertion

0577e79


          Merge remote-tracking branch 'origin/lattice_traces' into lexiconfree…

04b6ac4

…_beam_search


          Remove initial item in performTraceback

b454e39


          Merge remote-tracking branch 'origin/lattice_traces' into lexiconfree…

b3d5f02

…_beam_search


          Fix arc scores

d393c7e


          Merge remote-tracking branch 'origin/lattice_traces' into lexiconfree…

b1ed20e

…_beam_search


          Merge remote-tracking branch 'origin/stopwatch' into lexiconfree_beam…

f112113

…_search


          Update traceback/lattice building logic

d67cf45


          Make elapsed functions const

54535e6


          Merge branch 'stopwatch' into lexiconfree_beam_search

f0832f8

SimBe195 and others added 17 commits

April 9, 2025 16:29


          Add cache-cleanup functionality to LabelScorer

815338d


          Introduce function to get input at correct timestep in BufferedLabelS…

dcf65e5

…corer


          Extract sub-ScoringContexts from CombineScoringContexts

2e5bc56


          Merge branch 'master' into cleanup_inactive_contexts

4dbe0bf


          Formatting and include fixes

b7696e7


          Add handling of otherRootStates in PersistentStateTree and StaticSear…

f22812b

…chAutomaton (#125)


          Add new LexiconfreeLabelsyncBeamSearch search algorithm

c9e6498


          Revert Makefile changes that were produced by configure script

556d5d5


          Merge branch 'master' into cleanup_inactive_contexts

02b7e7a


          Merge branch 'cleanup_inactive_contexts' into lexiconfree_labelsync_s…

…earch


          Improve docstrings

6bf2d0a


          Bugfix: Do pruning on newBeam_ instead of beam_

62c7b8c


          Fix comment string

cfe3624


          Apply suggestions from code review + more verbose step logging

872b0ff


          Add missing include

fad9928


          Fix typo in year

b756a82


          Add TreeLabelsyncBeamSearch

b35b967

larissakl requested review from SimBe195 and curufinwe and removed request for curufinwe

May 30, 2025 14:59

larissakl and others added 7 commits

May 30, 2025 17:03


          Formatting

8712b59


          Merge branch 'master' into lexiconfree_labelsync_search

245755a


          Fix pruning order and logging

311eb8f


          Remove unnecessary include

a71e729


          Some synchronization with updates from master

8fc3d7a


          Merge branch 'lexiconfree_labelsync_search' into tree_labelsync_search

e9286d0

# Conflicts:
#	apptainer/2022-10-21_tensorflow-1.15_arm_v1/makefiles/Modules.make
#	apptainer/2022-10-21_tensorflow-1.15_v1/makefiles/Modules.make
#	apptainer/2023-05-08_tensorflow-2.8_v1/makefiles/Modules.make
#	apptainer/2023-08-09_tensorflow-2.8_onnx-1.15_v1/makefiles/Modules.make
#	apptainer/2023-11-08_tensorflow-2.14_v1/makefiles/Modules.make
#	apptainer/2025-04-23_tensorflow-2.17_onnx-1.20_v1/makefiles/Modules.make
#	src/Search/Makefile
#	src/Search/Module.cc
#	src/Search/Module.hh


          Correct two more Modules.make

hannah220 requested changes

View reviewed changes

src/Search/TreeLabelsyncBeamSearch/TreeLabelsyncBeamSearch.cc

+                        currentToken(extension.nextToken),
+                        currentState(extension.state),
+                        lmHistory(extension.lmHistory),
+                        length(base.length + 1),

Contributor

hannah220 Jan 7, 2026

As in LexiconfreeLabelsyncBeamSearch, shouldn't length be incremented depending on the transitionType?

src/Search/TreeLabelsyncBeamSearch/TreeLabelsyncBeamSearch.cc

+                                  extension.timeframe + 1,
+                                  {extension.score - extension.lmScore, extension.lmScore},
+                                  {}));
+                          break;

Contributor

hannah220 Jan 7, 2026

Maybe it's better without empty lin betwen case

src/Search/TreeLabelsyncBeamSearch/TreeLabelsyncBeamSearch.cc

+                          }
+                          break;
+                      default:

Contributor

hannah220 Jan 7, 2026

What about LABEL_LOOP?
LexiconfreeLabelsyncBeamSearch also contains blank, should this also include it?

src/Search/TreeLabelsyncBeamSearch/TreeLabelsyncBeamSearch.cc

+                      sentenceEndLemma = lexicon_->specialLemma("sentence-boundary");
+                  }
+                  sentenceEndLabelIndex_ = sentenceEndLemma->id();
+                  log() << "Use sentence-end index " << sentenceEndLabelIndex_ << " inferred from lexicon";

Contributor

hannah220 Jan 7, 2026 •

edited

Loading

Also better to have option to read from paramSentenceEndLabelIndex

src/Search/TreeLabelsyncBeamSearch/TreeLabelsyncBeamSearch.cc

+                          Nn::LabelIndex tokenIdx = network_->structure.state(successorState).stateDesc.acousticModel;
+                          auto transitionType = Nn::LabelScorer::TransitionType::LABEL_TO_LABEL;
+                          if (hyp.currentToken == Core::Type<Nn::LabelIndex>::max) {

Contributor

hannah220 Jan 7, 2026

Suggested change

      
                        if (hyp.currentToken == Core::Type<Nn::LabelIndex>::max) {
          
                        if (hyp.currentToken == Nn::invalidLabelIndex) {

src/Search/TreeLabelsyncBeamSearch/TreeLabelsyncBeamSearch.cc

+                      }
+                  };
+                  recombinedHypotheses_.clear();

Contributor

hannah220 Jan 7, 2026

    // Reserve capacity because future reallocations would break the raw pointer we are storing later
    recombinedHypotheses_.reserve(newBeam_.size());

src/Search/TreeLabelsyncBeamSearch/TreeLabelsyncBeamSearch.cc

+                          break;
+                      default:
+                          defect();  // Unexpected transition type which can not be produced by `inferTransitionType`

Contributor

hannah220 Jan 7, 2026

Also, why isn't inferTransitionType function defined in this class?

Base automatically changed from lexiconfree_labelsync_search to master

January 16, 2026 13:01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet