cam.sgnmt.decoding package¶
Submodules¶
cam.sgnmt.decoding.astar module¶
Implementation of the A* search strategy
-
class
cam.sgnmt.decoding.astar.
AstarDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
This decoder implements A*. For heuristics, see the the
decoding.core
module for interfaces and the general handling of heuristics, and thedecoding.heuristics
package for heuristic implementations. This A* implementation does not have a ‘closed set’, i.e. we do not keep track of already visited states. Make sure that your search space is acyclic (normally it is unless you decode on cyclic lattices with the fst predictor.Creates a new A* decoder instance. The following values are fetched from decoder_args:
beam (int): Maximum number of active hypotheses. pure_heuristic_scores (bool): For standard A* set this to
false. If set to true, partial hypo scores are ignored when scoring hypotheses.- early_stopping (bool): If this is true, partial hypotheses
- with score worse than the current best complete scores are not expanded. This applies when nbest is larger than one and inadmissible heuristics are used
- nbest (int): If this is set to a positive value, we do not
- stop decoding at the first complete path, but continue search until we collected this many complete hypothesis. With an admissible heuristic, this will yield an exact n-best list.
Parameters: decoder_args (object) – Decoder configuration passed through from the configuration API.
cam.sgnmt.decoding.beam module¶
Implementation of the beam search strategy
-
class
cam.sgnmt.decoding.beam.
BeamDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
This decoder implements standard beam search and several variants of it such as diversity promoting beam search and beam search with heuristic future cost estimates. This implementation supports risk-free pruning and hypotheses recombination.
Creates a new beam decoder instance. The following values are fetched from decoder_args:
hypo_recombination (bool): Activates hypo recombination beam (int): Absolute beam size. A beam of 12 means
that we keep track of 12 active hypotheses- sub_beam (int): Number of children per hypothesis. Set to
- beam size if zero.
- pure_heuristic_scores (bool): Hypotheses to keep in the beam
- are normally selected according the sum of partial hypo score and future cost estimates. If set to true, partial hypo scores are ignored.
- diversity_factor (float): If this is set to a positive
- value we add diversity promoting penalization terms to the partial hypothesis scores following Li and Jurafsky, 2016
- early_stopping (bool): If true, we stop when the best
- scoring hypothesis ends with </S>. If false, we stop when all hypotheses end with </S>. Enable if you are only interested in the single best decoding result. If you want to create full 12-best lists, disable
Parameters: decoder_args (object) – Decoder configuration passed through from the configuration API.
cam.sgnmt.decoding.bigramgreedy module¶
Implementation of the bigram greedy search strategy
-
class
cam.sgnmt.decoding.bigramgreedy.
BigramGreedyDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
The bigram greedy decoder collects bigram statistics at each node expansions. After each decoding pass, it constructs a new hypothesis to rescore by greedily selecting bigrams and gluing them together. Afterwards, the new hypothesis is rescored and new bigram statistics are collected.
Note that this decoder does not support the
max_length
parameter as it is designed for fixed length decoding problems.Also note that this decoder works only for bag-of-words problems. Do not use the bow predictor in combination with this decoder as it will hide the EOS scores which are important to estimate bigram scores.
Creates a new bigram greedy decoder. Do not use this decoder in combination with the bow predictor as it inherently already satisfies the bag-of-word constrains. The following values are fetched from decoder_args:
- trg_test (string): Path to a plain text file which
- defines the bag of words
- max_node_expansions (int): Maximum number of node expansions
- for inadmissible pruning.
early_stopping (boolean): Activates admissible pruning
Parameters: decoder_args (object) – Decoder configuration passed through from the configuration API.
cam.sgnmt.decoding.bucket module¶
Implementation of the bucket search strategy
-
class
cam.sgnmt.decoding.bucket.
BucketDecoder
(decoder_args, hypo_recombination, max_expansions=0, low_memory_mode=True, beam=1, pure_heuristic_scores=False, diversity_factor=-1.0, early_stopping=True, stochastic=False, bucket_selector='maxscore', bucket_score_strategy='difference', collect_stats_strategy='best')[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
The bucket decoder maintains separate buckets for each sentence length. The buckets contain partial hypotheses. In each iteration, the decoder selects a bucket, and expands the best hypothesis in this bucket by one token. The core of the bucket decoder is the bucket selection strategy. The following strategies are available:
- ‘iter’: Puts all buckets in a big loop and iterates through it.
With this strategy, the number of hypothesis expansions is equally distributed over the buckets
- ‘random’: (with stochastic=true and bucket_selecto!=difference)
Randomly select a non-empty bucket
- ‘difference’: Similar to the heuristic used by the restarting
decoder. Select the bucket in which the difference between best and second best hypothesis is minimal
- ‘maxdiff’: Like ‘iter’, but filters buckets in which the
difference between first and second hypo is larger than epsilon. If no such buckets exist, increase epsilon
Create a new bucket decoder
Parameters: - decoder_args (object) – Decoder configuration passed through from the configuration API.
- hypo_recombination (boolean) – Activates hypothesis recombination. Hypos are tested only within a bucket
- max_expansions (int) – Maximum number of node expansions for inadmissible pruning.
- low_memory_mode (bool) – Switch on low memory mode at cost of some computational overhead. Limits the number of hypotheses in each bucket to the number of remaining node expansions
- beam (int) – Number of hypotheses which get expanded at once after selecting a bucket.
- pure_heuristic_scores (bool) – If false, hypos are scored with partial score plus future cost estimates. If true, only the future cost estimates are used
- diversity_factor (float) – If this is set to a positive value, we reorder hypos in a bucket by adding a term which counts how many hypos with the same parent have been expanded already
- early_stopping (boolean) – Admissible pruning (works only if scores are non-positive)
- stochastic (boolean) – Stochastic bucket selection. If the
bucket selector is not ‘difference’,
this results in random bucket
selection. If
bucket_selector
is set to ‘difference’, buckets are randomly selected with probability according the bucket score - bucket_selector (string) – Bucket selection strategy. ‘iter’, ‘maxscore’. ‘score’. See the class docstring for more info
- bucket_score_strategy (string) – Determines the way the buckets are scored. ‘difference’ between current best word score and best hypo in bucket, ‘absolute’ hypo score, ‘heap’ score of top scoring hypo in bucket , ‘constant’ score of 0.0.
- collect_stats_strategy (string) – best, full, or all. Defines how unigram estimates are collected for heuristic
cam.sgnmt.decoding.combibeam module¶
Implementation of beam search which applies combination_sheme at each time step.
-
class
cam.sgnmt.decoding.combibeam.
CombiBeamDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.beam.BeamDecoder
This beam search implementation is a modification to the hypo expansion strategy. Rather than selecting hypotheses based on the sum of the previous hypo scores and the current one, we apply combination_scheme in each time step. This makes it possible to use schemes like Bayesian combination on the word rather than the full sentence level.
Creates a new beam decoder instance. In addition to the constructor of BeamDecoder, the following values are fetched from decoder_args:
combination_scheme (string): breakdown2score strategy
-
class
cam.sgnmt.decoding.combibeam.
CombiStatePartialHypo
(initial_states=None)[source]¶ Bases:
cam.sgnmt.decoding.core.PartialHypothesis
Identical to PartialHypothesis, but tracks the last-score-but-one for score combination
cam.sgnmt.decoding.combination module¶
This module contains strategies to convert a score breakdown to the total score. This is commonly specified via the –combination_scheme parameter.
- TODO: The breakdown2score interface is not very elegant, and has some
- overlap with the interpolation_strategy implementations.
-
cam.sgnmt.decoding.combination.
breakdown2score_bayesian
(working_score, score_breakdown, full=False, prev_score=None)[source]¶ This realizes score combination following the Bayesian LM interpolation scheme from (Allauzen and Riley, 2011)
Bayesian Language Model Interpolation for Mobile Speech InputBy setting K=T we define the predictor weights according the score the predictors give to the current partial hypothesis. The initial predictor weights are used as priors. TODO could make more efficient use of
working_score
Parameters: - working_score (float) – Working combined score, which is the
weighted sum of the scores in
score_breakdown
. Not used. - score_breakdown (list) – Breakdown of the combined score into predictor scores
- full (bool) –
If True, reevaluate all time steps. If False, assume that this function has been called in the
previous time step.
Returns: float. Bayesian interpolated predictor scores
- working_score (float) – Working combined score, which is the
weighted sum of the scores in
-
cam.sgnmt.decoding.combination.
breakdown2score_bayesian_loglin
(working_score, score_breakdown, full=False, prev_score=None)[source]¶ Like bayesian combination scheme, but uses loglinear model combination rather than linear interpolation weights
TODO: Implement incremental version of it, write weights into breakdowns.
-
cam.sgnmt.decoding.combination.
breakdown2score_bayesian_state_dependent
(working_score, score_breakdown, full=False, prev_score=None, lambdas=None)[source]¶ This realizes score combination following the Bayesian LM interpolation scheme from (Allauzen and Riley, 2011)
Bayesian Language Model Interpolation for Mobile Speech InputBy setting K=T we define the predictor weights according the score the predictors give to the current partial hypothesis. The initial predictor weights are used as priors .
Unlike breakdown2score_bayesian, define state-independent weights which affect how much state-dependent mixture weights (alphas) are affected by scores from the other model.
Makes more efficient use of working_score and calculated priors when used incrementally. :param working_score: Working combined score, which is the
weighted sum of the scores inscore_breakdown
. Not used.Parameters: - score_breakdown (list) – Breakdown of the combined score into predictor scores
- full (bool) –
If True, reevaluate all time steps. If False, assume that this function has been called in the
previous time step. - prev_score – score of hypothesis without final step
- lambdas – np array of domain-task weights
Returns: float. Bayesian interpolated predictor scores
-
cam.sgnmt.decoding.combination.
breakdown2score_length_norm
(working_score, score_breakdown, full=False)[source]¶ Implements the combination scheme ‘length_norm’ by normalizing the sum of the predictor scores by the length of the current sequence (i.e. the length of
score_breakdown
). TODO could make more efficient use ofworking_score
Parameters: - working_score (float) – Working combined score, which is the
weighted sum of the scores in
score_breakdown
. Not used. - score_breakdown (list) – Breakdown of the combined score into predictor scores
- full (bool) –
If True, reevaluate all time steps. If False, assume that this function has been called in the
previous time step (not used).
Returns: float. Returns a length normalized
working_score
- working_score (float) – Working combined score, which is the
weighted sum of the scores in
-
cam.sgnmt.decoding.combination.
breakdown2score_sum
(working_score, score_breakdown, full=False)[source]¶ Implements the combination scheme ‘sum’ by always returning
working_score
.Parameters: - working_score (float) – Working combined score, which is the
weighted sum of the scores in
score_breakdown
- score_breakdown (list) – Breakdown of the combined score into predictor scores (not used).
- full (bool) –
If True, reevaluate all time steps. If False, assume that this function has been called in the
previous time step (not used).
Returns: float. Returns
working_score
- working_score (float) – Working combined score, which is the
weighted sum of the scores in
cam.sgnmt.decoding.core module¶
Contains all the basic interfaces and abstract classes for decoders.
The Decoder
class provides common functionality for all decoders.
The Hypothesis
class represents complete hypotheses, which are
returned by decoders. PartialHypothesis
is a helper class which can
be used by predictors to represent translation prefixes.
-
cam.sgnmt.decoding.core.
CLOSED_VOCAB_SCORE_NORM_EXACT
= 2¶ Exact – Normalize by 1 plus the number of words outside the vocabulary to make it a valid distribution again
-
cam.sgnmt.decoding.core.
CLOSED_VOCAB_SCORE_NORM_NONE
= 1¶ None – Do not apply any normalization.
-
cam.sgnmt.decoding.core.
CLOSED_VOCAB_SCORE_NORM_NON_ZERO
= 5¶ Apply no normalization, but ensure posterior contains only tokens with scores strictly < 0.0.
-
cam.sgnmt.decoding.core.
CLOSED_VOCAB_SCORE_NORM_REDUCED
= 3¶ Reduced – Always normalize the closed vocabulary scores to the vocabulary which is defined by the open vocabulary predictors at each time step.
-
cam.sgnmt.decoding.core.
CLOSED_VOCAB_SCORE_NORM_RESCALE_UNK
= 4¶ Rescale UNK – Divide the UNK scores by the number of words outside the vocabulary. Results in a valid distribution if predictor scores are stochastic.
-
class
cam.sgnmt.decoding.core.
Decoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.utils.Observable
A
Decoder
instance represents a particular search strategy such as A*, beam search, greedy search etc. Decisions are made based on the outputs of one or many predictors, which are maintained by theDecoder
instance.Decoders are observable. They fire notifications after apply_predictors has been called. All heuristics are observing the decoder by default.
Initializes the decoder instance with no predictors or heuristics.
-
add_full_hypo
(hypo)[source]¶ Adds a new full hypothesis to
full_hypos
. This can be used by implementing subclasses to add a new hypothesis to the result set. This method also notifies observers.Parameters: hypo (Hypothesis) – New complete hypothesis
-
add_heuristic
(heuristic)[source]¶ Add a heuristic to the decoder. For future cost estimates, the sum of the estimates from all heuristics added so far will be used. The predictors used in this heuristic have to be set before via
set_heuristic_predictors()
Parameters: heuristic (Heuristic) – A heuristic to use for future cost estimates
-
add_predictor
(name, predictor, weight=1.0)[source]¶ Adds a predictor to the decoder. This means that this predictor is going to be used to predict the next target word (see
predict_next
)Parameters: - name (string) – Predictor name like ‘nmt’ or ‘fst’
- predictor (Predictor) – Predictor instance
- weight (float) – Predictor weight
-
apply_interpolation_strategy
(pred_weights, non_zero_words, posteriors, unk_probs)[source]¶ Applies the interpolation strategies to find the predictor weights for this apply_predictors() call.
Parameters: - pred_weights (list) – a priori predictor weights
- non_zero_words (set) – All words with positive probability
- posteriors – Predictor posterior distributions calculated
with
predict_next()
- unk_probs – UNK probabilities of the predictors, calculated
with
get_unk_probability
Returns: A list of predictor weights.
-
apply_predictors
(top_n=0)[source]¶ Get the distribution over the next word by combining the predictor scores.
Parameters: top_n (int) – If positive, return only the best n words. Returns: Two dicts. combined
maps target word ids to the combined score,score_breakdown
contains the scores for each predictor separately represented as tuples (unweighted_score, predictor_weight)Return type: combined,score_breakdown
-
are_equal_predictor_states
(states1, states2)[source]¶ This method applies
is_equal
on all predictors. It returns true if all predictor states are equal.Parameters: - states1 (list) – First predictor states as returned by
get_predictor_states
- states2 (list) – Second predictor states as returned by
get_predictor_states
Returns: boolean. True if all predictor states are equal, False otherwise
- states1 (list) – First predictor states as returned by
-
static
combi_arithmetic_unnormalized
(x)[source]¶ Calculates the weighted sum (or geometric mean of log values). Do not use with empty lists.
Parameters: x (list) – List of tuples [(out1, weight1), ...] Returns: float. Weighted sum out1*weight1+out2*weight2...
-
decode
(src_sentence)[source]¶ Decodes a single source sentence. This method has to be implemented by subclasses. It contains the core of the implemented search strategy
src_sentence
is a list of source word ids representing the source sentence without <S> or </S> symbols. This method returns a list of hypotheses, order descending by score such that the first entry is the best decoding result. Implementations should delegate the scoring of hypotheses to the predictors viaapply_predictors()
, and organize predictor states with the methodsconsume()
,get_predictor_states()
andset_predictor_states()
. In this way, the decoder is decoupled from the scoring modules.Parameters: src_sentence (list) – List of source word ids without <S> or </S> which make up the source sentence Returns: list. A list of Hypothesis
instances ordered by their score.Raises: NotImplementedError
– if the method is not implemented
-
estimate_future_cost
(hypo)[source]¶ Uses all heuristics which have been added with
add_heuristic
to estimate the future cost for a given partial hypothesis. The estimates are used in heuristic based searches like A*. This function returns the future log cost (i.e. the lower the better), assuming that the last word in the partial hypothesishypo
is consumed next.Parameters: hypo (PartialHypothesis) – Hypothesis for which to estimate the future cost given the current predictor state - Returns
- float. Future cost
-
get_full_hypos_sorted
()[source]¶ Returns
full_hypos
sorted by the total score. Can be used by implementing subclasses as return value ofdecode
Returns: list. full_hypos
sorted bytotal_score
.
-
get_lower_score_bound
()[source]¶ Intended to be called by implementing subclasses. Returns a lower bound on the best score of the current sentence. This is either read from the lower bounds file (if provided) or set to negative infinity.
Returns: float. Lower bound on the best score for current sentence
-
get_max_expansions
(max_expansions_param, src_sentence)[source]¶ This is a helper for decoders which support the
max_node_expansions
parameter. It returns the maximum number of node expansions for the given sentence.Parameters: - max_expansions_param (int) – max_node_expansions parameter passed through from the config
- src_sentence (list) – Current source sentence
Returns: int. Maximum number of node expansions for this decoding task.
-
initialize_predictors
(src_sentence)[source]¶ First, increases the sentence id counter and calls
initialize()
on all predictors. Then,initialize()
is called for all heuristics.Parameters: src_sentence (list) – List of source word ids without <S> or </S> which make up the source sentence
-
set_heuristic_predictors
(heuristic_predictors)[source]¶ Define the list of predictors used by heuristics. This needs to be called before adding heuristics with
add_heuristic()
Parameters: heuristic_predictors (list) – Predictors and their weights to be used with heuristics. Should be in the same form as Decoder.predictors
, i.e. a list of (predictor, weight) tuples
-
-
class
cam.sgnmt.decoding.core.
Heuristic
[source]¶ Bases:
cam.sgnmt.utils.Observer
A
Heuristic
instance can be used to estimate the future costs for a given word in a given state. See theheuristics
module for implementations.Creates a heuristic without predictors.
-
estimate_future_cost
(hypo)[source]¶ Estimate the future cost (i.e. negative score) given the states of the predictors set by
set_predictors
for a partial hypothesishypo
. Note that this function is not supposed to change predictor states. If (e.g. for the greedy heuristic) this is not possible, the predictor states must be changed back after execution by the implementing method.Parameters: hypo (PartialHypo) – Hypothesis for which to estimate the future cost Returns: float. The future cost estimate for this heuristic
-
initialize
(src_sentence)[source]¶ Initialize the heuristic with the given source sentence. This is not passed through to the heuristic predictors automatically but handles initialization outside the predictors.
Parameters: src_sentence (list) – List of source word ids without <S> or </S> which make up the source sentence
-
notify
(message, message_type=1)[source]¶ This is the notification method from the
Observer
super class. We implement it with an empty method here, but implementing sub classes can override this method to get notifications from the decoder instance about generated posterior distributions.Parameters: message (object) – The posterior sent by the decoder
-
-
class
cam.sgnmt.decoding.core.
Hypothesis
(trgt_sentence, total_score, score_breakdown=[])[source]¶ Complete translation hypotheses are represented by an instance of this class. We store the produced sentence, the combined score, and a score breakdown to the separate predictor scores.
Creates a new full hypothesis.
Parameters: - trgt_sentence (list) – List of target word ids without <S> or </S> which make up the target sentence
- total_score (float) – combined total score of this hypo
- score_breakdown (list) – Predictor score breakdown for each
target token in
trgt_sentence
-
class
cam.sgnmt.decoding.core.
PartialHypothesis
(initial_states=None)[source]¶ Bases:
object
Represents a partial hypothesis in various decoders.
Creates a new partial hypothesis with zero score and empty translation prefix.
Parameters: initial_states – Initial predictor states -
cheap_expand
(word, score, score_breakdown)[source]¶ Creates a new partial hypothesis adding a new word to the translation prefix with given probability. Does NOT update the predictor states but adds a flag which signals that the last word in this hypothesis has not been consumed yet by the predictors. This can save memory because we can reuse the current state for many hypothesis. It also saves computation as we do not consume words which are then discarded anyway by the search procedure.
Parameters: - word (int) – New word to add to the translation prefix
- score (float) – Word log probability which is to be added to the total hypothesis score
- score_breakdown (list) – Predictor score breakdown for the new word
-
expand
(word, new_states, score, score_breakdown)[source]¶ Creates a new partial hypothesis adding a new word to the translation prefix with given probability and updates the stored predictor states.
Parameters: - word (int) – New word to add to the translation prefix
- new_states (object) – Predictor states after consuming
word
- score (float) – Word log probability which is to be added to the total hypothesis score
- score_breakdown (list) – Predictor score breakdown for the new word
-
cam.sgnmt.decoding.dfs module¶
Implementation of the dfs search strategy
-
class
cam.sgnmt.decoding.dfs.
DFSDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
This decoder implements depth first search without using heuristics. This is the most efficient search algorithm for complete enumeration of the search space as it minimizes the number of
get_state()
andset_state()
calls. Note that this DFS implementation has no cycle detection, i.e. if the search space has cycles this decoder may run into an infinite loop.Creates new DFS decoder instance. The following values are fetched from decoder_args:
- decoder_args (object): Decoder configuration passed through
- from the configuration API.
- early_stopping (bool): Enable safe (admissible) branch
- pruning if the accumulated score is already worse than the currently best complete score. Do not use if scores can be positive
- max_expansions (int): Maximum number of node expansions for
- inadmissible pruning.
Parameters: decoder_args (object) – Decoder configuration passed through from the configuration API. -
decode
(src_sentence)[source]¶ Decodes a single source sentence using depth first search. If
max_expansions
equals 0, this corresponds to exhaustive search for the globally best scoring hypothesis. Note that withearly_stopping
enabled, the returned set of hypothesis are not necessarily the global n-best hypotheses. To create an exact n-best list, disable bothmax_expansions
andearly_stopping
in the constructor.Parameters: src_sentence (list) – List of source word ids without <S> or </S> which make up the source sentence Returns: list. A list of Hypothesis
instances ordered by their score. Ifmax_expansions
equals 0, the first element holds the global best scoring hypothesis
-
class
cam.sgnmt.decoding.dfs.
SimpleDFSDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
This is a stripped down version of the DFS decoder which is designed to explore the entire search space. SimpleDFS is intended to be used with a score_lower_bounds_file from a previous beam search run which already contains good lower bounds. SimpleDFS verifies whether the lower bound is actually the global best score.
SimpleDFS can only be used with a single predictor.
SimpleDFS does not support max_expansions or max_len_factor. early_stopping cannot be disabled.
Creates new SimpleDFS decoder instance.
Parameters: decoder_args (object) – Decoder configuration passed through from the configuration API.
-
class
cam.sgnmt.decoding.dfs.
SimpleLengthDFSDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
This is a length dependent version of SimpleDFS. This decoder finds the global best scores for certain hypo lengths. The simplelendfs_lower_bounds_file contains lines of the form
<length1>:<lower-bound> ... <lengthN>:<lower-boundN>that specify length dependent score lower bounds. The decoder will search for the optimal model scores for the specified lengths.
SimpleDFS can only be used with a single predictor.
SimpleDFS does not support max_expansions or max_len_factor. early_stopping cannot be disabled.
Creates new SimpleDFS decoder instance.
Parameters: decoder_args (object) – Decoder configuration passed through from the configuration API. -
decode
(src_sentence)[source]¶ Decodes a single source sentence exhaustively using depth first search under length constraints.
Parameters: src_sentence (list) – List of source word ids without <S> or </S> which make up the source sentence Returns: list. A list of Hypothesis
instances ordered by their score.
-
cam.sgnmt.decoding.flip module¶
Implementation of the flip search strategy
-
class
cam.sgnmt.decoding.flip.
FlipCandidate
(trgt_sentence, scores, bigram_scores, max_score)[source]¶ Bases:
object
Helper class for
FlipDecoder
. Represents a full but yet unscored hypothesis which differs from an explored hypo by one flip or move operation.Creates a new candidate hypothesis.
Parameters: - trgt_sentence (list) – Full target sentence
- scores (list) – Word level scores. Same length as
trgt_sentence
- bigram_scores (dict) – Bigram scores collected along the parent hypothesis
- max_score (float) – Maximum possible score this candidate can achieve
-
class
cam.sgnmt.decoding.flip.
FlipDecoder
(decoder_args, always_greedy=False)[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
The flip decoder explores the search space by permutating already explored hypotheses with a single permutation operation. We support two operations: ‘flip’ flips the position of two target tokens. ‘move’ moves one target token to another location in the sentence.
Note that this decoder does not support the
max_length
parameter as it is designed for fixed length decoding problems.Also note that this decoder works only for bag-of-words problems. Do not use the bow predictor in combination with this decoder as it will hide the EOS scores which are important to estimate bigram scores.
Creates a new flip decoder. Do not use this decoder in combination with the bow predictor as it inherently already satisfies the bag-of-word constrains. The following values are fetched from decoder_args:
- trg_test(string): Path to a plain text file which
- defines the bag of words
- max_node_expansions (int): Maximum number of node expansions
- for inadmissible pruning.
early_stopping (boolean): Activates admissible pruning
Parameters: - decoder_args (object) – Decoder configuration passed through from the configuration API.
- always_greedy (boolean) – Per default, the flip decoder does forced decoding along the complete candidate sentence. Set to True to do greedy decoding from the backtraced node instead
cam.sgnmt.decoding.fstbeam module¶
Implementation of a beam search which uses an FST for synchronization.
-
class
cam.sgnmt.decoding.fstbeam.
FSTBeamDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.beam.BeamDecoder
This beam search variant synchronizes hypotheses if they share the same node in an FST. This is similar to the syncbeam strategy, but rather than using a dedicated synchronization symbol, we keep track of the state ID of each hypothesis in an FST. Hypotheses are expanded until all of them arrive at the same state id, and are then compared with each other to select the set of active hypotheses in the next time step.
Creates a new beam decoder instance with FST-based synchronization. In addition to the constructor of BeamDecoder, the following values are fetched from decoder_args:
max_word_len (int): Maximum length of a single word fst_path (string): Path to the FST.
cam.sgnmt.decoding.greedy module¶
Implementation of the greedy search strategy
-
class
cam.sgnmt.decoding.greedy.
GreedyDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
The greedy decoder does not revise decisions and therefore does not have to maintain predictor states. Therefore, this implementation is particularly simple and can be used as template for more complex decoders. The greedy decoder can be imitated with the
BeamDecoder
with beam size 1.Initialize the greedy decoder.
-
decode
(src_sentence)[source]¶ Decode a single source sentence in a greedy way: Always take the highest scoring word as next word and proceed to the next position. This makes it possible to decode without using the predictors
get_state()
andset_state()
methods as we do not have to keep track of predictor states.Parameters: src_sentence (list) – List of source word ids without <S> or </S> which make up the source sentence Returns: list. A list of a single best Hypothesis
instance.
-
cam.sgnmt.decoding.heuristics module¶
Heuristics are used during A* decoding and are called to compose the
estimated look ahead costs. The Heuristic
super class is defined
in the core
module.
-
class
cam.sgnmt.decoding.heuristics.
GreedyHeuristic
(decoder_args, cache_estimates=True)[source]¶ Bases:
cam.sgnmt.decoding.core.Heuristic
This heuristic performs greedy decoding to get future cost estimates. This is expensive but can lead to very close estimates.
Creates a new
GreedyHeuristic
instance. The greedy heuristic performs full greedy decoding from the current state to get accurate cost estimates. However, this can be very expensive.Parameters: - decoder_args (object) – Decoder configuration passed through from the configuration API.
- cache_estimates (bool) – Set to true to enable a cache for predictor states which have been visited during the greedy decoding.
-
class
cam.sgnmt.decoding.heuristics.
LastTokenHeuristic
[source]¶ Bases:
cam.sgnmt.decoding.core.Heuristic
This heuristic reflects the score of the last token in the translation prefix only, ie. not the accumulated score. Using this with pure_heuristic_estimates leads to expanding the partial hypothesis with the end token with the best individual score. This can be useful in search spaces in which bad translation prefixes imply low individual scores later.
Creates a heuristic without predictors.
-
class
cam.sgnmt.decoding.heuristics.
PredictorHeuristic
[source]¶ Bases:
cam.sgnmt.decoding.core.Heuristic
The predictor heuristic relies on the
estimate_future_costs()
implementation of the predictors. Use this heuristic to access predictor specific future cost functions, e.g. shortest path for the fst predictor.Creates a heuristic without predictors.
-
class
cam.sgnmt.decoding.heuristics.
ScorePerWordHeuristic
[source]¶ Bases:
cam.sgnmt.decoding.core.Heuristic
Using this heuristic results in length normalized scores instead of the pure sum of predictor scores for a partial hypothesis. Therefore, it is not a heuristic like in the classical A* sense. Instead, using the A* decoder with this heuristic simulates beam search which always keeps the hypotheses with the best per word scores.
Creates a heuristic without predictors.
-
class
cam.sgnmt.decoding.heuristics.
StatsHeuristic
(heuristic_scores_file='', collect_stats_strategy='best')[source]¶ Bases:
cam.sgnmt.decoding.core.Heuristic
This heuristic is based on the sum of unigram costs of consumed words. Unigram statistics are collected via a
UnigramTable
.Creates a new
StatsHeuristic
instance. The constructor initializes the unigram table.Parameters: - heuristic_scores_file (string) – Path to the unigram scores which are used if this predictor estimates future costs
- collect_stats_strategy (string) – best, full, or all. Defines how unigram estimates are collected for heuristic
-
estimate_future_cost
(hypo)[source]¶ Returns the sum of heuristic unigram estimates of the words in the translation prefix of
hypo
. Combined with the hypo score, this leads to using the ratio between actual hypo score and an idealistic score (product of unigrams) to discriminate partial hypotheses.
cam.sgnmt.decoding.interpolation module¶
This module contains interpolation strategies. This is commonly specified via the –interpolation_strategy parameter.
-
class
cam.sgnmt.decoding.interpolation.
EntropyInterpolationStrategy
(vocab_size, cross_entropy)[source]¶ Bases:
cam.sgnmt.decoding.interpolation.InterpolationStrategy
The entropy interpolation strategy assigns weights to predictors according the entropy of their posteriors to the other posteriors. We first build a n x n square matrix of (cross-)entropies between all predictors, and then weight according the row sums.
We assume that predictor weights are log probabilities.
Constructor.
Parameters: - vocab_size (int) – Vocabulary size
- cross_entropy (bool) – If true, use cross entropy to other predictors. Otherwise, just use predictor distribution entropy.
-
class
cam.sgnmt.decoding.interpolation.
FixedInterpolationStrategy
[source]¶ Bases:
cam.sgnmt.decoding.interpolation.InterpolationStrategy
Null-object (GoF design pattern) implementation.
-
class
cam.sgnmt.decoding.interpolation.
InterpolationStrategy
[source]¶ Bases:
object
Base class for interpolation strategies.
-
find_weights
(pred_weights, non_zero_words, posteriors, unk_probs)[source]¶ Find interpolation weights for the current prediction.
Parameters: - pred_weights (list) – A priori predictor weights
- non_zero_words (set) – All words with positive probability
- posteriors – Predictor posterior distributions calculated
with
predict_next()
- unk_probs – UNK probabilities of the predictors, calculated
with
get_unk_probability
Returns: list of floats. The predictor weights for this prediction.
Raises: NotImplementedError
– if the method is not implemented
-
-
class
cam.sgnmt.decoding.interpolation.
MoEInterpolationStrategy
(num_experts, args)[source]¶ Bases:
cam.sgnmt.decoding.interpolation.InterpolationStrategy
This class implements a predictor-level Mixture of Experts (MoE) model. In this scenario, we have a neural model which predicts predictor weights from the predictor outputs. See the sgnmt_moe project on how to train this gating network with TensorFlow.
Creates the computation graph of the MoE network and loads the checkpoint file. Following fields are fetched from
args
- moe_config: Comma-separated <key>=<value> pairs specifying
- the MoE network. See the command line arguments of sgnmt_moe for a full description. Available keys: vocab_size, embed_size, activation, hidden_layer_size, preprocessing.
moe_checkpoint_dir (string): Checkpoint directory n_cpu_threads (int): Number of CPU threads for TensorFlow
Parameters: - num_experts (int) – Number of predictors under the MoE model
- args (object) – SGNMT configuration object
-
find_weights
(pred_weights, non_zero_words, posteriors, unk_probs)[source]¶ Runs the MoE model to find interpolation weights.
Parameters: - pred_weights (list) – A prior predictor weights
- non_zero_words (set) – All words with positive probability
- posteriors – Predictor posterior distributions calculated
with
predict_next()
- unk_probs – UNK probabilities of the predictors, calculated
with
get_unk_probability
Returns: list of floats. The predictor weights for this prediction.
Raises: NotImplementedError
– if the method is not implemented
cam.sgnmt.decoding.lenbeam module¶
Implementation of the lenbeam search strategy
-
class
cam.sgnmt.decoding.lenbeam.
LengthBeamDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
This beam decoder variant finds hypotheses for all lengths up to the maximum hypo length. At each time step, all EOS extensions are added to the results set.
Creates a new beam decoder instance. The following values are fetched from decoder_args:
- beam (int): Absolute beam size. A beam of 12 means
- that we keep track of 12 active hypotheses
cam.sgnmt.decoding.mbrbeam module¶
This beam search uses an MBR-based criterion at each time step.
-
class
cam.sgnmt.decoding.mbrbeam.
MBRBeamDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.beam.BeamDecoder
The MBR-based beam decoder does not select the n most likely hypotheses in each timestep. Instead, it tries to find the translation with the best expected BLEU. Two strategies control the behavior of mbrbeam: the evidence_strategy and the selection_strategy. Available evidence strategies:
- ‘renorm’: Only makes use of the n-best expansions of the hypos in the
- current beam. It renormalizes the scores, and count ngrams in the n^2 hypos.
- ‘maxent’: Applies the MaxEnt rule to the evidence space. It makes use of
- all partial hypos seen so far and updates its belief about the probability of an ngram based on that. Following MaxEnt we assume that translations outside the explored space are uniformly distributed.
- Available selection strategies:
‘bleu’: Select the n hypotheses with the best expected BLEU ‘oracle_bleu’: Select’a subset with n elements which maximizes the
expected maximum BLEU of one of the selected hypos. In other words, we optimize the oracle BLEU of the n-best list at each time step, where the n-best list consists of the active hypotheses in the beam.
Creates a new MBR beam decoder instance. We explicitly set early stopping to False since risk-free pruning is not supported by the MBR-based beam decoder. The MBR-based decoder fetches the following fields from
decoder_args
:min_ngram_order (int): Minimum n-gram order max_ngram_order (int): Maximum n-gram order mbrbeam_smooth_factor (float): Smoothing factor for evidence space mbrbeam_evidence_strategy (String): Evidence strategy mbrbeam_selection_strategy (String): Selection strategyParameters: decoder_args (object) – Decoder configuration passed through from the configuration API.
cam.sgnmt.decoding.multisegbeam module¶
Implementation of beam search for predictors with multiple tokenizations.
-
class
cam.sgnmt.decoding.multisegbeam.
Continuation
(parent_hypo, pred_stubs, key='')[source]¶ Bases:
object
A continuation is a partial hypothesis plus the next word. A continuation can be incomplete if predictors use finer grained tokenization and the score is not final yet.
Create a new continuation.
Parameters: - parent_hypo (PartialHypothesis) – hypo object encoding the state at the last word boundary
- pred_stubs (list) – List of
PredictorStub
objects, one for each predictor - key (string) – The lead key for this continuation. All stubs must be consistent with this key
-
calculate_score
(pred_weights, defaults=[])[source]¶ Calculates the full word score for this continuation using the predictor stub scores.
Parameters: - pred_weights (list) – Predictor weights. Length of this list must match the number of stubs
- defaults (list) – Score which should be used if a predictor stub is set to None
Returns: float. Full score of this continuation, or an optimistic estimate if the continuation is not complete.
-
generate_expanded_hypo
(decoder)[source]¶ This can be used to create a new
PartialHypothesis
which reflects the state after this continuation. This involves expanding the history byword
, updating score and score_ breakdown, and consuming the last tokens in the stub to save the final predictor states. If the continuation is complete, this will result in a new word level hypothesis. If not, the generated hypo will indicate an incomplete word at the last position by using the word ID -1.
-
class
cam.sgnmt.decoding.multisegbeam.
EOWTokenizer
(path)[source]¶ Bases:
cam.sgnmt.decoding.multisegbeam.Tokenizer
This tokenizer reads word maps with explicit </w> endings. This can be used for subword unit based tokenizers.
-
class
cam.sgnmt.decoding.multisegbeam.
MixedTokenizer
(path)[source]¶ Bases:
cam.sgnmt.decoding.multisegbeam.Tokenizer
This tokenizer allows to mix word- and character-level tokenizations like proposed by Wu et al. (2016). Words with <b>, <m>, and <e> prefixes are treated as character-level tokens, all others are completed word-level tokens
-
class
cam.sgnmt.decoding.multisegbeam.
MultisegBeamDecoder
(decoder_args, hypo_recombination, beam_size, tokenizations, early_stopping=True, max_word_len=25)[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
This is a version of beam search which can handle predictors with differing tokenizations. We assume that all tokenizations are consistent with words, i.e. no token crosses word boundaries. The search simulates beam search on the word level. At each time step, we keep the n best hypotheses on the word level. Predictor scores on finer-grained tokens are collapsed into a single score s.t. they can be combined with scores from other predictors. This decoder can produce words without entry in the word map. In this case, words are added to
io.trg_wmap
. Consider using theoutput_chars
option to avoid dealing with the updated word map in the output.Creates a new beam decoder instance for predictors with multiple tokenizations.
Parameters: - decoder_args (object) – Decoder configuration passed through from the configuration API.
- hypo_recombination (bool) – Activates hypo recombination
- beam_size (int) – Absolute beam size. A beam of 12 means that we keep track of 12 active hypothesis
- tokenizations (string) – Comma separated strings describing the predictor tokenizations
- early_stopping (bool) – If true, we stop when the best scoring hypothesis ends with </S>. If false, we stop when all hypotheses end with </S>. Enable if you are only interested in the single best decoding result. If you want to create full 12-best lists, disable
- max_word_len (int) – Maximum length of a single word
-
class
cam.sgnmt.decoding.multisegbeam.
PredictorStub
(tokens, pred_state)[source]¶ Bases:
object
A predictor stub models the state of a predictor given a continuation.
Creates a new stub for a certain predictor.
Parameters: - tokens (list) – List of token IDs which correspond to the key
- pred_state (object) – Predictor state before consuming
the last token in
tokens
-
expand
(token, token_score, pred_state)[source]¶ Creates a new predictor stub by adding a (scored) token.
Parameters: - token (int) – Token ID to add
- token_score (float) – Token score of the added token
- pred_state (object) – predictor state before consuming the added token
-
class
cam.sgnmt.decoding.multisegbeam.
Tokenizer
[source]¶ Bases:
object
A tokenizer translates between token sequences and string keys. It is mainly responsible for matching token sequences from different predictors together.
-
class
cam.sgnmt.decoding.multisegbeam.
WordMapper
[source]¶ Bases:
object
- This class is responsible for the mapping between keys and word
- IDs. The multiseg beam search can produce words which are not in the original word map. This mapper adds these words to
io.trg_wmap
.This class uses the GoF design pattern singleton.Creates a new mapper instance and synchronizes it with
io.trg_wmap
.-
static
get_singleton
()[source]¶ Get singleton instance of the word mapper. This method implements lazy initialization.
Returns: WordMapper. Singleton WordMapper
instance.
-
get_word_id
(key)[source]¶ Finds a word ID for the given key. If no such key is in the current word map, create a new entry in
io.trg_wmap
.Parameters: key (string) – key to look up Returns: int. Word ID corresponding to key
. Add new word ID if the key cannot be found inio.trg_wmap
-
singleton
= None¶ Singleton instance. Access via
get_singleton()
.
-
class
cam.sgnmt.decoding.multisegbeam.
WordTokenizer
(path)[source]¶ Bases:
cam.sgnmt.decoding.multisegbeam.Tokenizer
This tokenizer implements a purly word-level tokenization. Keys are generated according a standard word map.
-
cam.sgnmt.decoding.multisegbeam.
is_key_complete
(key)[source]¶ Returns true if the key is complete. Complete keys are marked with a blank symbol at the end of the string. A complete key corresponds to a full word, incomplete keys cannot be mapped to word IDs.
Parameters: key (string) – The key Returns: bool. Return true if the last character in key
is blank.
cam.sgnmt.decoding.predlimitbeam module¶
Implementation of beam search with explicit limits on culmulative predictor scores at each node expansion.
-
class
cam.sgnmt.decoding.predlimitbeam.
PredLimitBeamDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.beam.BeamDecoder
Beam search variant with explicit limits on the culmulative predictor scores at each node expansion.
Creates a new beam decoder with culmulative predictor score limits. In addition to the constructor of BeamDecoder, the following values are fetched from decoder_args:
- pred_limits (string): Comma-separated list of predictor
- score limits.
cam.sgnmt.decoding.restarting module¶
Implementation of the restarting search strategy
-
class
cam.sgnmt.decoding.restarting.
RestartingChild
(word, score, score_breakdown)[source]¶ Bases:
object
Helper class for
RestartingDecoder`
representing a child object in the search tree.Creates a new child instance
-
class
cam.sgnmt.decoding.restarting.
RestartingDecoder
(decoder_args, hypo_recombination, max_expansions=0, low_memory_mode=True, node_cost_strategy='difference', stochastic=False, always_single_step=False)[source]¶ Bases:
cam.sgnmt.decoding.core.Decoder
This decoder first creates a path to the final node greedily. Then, it looks for the node on this path with the smallest difference between best and second best child, and restarts greedy decoding from this point. In order to do so, it maintains a priority queue of all visited nodes, which is ordered by the difference between the worst expanded child and the best unexpanded one. If this queue is empty, we have visited the best path. This algorithm is similar to DFS but does not backtrace to the last call of the recursive function but to the one which is most promising.
Note that this algorithm is exact. It tries to exploit the problem characteristics of NMT search: Reloading predictor states can be expensive, node expansion is even more expensive but for free from visited nodes, and there is no good admissible heuristic.
Note2: Does not work properly if predictor scores can be positive because of admissible pruning
Creates new Restarting decoder instance.
Parameters: - decoder_args (object) – Decoder configuration passed through from the configuration API.
- hypo_recombination (bool) – Activates hypo recombination
- max_expansions (int) – Maximum number of node expansions for inadmissible pruning.
- low_memory_mode (bool) – Switch on low memory mode at cost of some computational overhead as the set of open nodes is reduced after each decoding pass
- node_cost_strategy (string) – How to decide which node to restart from next
- stochastic (bool) – If true, select the next node to restart from randomly. If false, take the one with the best node score
- always_single_step (bool) – If false, do greedy decoding when restarting. If true, expand the hypothesis only by a single token
cam.sgnmt.decoding.sepbeam module¶
Implementation of beam search which does not combine all predictor scores but keeps only one predictor alive for each hypo in the beam. Good for approximate and efficient ensembling.
-
class
cam.sgnmt.decoding.sepbeam.
SepBeamDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.beam.BeamDecoder
This beam search implementation breaks with the predictor abstraction via the
Decoder.apply_predictors()
andDecoder.consume()
interfaces. We do not use combined scores of all predictors, but link single predictors to hypotheses in the beam. On hypo expansion, we callpredict_next()
only on this predictor. This is suitable for approximated ensembling as it reduces the runtime nearly to a single system run.Note that
PartialHypothesis.predictor_states
holds a list withNone
objects except for one position.Also note that predictor weights are ignored for this decoding strategy.
Creates a new beam decoder instance for system level combination. See the docstring of the BeamDecoder constructor for a description of which arguments are fetched from decoder_args.
cam.sgnmt.decoding.syncbeam module¶
Implementation of beam search with explicit synchronization symbol
-
class
cam.sgnmt.decoding.syncbeam.
SyncBeamDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.beam.BeamDecoder
This beam search implementation is a two level approach. Hypotheses are not compared after each iteration, but after consuming an explicit synchronization symbol. This is useful when SGNMT runs on the character level, but it makes more sense to compare hypos with same lengths in terms of number of words and not characters. The end-of-word symbol </w> can be used as synchronization symbol.
Creates a new beam decoder instance with explicit synchronization symbol. In addition to the constructor of BeamDecoder, the following values are fetched from decoder_args:
- sync_symb (int): Synchronization symbol. If negative,
- consider a hypothesis as closed when it ends with a terminal symbol (see syntax_min_terminal_id and syntax_max_terminal_id)
max_word_len (int): Maximum length of a single word
cam.sgnmt.decoding.syntaxbeam module¶
The syntax beam secoding strategy ensures diversity in the terminals.
-
class
cam.sgnmt.decoding.syntaxbeam.
SyntaxBeamDecoder
(decoder_args)[source]¶ Bases:
cam.sgnmt.decoding.beam.BeamDecoder
The syntax beam search strategy is an extension of beam search which ensures diversity amongst the terminals in the active hypotheses. The decoder clusters hypotheses by their terminal history. Each cluster cannot have more than beam_size hypos, and the number of clusters is topped by beam_size. This means that the effective beam size varies between beam_size and beam_size^2, and there are always beam_size different terminal histories in the active beam.
Creates a new beam decoder instance for system level combination. In addition to the constructor of BeamDecoder, the following values are fetched from decoder_args:
syntax_min_terminal_id (int): All IDs smaller than this are NTs syntax_max_terminal_id (int): All IDs larger than this are NTs
Module contents¶
This package contains the central interfaces for the decoder (in the
core
module ), and the implementations of search strategies
(Decoder
).