Decoders

Decoders are search strategies for traversing the search space which is spanned by the predictors. Decoders are specified using the --decoder arguments.

Available decoders

  • greedy: Greedy decoding (similar to beam=1)
  • beam: Beam search like in Bahdanau et al, 2015 .
  • sepbeam: Associates predictors with hypos in beam search and applies only one predictor instead of all for hypo expansion.
  • syncbeam: Beam search which compares after consuming a special synchronization symbol instead of after each iteration.
  • syntaxbeam: Beam search which ensures diversity amongst terminal symbol histories.
  • mbrbeam: Diversity encouraging beam search which maximizes the expected BLEU.
  • combibeam: Beam search which applies --combination_scheme at each time step.
  • multisegbeam: Beam search with multiple segmentations.
  • dfs: Depth-first search. This should be used for exact decoding or the complete enumeration of the search space. but it cannot be used if the search space is too large (like for unrestricted NMT) as it performs exhaustive search. If you have not only negative predictor scores, set --early_stopping to false.
  • restarting: Like DFS but with better admissible pruning behavior.
  • astar: A* search. The heuristic function is configured using the --heuristics options.
  • bucket: Works best for bag problems. Maintains buckets for each hypo length and extends a hypo in a bucket by one before selecting the next bucket.
  • flip: This decoder works only for bag problems. It traverses the search space by switching two words in the hypothesis. Do not use bow predictor.
  • bow: Restarting decoder optimized for bag-of-words problems.
  • bigramgreedy: Works best for bag problems. Collects bigram statistics and constructs hypos to score by greedily selecting high scoring bigrams. Do not use bow predictor with this search strategy.
  • vanilla: Original Blocks beam decoder. This bypasses the predictor framework and directly performs pure NMT beam decoding on the GPU. Use this when you do pure NMT decoding as this is usually faster then using a single nmt predictor as the search can be parallelized on the GPU.

Detailed descriptions are available below in the modules.

Relevant modules

cam.sgnmt.decoding.astar module

Implementation of the A* search strategy

class cam.sgnmt.decoding.astar.AstarDecoder(decoder_args)[source]

Bases: cam.sgnmt.decoding.core.Decoder

This decoder implements A*. For heuristics, see the the decoding.core module for interfaces and the general handling of heuristics, and the decoding.heuristics package for heuristic implementations. This A* implementation does not have a ‘closed set’, i.e. we do not keep track of already visited states. Make sure that your search space is acyclic (normally it is unless you decode on cyclic lattices with the fst predictor.

Creates a new A* decoder instance. The following values are fetched from decoder_args:

beam (int): Maximum number of active hypotheses. pure_heuristic_scores (bool): For standard A* set this to

false. If set to true, partial hypo scores are ignored when scoring hypotheses.
early_stopping (bool): If this is true, partial hypotheses
with score worse than the current best complete scores are not expanded. This applies when nbest is larger than one and inadmissible heuristics are used
nbest (int): If this is set to a positive value, we do not
stop decoding at the first complete path, but continue search until we collected this many complete hypothesis. With an admissible heuristic, this will yield an exact n-best list.
Parameters:decoder_args (object) – Decoder configuration passed through from the configuration API.
decode(src_sentence)[source]

Decodes a single source sentence using A* search.

cam.sgnmt.decoding.beam module

Implementation of the beam search strategy

class cam.sgnmt.decoding.beam.BeamDecoder(decoder_args)[source]

Bases: cam.sgnmt.decoding.core.Decoder

This decoder implements standard beam search and several variants of it such as diversity promoting beam search and beam search with heuristic future cost estimates. This implementation supports risk-free pruning and hypotheses recombination.

Creates a new beam decoder instance. The following values are fetched from decoder_args:

hypo_recombination (bool): Activates hypo recombination beam (int): Absolute beam size. A beam of 12 means

that we keep track of 12 active hypotheses
sub_beam (int): Number of children per hypothesis. Set to
beam size if zero.
pure_heuristic_scores (bool): Hypotheses to keep in the beam
are normally selected according the sum of partial hypo score and future cost estimates. If set to true, partial hypo scores are ignored.
diversity_factor (float): If this is set to a positive
value we add diversity promoting penalization terms to the partial hypothesis scores following Li and Jurafsky, 2016
early_stopping (bool): If true, we stop when the best
scoring hypothesis ends with </S>. If false, we stop when all hypotheses end with </S>. Enable if you are only interested in the single best decoding result. If you want to create full 12-best lists, disable
Parameters:decoder_args (object) – Decoder configuration passed through from the configuration API.
decode(src_sentence)[source]

Decodes a single source sentence using beam search.

cam.sgnmt.decoding.bigramgreedy module

Implementation of the bigram greedy search strategy

class cam.sgnmt.decoding.bigramgreedy.BigramGreedyDecoder(decoder_args)[source]

Bases: cam.sgnmt.decoding.core.Decoder

The bigram greedy decoder collects bigram statistics at each node expansions. After each decoding pass, it constructs a new hypothesis to rescore by greedily selecting bigrams and gluing them together. Afterwards, the new hypothesis is rescored and new bigram statistics are collected.

Note that this decoder does not support the max_length parameter as it is designed for fixed length decoding problems.

Also note that this decoder works only for bag-of-words problems. Do not use the bow predictor in combination with this decoder as it will hide the EOS scores which are important to estimate bigram scores.

Creates a new bigram greedy decoder. Do not use this decoder in combination with the bow predictor as it inherently already satisfies the bag-of-word constrains. The following values are fetched from decoder_args:

trg_test (string): Path to a plain text file which
defines the bag of words
max_node_expansions (int): Maximum number of node expansions
for inadmissible pruning.

early_stopping (boolean): Activates admissible pruning

Parameters:decoder_args (object) – Decoder configuration passed through from the configuration API.
decode(src_sentence)[source]

Decodes a single source sentence with the flip decoder

cam.sgnmt.decoding.bow module

Implementation of the bow search strategy

class cam.sgnmt.decoding.bow.BOWDecoder(decoder_args)[source]

Bases: cam.sgnmt.decoding.core.Decoder

This decoder is designed to work well for bag-of-words decoding experiments. It is similar to the restarting decoder as it back- traces to a node in the search tree and does greedy decoding from that point. However, the strategy to select the node from which to restart differs: we select the node with the highest expected total score, which is estimated by taking the score of a full hypothesis with the same prefix plus the score in the posterior in the corresponding decoding run of the first deviating token. This heuristic assumes that the score of the full hypothesis will be the same as the previous one, just using the better score at the deviating node in place. This assumption is more likely to hold in fixed length decoding problems, e.g. bag-of-words tasks.

Note that this decoder does not support the max_length parameter as it is designed for fixed length decoding problems.

Also note that even if this decoder is designed for bag problems, it can be used even in case of variable length hypotheses. Therefore, you still need to use the bow predictor.

Creates a new bag decoder. The following values are fetched from decoder_args:

hypo_recombination (bool): Activates hypo recombination max_node_expansions (int): Maximum number of node expansions

for inadmissible pruning.
stochastic_decoder (bool): If true, select the next node to
restart from randomly. If false, take the one with the best node score
early_stopping (bool): Activates inadmissible pruning. Do
not use if you have positive scores
decode_always_single_step (bool): If true, we do only a single
expansion from the backtraced node and select a new node. If false, we perform greedy decoding from that node until we reach a final node
Parameters:decoder_args (object) – Decoder configuration passed through from the configuration API.
create_initial_node()[source]

Create the root node for the search tree.

decode(src_sentence)[source]

Decodes a single source sentence using BOW search.

greedy_decode(node, word, single_step)[source]

Helper function for greedy decoding from a certain point in the search tree.

class cam.sgnmt.decoding.bow.BOWNode(hypo, posterior, score_breakdown, prev_nodes)[source]

Bases: object

Helper class for BOWDecoder` representing a child object in the search tree.

hypo

PartialHypothesis – Hypothesis corresponding to this node

posterior

dict – Scores on outgoing arcs

score_breakdown

dict – Score breakdowns for outgoing arcs

prev_nodes

list – Path from the root to this node

active_arcs

dict – Dictionary of unexplored outgoing arcs

Creates a new search tree node. All outgoing arcs are set to active by default.

Parameters:
  • hypo (PartialHypothesis) – Hypothesis corresponding to this node
  • posterior (dict) – Scores on outgoing arcs
  • score_breakdown (dict) – Score breakdowns for outgoing arcs
  • prev_nodes (list) – Path from the root to this node

cam.sgnmt.decoding.bucket module

Implementation of the bucket search strategy

class cam.sgnmt.decoding.bucket.BucketDecoder(decoder_args, hypo_recombination, max_expansions=0, low_memory_mode=True, beam=1, pure_heuristic_scores=False, diversity_factor=-1.0, early_stopping=True, stochastic=False, bucket_selector='maxscore', bucket_score_strategy='difference', collect_stats_strategy='best')[source]

Bases: cam.sgnmt.decoding.core.Decoder

The bucket decoder maintains separate buckets for each sentence length. The buckets contain partial hypotheses. In each iteration, the decoder selects a bucket, and expands the best hypothesis in this bucket by one token. The core of the bucket decoder is the bucket selection strategy. The following strategies are available:

  • ‘iter’: Puts all buckets in a big loop and iterates through it.

    With this strategy, the number of hypothesis expansions is equally distributed over the buckets

  • ‘random’: (with stochastic=true and bucket_selecto!=difference)

    Randomly select a non-empty bucket

  • ‘difference’: Similar to the heuristic used by the restarting

    decoder. Select the bucket in which the difference between best and second best hypothesis is minimal

  • ‘maxdiff’: Like ‘iter’, but filters buckets in which the

    difference between first and second hypo is larger than epsilon. If no such buckets exist, increase epsilon

Create a new bucket decoder

Parameters:
  • decoder_args (object) – Decoder configuration passed through from the configuration API.
  • hypo_recombination (boolean) – Activates hypothesis recombination. Hypos are tested only within a bucket
  • max_expansions (int) – Maximum number of node expansions for inadmissible pruning.
  • low_memory_mode (bool) – Switch on low memory mode at cost of some computational overhead. Limits the number of hypotheses in each bucket to the number of remaining node expansions
  • beam (int) – Number of hypotheses which get expanded at once after selecting a bucket.
  • pure_heuristic_scores (bool) – If false, hypos are scored with partial score plus future cost estimates. If true, only the future cost estimates are used
  • diversity_factor (float) – If this is set to a positive value, we reorder hypos in a bucket by adding a term which counts how many hypos with the same parent have been expanded already
  • early_stopping (boolean) – Admissible pruning (works only if scores are non-positive)
  • stochastic (boolean) – Stochastic bucket selection. If the bucket selector is not ‘difference’, this results in random bucket selection. If bucket_selector is set to ‘difference’, buckets are randomly selected with probability according the bucket score
  • bucket_selector (string) – Bucket selection strategy. ‘iter’, ‘maxscore’. ‘score’. See the class docstring for more info
  • bucket_score_strategy (string) – Determines the way the buckets are scored. ‘difference’ between current best word score and best hypo in bucket, ‘absolute’ hypo score, ‘heap’ score of top scoring hypo in bucket , ‘constant’ score of 0.0.
  • collect_stats_strategy (string) – best, full, or all. Defines how unigram estimates are collected for heuristic
decode(src_sentence)[source]

Decodes a single source sentence.

cam.sgnmt.decoding.combibeam module

Implementation of beam search which applies combination_sheme at each time step.

class cam.sgnmt.decoding.combibeam.CombiBeamDecoder(decoder_args)[source]

Bases: cam.sgnmt.decoding.beam.BeamDecoder

This beam search implementation is a modification to the hypo expansion strategy. Rather than selecting hypotheses based on the sum of the previous hypo scores and the current one, we apply combination_scheme in each time step. This makes it possible to use schemes like Bayesian combination on the word rather than the full sentence level.

Creates a new beam decoder instance. In addition to the constructor of BeamDecoder, the following values are fetched from decoder_args:

combination_scheme (string): breakdown2score strategy

cam.sgnmt.decoding.core module

Contains all the basic interfaces and abstract classes for decoders. The Decoder class provides common functionality for all decoders. The Hypothesis class represents complete hypotheses, which are returned by decoders. PartialHypothesis is a helper class which can be used by predictors to represent translation prefixes.

cam.sgnmt.decoding.core.CLOSED_VOCAB_SCORE_NORM_EXACT = 2

Exact – Normalize by 1 plus the number of words outside the vocabulary to make it a valid distribution again

cam.sgnmt.decoding.core.CLOSED_VOCAB_SCORE_NORM_NONE = 1

None – Do not apply any normalization.

cam.sgnmt.decoding.core.CLOSED_VOCAB_SCORE_NORM_NON_ZERO = 5

Apply no normalization, but ensure posterior contains only tokens with scores strictly < 0.0.

cam.sgnmt.decoding.core.CLOSED_VOCAB_SCORE_NORM_REDUCED = 3

Reduced – Always normalize the closed vocabulary scores to the vocabulary which is defined by the open vocabulary predictors at each time step.

cam.sgnmt.decoding.core.CLOSED_VOCAB_SCORE_NORM_RESCALE_UNK = 4

Rescale UNK – Divide the UNK scores by the number of words outside the vocabulary. Results in a valid distribution if predictor scores are stochastic.

class cam.sgnmt.decoding.core.Decoder(decoder_args)[source]

Bases: cam.sgnmt.utils.Observable

A Decoder instance represents a particular search strategy such as A*, beam search, greedy search etc. Decisions are made based on the outputs of one or many predictors, which are maintained by the Decoder instance.

Decoders are observable. They fire notifications after apply_predictors has been called. All heuristics are observing the decoder by default.

Initializes the decoder instance with no predictors or heuristics.

Parameters:
  • closed_vocabulary_normalization (string) – Defines the normalization behavior for closed vocabulary predictor scores. See the documentation to the CLOSED_VOCAB_SCORE_NORM_* variables for more information
  • max_len_factor (int) – Hypotheses are not longer than source sentence length times this. Needs to be supported by the search strategy implementation
  • lower_bounds_file (string) – Path to a file with lower bounds on hypothesis scores. If empty, all lower bounds are set to NEG_INF.
add_full_hypo(hypo)[source]

Adds a new full hypothesis to full_hypos. This can be used by implementing subclasses to add a new hypothesis to the result set. This method also notifies observers.

Parameters:hypo (Hypothesis) – New complete hypothesis
add_heuristic(heuristic)[source]

Add a heuristic to the decoder. For future cost estimates, the sum of the estimates from all heuristics added so far will be used. The predictors used in this heuristic have to be set before via set_heuristic_predictors()

Parameters:heuristic (Heuristic) – A heuristic to use for future cost estimates
add_predictor(name, predictor, weight=1.0)[source]

Adds a predictor to the decoder. This means that this predictor is going to be used to predict the next target word (see predict_next)

Parameters:
  • name (string) – Predictor name like ‘nmt’ or ‘fst’
  • predictor (Predictor) – Predictor instance
  • weight (float) – Predictor weight
apply_interpolation_strategy(pred_weights, non_zero_words, posteriors, unk_probs)[source]

Applies the interpolation strategies to find the predictor weights for this apply_predictors() call.

Parameters:
  • pred_weights (list) – a prior predictor weights
  • non_zero_words (set) – All words with positive probability
  • posteriors – Predictor posterior distributions calculated with predict_next()
  • unk_probs – UNK probabilities of the predictors, calculated with get_unk_probability
Returns:

A list of predictor weights.

apply_predictors(top_n=0)[source]

Get the distribution over the next word by combining the predictor scores.

Parameters:top_n (int) – If positive, return only the best n words.
Returns:Two dicts. combined maps target word ids to the combined score, score_breakdown contains the scores for each predictor separately represented as tuples (unweighted_score, predictor_weight)
Return type:combined,score_breakdown
are_equal_predictor_states(states1, states2)[source]

This method applies is_equal on all predictors. It returns true if all predictor states are equal.

Parameters:
  • states1 (list) – First predictor states as returned by get_predictor_states
  • states2 (list) – Second predictor states as returned by get_predictor_states
Returns:

boolean. True if all predictor states are equal, False otherwise

static combi_arithmetic_unnormalized(x)[source]

Calculates the weighted sum (or geometric mean of log values). Do not use with empty lists.

Parameters:x (list) – List of tuples [(out1, weight1), ...]
Returns:float. Weighted sum out1*weight1+out2*weight2...
static combi_geometric_unnormalized(x)[source]

Calculates the weighted geometric mean. Do not use empty lists.

Parameters:x (list) – List of tuples [(out1, weight1), ...]
Returns:out1^weight1*out2^weight2...
Return type:float. Weighted geo. mean
consume(word)[source]

Calls consume() on all predictors.

decode(src_sentence)[source]

Decodes a single source sentence. This method has to be implemented by subclasses. It contains the core of the implemented search strategy src_sentence is a list of source word ids representing the source sentence without <S> or </S> symbols. This method returns a list of hypotheses, order descending by score such that the first entry is the best decoding result. Implementations should delegate the scoring of hypotheses to the predictors via apply_predictors(), and organize predictor states with the methods consume(), get_predictor_states() and set_predictor_states(). In this way, the decoder is decoupled from the scoring modules.

Parameters:src_sentence (list) – List of source word ids without <S> or </S> which make up the source sentence
Returns:list. A list of Hypothesis instances ordered by their score.
Raises:NotImplementedError – if the method is not implemented
estimate_future_cost(hypo)[source]

Uses all heuristics which have been added with add_heuristic to estimate the future cost for a given partial hypothesis. The estimates are used in heuristic based searches like A*. This function returns the future log cost (i.e. the lower the better), assuming that the last word in the partial hypothesis hypo is consumed next.

Parameters:hypo (PartialHypothesis) – Hypothesis for which to estimate the future cost given the current predictor state
Returns
float. Future cost
get_full_hypos_sorted()[source]

Returns full_hypos sorted by the total score. Can be used by implementing subclasses as return value of decode

Returns:list. full_hypos sorted by total_score.
get_lower_score_bound()[source]

Intended to be called by implementing subclasses. Returns a lower bound on the best score of the current sentence. This is either read from the lower bounds file (if provided) or set to negative infinity.

Returns:float. Lower bound on the best score for current sentence
get_max_expansions(max_expansions_param, src_sentence)[source]

This is a helper for decoders which support the max_node_expansions parameter. It returns the maximum number of node expansions for the given sentence.

Parameters:
  • max_expansions_param (int) – max_node_expansions parameter passed through from the config
  • src_sentence (list) – Current source sentence
Returns:

int. Maximum number of node expansions for this decoding task.

get_predictor_states()[source]

Calls get_state() on all predictors.

has_predictors()[source]

Returns true if predictors have been added to the decoder.

initialize_predictors(src_sentence)[source]

First, increases the sentence id counter and calls initialize() on all predictors. Then, initialize() is called for all heuristics.

Parameters:src_sentence (list) – List of source word ids without <S> or </S> which make up the source sentence
remove_predictors()[source]

Removes all predictors of this decoder.

set_current_sen_id(sen_id)[source]
set_heuristic_predictors(heuristic_predictors)[source]

Define the list of predictors used by heuristics. This needs to be called before adding heuristics with add_heuristic()

Parameters:heuristic_predictors (list) – Predictors and their weights to be used with heuristics. Should be in the same form as Decoder.predictors, i.e. a list of (predictor, weight) tuples
set_predictor_combi_method(method)[source]

Defines how to accumulate scores over the sequence. Should be one of the combi_ methods defined below

Parameters:method (function) – A function which accepts a list of tuples [(out1, weight1), ...] and calculates a combined score, e.g. one of the combi_* methods
set_predictor_states(states)[source]

Calls set_state() on all predictors.

class cam.sgnmt.decoding.core.Heuristic[source]

Bases: cam.sgnmt.utils.Observer

A Heuristic instance can be used to estimate the future costs for a given word in a given state. See the heuristics module for implementations.

Creates a heuristic without predictors.

estimate_future_cost(hypo)[source]

Estimate the future cost (i.e. negative score) given the states of the predictors set by set_predictors for a partial hypothesis hypo. Note that this function is not supposed to change predictor states. If (e.g. for the greedy heuristic) this is not possible, the predictor states must be changed back after execution by the implementing method.

Parameters:hypo (PartialHypo) – Hypothesis for which to estimate the future cost
Returns:float. The future cost estimate for this heuristic
initialize(src_sentence)[source]

Initialize the heuristic with the given source sentence. This is not passed through to the heuristic predictors automatically but handles initialization outside the predictors.

Parameters:src_sentence (list) – List of source word ids without <S> or </S> which make up the source sentence
notify(message, message_type=1)[source]

This is the notification method from the Observer super class. We implement it with an empty method here, but implementing sub classes can override this method to get notifications from the decoder instance about generated posterior distributions.

Parameters:message (object) – The posterior sent by the decoder
set_predictors(predictors)[source]

Set the predictors used by this heuristic.

Parameters:predictors (list) – Predictors and their weights to be used with this heuristic. Should be in the same form as Decoder.predictors, i.e. a list of (predictor, weight) tuples
class cam.sgnmt.decoding.core.Hypothesis(trgt_sentence, total_score, score_breakdown=[])[source]

Complete translation hypotheses are represented by an instance of this class. We store the produced sentence, the combined score, and a score breakdown to the separate predictor scores.

Creates a new full hypothesis.

Parameters:
  • trgt_sentence (list) – List of target word ids without <S> or </S> which make up the target sentence
  • total_score (float) – combined total score of this hypo
  • score_breakdown (list) – Predictor score breakdown for each target token in trgt_sentence
convert_to_char_level(cmap)[source]

Creates a new hypothesis on the character level from a hypothesis on the word level. Both objects will have the same total score, but the word tokens in trgt_sentence are replaced by characters and score_breakdown adds word scores on the first character of the word. The mapping from word ID to character ID sequence is realized by using utils.trg_wmap and the char- to-id map cmap.

Parameters:cmap (dict) – Mapping from character to character ID
Returns:Hypothesis. New hypo which corresponds to this hypo but is tokenized on the character instead of the word level.
class cam.sgnmt.decoding.core.PartialHypothesis(initial_states=None)[source]

Represents a partial hypothesis in various decoders.

Creates a new partial hypothesis with zero score and empty translation prefix.

Parameters:initial_states – Initial predictor states
cheap_expand(word, score, score_breakdown)[source]

Creates a new partial hypothesis adding a new word to the translation prefix with given probability. Does NOT update the predictor states but adds a flag which signals that the last word in this hypothesis has not been consumed yet by the predictors. This can save memory because we can reuse the current state for many hypothesis. It also saves computation as we do not consume words which are then discarded anyway by the search procedure.

Parameters:
  • word (int) – New word to add to the translation prefix
  • score (float) – Word log probability which is to be added to the total hypothesis score
  • score_breakdown (list) – Predictor score breakdown for the new word
expand(word, new_states, score, score_breakdown)[source]

Creates a new partial hypothesis adding a new word to the translation prefix with given probability and updates the stored predictor states.

Parameters:
  • word (int) – New word to add to the translation prefix
  • new_states (object) – Predictor states after consuming word
  • score (float) – Word log probability which is to be added to the total hypothesis score
  • score_breakdown (list) – Predictor score breakdown for the new word
generate_full_hypothesis()[source]

Create a Hypothesis instance from this hypothesis.

get_last_word()[source]

Get the last word in the translation prefix.

cam.sgnmt.decoding.decoder module

cam.sgnmt.decoding.dfs module

Implementation of the dfs search strategy

class cam.sgnmt.decoding.dfs.DFSDecoder(decoder_args)[source]

Bases: cam.sgnmt.decoding.core.Decoder

This decoder implements depth first search without using heuristics. This is the most efficient search algorithm for complete enumeration of the search space as it minimizes the number of get_state() and set_state() calls. Note that this DFS implementation has no cycle detection, i.e. if the search space has cycles this decoder may run into an infinite loop.

Creates new DFS decoder instance. The following values are fetched from decoder_args:

decoder_args (object): Decoder configuration passed through
from the configuration API.
early_stopping (bool): Enable safe (admissible) branch
pruning if the accumulated score is already worse than the currently best complete score. Do not use if scores can be positive
max_expansions (int): Maximum number of node expansions for
inadmissible pruning.
Parameters:decoder_args (object) – Decoder configuration passed through from the configuration API.
decode(src_sentence)[source]

Decodes a single source sentence using depth first search. If max_expansions equals 0, this corresponds to exhaustive search for the globally best scoring hypothesis. Note that with early_stopping enabled, the returned set of hypothesis are not necessarily the global n-best hypotheses. To create an exact n-best list, disable both max_expansions and early_stopping in the constructor.

Parameters:src_sentence (list) – List of source word ids without <S> or </S> which make up the source sentence
Returns:list. A list of Hypothesis instances ordered by their score. If max_expansions equals 0, the first element holds the global best scoring hypothesis

cam.sgnmt.decoding.flip module

Implementation of the flip search strategy

class cam.sgnmt.decoding.flip.FlipCandidate(trgt_sentence, scores, bigram_scores, max_score)[source]

Bases: object

Helper class for FlipDecoder. Represents a full but yet unscored hypothesis which differs from an explored hypo by one flip or move operation.

Creates a new candidate hypothesis.

Parameters:
  • trgt_sentence (list) – Full target sentence
  • scores (list) – Word level scores. Same length as trgt_sentence
  • bigram_scores (dict) – Bigram scores collected along the parent hypothesis
  • max_score (float) – Maximum possible score this candidate can achieve
class cam.sgnmt.decoding.flip.FlipDecoder(decoder_args, always_greedy=False)[source]

Bases: cam.sgnmt.decoding.core.Decoder

The flip decoder explores the search space by permutating already explored hypotheses with a single permutation operation. We support two operations: ‘flip’ flips the position of two target tokens. ‘move’ moves one target token to another location in the sentence.

Note that this decoder does not support the max_length parameter as it is designed for fixed length decoding problems.

Also note that this decoder works only for bag-of-words problems. Do not use the bow predictor in combination with this decoder as it will hide the EOS scores which are important to estimate bigram scores.

Creates a new flip decoder. Do not use this decoder in combination with the bow predictor as it inherently already satisfies the bag-of-word constrains. The following values are fetched from decoder_args:

trg_test(string): Path to a plain text file which
defines the bag of words
max_node_expansions (int): Maximum number of node expansions
for inadmissible pruning.

early_stopping (boolean): Activates admissible pruning

Parameters:
  • decoder_args (object) – Decoder configuration passed through from the configuration API.
  • always_greedy (boolean) – Per default, the flip decoder does forced decoding along the complete candidate sentence. Set to True to do greedy decoding from the backtraced node instead
decode(src_sentence)[source]

Decodes a single source sentence with the flip decoder

cam.sgnmt.decoding.greedy module

Implementation of the greedy search strategy

class cam.sgnmt.decoding.greedy.GreedyDecoder(decoder_args)[source]

Bases: cam.sgnmt.decoding.core.Decoder

The greedy decoder does not revise decisions and therefore does not have to maintain predictor states. Therefore, this implementation is particularly simple and can be used as template for more complex decoders. The greedy decoder can be imitated with the BeamDecoder with beam size 1.

Initialize the greedy decoder.

decode(src_sentence)[source]

Decode a single source sentence in a greedy way: Always take the highest scoring word as next word and proceed to the next position. This makes it possible to decode without using the predictors get_state() and set_state() methods as we do not have to keep track of predictor states.

Parameters:src_sentence (list) – List of source word ids without <S> or </S> which make up the source sentence
Returns:list. A list of a single best Hypothesis instance.

cam.sgnmt.decoding.heuristics module

Heuristics are used during A* decoding and are called to compose the estimated look ahead costs. The Heuristic super class is defined in the core module.

class cam.sgnmt.decoding.heuristics.GreedyHeuristic(decoder_args, cache_estimates=True)[source]

Bases: cam.sgnmt.decoding.core.Heuristic

This heuristic performs greedy decoding to get future cost estimates. This is expensive but can lead to very close estimates.

Creates a new GreedyHeuristic instance. The greedy heuristic performs full greedy decoding from the current state to get accurate cost estimates. However, this can be very expensive.

Parameters:
  • decoder_args (object) – Decoder configuration passed through from the configuration API.
  • cache_estimates (bool) – Set to true to enable a cache for predictor states which have been visited during the greedy decoding.
estimate_future_cost(hypo)[source]

Estimate the future cost by full greedy decoding. If self.cache_estimates is enabled, check cache first

estimate_future_cost_with_cache(hypo)[source]

Enabled cache...

estimate_future_cost_without_cache(hypo)[source]

Disabled cache...

initialize(src_sentence)[source]

Initialize the cache.

set_predictors(predictors)[source]

Override Decoder.set_predictors to redirect the predictors to self.decoder

class cam.sgnmt.decoding.heuristics.LastTokenHeuristic[source]

Bases: cam.sgnmt.decoding.core.Heuristic

This heuristic reflects the score of the last token in the translation prefix only, ie. not the accumulated score. Using this with pure_heuristic_estimates leads to expanding the partial hypothesis with the end token with the best individual score. This can be useful in search spaces in which bad translation prefixes imply low individual scores later.

Creates a heuristic without predictors.

estimate_future_cost(hypo)[source]

Returns the negative score of the last token in hypo.

initialize(src_sentence)[source]

Empty method.

class cam.sgnmt.decoding.heuristics.PredictorHeuristic[source]

Bases: cam.sgnmt.decoding.core.Heuristic

The predictor heuristic relies on the estimate_future_costs() implementation of the predictors. Use this heuristic to access predictor specific future cost functions, e.g. shortest path for the fst predictor.

Creates a heuristic without predictors.

estimate_future_cost(hypo)[source]

Returns the weighted sum of predictor estimates.

initialize(src_sentence)[source]

Calls initialize_heuristic() on all predictors.

notify(message, message_type=1)[source]

This heuristic passes through notifications to the predictors.

class cam.sgnmt.decoding.heuristics.ScorePerWordHeuristic[source]

Bases: cam.sgnmt.decoding.core.Heuristic

Using this heuristic results in length normalized scores instead of the pure sum of predictor scores for a partial hypothesis. Therefore, it is not a heuristic like in the classical A* sense. Instead, using the A* decoder with this heuristic simulates beam search which always keeps the hypotheses with the best per word scores.

Creates a heuristic without predictors.

estimate_future_cost(hypo)[source]

A* will put cost-score on the heap. In order to simulate length normalized beam search, we want to use -score/length as partial hypothesis score. Therefore, this method returns -score/length + score

initialize(src_sentence)[source]

Empty method.

class cam.sgnmt.decoding.heuristics.StatsHeuristic(heuristic_scores_file='', collect_stats_strategy='best')[source]

Bases: cam.sgnmt.decoding.core.Heuristic

This heuristic is based on the sum of unigram costs of consumed words. Unigram statistics are collected via a UnigramTable.

Creates a new StatsHeuristic instance. The constructor initializes the unigram table.

Parameters:
  • heuristic_scores_file (string) – Path to the unigram scores which are used if this predictor estimates future costs
  • collect_stats_strategy (string) – best, full, or all. Defines how unigram estimates are collected for heuristic
estimate_future_cost(hypo)[source]

Returns the sum of heuristic unigram estimates of the words in the translation prefix of hypo. Combined with the hypo score, this leads to using the ratio between actual hypo score and an idealistic score (product of unigrams) to discriminate partial hypotheses.

initialize(src_sentence)[source]

Calls reset to reset collected statistics from previous sentences

Parameters:src_sentence (list) – Not used
notify(message, message_type=1)[source]

Passing through to the unigram table self.estimates.

cam.sgnmt.decoding.mbrbeam module

This beam search uses an MBR-based criterion at each time step.

class cam.sgnmt.decoding.mbrbeam.MBRBeamDecoder(decoder_args)[source]

Bases: cam.sgnmt.decoding.beam.BeamDecoder

The MBR-based beam decoder does not select the n most likely hypotheses in each timestep. Instead, it tries to find the translation with the best expected BLEU. Two strategies control the behavior of mbrbeam: the evidence_strategy and the selection_strategy. Available evidence strategies:

‘renorm’: Only makes use of the n-best expansions of the hypos in the
current beam. It renormalizes the scores, and count ngrams in the n^2 hypos.
‘maxent’: Applies the MaxEnt rule to the evidence space. It makes use of
all partial hypos seen so far and updates its belief about the probability of an ngram based on that. Following MaxEnt we assume that translations outside the explored space are uniformly distributed.
Available selection strategies:

‘bleu’: Select the n hypotheses with the best expected BLEU ‘oracle_bleu’: Select’a subset with n elements which maximizes the

expected maximum BLEU of one of the selected hypos. In other words, we optimize the oracle BLEU of the n-best list at each time step, where the n-best list consists of the active hypotheses in the beam.

Creates a new MBR beam decoder instance. We explicitly set early stopping to False since risk-free pruning is not supported by the MBR-based beam decoder. The MBR-based decoder fetches the following fields from decoder_args:

min_ngram_order (int): Minimum n-gram order max_ngram_order (int): Maximum n-gram order mbrbeam_smooth_factor (float): Smoothing factor for evidence space mbrbeam_evidence_strategy (String): Evidence strategy mbrbeam_selection_strategy (String): Selection strategy
Parameters:decoder_args (object) – Decoder configuration passed through from the configuration API.
decode(src_sentence)[source]

Decodes a single source sentence using beam search.

cam.sgnmt.decoding.mbrbeam.is_sublist(needle, haystack)[source]

True if needle is a sublist of haystack, False otherwise. Could be more efficient with Boyer-Moore-Horspool but our needles are usually ngrams (ie. short), so this is a O(nm) implementation.

cam.sgnmt.decoding.multisegbeam module

Implementation of beam search for predictors with multiple tokenizations.

class cam.sgnmt.decoding.multisegbeam.Continuation(parent_hypo, pred_stubs, key='')[source]

Bases: object

A continuation is a partial hypothesis plus the next word. A continuation can be incomplete if predictors use finer grained tokenization and the score is not final yet.

Create a new continuation.

Parameters:
  • parent_hypo (PartialHypothesis) – hypo object encoding the state at the last word boundary
  • pred_stubs (list) – List of PredictorStub objects, one for each predictor
  • key (string) – The lead key for this continuation. All stubs must be consistent with this key
calculate_score(pred_weights, defaults=[])[source]

Calculates the full word score for this continuation using the predictor stub scores.

Parameters:
  • pred_weights (list) – Predictor weights. Length of this list must match the number of stubs
  • defaults (list) – Score which should be used if a predictor stub is set to None
Returns:

float. Full score of this continuation, or an optimistic estimate if the continuation is not complete.

expand(decoder)[source]
generate_expanded_hypo(decoder)[source]

This can be used to create a new PartialHypothesis which reflects the state after this continuation. This involves expanding the history by word, updating score and score_ breakdown, and consuming the last tokens in the stub to save the final predictor states. If the continuation is complete, this will result in a new word level hypothesis. If not, the generated hypo will indicate an incomplete word at the last position by using the word ID -1.

is_complete()[source]

Returns true if all predictor stubs are completed, i.e. the continuation can be mapped unambiguously to a word and the score is final.

class cam.sgnmt.decoding.multisegbeam.EOWTokenizer(path)[source]

Bases: cam.sgnmt.decoding.multisegbeam.Tokenizer

This tokenizer reads word maps with explicit </w> endings. This can be used for subword unit based tokenizers.

is_word_begin_token(token)[source]
key2tokens(key)[source]
tokens2key(tokens)[source]
class cam.sgnmt.decoding.multisegbeam.FSTTokenizer(path)[source]

Bases: cam.sgnmt.decoding.multisegbeam.Tokenizer

This tokenizer reads in an FST which transduces a sequence of subword units to a sequence of characters which constitute the key. The characters must used the global target cmap.

Loads subword->char FST, determinizes and minimizes it.

Parameters:path (string) – Path to an FST from subword unit to char sequence
EPS_ID = 0

OpenFST’s reserved ID for epsilon arcs.

is_word_begin_token(token)[source]

Returns true if there is an arc labeled with token from the start state in the token2char FST.

Parameters:token (int) – token ID
Returns:bool. True if a word can start with token
key2tokens(key)[source]
tokens2key(tokens)[source]
class cam.sgnmt.decoding.multisegbeam.MixedTokenizer(path)[source]

Bases: cam.sgnmt.decoding.multisegbeam.Tokenizer

This tokenizer allows to mix word- and character-level tokenizations like proposed by Wu et al. (2016). Words with <b>, <m>, and <e> prefixes are treated as character-level tokens, all others are completed word-level tokens

is_word_begin_token(token)[source]
key2tokens(key)[source]
tokens2key(tokens)[source]
class cam.sgnmt.decoding.multisegbeam.MultisegBeamDecoder(decoder_args, hypo_recombination, beam_size, tokenizations, early_stopping=True, max_word_len=25)[source]

Bases: cam.sgnmt.decoding.core.Decoder

This is a version of beam search which can handle predictors with differing tokenizations. We assume that all tokenizations are consistent with words, i.e. no token crosses word boundaries. The search simulates beam search on the word level. At each time step, we keep the n best hypotheses on the word level. Predictor scores on finer-grained tokens are collapsed into a single score s.t. they can be combined with scores from other predictors. This decoder can produce words without entry in the word map. In this case, words are added to utils.trg_wmap. Consider using the output_chars option to avoid dealing with the updated word map in the output.

Creates a new beam decoder instance for predictors with multiple tokenizations.

Parameters:
  • decoder_args (object) – Decoder configuration passed through from the configuration API.
  • hypo_recombination (bool) – Activates hypo recombination
  • beam_size (int) – Absolute beam size. A beam of 12 means that we keep track of 12 active hypothesis
  • tokenizations (string) – Comma separated strings describing the predictor tokenizations
  • early_stopping (bool) – If true, we stop when the best scoring hypothesis ends with </S>. If false, we stop when all hypotheses end with </S>. Enable if you are only interested in the single best decoding result. If you want to create full 12-best lists, disable
  • max_word_len (int) – Maximum length of a single word
decode(src_sentence)[source]

Decodes a single source sentence using beam search.

class cam.sgnmt.decoding.multisegbeam.PredictorStub(tokens, pred_state)[source]

Bases: object

A predictor stub models the state of a predictor given a continuation.

Creates a new stub for a certain predictor.

Parameters:
  • tokens (list) – List of token IDs which correspond to the key
  • pred_state (object) – Predictor state before consuming the last token in tokens
expand(token, token_score, pred_state)[source]

Creates a new predictor stub by adding a (scored) token.

Parameters:
  • token (int) – Token ID to add
  • token_score (float) – Token score of the added token
  • pred_state (object) – predictor state before consuming the added token
has_full_score()[source]

Returns true if the full token sequence has been scored with the predictor, i.e. self.score is the final predictor score.

score_next(token_score)[source]

Can be called when the continuation is expanded and the score of the next token is available

Parameters:token_score (float) – Predictor score of self.tokens[self.score_pos]
class cam.sgnmt.decoding.multisegbeam.Tokenizer[source]

Bases: object

A tokenizer translates between token sequences and string keys. It is mainly responsible for matching token sequences from different predictors together.

is_word_begin_token(token)[source]

Returns true if token is only allowed at word begins.

key2tokens(key)[source]

Convert a key to a sequence of tokens. If this mapping is ambiguous, return one of the shortest mappings. Use UNK to match any (sub)string without token correspondence.

Parameters:key (string) – key to look up
Returns:list. List of token IDs
tokens2key(tokens)[source]

Convert a token sequence to a string key.

Parameters:tokens (list) – List of token IDs
Returns:String. The key for the token sequence
class cam.sgnmt.decoding.multisegbeam.WordMapper[source]

Bases: object

This class is responsible for the mapping between keys and word
IDs. The multiseg beam search can produce words which are not in the original word map. This mapper adds these words to

utils.trg_wmap.

This class uses the GoF design pattern singleton.

Creates a new mapper instance and synchronizes it with utils.trg_wmap.

static get_singleton()[source]

Get singleton instance of the word mapper. This method implements lazy initialization.

Returns:WordMapper. Singleton WordMapper instance.
get_word_id(key)[source]

Finds a word ID for the given key. If no such key is in the current word map, create a new entry in utils.trg_wmap.

Parameters:key (string) – key to look up
Returns:int. Word ID corresponding to key. Add new word ID if the key cannot be found in utils.trg_wmap
singleton = None

Singleton instance. Access via get_singleton().

synchronize()[source]

Synchronizes the internal state of this mapper with utils.trg_wmap. This includes updating the reverse lookup table and finding the lowest free word ID which can be assigned to new words.

class cam.sgnmt.decoding.multisegbeam.WordTokenizer(path)[source]

Bases: cam.sgnmt.decoding.multisegbeam.Tokenizer

This tokenizer implements a purly word-level tokenization. Keys are generated according a standard word map.

is_word_begin_token(token)[source]
key2tokens(key)[source]
tokens2key(tokens)[source]
cam.sgnmt.decoding.multisegbeam.is_key_complete(key)[source]

Returns true if the key is complete. Complete keys are marked with a blank symbol at the end of the string. A complete key corresponds to a full word, incomplete keys cannot be mapped to word IDs.

Parameters:key (string) – The key
Returns:bool. Return true if the last character in key is blank.

cam.sgnmt.decoding.restarting module

Implementation of the restarting search strategy

class cam.sgnmt.decoding.restarting.RestartingChild(word, score, score_breakdown)[source]

Bases: object

Helper class for RestartingDecoder` representing a child object in the search tree.

Creates a new child instance

class cam.sgnmt.decoding.restarting.RestartingDecoder(decoder_args, hypo_recombination, max_expansions=0, low_memory_mode=True, node_cost_strategy='difference', stochastic=False, always_single_step=False)[source]

Bases: cam.sgnmt.decoding.core.Decoder

This decoder first creates a path to the final node greedily. Then, it looks for the node on this path with the smallest difference between best and second best child, and restarts greedy decoding from this point. In order to do so, it maintains a priority queue of all visited nodes, which is ordered by the difference between the worst expanded child and the best unexpanded one. If this queue is empty, we have visited the best path. This algorithm is similar to DFS but does not backtrace to the last call of the recursive function but to the one which is most promising.

Note that this algorithm is exact. It tries to exploit the problem characteristics of NMT search: Reloading predictor states can be expensive, node expansion is even more expensive but for free from visited nodes, and there is no good admissible heuristic.

Note2: Does not work properly if predictor scores can be positive because of admissible pruning

Creates new Restarting decoder instance.

Parameters:
  • decoder_args (object) – Decoder configuration passed through from the configuration API.
  • hypo_recombination (bool) – Activates hypo recombination
  • max_expansions (int) – Maximum number of node expansions for inadmissible pruning.
  • low_memory_mode (bool) – Switch on low memory mode at cost of some computational overhead as the set of open nodes is reduced after each decoding pass
  • node_cost_strategy (string) – How to decide which node to restart from next
  • stochastic (bool) – If true, select the next node to restart from randomly. If false, take the one with the best node score
  • always_single_step (bool) – If false, do greedy decoding when restarting. If true, expand the hypothesis only by a single token
create_initial_node()[source]

Create the root node for the search tree.

decode(src_sentence)[source]

Decodes a single source sentence using Restarting search.

greedy_decode(hypo)[source]

Helper function for greedy decoding from a certain point in the search tree.

class cam.sgnmt.decoding.restarting.RestartingNode(hypo, children)[source]

Bases: object

Helper class for RestartingDecoder` representing a node in the search tree.

Creates a new node instance

cam.sgnmt.decoding.sepbeam module

Implementation of beam search which does not combine all predictor scores but keeps only one predictor alive for each hypo in the beam. Good for approximate and efficient ensembling.

class cam.sgnmt.decoding.sepbeam.SepBeamDecoder(decoder_args)[source]

Bases: cam.sgnmt.decoding.beam.BeamDecoder

This beam search implementation breaks with the predictor abstraction via the Decoder.apply_predictors() and Decoder.consume() interfaces. We do not use combined scores of all predictors, but link single predictors to hypotheses in the beam. On hypo expansion, we call predict_next() only on this predictor. This is suitable for approximated ensembling as it reduces the runtime nearly to a single system run.

Note that PartialHypothesis.predictor_states holds a list with None objects except for one position.

Also note that predictor weights are ignored for this decoding strategy.

Creates a new beam decoder instance for system level combination. See the docstring of the BeamDecoder constructor for a description of which arguments are fetched from decoder_args.

cam.sgnmt.decoding.syncbeam module

Implementation of beam search with explicit synchronization symbol

class cam.sgnmt.decoding.syncbeam.SyncBeamDecoder(decoder_args)[source]

Bases: cam.sgnmt.decoding.beam.BeamDecoder

This beam search implementation is a two level approach. Hypotheses are not compared after each iteration, but after consuming an explicit synchronization symbol. This is useful when SGNMT runs on the character level, but it makes more sense to compare hypos with same lengths in terms of number of words and not characters. The end-of-word symbol </w> can be used as synchronization symbol.

Creates a new beam decoder instance with explicit synchronization symbol. In addition to the constructor of BeamDecoder, the following values are fetched from decoder_args:

sync_symb (int): Synchronization symbol. If negative,
consider a hypothesis as closed when it ends with a terminal symbol (see syntax_min_terminal_id and syntax_max_terminal_id)

max_word_len (int): Maximum length of a single word

cam.sgnmt.decoding.syntaxbeam module

The syntax beam secoding strategy ensures diversity in the terminals.

class cam.sgnmt.decoding.syntaxbeam.SyntaxBeamDecoder(decoder_args)[source]

Bases: cam.sgnmt.decoding.beam.BeamDecoder

The syntax beam search strategy is an extension of beam search which ensures diversity amongst the terminals in the active hypotheses. The decoder clusters hypotheses by their terminal history. Each cluster cannot have more than beam_size hypos, and the number of clusters is topped by beam_size. This means that the effective beam size varies between beam_size and beam_size^2, and there are always beam_size different terminal histories in the active beam.

Creates a new beam decoder instance for system level combination. In addition to the constructor of BeamDecoder, the following values are fetched from decoder_args:

syntax_min_terminal_id (int): All IDs smaller than this are NTs syntax_max_terminal_id (int): All IDs larger than this are NTs
decode(src_sentence)[source]

Decodes a single source sentence using beam search.

is_terminal(tok)[source]

Module contents

This package contains the central interfaces for the decoder (in the core module ), and the implementations of search strategies (Decoder).