Command-line reference¶

The main runner script in SGNMT is decode.py. The script can be configured via command line or configuration file. For a quick overview of available parameters use --help:

python decode.py --help

The complete and detailed list of parameters is provided below.

Besides decode.py following additional scripts are available:

extract_scores_along_reference.py: Generates a JSON file which contains the complete posteriors of all predictors along a reference decoding path. Can be used for tuning predictor weights. Same arguments as decode.py.

scripts/apply_wmap.py: Applies word maps to strings.

scripts/extend_wmap.py: Extends a word map with new words.

scripts/apply_bpe_with_eow.py: Version of apply_bpe.py in Rico Sennrich’s subword-nmt repo that is compatible with SGNMT’s BPE style (no @@ separator, keep </w> tokens).

scripts/mert.py: Basic MERT script for tuning SGNMT predictor weights on 1-best translations.

scripts/sge/: Script for distributed SGNMT decoding on the Sun Grid Engine.

Decoding¶

usage: decode.py [-h] [--config_file CONFIG_FILE] [--run_diagnostics]
                 [--verbosity {debug,info,warn,error}] [--min_score MIN_SCORE]
                 [--range RANGE] [--src_test SRC_TEST]
                 [--indexing_scheme {t2t,fairseq}]
                 [--ignore_sanity_checks IGNORE_SANITY_CHECKS]
                 [--input_method {dummy,file,shell,stdin}]
                 [--log_sum {tropical,log}] [--n_cpu_threads N_CPU_THREADS]
                 [--single_cpu_thread SINGLE_CPU_THREAD]
                 [--decoder {greedy,beam,multisegbeam,syncbeam,fstbeam,predlimitbeam,sepbeam,mbrbeam,lenbeam,syntaxbeam,combibeam,dfs,simpledfs,simplelendfs,restarting,flip,bucket,bigramgreedy,astar}]
                 [--beam BEAM] [--sub_beam SUB_BEAM]
                 [--hypo_recombination HYPO_RECOMBINATION]
                 [--allow_unk_in_output ALLOW_UNK_IN_OUTPUT]
                 [--max_node_expansions MAX_NODE_EXPANSIONS]
                 [--max_len_factor MAX_LEN_FACTOR]
                 [--early_stopping EARLY_STOPPING] [--heuristics HEURISTICS]
                 [--heuristic_predictors HEURISTIC_PREDICTORS]
                 [--multiseg_tokenizations MULTISEG_TOKENIZATIONS]
                 [--cache_heuristic_estimates CACHE_HEURISTIC_ESTIMATES]
                 [--pure_heuristic_scores PURE_HEURISTIC_SCORES]
                 [--restarting_node_score {difference,absolute,constant,expansions}]
                 [--low_decoder_memory LOW_DECODER_MEMORY]
                 [--stochastic_decoder STOCHASTIC_DECODER]
                 [--decode_always_single_step DECODE_ALWAYS_SINGLE_STEP]
                 [--flip_strategy {move,flip}]
                 [--bucket_selector BUCKET_SELECTOR]
                 [--bucket_score_strategy {difference,heap,absolute,constant}]
                 [--collect_statistics {best,full,all}]
                 [--heuristic_scores_file HEURISTIC_SCORES_FILE]
                 [--score_lower_bounds_file SCORE_LOWER_BOUNDS_FILE]
                 [--decoder_diversity_factor DECODER_DIVERSITY_FACTOR]
                 [--sync_symbol SYNC_SYMBOL] [--max_word_len MAX_WORD_LEN]
                 [--mbrbeam_smooth_factor MBRBEAM_SMOOTH_FACTOR]
                 [--mbrbeam_selection_strategy {bleu,oracle_bleu}]
                 [--mbrbeam_evidence_strategy {maxent,renorm}]
                 [--pred_limits PRED_LIMITS]
                 [--simplelendfs_lower_bounds_file SIMPLELENDFS_LOWER_BOUNDS_FILE]
                 [--nbest NBEST] [--fst_unk_id FST_UNK_ID]
                 [--output_path OUTPUT_PATH] [--outputs OUTPUTS]
                 [--remove_eos REMOVE_EOS] [--src_wmap SRC_WMAP]
                 [--trg_wmap TRG_WMAP] [--wmap WMAP]
                 [--preprocessing {id,word,char,bpe}]
                 [--postprocessing {id,wmap,char,subword_nmt}]
                 [--bpe_codes BPE_CODES] [--predictors PREDICTORS]
                 [--predictor_weights PREDICTOR_WEIGHTS]
                 [--per_sentence_predictor_weights PER_SENTENCE_PREDICTOR_WEIGHTS]
                 [--interpolation_strategy INTERPOLATION_STRATEGY]
                 [--interpolation_weights_mean {arith,geo,prob}]
                 [--interpolation_smoothing INTERPOLATION_SMOOTHING]
                 [--moe_config MOE_CONFIG]
                 [--moe_checkpoint_dir MOE_CHECKPOINT_DIR]
                 [--closed_vocabulary_normalization {none,exact,reduced,rescale_unk,non_zero}]
                 [--combination_scheme {sum,length_norm,bayesian,bayesian_loglin,bayesian_state_dependent}]
                 [--bayesian_domain_task_weights BAYESIAN_DOMAIN_TASK_WEIGHTS]
                 [--t2t_unk_id T2T_UNK_ID]
                 [--pred_src_vocab_size PRED_SRC_VOCAB_SIZE]
                 [--pred_trg_vocab_size PRED_TRG_VOCAB_SIZE]
                 [--gnmt_alpha GNMT_ALPHA] [--t2t_usr_dir T2T_USR_DIR]
                 [--t2t_model T2T_MODEL] [--t2t_problem T2T_PROBLEM]
                 [--t2t_hparams_set T2T_HPARAMS_SET]
                 [--t2t_checkpoint_dir T2T_CHECKPOINT_DIR]
                 [--nizza_model NIZZA_MODEL]
                 [--nizza_hparams_set NIZZA_HPARAMS_SET]
                 [--nizza_checkpoint_dir NIZZA_CHECKPOINT_DIR]
                 [--lexnizza_trg2src_model LEXNIZZA_TRG2SRC_MODEL]
                 [--lexnizza_trg2src_hparams_set LEXNIZZA_TRG2SRC_HPARAMS_SET]
                 [--lexnizza_trg2src_checkpoint_dir LEXNIZZA_TRG2SRC_CHECKPOINT_DIR]
                 [--lexnizza_shortlist_strategies LEXNIZZA_SHORTLIST_STRATEGIES]
                 [--lexnizza_alpha LEXNIZZA_ALPHA]
                 [--lexnizza_beta LEXNIZZA_BETA]
                 [--lexnizza_max_shortlist_length LEXNIZZA_MAX_SHORTLIST_LENGTH]
                 [--lexnizza_min_id LEXNIZZA_MIN_ID]
                 [--fairseq_path FAIRSEQ_PATH]
                 [--fairseq_user_dir FAIRSEQ_USER_DIR]
                 [--fairseq_lang_pair FAIRSEQ_LANG_PAIR]
                 [--syntax_max_depth SYNTAX_MAX_DEPTH]
                 [--syntax_root_id SYNTAX_ROOT_ID]
                 [--syntax_pop_id SYNTAX_POP_ID]
                 [--syntax_min_terminal_id SYNTAX_MIN_TERMINAL_ID]
                 [--syntax_max_terminal_id SYNTAX_MAX_TERMINAL_ID]
                 [--syntax_terminal_list SYNTAX_TERMINAL_LIST]
                 [--syntax_nonterminal_ids SYNTAX_NONTERMINAL_IDS]
                 [--osm_use_jumps OSM_USE_JUMPS]
                 [--osm_use_auto_pop OSM_USE_AUTO_POP]
                 [--osm_use_unpop OSM_USE_UNPOP] [--osm_use_pop2 OSM_USE_POP2]
                 [--osm_use_src_eop OSM_USE_SRC_EOP]
                 [--osm_use_copy OSM_USE_COPY] [--src_test_raw SRC_TEST_RAW]
                 [--length_model_weights LENGTH_MODEL_WEIGHTS]
                 [--use_length_point_probs USE_LENGTH_POINT_PROBS]
                 [--length_model_offset LENGTH_MODEL_OFFSET]
                 [--extlength_path EXTLENGTH_PATH]
                 [--unk_count_lambdas UNK_COUNT_LAMBDAS] [--wc_word WC_WORD]
                 [--negative_wc NEGATIVE_WC] [--wc_nonterminal_penalty]
                 [--syntax_nonterminal_factor SYNTAX_NONTERMINAL_FACTOR]
                 [--ngramc_path NGRAMC_PATH] [--ngramc_order NGRAMC_ORDER]
                 [--min_ngram_order MIN_NGRAM_ORDER]
                 [--max_ngram_order MAX_NGRAM_ORDER]
                 [--ngramc_discount_factor NGRAMC_DISCOUNT_FACTOR]
                 [--skipvocab_vocab SKIPVOCAB_VOCAB]
                 [--skipvocab_stop_size SKIPVOCAB_STOP_SIZE]
                 [--trg_test TRG_TEST] [--forced_spurious FORCED_SPURIOUS]
                 [--forcedlst_sparse_feat FORCEDLST_SPARSE_FEAT]
                 [--forcedlst_match_unk FORCEDLST_MATCH_UNK]
                 [--use_nbest_weights USE_NBEST_WEIGHTS]
                 [--bow_heuristic_strategies BOW_HEURISTIC_STRATEGIES]
                 [--bow_accept_subsets BOW_ACCEPT_SUBSETS]
                 [--bow_accept_duplicates BOW_ACCEPT_DUPLICATES]
                 [--bow_diversity_heuristic_factor BOW_DIVERSITY_HEURISTIC_FACTOR]
                 [--src_idxmap SRC_IDXMAP] [--trg_idxmap TRG_IDXMAP]
                 [--maskvocab_vocab MASKVOCAB_VOCAB]
                 [--altsrc_test ALTSRC_TEST] [--word2char_map WORD2CHAR_MAP]
                 [--fsttok_path FSTTOK_PATH]
                 [--fsttok_max_pending_score FSTTOK_MAX_PENDING_SCORE]
                 [--rules_path RULES_PATH]
                 [--use_grammar_weights USE_GRAMMAR_WEIGHTS]
                 [--grammar_feature_weights GRAMMAR_FEATURE_WEIGHTS]
                 [--lm_path LM_PATH] [--fst_path FST_PATH]
                 [--syntax_path SYNTAX_PATH]
                 [--syntax_bpe_path SYNTAX_BPE_PATH]
                 [--syntax_word_out SYNTAX_WORD_OUT]
                 [--syntax_allow_early_eos SYNTAX_ALLOW_EARLY_EOS]
                 [--syntax_norm_alpha SYNTAX_NORM_ALPHA]
                 [--syntax_max_internal_len SYNTAX_MAX_INTERNAL_LEN]
                 [--syntax_internal_beam SYNTAX_INTERNAL_BEAM]
                 [--syntax_consume_ooc SYNTAX_CONSUME_OOC]
                 [--syntax_tok_grammar SYNTAX_TOK_GRAMMAR]
                 [--syntax_terminal_restrict SYNTAX_TERMINAL_RESTRICT]
                 [--syntax_internal_only SYNTAX_INTERNAL_ONLY]
                 [--syntax_eow_ids SYNTAX_EOW_IDS]
                 [--syntax_terminal_ids SYNTAX_TERMINAL_IDS]
                 [--rtn_path RTN_PATH]
                 [--fst_skip_bos_weight FST_SKIP_BOS_WEIGHT]
                 [--fst_to_log FST_TO_LOG] [--use_fst_weights USE_FST_WEIGHTS]
                 [--use_rtn_weights USE_RTN_WEIGHTS]
                 [--minimize_rtns MINIMIZE_RTNS]
                 [--remove_epsilon_in_rtns REMOVE_EPSILON_IN_RTNS]
                 [--normalize_fst_weights NORMALIZE_FST_WEIGHTS]
                 [--normalize_rtn_weights NORMALIZE_RTN_WEIGHTS]
                 [--t2t_model2 T2T_MODEL2] [--t2t_problem2 T2T_PROBLEM2]
                 [--t2t_hparams_set2 T2T_HPARAMS_SET2]
                 [--t2t_checkpoint_dir2 T2T_CHECKPOINT_DIR2]
                 [--pred_src_vocab_size2 PRED_SRC_VOCAB_SIZE2]
                 [--t2t_unk_id2 T2T_UNK_ID2]
                 [--pred_trg_vocab_size2 PRED_TRG_VOCAB_SIZE2]
                 [--altsrc_test2 ALTSRC_TEST2]
                 [--word2char_map2 WORD2CHAR_MAP2]
                 [--fsttok_path2 FSTTOK_PATH2] [--src_idxmap2 SRC_IDXMAP2]
                 [--trg_idxmap2 TRG_IDXMAP2] [--fst_path2 FST_PATH2]
                 [--forcedlst_sparse_feat2 FORCEDLST_SPARSE_FEAT2]
                 [--ngramc_path2 NGRAMC_PATH2] [--ngramc_order2 NGRAMC_ORDER2]
                 [--fairseq_path2 FAIRSEQ_PATH2] [--t2t_model3 T2T_MODEL3]
                 [--t2t_problem3 T2T_PROBLEM3]
                 [--t2t_hparams_set3 T2T_HPARAMS_SET3]
                 [--t2t_checkpoint_dir3 T2T_CHECKPOINT_DIR3]
                 [--pred_src_vocab_size3 PRED_SRC_VOCAB_SIZE3]
                 [--t2t_unk_id3 T2T_UNK_ID3]
                 [--pred_trg_vocab_size3 PRED_TRG_VOCAB_SIZE3]
                 [--altsrc_test3 ALTSRC_TEST3]
                 [--word2char_map3 WORD2CHAR_MAP3]
                 [--fsttok_path3 FSTTOK_PATH3] [--src_idxmap3 SRC_IDXMAP3]
                 [--trg_idxmap3 TRG_IDXMAP3] [--fst_path3 FST_PATH3]
                 [--forcedlst_sparse_feat3 FORCEDLST_SPARSE_FEAT3]
                 [--ngramc_path3 NGRAMC_PATH3] [--ngramc_order3 NGRAMC_ORDER3]
                 [--fairseq_path3 FAIRSEQ_PATH3] [--t2t_model4 T2T_MODEL4]
                 [--t2t_problem4 T2T_PROBLEM4]
                 [--t2t_hparams_set4 T2T_HPARAMS_SET4]
                 [--t2t_checkpoint_dir4 T2T_CHECKPOINT_DIR4]
                 [--pred_src_vocab_size4 PRED_SRC_VOCAB_SIZE4]
                 [--t2t_unk_id4 T2T_UNK_ID4]
                 [--pred_trg_vocab_size4 PRED_TRG_VOCAB_SIZE4]
                 [--altsrc_test4 ALTSRC_TEST4]
                 [--word2char_map4 WORD2CHAR_MAP4]
                 [--fsttok_path4 FSTTOK_PATH4] [--src_idxmap4 SRC_IDXMAP4]
                 [--trg_idxmap4 TRG_IDXMAP4] [--fst_path4 FST_PATH4]
                 [--forcedlst_sparse_feat4 FORCEDLST_SPARSE_FEAT4]
                 [--ngramc_path4 NGRAMC_PATH4] [--ngramc_order4 NGRAMC_ORDER4]
                 [--fairseq_path4 FAIRSEQ_PATH4] [--t2t_model5 T2T_MODEL5]
                 [--t2t_problem5 T2T_PROBLEM5]
                 [--t2t_hparams_set5 T2T_HPARAMS_SET5]
                 [--t2t_checkpoint_dir5 T2T_CHECKPOINT_DIR5]
                 [--pred_src_vocab_size5 PRED_SRC_VOCAB_SIZE5]
                 [--t2t_unk_id5 T2T_UNK_ID5]
                 [--pred_trg_vocab_size5 PRED_TRG_VOCAB_SIZE5]
                 [--altsrc_test5 ALTSRC_TEST5]
                 [--word2char_map5 WORD2CHAR_MAP5]
                 [--fsttok_path5 FSTTOK_PATH5] [--src_idxmap5 SRC_IDXMAP5]
                 [--trg_idxmap5 TRG_IDXMAP5] [--fst_path5 FST_PATH5]
                 [--forcedlst_sparse_feat5 FORCEDLST_SPARSE_FEAT5]
                 [--ngramc_path5 NGRAMC_PATH5] [--ngramc_order5 NGRAMC_ORDER5]
                 [--fairseq_path5 FAIRSEQ_PATH5] [--t2t_model6 T2T_MODEL6]
                 [--t2t_problem6 T2T_PROBLEM6]
                 [--t2t_hparams_set6 T2T_HPARAMS_SET6]
                 [--t2t_checkpoint_dir6 T2T_CHECKPOINT_DIR6]
                 [--pred_src_vocab_size6 PRED_SRC_VOCAB_SIZE6]
                 [--t2t_unk_id6 T2T_UNK_ID6]
                 [--pred_trg_vocab_size6 PRED_TRG_VOCAB_SIZE6]
                 [--altsrc_test6 ALTSRC_TEST6]
                 [--word2char_map6 WORD2CHAR_MAP6]
                 [--fsttok_path6 FSTTOK_PATH6] [--src_idxmap6 SRC_IDXMAP6]
                 [--trg_idxmap6 TRG_IDXMAP6] [--fst_path6 FST_PATH6]
                 [--forcedlst_sparse_feat6 FORCEDLST_SPARSE_FEAT6]
                 [--ngramc_path6 NGRAMC_PATH6] [--ngramc_order6 NGRAMC_ORDER6]
                 [--fairseq_path6 FAIRSEQ_PATH6] [--t2t_model7 T2T_MODEL7]
                 [--t2t_problem7 T2T_PROBLEM7]
                 [--t2t_hparams_set7 T2T_HPARAMS_SET7]
                 [--t2t_checkpoint_dir7 T2T_CHECKPOINT_DIR7]
                 [--pred_src_vocab_size7 PRED_SRC_VOCAB_SIZE7]
                 [--t2t_unk_id7 T2T_UNK_ID7]
                 [--pred_trg_vocab_size7 PRED_TRG_VOCAB_SIZE7]
                 [--altsrc_test7 ALTSRC_TEST7]
                 [--word2char_map7 WORD2CHAR_MAP7]
                 [--fsttok_path7 FSTTOK_PATH7] [--src_idxmap7 SRC_IDXMAP7]
                 [--trg_idxmap7 TRG_IDXMAP7] [--fst_path7 FST_PATH7]
                 [--forcedlst_sparse_feat7 FORCEDLST_SPARSE_FEAT7]
                 [--ngramc_path7 NGRAMC_PATH7] [--ngramc_order7 NGRAMC_ORDER7]
                 [--fairseq_path7 FAIRSEQ_PATH7] [--t2t_model8 T2T_MODEL8]
                 [--t2t_problem8 T2T_PROBLEM8]
                 [--t2t_hparams_set8 T2T_HPARAMS_SET8]
                 [--t2t_checkpoint_dir8 T2T_CHECKPOINT_DIR8]
                 [--pred_src_vocab_size8 PRED_SRC_VOCAB_SIZE8]
                 [--t2t_unk_id8 T2T_UNK_ID8]
                 [--pred_trg_vocab_size8 PRED_TRG_VOCAB_SIZE8]
                 [--altsrc_test8 ALTSRC_TEST8]
                 [--word2char_map8 WORD2CHAR_MAP8]
                 [--fsttok_path8 FSTTOK_PATH8] [--src_idxmap8 SRC_IDXMAP8]
                 [--trg_idxmap8 TRG_IDXMAP8] [--fst_path8 FST_PATH8]
                 [--forcedlst_sparse_feat8 FORCEDLST_SPARSE_FEAT8]
                 [--ngramc_path8 NGRAMC_PATH8] [--ngramc_order8 NGRAMC_ORDER8]
                 [--fairseq_path8 FAIRSEQ_PATH8] [--t2t_model9 T2T_MODEL9]
                 [--t2t_problem9 T2T_PROBLEM9]
                 [--t2t_hparams_set9 T2T_HPARAMS_SET9]
                 [--t2t_checkpoint_dir9 T2T_CHECKPOINT_DIR9]
                 [--pred_src_vocab_size9 PRED_SRC_VOCAB_SIZE9]
                 [--t2t_unk_id9 T2T_UNK_ID9]
                 [--pred_trg_vocab_size9 PRED_TRG_VOCAB_SIZE9]
                 [--altsrc_test9 ALTSRC_TEST9]
                 [--word2char_map9 WORD2CHAR_MAP9]
                 [--fsttok_path9 FSTTOK_PATH9] [--src_idxmap9 SRC_IDXMAP9]
                 [--trg_idxmap9 TRG_IDXMAP9] [--fst_path9 FST_PATH9]
                 [--forcedlst_sparse_feat9 FORCEDLST_SPARSE_FEAT9]
                 [--ngramc_path9 NGRAMC_PATH9] [--ngramc_order9 NGRAMC_ORDER9]
                 [--fairseq_path9 FAIRSEQ_PATH9] [--t2t_model10 T2T_MODEL10]
                 [--t2t_problem10 T2T_PROBLEM10]
                 [--t2t_hparams_set10 T2T_HPARAMS_SET10]
                 [--t2t_checkpoint_dir10 T2T_CHECKPOINT_DIR10]
                 [--pred_src_vocab_size10 PRED_SRC_VOCAB_SIZE10]
                 [--t2t_unk_id10 T2T_UNK_ID10]
                 [--pred_trg_vocab_size10 PRED_TRG_VOCAB_SIZE10]
                 [--altsrc_test10 ALTSRC_TEST10]
                 [--word2char_map10 WORD2CHAR_MAP10]
                 [--fsttok_path10 FSTTOK_PATH10] [--src_idxmap10 SRC_IDXMAP10]
                 [--trg_idxmap10 TRG_IDXMAP10] [--fst_path10 FST_PATH10]
                 [--forcedlst_sparse_feat10 FORCEDLST_SPARSE_FEAT10]
                 [--ngramc_path10 NGRAMC_PATH10]
                 [--ngramc_order10 NGRAMC_ORDER10]
                 [--fairseq_path10 FAIRSEQ_PATH10] [--t2t_model11 T2T_MODEL11]
                 [--t2t_problem11 T2T_PROBLEM11]
                 [--t2t_hparams_set11 T2T_HPARAMS_SET11]
                 [--t2t_checkpoint_dir11 T2T_CHECKPOINT_DIR11]
                 [--pred_src_vocab_size11 PRED_SRC_VOCAB_SIZE11]
                 [--t2t_unk_id11 T2T_UNK_ID11]
                 [--pred_trg_vocab_size11 PRED_TRG_VOCAB_SIZE11]
                 [--altsrc_test11 ALTSRC_TEST11]
                 [--word2char_map11 WORD2CHAR_MAP11]
                 [--fsttok_path11 FSTTOK_PATH11] [--src_idxmap11 SRC_IDXMAP11]
                 [--trg_idxmap11 TRG_IDXMAP11] [--fst_path11 FST_PATH11]
                 [--forcedlst_sparse_feat11 FORCEDLST_SPARSE_FEAT11]
                 [--ngramc_path11 NGRAMC_PATH11]
                 [--ngramc_order11 NGRAMC_ORDER11]
                 [--fairseq_path11 FAIRSEQ_PATH11] [--t2t_model12 T2T_MODEL12]
                 [--t2t_problem12 T2T_PROBLEM12]
                 [--t2t_hparams_set12 T2T_HPARAMS_SET12]
                 [--t2t_checkpoint_dir12 T2T_CHECKPOINT_DIR12]
                 [--pred_src_vocab_size12 PRED_SRC_VOCAB_SIZE12]
                 [--t2t_unk_id12 T2T_UNK_ID12]
                 [--pred_trg_vocab_size12 PRED_TRG_VOCAB_SIZE12]
                 [--altsrc_test12 ALTSRC_TEST12]
                 [--word2char_map12 WORD2CHAR_MAP12]
                 [--fsttok_path12 FSTTOK_PATH12] [--src_idxmap12 SRC_IDXMAP12]
                 [--trg_idxmap12 TRG_IDXMAP12] [--fst_path12 FST_PATH12]
                 [--forcedlst_sparse_feat12 FORCEDLST_SPARSE_FEAT12]
                 [--ngramc_path12 NGRAMC_PATH12]
                 [--ngramc_order12 NGRAMC_ORDER12]
                 [--fairseq_path12 FAIRSEQ_PATH12]

General options

`--config_file`	Configuration file in standard .ini format. NOTE: Configuration file overrides command line arguments
`--run_diagnostics=False`
	Run diagnostics and check availability of external libraries.
`--verbosity=info`
	Log level: debug,info,warn,error Possible choices: debug, info, warn, error
`--min_score=-inf`
	Delete all complete hypotheses with total scores smaller than this value
`--range=`	Defines the range of sentences to be processed. Syntax is equal to HiFSTs printstrings and lmerts idxrange parameter: <start-idx>:<end-idx> (both inclusive, start with 1). E.g. 2:5 means: skip the first sentence, process next 4 sentences. If this points to a file, we grap sentence IDs to translate from that file and delete the decoded IDs. This can be used for distributed decoding.
`--src_test=`	Path to source test set. This is expected to be a plain text file with one source sentence in each line. Words need to be indexed, i.e. use word IDs instead of their string representations.
`--indexing_scheme=t2t`
	This parameter defines the reserved IDs. * ‘t2t’: unk: 3, <s>: 2, </s>: 1. * ‘fairseq’: unk: 3, <s>: 0, </s>: 2. Possible choices: t2t, fairseq
`--ignore_sanity_checks=False`
	SGNMT terminates when a sanity check fails by default. Set this to true to ignore sanity checks.
`--input_method=file`
	This parameter controls how the input to SGNMT is provided. SGNMT supports three modes: * ‘dummy’: Use dummy source sentences. * ‘file’: Read test sentences from a plain text filespecified by –src_test. * ‘shell’: Start SGNMT in an interactive shell. * ‘stdin’: Test sentences are read from stdin In shell and stdin mode you can change SGNMT options on the fly: Beginning a line with the string ‘!sgnmt ‘ signals SGNMT directives instead of sentences to translate. E.g. ‘!sgnmt config predictor_weights 0.2,0.8’ changes the current predictor weights. ‘!sgnmt help’ lists all available directives. Using SGNMT directives is particularly useful in combination with MERT to avoid start up times between evaluations. Note that input sentences still have to be written using word ids in all cases. Possible choices: dummy, file, shell, stdin
`--log_sum=log`	Controls how to compute the sum in the log space, i.e. how to compute log(exp(l1)+exp(l2)) for log values l1,l2. * ‘tropical’: approximate with max(l1,l2) * ‘log’: Use logsumexp in scipy Possible choices: tropical, log
`--n_cpu_threads=-1`
	Set the number of CPU threads for libraries like Theano or TensorFlow for internal multithreading. Also, see the OMP_NUM_THREADS environment variable.
`--single_cpu_thread=False`
	Synonym for –n_cpu_threads=1

Decoding options

`--decoder=beam`	Strategy for traversing the search space which is spanned by the predictors. * ‘greedy’: Greedy decoding (similar to beam=1) * ‘beam’: beam search like in Bahdanau et al, 2015 * ‘dfs’: Depth-first search. This should be used for exact decoding or the complete enumeration of the search space, but it cannot be used if the search space is too large (like for unrestricted NMT) as it performs exhaustive search. If you have not only negative predictor scores, set –early_stopping to false. * ‘simpledfs’: Depth-first search which works with only one predictor. Good for exhaustive search in combination with –score_lower_bounds_file from a previous (beam) run. * ‘simplelendfs’: simpledfs variant with length-dependent lower bounds. * ‘restarting’: Like DFS but with better admissible pruning behavior. * ‘multisegbeam’: Beam search for predictors with multiple tokenizations ([sub]word/char-levels). * ‘syncbeam’: beam search which compares after consuming a special synchronization symbol instead of after each iteration. * ‘syntaxbeam’: beam search which ensures terminal symbol diversity. * ‘mbrbeam’: Uses an MBR-based criterion to select the next hypotheses at each time step. * ‘lenbeam’: Beam search that adds EOS extensions at each length to the results set. * ‘sepbeam’: Associates predictors with hypos in beam search and applies only one predictor instead of all for hypo expansion. * ‘combibeam’: Applies combination_scheme at each time step. * ‘fstbeam’: Beam search optimized for FST constrained search problems. * ‘flip’: This decoder works only for bag problems. It traverses the search space by switching two words in the hypothesis. Do not use bow predictor. * ‘bucket’: Works best for bag problems. Maintains buckets for each hypo length and extends a hypo in a bucket by one before selecting the next bucket. * ‘bigramgreedy’: Works best for bag problems. Collects bigram statistics and constructs hypos to score by greedily selecting high scoring bigrams. Do not use bow predictor with this search strategy. * ‘astar’: A* search. The heuristic function is configured using the –heuristics options. Possible choices: greedy, beam, multisegbeam, syncbeam, fstbeam, predlimitbeam, sepbeam, mbrbeam, lenbeam, syntaxbeam, combibeam, dfs, simpledfs, simplelendfs, restarting, flip, bucket, bigramgreedy, astar
`--beam=4`	Size of beam. Only used if –decoder is set to ‘beam’ or ‘astar’. For ‘astar’ it limits the capacity of the queue. Use –beam 0 for unlimited capacity.
`--sub_beam=0`	This denotes the maximum number of children of a partial hypothesis in beam-like decoders. If zero, this is set to –beam to reproduce standard beam search.
`--hypo_recombination=False`
	Activates hypothesis recombination. Has to be supported by the decoder. Applicable to beam, restarting, bow, bucket
`--allow_unk_in_output=True`
	If false, remove all UNKs in the final posteriors. Predictor distributions can still produce UNKs, but they have to be replaced by other words by other predictors
`--max_node_expansions=0`
	This parameter allows to limit the total number of search space expansions for a single sentence. If this is 0 we allow an unlimited number of expansions. If it is negative, the maximum number of expansions is this times the length of the source sentence. Supporting decoders: bigramgreedy, bow, bucket, dfs, flip, restarting
`--max_len_factor=2`
	Limits the length of hypotheses to avoid infinity loops in search strategies for unbounded search spaces. The length of any translation is limited to max_len_factor times the length of the source sentence.
`--early_stopping=True`
	Use this parameter if you are only interested in the first best decoding result. This option has a different effect depending on the used –decoder. For the beam decoder, it means stopping decoding when the best active hypothesis ends with </s>. If false, do not stop until all hypotheses end with EOS. For the dfs and restarting decoders, early stopping enables admissible pruning of branches when the accumulated score already exceeded the currently best score. DO NOT USE early stopping in combination with the dfs or restarting decoder when your predictors can produce positive scores!
`--heuristics=`	Comma-separated list of heuristics to use in heuristic based search like A. ‘predictor’: Predictor specific heuristics. Some predictors come with own heuristics - e.g. the fst predictor uses the shortest path to the final state. Using ‘predictor’ combines the specific heuristics of all selected predictors. * ‘greedy’: Do greedy decoding to get the heuristic costs. This is expensive but accurate. * ‘lasttoken’: Use the single score of the last token. * ‘stats’: Collect unigram statistics during decodingand compare actual hypothesis scores with the product of unigram scores of the used words. * ‘scoreperword’: Using this heuristic normalizes the previously accumulated costs by its length. It can be used for beam search with normalized scores, using a capacity (–beam), no other heuristic, and setting–decoder to astar. Note that all heuristics are inadmissible, i.e. A* is not guaranteed to find the globally best path.
`--heuristic_predictors=all`
	Comma separated list of indices of predictors considered by the heuristic. For example, if –predictors is set to nmt,length,fst then setting –heuristic_predictors to 0,2 results in using nmt and fst in the heuristics. Use ‘all’ to use all predictors in the heuristics
`--multiseg_tokenizations=`
	This argument must be used when the multisegbeam decoder is activated. For each predictor, it defines the tokenizations used for it (comma separated). If a path to a word map file is provided, the corresponding predictor is operating on the pure word level. The ‘mixed:’ prefix activates mixed word/character models according Wu et al. (2016). the ‘eow’: prefix assumes to find explicit </w>specifiers in the word maps which mark end of words. This is suitable for subword units, e.g. bpe.
`--cache_heuristic_estimates=True`
	Whether to cache heuristic future cost estimates. This is especially useful with the greedy heuristic.
`--pure_heuristic_scores=False`
	If this is set to false, heuristic decoders as A* score hypotheses with the sum of the partial hypo score plus the heuristic estimates (lik in standard A*). Set to true to use the heuristic estimates only
`--restarting_node_score=difference`
	This parameter defines the strategy how the restarting decoder decides from which node to restart. * ‘difference’: Restart where the difference between 1-best and 2-best is smallest * ‘absolute’: Restart from the unexplored node with the best absolute score globally. * ‘constant’: Constant node score. Simulates FILO or uniform distribution with restarting_stochastic. * ‘expansions’: Inverse of the number of expansions on the node. Discourages expanding arcs on the same node repeatedly. Possible choices: difference, absolute, constant, expansions
`--low_decoder_memory=True`
	Some decoding strategies support modes which do not change the decoding logic, but make use of the inadmissible pruning parameters like max_expansions to reduce memory consumption. This usually requires some computational overhead for cleaning up data structures. Applicable to restarting and bucket decoders.
`--stochastic_decoder=False`
	Activates stochastic decoders. Applicable to the decoders restarting, bow, bucket
`--decode_always_single_step=False`
	If this is set to true, heuristic depth first search decoders like restarting or bow always perform a single decoding step instead of greedy decoding. Handle with care...
`--flip_strategy=move`
	Defines the hypothesis transition in the flip decoder. ‘flip’ flips two words, ‘move’ moves a word to a different position Possible choices: move, flip
`--bucket_selector=maxscore`
	Defines the bucket selection strategy for the bucket decoder. * ‘iter’: Rotate through all lengths * ‘iter-n’: Rotate through all lengths n times * ‘maxscore’: Like iter, but filters buckets with hypos worse than a threshold. Threshold is increased if no bucket found * ‘score’: Select bucket with the highest bucket score. The bucket score is determined by the bucket_score_strategy * ‘score-end’: Start with the bucket with highest bucket score, and iterate through all subsequent buckets.
`--bucket_score_strategy=difference`
	Defines how buckets are scored for the bucket decoder. Usually, the best hypo in the bucket is compared to the global best score of that length according –collect_statistics. * ‘difference’: Difference between both hypos * ‘heap’: Use best score on bucket heap directly * ‘absolute’: Use best hypo score in bucket directly * ‘constant’: Uniform bucket scores. Possible choices: difference, heap, absolute, constant
`--collect_statistics=best`
	Determines over which hypotheses statistics are collected. * ‘best’: Collect statistics from the current best full hypothesis * ‘full’: Collect statistics from all full hypos * ‘all’: Collect statistics also from partial hypos Applicable to the bucket decoder, the heuristic of the bow predictor, and the heuristic ‘stats’. Possible choices: best, full, all
`--heuristic_scores_file=`
	The bow predictor heuristic and the stats heuristic sum up the unigram scores of words as heuristic estimate. This option should point to a mapping file from word-id to (unigram) score. If this is empty, the unigram scores are collected during decoding for each sentence separately according –collect_statistics.
`--score_lower_bounds_file=`
	Admissible pruning in some decoding strategies can be improved by providing lower bounds on complete hypothesis scores. This is useful to improve the efficiency of exhaustive search, with lower bounds found by e.g. beam search. The expected file format is just a text file with line separated scores for each sentence. Supported by the following decoders: astar, bigramgreedy, bow, bucket, dfs, flip, restarting
`--decoder_diversity_factor=-1.0`
	If this is greater than zero, promote diversity between active hypotheses during decoding. The exact way of doing this depends on –decoder: * The ‘beam’ decoder roughly follows the approach in Li and Jurafsky, 2016 * The ‘bucket’ decoder reorders the hypotheses in a bucket by penalizing hypotheses with the number of expanded hypotheses from the same parent.
`--sync_symbol=-1`
	Used for the syncbeam decoder. Synchronization symbol for hypothesis comparision. If negative, use syntax_[min\|max]_terminal_id.
`--max_word_len=25`
	Maximum length of a single word. Only applicable to the decoders multisegbeam and syncbeam.
`--mbrbeam_smooth_factor=0.01`
	If positive, apply mix the evidence space distribution with the uniform distribution using this factor
`--mbrbeam_selection_strategy=oracle_bleu`
	Defines the hypo selection strategy for mbrbeam. See the mbrbeam docstring for more information. ‘bleu’: Select the n best hypotheses with the best expected BLEU. ‘oracle_bleu’: Optimize the expected oracle BLEU score of the n-best list. Possible choices: bleu, oracle_bleu
`--mbrbeam_evidence_strategy=renorm`
	Defines the way the evidence space is estimated for mbrbeam. See the mbrbeam docstring for more. ‘maxent’: Maximum entropy criterion on n-gram probs. ‘renorm’: Only use renormalized scores of the hypos which are currently in the beam. Possible choices: maxent, renorm
`--pred_limits=`	If predlimitbeam decoder is used, comma-separated list of maximum accumulated values for each predictor at each node expansion. Use ‘-inf’ or ‘neg_inf’ to not restrict a specific predictor.
`--simplelendfs_lower_bounds_file=`
	Path to a file with length dependent lower lower bounds for the simplelendfs decoder. Each line must be in the format <len1>:<lower-bound1> ... <lenN>:<lower-boundN>.

Output options

`--nbest=0`	Maximum number of hypotheses in the output files. Set to 0 to output all hypotheses found by the decoder. If you use the beam or astar decoder, this option is limited by the beam size.
`--fst_unk_id=999999998`
	SGNMT uses the ID 0 for UNK. However, this clashes with OpenFST when writing FSTs as OpenFST reserves 0 for epsilon arcs. Therefore, we use this ID for UNK instead. Note that this only applies to output FSTs created by the fst or sfst output handler, or FSTs used by the fsttok wrapper. Apart from that, UNK is still represented by the ID 0.
`--output_path=sgnmt-out.%s`
	Path to the output files generated by SGNMT. You can use the placeholder %%s for the format specifier
`--outputs=`	Comma separated list of output formats: * ‘text’: First best translations in plain text format * ‘nbest’: Moses’ n-best format with separate scores for each predictor. * ‘fst’: Translation lattices in OpenFST format with sparse tuple arcs. * ‘sfst’: Translation lattices in OpenFST format with standard arcs (i.e. combined scores). * ‘timecsv’: Generate CSV files with separate predictor scores for each time step. * ‘ngram’: MBR-style n-gram posteriors. For extract_scores_along_reference.py, select one of the following output formats: * ‘json’: Dump data in pretty JSON format. * ‘pickle’: Dump data as binary pickle. The path to the output files can be specified with –output_path
`--remove_eos=True`
	Whether to remove </S> symbol on output.
`--src_wmap=`	Path to the source side word map (Format: <word> <id>). See –preprocessing and –postprocessing for more details.
`--trg_wmap=`	Path to the source side word map (Format: <word> <id>). See –preprocessing and –postprocessing for more details.
`--wmap=`	Sets –src_wmap and –trg_wmap at the same time
`--preprocessing=id`
	Preprocessing strategy for source sentences. * ‘id’: Input sentences are expected in indexed representation (321 123 456 4444 ...). * ‘word’: Apply –src_wmap on the input. * ‘char’: Split into characters, then apply (character-level) –src_wmap. * ‘bpe’: Apply Sennrich’s subword_nmt segmentation SGNMT style (as in $SGNMT/scripts/subword-nmt) * ‘bpe@@’: Apply Sennrich’s subword_nmt segmentation with original default values (removing </w>, using @@ separator) Possible choices: id, word, char, bpe
`--postprocessing=id`
	Postprocessing strategy for output sentences. See –preprocessing for more. Possible choices: id, wmap, char, subword_nmt
`--bpe_codes=`	Must be set if preprocessing=bpe. Path to the BPE codes file from Sennrich’s subword_nmt.

General predictor options

`--predictors=`	Comma separated list of predictors. Predictors are scoring modules which define a distribution over target words given the history and some side information like the source sentence. If vocabulary sizes differ among predictors, we fill in gaps with predictor UNK scores.: * ‘t2t’: Tensor2Tensor predictor. Options: t2t_usr_dir, t2t_model, t2t_problem, t2t_hparams_set, t2t_checkpoint_dir, pred_src_vocab_size, pred_trg_vocab_size * ‘segt2t’: Segmentation-based T2T predictor. Options: t2t_usr_dir, t2t_model, t2t_problem, t2t_hparams_set, t2t_checkpoint_dir, pred_src_vocab_size, pred_trg_vocab_size * ‘fertt2t’: T2T predictor for fertility models. Options: syntax_pop_id, t2t_usr_dir, t2t_model, t2t_problem, t2t_hparams_set, t2t_checkpoint_dir, pred_src_vocab_size, pred_trg_vocab_size * ‘nizza’: Nizza alignment models. Options: nizza_model, nizza_hparams_set, nizza_checkpoint_dir, pred_src_vocab_size, pred_trg_vocab_size * ‘lexnizza’: Uses Nizza lexical scores for checking the source word coverage. Options: nizza_model, nizza_hparams_set, nizza_checkpoint_dir, pred_src_vocab_size, pred_trg_vocab_size, lexnizza_alpha, lexnizza_beta, lexnizza_shortlist_strategies, lexnizza_max_shortlist_length, lexnizza_trg2src_model, lexnizza_trg2src_hparams_set, lexnizza_trg2src_checkpoint_dir, lexnizza_min_id * ‘fairseq’: fairseq predictor. Options: fairseq_path, fairseq_user_dir, fairseq_lang_pair, n_cpu_threads * ‘bracket’: Well-formed bracketing. Options: syntax_max_terminal_id, syntax_pop_id, syntax_max_depth, extlength_path * ‘osm’: Well-formed operation sequences. Options: osm_use_jumps, osm_use_auto_pop, osm_use_pop2, osm_use_src_eop, osm_use_copy, osm_use_unpop, src_wmap, trg_wmap * ‘forcedosm’: Forced decoding under OSM. Use in combination with osm predictor. Options: trg_test, trg_wmap * ‘kenlm’: n-gram language model (KenLM). Options: lm_pathr * ‘forced’: Forced decoding with one reference Options: trg_test,forced_spurious * ‘forcedlst’: Forced decoding with a Moses n-best list (n-best list rescoring) Options: trg_test, forcedlst_match_unk forcedlst_sparse_feat, use_nbest_weights * ‘bow’: Forced decoding with one bag-of-words ref. Options: trg_test, heuristic_scores_file, bow_heuristic_strategies, bow_accept_subsets, bow_accept_duplicates, pred_trg_vocab_size * ‘bowsearch’: Forced decoding with one bag-of-words ref. Options: hypo_recombination, trg_test, heuristic_scores_file, bow_heuristic_strategies, bow_accept_subsets, bow_accept_duplicates, pred_trg_vocab_size * ‘fst’: Deterministic translation lattices Options: fst_path, use_fst_weights, normalize_fst_weights, fst_to_log, fst_skip_bos_weight * ‘nfst’: Non-deterministic translation lattices Options: fst_path, use_fst_weights, normalize_fst_weights, fst_to_log, fst_skip_bos_weight * ‘rtn’: Recurrent transition networks as created by HiFST with late expansion. Options: rtn_path, use_rtn_weights, minimize_rtns, remove_epsilon_in_rtns, normalize_rtn_weights * ‘lrhiero’: Direct Hiero (left-to-right Hiero). This is an EXPERIMENTAL implementation of LRHiero. Options: rules_path, grammar_feature_weights, use_grammar_weights * ‘wc’: Number of words feature. Options: wc_word, negative_wc. * ‘unkc’: Poisson model for number of UNKs. Options: unk_count_lambdas, pred_src_vocab_size. * ‘ngramc’: Number of ngram feature. Options: ngramc_path, ngramc_order. * ‘length’: Target sentence length model Options: src_test_raw, length_model_weights, use_length_point_probs * ‘extlength’: External target sentence lengths Options: extlength_path All predictors can be combined with one or more wrapper predictors by adding the wrapper name separated by a _ symbol. Following wrappers are available: * ‘idxmap’: Add this wrapper to predictors which use an alternative word map. Options: src_idxmap, trg_idxmap * ‘maskvocab’: Hides certain words in the SGNMT vocabulary from the masked predictor. Options: maskvocab_vocab * ‘rank’: Uses the rank of a word in the score list of the wrapped predictor, not the score itself. * ‘glue’: Masks a sentence-level predictor when SGNMTis running on the document-level. * ‘altsrc’: This wrapper loads source sentences from an alternative source. Options: altsrc_test * ‘unkvocab’: This wrapper explicitly excludes matching word indices higher than pred_trg_vocab_size with UNK scores. Options: pred_trg_vocab_size * ‘fsttok’: Uses an FST to transduce SGNMT tokens to predictor tokens. Options: fsttok_path, fsttok_max_pending_score, fst_unk_id * ‘word2char’: Wraps word-level predictors when SGNMT is running on character level. Options: word2char_map * ‘skipvocab’: Skip a subset of the predictor vocabulary. Options: skipvocab_vocab, skipvocab_stop_size * ‘ngramize’: Extracts n-gram posterior from predictors without token-level history. Options: min_ngram_order, max_ngram_order, max_len_factor Note that you can use multiple instances of the same predictor. For example, ‘nmt,nmt,nmt’ can be used for ensembling three NMT systems. You can often override parts of the predictor configurations for subsequent predictors by adding the predictor number (e.g. see –nmt_config2 or –fst_path2)
`--predictor_weights=`
	Predictor weights. Have to be specified consistently with –predictor, e.g. if –predictor is ‘bla_fst,nmt’ then set their weights with –predictor_weights bla-weight_fst-weight,nmt-weight, e.g. ‘–predictor_weights 0.1_0.3,0.6’. Default (empty string) means that each predictor gets assigned the weight 1. You may specify a single weight for wrapped predictors (e.g. 0.3,0.6) if the wrapper is unweighted.
`--per_sentence_predictor_weights=False`
	Assign predictor weights for each sentence. Must be set consistent with –predictors as for –predictor_weights. Per-sentence weights are set by appending a comma-separated string of weights to the end of source sentences. e.g. ‘X1 X2 X3’ with two predictors might become ‘X1 X2 X3 pred1_w,pred2_w’ a sentence with no weight str means each predictor is assigned the weights set in –predictor_weights or 1 if –predictor_weights is not set
`--interpolation_strategy=`
	This parameter specifies how the predictor weights are used. ‘fixed’: Predictor weights do not change. ‘crossentropy’: Set predictor weight according the crossentropy of its posterior to all other predictors. ‘entropy’: Predictors with low entropy distributions get large weights. ‘moe’: Use a Mixture of Experts gating network to decide predictor weights at each time step. See the sgnmt_moe project on how to train it. Interpolation strategies can be specified for each predictor separately, eg ‘fixed\|moe,moe,fixed,moe,moe’ means that a MoE network with output dimensionality 4 will decide for the 2nd, 4th, and 5th predictors, the 1st predictor mixes the prior weight with the MoE prediction, and the rest keep their weight from predictor_weights.
`--interpolation_weights_mean=arith`
	Used when –interpolation_strategy contains \|. Specifies the way interpolation weights are combined.’arith’metirc, ‘geo’metric, ‘prob’abilistic. Possible choices: arith, geo, prob
`--interpolation_smoothing=0.0`
	When using interpolation strategies, smooth them with weights(1-alpha)+alphauniformweights.
`--moe_config=`	Only for MoE interpolation strategy: Semicolon-separated key=value pairs specifying the MoE network
`--moe_checkpoint_dir=`
	Only for MoE interpolation strategy: Path to the TensorFlow checkpoint directory.
`--closed_vocabulary_normalization=none`
	This parameter specifies the way closed vocabulary predictors (e.g. NMT) are normalized. Closed vocabulary means that they have a predefined vocabulary. Open vocabulary predictors (e.g. fst) can potentially produce any word, or have a very large vocabulary. * ‘none’: Use unmodified scores for closed vocabulary predictors * ‘exact’: Renormalize scores depending on the probability mass which they distribute to words outside the vocabulary via the UNK probability. * ‘rescale_unk’: Rescale UNK probabilities and leave all other scores unmodified. Results in a distribution if predictor scores are stochastic. * ‘reduced’: Normalize to vocabulary defined by the open vocabulary predictors at each time step. * ‘non_zero’: only keep scores which are strictly < 0 after combination. Possible choices: none, exact, reduced, rescale_unk, non_zero
`--combination_scheme=sum`
	This parameter controls how the combined hypothesis score is calculated from the predictor scores and weights. * ‘sum’: The combined score is the weighted sum of all predictor scores * ‘length_norm’: Renormalize scores by the length of hypotheses. * ‘bayesian’: Apply the Bayesian LM interpolation scheme from Allauzen and Riley to interpolate the predictor scores * ‘bayesian_state_dependent’: Like Bayesian but with model-task weights defined by ‘bayesian_domain_task_weights’ parameter* ‘bayesian_loglin’: Like bayesian, but retain loglinear framework. Possible choices: sum, length_norm, bayesian, bayesian_loglin, bayesian_state_dependent
`--bayesian_domain_task_weights`
	comma-separated string of num_predictors^2 weights where rows are domains and tasks are columns, e.g. w[k, t] gives weight on domain k for task t. will be reshaped into a num_predictors x num_predictors matrix
`--t2t_unk_id=-1`
	unk id for t2t. Used by the t2t predictor
`--pred_src_vocab_size=30000`
	Predictor source vocabulary size. Used by the bow, bowsearch, t2t, nizza, unkc predictors.
`--pred_trg_vocab_size=30000`
	Predictor target vocabulary size. Used by thebow, bowsearch, t2t, nizza, unkc predictors.

Neural predictor options

`--gnmt_alpha=0.0`
	If this is greater than zero and the combination scheme is set to length_norm, use Google-style length normalization (Wu et al., 2016) rather than simply dividing by translation length.
`--t2t_usr_dir=`	Available for the t2t predictor. See the –t2t_usr_dir argument in tensor2tensor.
`--t2t_model=transformer`
	Available for the t2t predictor. Name of the tensor2tensor model.
`--t2t_problem=translate_ende_wmt32k`
	Available for the t2t predictor. Name of the tensor2tensor problem.
`--t2t_hparams_set=transformer_base_single_gpu`
	Available for the t2t predictor. Name of the tensor2tensor hparams set.
`--t2t_checkpoint_dir=`
	Available for the t2t predictor. Path to the tensor2tensor checkpoint directory. Same as –output_dir in t2t_trainer.
`--nizza_model=model1`
	Available for the nizza predictor. Name of the nizza model.
`--nizza_hparams_set=model1_default`
	Available for the nizza predictor. Name of the nizza hparams set.
`--nizza_checkpoint_dir=`
	Available for the nizza predictor. Path to the nizza checkpoint directory. Same as –model_dir in nizza_trainer.
`--lexnizza_trg2src_model=model1`
	Available for the lexnizza predictor. Name of the target-to-source nizza model.
`--lexnizza_trg2src_hparams_set=model1_default`
	Available for the lexnizza predictor. Name of the target-to-source nizza hparams set.
`--lexnizza_trg2src_checkpoint_dir=`
	Available for the lexnizza predictor. Path to the target-to-source nizza checkpoint directory. Same as –model_dir in nizza_trainer.
`--lexnizza_shortlist_strategies=top10`
	Comma-separated list of strategies to extract a short list of likely translations from lexical Model1 scores. Strategies are combined using the union operation. Available strategies: * top<N>: Select the top N words. * prob<p>: Select the top words such that their combined probability mass is greater than p.
`--lexnizza_alpha=0.0`
	Score of each word which matches a short list.
`--lexnizza_beta=1.0`
	Penalty for each uncovered word at the end.
`--lexnizza_max_shortlist_length=0`
	If positive and a shortlist is longer than this limit, initialize the coverage vector at this position with 1
`--lexnizza_min_id=0`
	Word IDs lower than this are not considered by lexnizza. Can be used to filter out frequent words.
`--fairseq_path=`
	Points to the model file (*.pt) for the fairseq predictor. Like –path in fairseq-interactive.
`--fairseq_user_dir=`
	fairseq user directory for additional models.
`--fairseq_lang_pair=`
	Language pair such as ‘en-fr’ for fairseq. Used to load fairseq dictionaries

Structured predictor options

`--syntax_max_depth=30`
	Maximum depth of generated trees. After this depth is reached, only terminals and POP are allowed on the next layer.
`--syntax_root_id=-1`
	Must be set for the layerbylayer predictor. ID of the initial target root label.
`--syntax_pop_id=-1`
	ID of the closing bracket in output syntax trees. layerbylayer and t2t predictors support single integer values. The bracket predictor can take a comma-separated list of integers.
`--syntax_min_terminal_id=0`
	All token IDs smaller than this are considered to be non-terminal symbols except the ones specified by –syntax_terminal_list
`--syntax_max_terminal_id=30003`
	All token IDs larger than this are considered to be non-terminal symbols except the ones specified by –syntax_terminal_list
`--syntax_terminal_list=`
	List of IDs which are explicitly treated as terminals, in addition to all IDs lower or equal –syntax_max_terminal_id. This can be used to exclude the POP symbol from the list of non-terminals even though it has a ID higher than max_terminal_id.
`--syntax_nonterminal_ids=`
	Explicitly define non-terminals with a file containing their ids. Useful when non-terminals do not occur consecutively in data (e.g. internal bpe units.)
`--osm_use_jumps=True`
	If this is true, use SET_MARKER, JUMP_BWD and JUMP_FWD operations to control a target write head
`--osm_use_auto_pop=False`
	If this is true, all word insertion operations automatically move the source read head by one.
`--osm_use_unpop=False`
	If this is true, use SRC_UNPOP to move the source read head by one to the left.
`--osm_use_pop2=False`
	If this is true, use phrase-based OSM with two source read heads.
`--osm_use_src_eop=False`
	If this is true, the source sentence is pre-segmented with EOP markers, and SRC_POP operations advance the read head to the next segment.
`--osm_use_copy=False`
	If this is true, use COPY operation that automatically move source read head by one. The aligned source token is not copied by SGNMT - this has to be done in post-processing.

Length predictor options

`--src_test_raw=`
	Only required for the ‘length’ predictor. Path to original source test set WITHOUT word indices. This is used to extract features for target sentence length predictions
`--length_model_weights=`
	Only required for length predictor. String of length model parameters.
`--use_length_point_probs=False`
	If this is true, the length predictor outputs probability 1 for all tokens except </S>. For </S> it uses the point probability given by the length model. If this is set to false, we normalize the predictive score by comparing P(l=x) and P(l<x)
`--length_model_offset=0`
	The target sentence length model is applied to hypothesis length minus length_model_offst
`--extlength_path=`
	Only required for the ‘extlength’ predictor. This is the path to the file which specifies the length distributions for each sentence. Each line consists of blank separated ‘<length>:<logprob>’ pairs.

Count predictor options

`--unk_count_lambdas=1.0`
	Model parameters for the UNK count model: comma-separated list of lambdas for Poisson distributions. The first float specifies the Poisson distribution over the number of UNKs in the hypotheses given that the number of UNKs on the source side is 0. The last lambda specifies the distribution given >=n-1 UNKs in the source sentence.
`--wc_word=-1`	If negative, the wc predictor counts all words. Otherwise, count only the specific word
`--negative_wc=True`
	If true, wc is the negative word count and thus makes translations shorter. Otherwise, it makes translations longer. Set early_stopping to False if negative_wc=False and wc has a positive weight
`--wc_nonterminal_penalty=False`
	if true, use syntax_[max\|min]_terminal_id to apply penalty to all non-terminals
`--syntax_nonterminal_factor=1.0`
	penalty factor for WeightNonTerminalWrapper to apply
`--ngramc_path=ngramc/%d.txt`
	Only required for ngramc predictor. The ngramc predictor counts the number of ngrams and multiplies them with the factors defined in the files. The format is one ngram per line ‘<ngram> : <score>’. You can use the placeholder %%d for the sentence index.
`--ngramc_order=0`
	If positive, count only ngrams of the specified Order. Otherwise, count all ngrams
`--min_ngram_order=1`
	Minimum ngram order for ngramize wrapper and ngram output format
`--max_ngram_order=4`
	Maximum ngram order for ngramize wrapper amd ngram output format
`--ngramc_discount_factor=-1.0`
	If this is non-negative, discount ngram counts by this factor each time the ngram is consumed
`--skipvocab_vocab=`
	Tokens specified here are skipped by the skipvocab predictor wrapper. Accepts strings like ‘4,7,12’, ‘>66,4’, etc.
`--skipvocab_stop_size=1`
	The internal beam search of the skipvocab predictor wrapper stops if the best stop_size scores are for in-vocabulary words (ie. with index lower or equal skipvocab_max_id

Forced decoding predictor options

`--trg_test=`	Path to target test set (with integer tokens). This is only required for the predictors ‘forced’ and ‘forcedlst’. For ‘forcedlst’ this needs to point to an n-best list in Moses format.
`--forced_spurious=`
	Comma separated list of token IDs that are allowed to occur anywhere in a sequence when the ‘forced’ predictor is used
`--forcedlst_sparse_feat=`
	Per default, the forcedlst predictor uses the combined score in the Moses nbest list. Alternatively, for nbest lists in sparse feature format, you can specify the name of the features which should be used instead.
`--forcedlst_match_unk=False`
	Only required for forcedlst predictor. If true, allow any word where the n-best list has an UNK.
`--use_nbest_weights=False`
	Only required for forcedlst predictor. Whether to use the scores in n-best lists.
`--bow_heuristic_strategies=remaining`
	Defines the form of heuristic estimates of the bow predictor. Comma-separate following values: * remaining: sum up unigram estimates for all words in the bag which haven’t been consumed * consumed: Use the difference between the actual hypothesis score and the sum of unigram estimates of consumed words as score
`--bow_accept_subsets=False`
	If this is set to false, the bow predictor enforces exact correspondence between bag and words in complete hypotheses. If false, it ensures that hypotheses are consistent with the bag (i.e. do not contain words outside the bag) but do not necessarily have all words in the bag
`--bow_accept_duplicates=False`
	If this is set to true, the bow predictor allows a word in the bag to appear multiple times, i.e. the exact count of the word is not enforced. Can only be used in conjunction with bow_accept_subsets
`--bow_diversity_heuristic_factor=-1.0`
	If this is greater than zero, promote diversity between bags via the bow predictor heuristic. Bags which correspond to bags of partial bags of full hypotheses are penalized by this factor.

Wrapper predictor options

`--src_idxmap=`	Only required for idxmap wrapper predictor. Path to the source side mapping file. The format is ‘<index> <alternative_index>’. The mapping must be complete and should be a bijection.
`--trg_idxmap=`	Only required for idxmap wrapper predictor. Path to the target side mapping file. The format is ‘<index> <alternative_index>’. The mapping must be complete and should be a bijection.
`--maskvocab_vocab=`
	Token IDs which are masked out by the maskvocab predictor wrapper. Accepts strings like ‘4,7,12’, ‘>66,4’, etc.
`--altsrc_test=test_en.alt`
	Only required for altsrc wrapper predictor. Path to the alternative source sentences.
`--word2char_map=word2char.map`
	Only required for word2char wrapper predictor. Path to a mapping file from word ID to sequence of character IDs (format: <word-id> <char-id1> <char-id2>...). All character IDs which do not occur in this mapping are treated as word boundary symbols.
`--fsttok_path=tok.fst`
	For the fsttok wrapper. Defines the path to the FSt which transduces sequences of SGNMT tokens (eg. characters) to predictor tokens (eg BPEs). FST may be non-deterministic and contain epsilons.
`--fsttok_max_pending_score=5.0`
	Applicable if an FST used by the fsttok wrapper is non-deterministic. In this case, one predictor state may correspond to multiple nodes in the FST. We prune nodes which are this much worse than the best scoring node with the same history.

Hiero predictor options

`--rules_path=rules/rules`
	Only required for predictor lrhiero. Path to the ruleXtract rules file.
`--use_grammar_weights=False`
	Whether to use weights in the synchronous grammar for the lrhiero predictor. If set to false, use uniform grammar scores.
`--grammar_feature_weights=`
	If rules_path points to a factorized rules file (i.e. containing rules associated with a number of features, not only one score) SGNMT uses a weighted sum for them. You can specify the weights for this summation here (comma-separated) or leave it blank to sum them up equally weighted.

(Neural) LM predictor options

`--lm_path=lm/ngram.lm.gz`
	Path to the ngram LM file in ARPA format

FST and RTN predictor options

`--fst_path=fst/%d.fst`
	Only required for fst and nfst predictor. Sets the path to the OpenFST translation lattices. You can use the placeholder %%d for the sentence index.
`--syntax_path`	Only required for parse predictor. Sets the path to the grammar non-terminal map determiningpermitted parses
`--syntax_bpe_path`
	Internal can-follow syntax for subwords
`--syntax_word_out=True`
	Whether to output word tokens only from parsepredictor.
`--syntax_allow_early_eos=False`
	Whether to let parse predictor output EOS instead of any terminal
`--syntax_norm_alpha=1.0`
	Normalizing alpha for internal beam search
`--syntax_max_internal_len=35`
	Max length of non-terminal sequences to consider
`--syntax_internal_beam=1`
	Beam size when internally searching for wordsusing parse predictor
`--syntax_consume_ooc=False`
	Whether to let parse predictor consume tokens which are not permitted by the current LHS
`--syntax_tok_grammar=False`
	Whether to use a token-based grammar.Default uses no internal grammar
`--syntax_terminal_restrict=True`
	Whether to restrict inside terminals.
`--syntax_internal_only=False`
	Whether to restrict only non-terminals.
`--syntax_eow_ids`
	ids for end-of-word tokens
`--syntax_terminal_ids`
	ids for terminal tokens
`--rtn_path=rtn/`
	Only required for rtn predictor. Sets the path to the RTN directory as created by HiFST
`--fst_skip_bos_weight=True`
	This option applies to fst and nfst predictors. Lattices produced by HiFST contain the <S> symbol and often have scores on the corresponding arc. However, SGNMT skips <S> and this score is not regarded anywhere. Set this option to true to add the <S> scores. This ensures that the complete path scores for the [n]fst and rtn predictors match the corresponding path weights in the original FST as obtained with fstshortestpath.
`--fst_to_log=True`
	Multiply weights in the FST by -1 to transform them from tropical semiring into logprobs.
`--use_fst_weights=False`
	Whether to use weights in FSTs for thenfst and fst predictor.
`--use_rtn_weights=False`
	Whether to use weights in RTNs.
`--minimize_rtns=True`
	Whether to do determinization, epsilon removal, and minimization after each RTN expansion.
`--remove_epsilon_in_rtns=True`
	Whether to remove epsilons after RTN expansion.
`--normalize_fst_weights=False`
	Whether to normalize weights in FSTs. This forces the weights on outgoing edges to sum up to 1. Applicable to fst and nfst predictor.
`--normalize_rtn_weights=False`
	Whether to normalize weights in RTNs. This forces the weights on outgoing edges to sum up to 1. Applicable to rtn predictor.

Override options

`--t2t_model2=`	Overrides –t2t_model for the second t2t predictor
`--t2t_problem2=`
	Overrides –t2t_problem for the second t2t predictor
`--t2t_hparams_set2=`
	Overrides –t2t_hparams_set for the second t2t predictor
`--t2t_checkpoint_dir2=`
	Overrides –t2t_checkpoint_dir for the second t2t predictor
`--pred_src_vocab_size2=0`
	Overrides –pred_src_vocab_size for the second t2t predictor
`--t2t_unk_id2=3`
	Overrides –t2t_unk_id for the second t2t predictor
`--pred_trg_vocab_size2=0`
	Overrides –pred_trg_vocab_size for the second t2t predictor
`--altsrc_test2=`
	Overrides –altsrc_test for the second altsrc
`--word2char_map2=`
	Overrides –word2char_map for the second word2char
`--fsttok_path2=`
	Overrides –fsttok_path for the second fsttok
`--src_idxmap2=`	Overrides –src_idxmap for the second indexmap
`--trg_idxmap2=`	Overrides –trg_idxmap for the second indexmap
`--fst_path2=`	Overrides –fst_path for the second fst predictor
`--forcedlst_sparse_feat2=`
	Overrides –forcedlst_sparse_feat for the second forcedlst predictor
`--ngramc_path2=`
	Overrides –ngramc_path for the second ngramc
`--ngramc_order2=0`
	Overrides –ngramc_order for the second ngramc
`--fairseq_path2=`
	Overrides –fairseq_path for the second fairseq
`--t2t_model3=`	Overrides –t2t_model for the third t2t predictor
`--t2t_problem3=`
	Overrides –t2t_problem for the third t2t predictor
`--t2t_hparams_set3=`
	Overrides –t2t_hparams_set for the third t2t predictor
`--t2t_checkpoint_dir3=`
	Overrides –t2t_checkpoint_dir for the third t2t predictor
`--pred_src_vocab_size3=0`
	Overrides –pred_src_vocab_size for the third t2t predictor
`--t2t_unk_id3=3`
	Overrides –t2t_unk_id for the third t2t predictor
`--pred_trg_vocab_size3=0`
	Overrides –pred_trg_vocab_size for the third t2t predictor
`--altsrc_test3=`
	Overrides –altsrc_test for the third altsrc
`--word2char_map3=`
	Overrides –word2char_map for the third word2char
`--fsttok_path3=`
	Overrides –fsttok_path for the third fsttok
`--src_idxmap3=`	Overrides –src_idxmap for the third indexmap
`--trg_idxmap3=`	Overrides –trg_idxmap for the third indexmap
`--fst_path3=`	Overrides –fst_path for the third fst predictor
`--forcedlst_sparse_feat3=`
	Overrides –forcedlst_sparse_feat for the third forcedlst predictor
`--ngramc_path3=`
	Overrides –ngramc_path for the third ngramc
`--ngramc_order3=0`
	Overrides –ngramc_order for the third ngramc
`--fairseq_path3=`
	Overrides –fairseq_path for the third fairseq
`--t2t_model4=`	Overrides –t2t_model for the 4-th t2t predictor
`--t2t_problem4=`
	Overrides –t2t_problem for the 4-th t2t predictor
`--t2t_hparams_set4=`
	Overrides –t2t_hparams_set for the 4-th t2t predictor
`--t2t_checkpoint_dir4=`
	Overrides –t2t_checkpoint_dir for the 4-th t2t predictor
`--pred_src_vocab_size4=0`
	Overrides –pred_src_vocab_size for the 4-th t2t predictor
`--t2t_unk_id4=3`
	Overrides –t2t_unk_id for the 4-th t2t predictor
`--pred_trg_vocab_size4=0`
	Overrides –pred_trg_vocab_size for the 4-th t2t predictor
`--altsrc_test4=`
	Overrides –altsrc_test for the 4-th altsrc
`--word2char_map4=`
	Overrides –word2char_map for the 4-th word2char
`--fsttok_path4=`
	Overrides –fsttok_path for the 4-th fsttok
`--src_idxmap4=`	Overrides –src_idxmap for the 4-th indexmap
`--trg_idxmap4=`	Overrides –trg_idxmap for the 4-th indexmap
`--fst_path4=`	Overrides –fst_path for the 4-th fst predictor
`--forcedlst_sparse_feat4=`
	Overrides –forcedlst_sparse_feat for the 4-th forcedlst predictor
`--ngramc_path4=`
	Overrides –ngramc_path for the 4-th ngramc
`--ngramc_order4=0`
	Overrides –ngramc_order for the 4-th ngramc
`--fairseq_path4=`
	Overrides –fairseq_path for the 4-th fairseq
`--t2t_model5=`	Overrides –t2t_model for the 5-th t2t predictor
`--t2t_problem5=`
	Overrides –t2t_problem for the 5-th t2t predictor
`--t2t_hparams_set5=`
	Overrides –t2t_hparams_set for the 5-th t2t predictor
`--t2t_checkpoint_dir5=`
	Overrides –t2t_checkpoint_dir for the 5-th t2t predictor
`--pred_src_vocab_size5=0`
	Overrides –pred_src_vocab_size for the 5-th t2t predictor
`--t2t_unk_id5=3`
	Overrides –t2t_unk_id for the 5-th t2t predictor
`--pred_trg_vocab_size5=0`
	Overrides –pred_trg_vocab_size for the 5-th t2t predictor
`--altsrc_test5=`
	Overrides –altsrc_test for the 5-th altsrc
`--word2char_map5=`
	Overrides –word2char_map for the 5-th word2char
`--fsttok_path5=`
	Overrides –fsttok_path for the 5-th fsttok
`--src_idxmap5=`	Overrides –src_idxmap for the 5-th indexmap
`--trg_idxmap5=`	Overrides –trg_idxmap for the 5-th indexmap
`--fst_path5=`	Overrides –fst_path for the 5-th fst predictor
`--forcedlst_sparse_feat5=`
	Overrides –forcedlst_sparse_feat for the 5-th forcedlst predictor
`--ngramc_path5=`
	Overrides –ngramc_path for the 5-th ngramc
`--ngramc_order5=0`
	Overrides –ngramc_order for the 5-th ngramc
`--fairseq_path5=`
	Overrides –fairseq_path for the 5-th fairseq
`--t2t_model6=`	Overrides –t2t_model for the 6-th t2t predictor
`--t2t_problem6=`
	Overrides –t2t_problem for the 6-th t2t predictor
`--t2t_hparams_set6=`
	Overrides –t2t_hparams_set for the 6-th t2t predictor
`--t2t_checkpoint_dir6=`
	Overrides –t2t_checkpoint_dir for the 6-th t2t predictor
`--pred_src_vocab_size6=0`
	Overrides –pred_src_vocab_size for the 6-th t2t predictor
`--t2t_unk_id6=3`
	Overrides –t2t_unk_id for the 6-th t2t predictor
`--pred_trg_vocab_size6=0`
	Overrides –pred_trg_vocab_size for the 6-th t2t predictor
`--altsrc_test6=`
	Overrides –altsrc_test for the 6-th altsrc
`--word2char_map6=`
	Overrides –word2char_map for the 6-th word2char
`--fsttok_path6=`
	Overrides –fsttok_path for the 6-th fsttok
`--src_idxmap6=`	Overrides –src_idxmap for the 6-th indexmap
`--trg_idxmap6=`	Overrides –trg_idxmap for the 6-th indexmap
`--fst_path6=`	Overrides –fst_path for the 6-th fst predictor
`--forcedlst_sparse_feat6=`
	Overrides –forcedlst_sparse_feat for the 6-th forcedlst predictor
`--ngramc_path6=`
	Overrides –ngramc_path for the 6-th ngramc
`--ngramc_order6=0`
	Overrides –ngramc_order for the 6-th ngramc
`--fairseq_path6=`
	Overrides –fairseq_path for the 6-th fairseq
`--t2t_model7=`	Overrides –t2t_model for the 7-th t2t predictor
`--t2t_problem7=`
	Overrides –t2t_problem for the 7-th t2t predictor
`--t2t_hparams_set7=`
	Overrides –t2t_hparams_set for the 7-th t2t predictor
`--t2t_checkpoint_dir7=`
	Overrides –t2t_checkpoint_dir for the 7-th t2t predictor
`--pred_src_vocab_size7=0`
	Overrides –pred_src_vocab_size for the 7-th t2t predictor
`--t2t_unk_id7=3`
	Overrides –t2t_unk_id for the 7-th t2t predictor
`--pred_trg_vocab_size7=0`
	Overrides –pred_trg_vocab_size for the 7-th t2t predictor
`--altsrc_test7=`
	Overrides –altsrc_test for the 7-th altsrc
`--word2char_map7=`
	Overrides –word2char_map for the 7-th word2char
`--fsttok_path7=`
	Overrides –fsttok_path for the 7-th fsttok
`--src_idxmap7=`	Overrides –src_idxmap for the 7-th indexmap
`--trg_idxmap7=`	Overrides –trg_idxmap for the 7-th indexmap
`--fst_path7=`	Overrides –fst_path for the 7-th fst predictor
`--forcedlst_sparse_feat7=`
	Overrides –forcedlst_sparse_feat for the 7-th forcedlst predictor
`--ngramc_path7=`
	Overrides –ngramc_path for the 7-th ngramc
`--ngramc_order7=0`
	Overrides –ngramc_order for the 7-th ngramc
`--fairseq_path7=`
	Overrides –fairseq_path for the 7-th fairseq
`--t2t_model8=`	Overrides –t2t_model for the 8-th t2t predictor
`--t2t_problem8=`
	Overrides –t2t_problem for the 8-th t2t predictor
`--t2t_hparams_set8=`
	Overrides –t2t_hparams_set for the 8-th t2t predictor
`--t2t_checkpoint_dir8=`
	Overrides –t2t_checkpoint_dir for the 8-th t2t predictor
`--pred_src_vocab_size8=0`
	Overrides –pred_src_vocab_size for the 8-th t2t predictor
`--t2t_unk_id8=3`
	Overrides –t2t_unk_id for the 8-th t2t predictor
`--pred_trg_vocab_size8=0`
	Overrides –pred_trg_vocab_size for the 8-th t2t predictor
`--altsrc_test8=`
	Overrides –altsrc_test for the 8-th altsrc
`--word2char_map8=`
	Overrides –word2char_map for the 8-th word2char
`--fsttok_path8=`
	Overrides –fsttok_path for the 8-th fsttok
`--src_idxmap8=`	Overrides –src_idxmap for the 8-th indexmap
`--trg_idxmap8=`	Overrides –trg_idxmap for the 8-th indexmap
`--fst_path8=`	Overrides –fst_path for the 8-th fst predictor
`--forcedlst_sparse_feat8=`
	Overrides –forcedlst_sparse_feat for the 8-th forcedlst predictor
`--ngramc_path8=`
	Overrides –ngramc_path for the 8-th ngramc
`--ngramc_order8=0`
	Overrides –ngramc_order for the 8-th ngramc
`--fairseq_path8=`
	Overrides –fairseq_path for the 8-th fairseq
`--t2t_model9=`	Overrides –t2t_model for the 9-th t2t predictor
`--t2t_problem9=`
	Overrides –t2t_problem for the 9-th t2t predictor
`--t2t_hparams_set9=`
	Overrides –t2t_hparams_set for the 9-th t2t predictor
`--t2t_checkpoint_dir9=`
	Overrides –t2t_checkpoint_dir for the 9-th t2t predictor
`--pred_src_vocab_size9=0`
	Overrides –pred_src_vocab_size for the 9-th t2t predictor
`--t2t_unk_id9=3`
	Overrides –t2t_unk_id for the 9-th t2t predictor
`--pred_trg_vocab_size9=0`
	Overrides –pred_trg_vocab_size for the 9-th t2t predictor
`--altsrc_test9=`
	Overrides –altsrc_test for the 9-th altsrc
`--word2char_map9=`
	Overrides –word2char_map for the 9-th word2char
`--fsttok_path9=`
	Overrides –fsttok_path for the 9-th fsttok
`--src_idxmap9=`	Overrides –src_idxmap for the 9-th indexmap
`--trg_idxmap9=`	Overrides –trg_idxmap for the 9-th indexmap
`--fst_path9=`	Overrides –fst_path for the 9-th fst predictor
`--forcedlst_sparse_feat9=`
	Overrides –forcedlst_sparse_feat for the 9-th forcedlst predictor
`--ngramc_path9=`
	Overrides –ngramc_path for the 9-th ngramc
`--ngramc_order9=0`
	Overrides –ngramc_order for the 9-th ngramc
`--fairseq_path9=`
	Overrides –fairseq_path for the 9-th fairseq
`--t2t_model10=`	Overrides –t2t_model for the 10-th t2t predictor
`--t2t_problem10=`
	Overrides –t2t_problem for the 10-th t2t predictor
`--t2t_hparams_set10=`
	Overrides –t2t_hparams_set for the 10-th t2t predictor
`--t2t_checkpoint_dir10=`
	Overrides –t2t_checkpoint_dir for the 10-th t2t predictor
`--pred_src_vocab_size10=0`
	Overrides –pred_src_vocab_size for the 10-th t2t predictor
`--t2t_unk_id10=3`
	Overrides –t2t_unk_id for the 10-th t2t predictor
`--pred_trg_vocab_size10=0`
	Overrides –pred_trg_vocab_size for the 10-th t2t predictor
`--altsrc_test10=`
	Overrides –altsrc_test for the 10-th altsrc
`--word2char_map10=`
	Overrides –word2char_map for the 10-th word2char
`--fsttok_path10=`
	Overrides –fsttok_path for the 10-th fsttok
`--src_idxmap10=`
	Overrides –src_idxmap for the 10-th indexmap
`--trg_idxmap10=`
	Overrides –trg_idxmap for the 10-th indexmap
`--fst_path10=`	Overrides –fst_path for the 10-th fst predictor
`--forcedlst_sparse_feat10=`
	Overrides –forcedlst_sparse_feat for the 10-th forcedlst predictor
`--ngramc_path10=`
	Overrides –ngramc_path for the 10-th ngramc
`--ngramc_order10=0`
	Overrides –ngramc_order for the 10-th ngramc
`--fairseq_path10=`
	Overrides –fairseq_path for the 10-th fairseq
`--t2t_model11=`	Overrides –t2t_model for the 11-th t2t predictor
`--t2t_problem11=`
	Overrides –t2t_problem for the 11-th t2t predictor
`--t2t_hparams_set11=`
	Overrides –t2t_hparams_set for the 11-th t2t predictor
`--t2t_checkpoint_dir11=`
	Overrides –t2t_checkpoint_dir for the 11-th t2t predictor
`--pred_src_vocab_size11=0`
	Overrides –pred_src_vocab_size for the 11-th t2t predictor
`--t2t_unk_id11=3`
	Overrides –t2t_unk_id for the 11-th t2t predictor
`--pred_trg_vocab_size11=0`
	Overrides –pred_trg_vocab_size for the 11-th t2t predictor
`--altsrc_test11=`
	Overrides –altsrc_test for the 11-th altsrc
`--word2char_map11=`
	Overrides –word2char_map for the 11-th word2char
`--fsttok_path11=`
	Overrides –fsttok_path for the 11-th fsttok
`--src_idxmap11=`
	Overrides –src_idxmap for the 11-th indexmap
`--trg_idxmap11=`
	Overrides –trg_idxmap for the 11-th indexmap
`--fst_path11=`	Overrides –fst_path for the 11-th fst predictor
`--forcedlst_sparse_feat11=`
	Overrides –forcedlst_sparse_feat for the 11-th forcedlst predictor
`--ngramc_path11=`
	Overrides –ngramc_path for the 11-th ngramc
`--ngramc_order11=0`
	Overrides –ngramc_order for the 11-th ngramc
`--fairseq_path11=`
	Overrides –fairseq_path for the 11-th fairseq
`--t2t_model12=`	Overrides –t2t_model for the 12-th t2t predictor
`--t2t_problem12=`
	Overrides –t2t_problem for the 12-th t2t predictor
`--t2t_hparams_set12=`
	Overrides –t2t_hparams_set for the 12-th t2t predictor
`--t2t_checkpoint_dir12=`
	Overrides –t2t_checkpoint_dir for the 12-th t2t predictor
`--pred_src_vocab_size12=0`
	Overrides –pred_src_vocab_size for the 12-th t2t predictor
`--t2t_unk_id12=3`
	Overrides –t2t_unk_id for the 12-th t2t predictor
`--pred_trg_vocab_size12=0`
	Overrides –pred_trg_vocab_size for the 12-th t2t predictor
`--altsrc_test12=`
	Overrides –altsrc_test for the 12-th altsrc
`--word2char_map12=`
	Overrides –word2char_map for the 12-th word2char
`--fsttok_path12=`
	Overrides –fsttok_path for the 12-th fsttok
`--src_idxmap12=`
	Overrides –src_idxmap for the 12-th indexmap
`--trg_idxmap12=`
	Overrides –trg_idxmap for the 12-th indexmap
`--fst_path12=`	Overrides –fst_path for the 12-th fst predictor
`--forcedlst_sparse_feat12=`
	Overrides –forcedlst_sparse_feat for the 12-th forcedlst predictor
`--ngramc_path12=`
	Overrides –ngramc_path for the 12-th ngramc
`--ngramc_order12=0`
	Overrides –ngramc_order for the 12-th ngramc
`--fairseq_path12=`
	Overrides –fairseq_path for the 12-th fairseq