Examples

SGNMT supports many different options, and small changes in the decoder configuration sometimes lead to very different results. This page contains a list of SGNMT configuration files which have been used in our group. Details such as paths and vocabulary sizes are exemplary as we do not provide model files. Some of the examples are not comparable to the SGNMT 1.0 and higher as they rely on Theano/Blocks or NPLM backends that are not supported anymore. However, we hope that this list is still useful as a blueprint for your own experiments.

University of Cambridge submission to WMT18 (Tensor2Tensor)

These are the ini-files for the final decoding passes. Please write to fs439@cam.ac.uk for access to the trained models.

English-German:

predictors: t2t,t2t,ngramc,ngramc,ngramc,ngramc,wc
predictor_weights: 5.7,4.3,0.1,0.1,1.0,0.75,1.5
src_test: /data/mifs_scratch/fs439/data/wmt18/test/indexed_bpe/test18ts.ende.bpe.en
beam: 8
early_stopping: false

pred_src_vocab_size: 35627
pred_trg_vocab_size: 35627
indexing_scheme: t2t
t2t_problem: translate_ende_wmt18

# Relative transformer model
t2t_checkpoint_dir: t2t_train/translate_ende_wmt18/transformer-transformer_relative_big_large_batch4/average.1000k/
t2t_model: transformer
t2t_hparams_set: transformer_relative_big

# Transformer model
t2t_checkpoint_dir2: t2t_train/translate_ende_wmt18/transformer-transformer_big_large_batch4/average.1000k
t2t_model2: transformer
t2t_hparams_set2: transformer_big

# slicenet ngram posteriors
ngramc_path: supplementary/ende2/ss.ngramc_test18ts/%d.txt

# rnn ngram posteriors
ngramc_path2: supplementary/ende2/rr.ngramc_test18ts/%d.txt

# r2l ngram posteriors
ngramc_path3: supplementary/ende2/ender.ngramc_test18ts/%d.txt

# SDL posteriors
ngramc_path4: supplementary/ende/sdl.ngramc_test18ts/%d.txt

outputs: nbest,text,ngram

German-English:

(like English-German with the following predictor weights)
predictor_weights: 4.2,3.8,0.1,0.125,0.75,1.5,-1.2

Chinese-English:

verbosity: debug
predictors: t2t,t2t,ngramc,ngramc,ngramc,ngramc,wc
predictor_weights: 6.5,5.5,0.375,0.375,0.375,0.5,-0.5
src_test: /data/mifs_scratch/fs439/data/wmt18/test/indexed_bpe/test18.enzh.bpe.zh
beam: 8
early_stopping: false

pred_src_vocab_size: 43738
pred_trg_vocab_size: 34306
indexing_scheme: t2t
t2t_problem: translate_zhen_wmt18

# Relative transformer model
t2t_checkpoint_dir: t2t_train/translate_zhen_wmt18/transformer-transformer_relative_big_large_batch4/average.1000k/
t2t_model: transformer
t2t_hparams_set: transformer_relative_big

# Transformer model
t2t_checkpoint_dir2: t2t_train/translate_zhen_wmt18/transformer-transformer_big_large_batch4/average.1000k
t2t_model2: transformer
t2t_hparams_set2: transformer_big

# slicenet ngram posteriors
ngramc_path: supplementary/zhen2/ss.ngramc_test18/%d.txt

# rnn ngram posteriors
ngramc_path2: supplementary/zhen2/rr.ngramc_test18/%d.txt

# r2l ngram posteriors
ngramc_path3: supplementary/zhen2/zhenr.ngramc_test18/%d.txt

# SDL posteriors
ngramc_path4: supplementary/zhen/sdl.ngramc_test18/%d.txt

outputs: nbest,text,ngram

Joint Word/BPE decoding with multisegbeam (Blocks/Theano)

Ensemble of a word-to-word system (vocabulary size: 30003) and a bpe-to-bpe system using the multisegbeam decoding strategy, tested on WAT (ja-en):

predictors: nmt,altsrc_nmt
src_test: data/dev.ids.ja
altsrc_test: data/dev.bpe.ja
decoder: multisegbeam
multiseg_tokenizations: '30003:data/wmap.en,eow:data/wmap.bpe.en'

Lattice rescoring with three NMT systems (Blocks/Theano)

Rescoring an SMT lattice on ja-en WAT with an ensemble of three NMT systems (ja-en WAT):

predictors: fst,nmt,nmt,nmt
predictor_weights: 0.5,0.5,0.5,0.5
src_test: data/test.bpe.ja
early_stopping: false
hypo_recombination: true

fst_path: lats.bpe_test/%d.fst
use_fst_weights: true

gnmt_beta: 0.01
nmt_config: src_vocab_size=32081,trg_vocab_size=30123
nmt_path2: ../jaen-wat-bpe2/train
nmt_path3: ../jaen-wat-bpe3/train

Iterative NMT beam search (Blocks/Theano)

Using the bucket decoding strategy for pure NMT decoding with multiple beam search passes:

src_test: data/test15.ids.en
decoder: bucket
bucket_selector: iter
max_node_expansions: 2000
predictors: nmt
length_normalization: true
nmt_config: src_vocab_size=50003,trg_vocab_size=50003

MBR-based NMT with a 3-ensemble (Blocks/Theano)

Using MBR-style n-gram posteriors together with a ensemble of three NMT systems on ja-en WAT:

predictors: nmt,ngramc,wc,nmt,nmt
predictor_weights: 0.53125,0.46875,0.46875,0.53125,0.53125
src_test: data/test.bpe.ja
allow_unk_in_output: false
early_stopping: false
hypo_recombination: true

ngramc_path: lats.ngramc.smooth.bpe_test/%d.txt

gnmt_beta: 0.01
nmt_config: src_vocab_size=32081,trg_vocab_size=30123
nmt_path2: ../jaen-wat-bpe2/train
nmt_path3: ../jaen-wat-bpe3/train

MBR-based NMT with separately tuned Thetas (Blocks/Theano)

Tuning the n-gram posterior weights for each n-gram order separately on WMT15 en-de:

predictors: ngramc,ngramc,ngramc,ngramc,wc,nmt
predictor_weights: 1.0,0.603674,0.0950858,0.514882,0.713726,0.510412
src_test: data/test15.ids.en
beam: 2
hypo_recombination: true
allow_unk_in_output: false

ngramc_path: lats.mapped.ngramc.smooth_test15/%d.txt
ngramc_order: 1
ngramc_order2: 2
ngramc_order3: 3
ngramc_order4: 4

gnmt_beta: 0.2
nmt_config: src_vocab_size=50003,trg_vocab_size=50003

Exhaustive n-best list rescoring with NMT (Blocks/Theano)

Rescoring a complete n-best list on WMT15 en-de. The n-best list and the NMT system use different word IDs:

src_test: data/test15.ids.en
decoder: dfs
early_stopping: false
predictors: idxmap_forcedlst,nmt
src_idxmap: data/idxmap.org.test15.en
trg_idxmap: data/idxmap.org.test15.de
use_nbest_weights: false
trg_test: ../ende-wmt15/results/hifst_test15.1000best
nmt_config: src_vocab_size=50003,trg_vocab_size=50003

NMT with length model and neural language model (Blocks/Theano)

Pairing NMT with a length model and a neural language model, and rescoring HiFST translation lattices with the mix. The lattice and the LM use alternative word maps:

src_test: data/test15.ids.en
predictors: idxmap_fst,length,idxmap_nplm,nmt
predictor_weights: 8.144021,0.579325,1.192874,4.347711
src_idxmap: data/idxmap.org.test15.en
trg_idxmap: data/idxmap.org.test15.de
fst_path: lats_test15/%d.fst
nmt_config: src_vocab_size=50003,trg_vocab_size=50003
use_fst_weights: true
length_model_weights: 0.252503399924538,1.26556504208994,0.0476145832475248,0.507108282728234,0.0706249583462012,0.00156446527534046,-0.0114873442886072,0.00724551243039656,-0.108343582699869,-0.225865854796484,0.183585648431748,-0.367378141618226
src_test_raw: data/test15.en
nplm_path: nplm/news12-14.de.nnlm.news12-14.5gram-model.large_8.50000.0.10.24.3186.10