Cambridge SMT System
Introduction

This tutorial presents various tools and techniques developed in the Statistical Machine Translation group at the Cambridge University Engineering Department.

The tutorial is intended to serve as a guide for the use of the tools, but our research publications contain the best descriptions of the algorithms and modelling techniques described here. The most relevant publications for this tutorial are listed below (Relevant papers). In particular, this tutorial is based on the Russian-English SMT system developed for the WMT 2013 evaluation - we suggest reading the system description [Pino2013] before starting on the tutorial.

Our complete publications can be found at http://divf.eng.cam.ac.uk/smt/Main/SmtPapers.

HiFST grew out of the Ph.D. thesis work of Gonzalo Iglesias.

Contributors to this release are:

  • Graeme Blackwood
  • Bill Byrne
  • Adria de Gispert
  • Federico Flego
  • Gonzalo Iglesias
  • Juan Pino
  • Rory Waite
  • Tong Xiao

with thanks to Cyril Allauzen and Michael Riley.

Features Included in this Release

  • HiFST – Hierarchical phrase-based statistical machine translation system based on OpenFst
  • Direct production of translation lattices as Weighted Finite State Automata
  • Efficient WFSA rescoring procedures
  • OpenFst wrappers for direct inclusion of KenLM and ARPA language models as WFSAs
  • Lattice Minimum Bayes Risk decoding
  • Lattice Minimum Error Rate training
  • Tutorial for Hiero translation using Recursive Transition Networks and Pushdown Transducers
  • Client/Server mode
  • Shallow-N translation grammars
  • Source-sentence `chopping' procedures
  • WFSA true-casing
  • Multi-dimensional parameter search for MERT
  • ...

Relevant papers

HiFST, HiPDT and Hierarchical Phrase-Based Decoding

[deGispert2010] Hierarchical phrase-based translation with weighted finite state transducers and Shallow-N grammars.
A. de Gispert, G. Iglesias, G. Blackwood, E. R. Banga, and W. Byrne. Computational Linguistics, 36(3). 2010.
http://aclweb.org/anthology/J/J10/J10-3008.pdf

[Allauzen2014] Pushdown automata in statistical machine translation.
C. Allauzen, W. Byrne, A. de Gispert, G. Iglesias, and M. Riley. Computational Linguistics. 2014.
http://www.aclweb.org/anthology/J/J14/J14-3008.pdf

[Iglesias2009a] Hierarchical phrase-based translation with weighted finite state transducers.
G. Iglesias, A. de Gispert, E. R. Banga, and W. Byrne. Proceedings of HLT. 2009.
http://aclweb.org/anthology//N/N09/N09-1049.pdf
http://mi.eng.cam.ac.uk/~wjb31/ppubs/naaclhlt2009presentation.pdf

[Iglesias2011] Hierarchical Phrase-based Translation Representations.
G. Iglesias, C. Allauzen, W. Byrne, A. de Gispert, M. Riley. Proceedings of EMNLP. 2011.
http://aclweb.org/anthology/D/D11/D11-1127.pdf

[Iglesias2009b] Rule filtering by pattern for efficient hierarchical translation.
G. Iglesias, A. de Gispert, E. R. Banga, and W. Byrne. Proceedings of EACL. 2009.
http://aclweb.org/anthology/E/E09/E09-1044.pdf

[Chiang2007] Hierarchical phrase-based translation.
Computational Linguistics. 2007
http://aclweb.org/anthology/J07-2003.pdf

CUED SMT System Descriptions

[Pino2013] The University of Cambridge Russian-English System at WMT13.
J. Pino, A. Waite, T. Xiao, A. de Gispert, F. Flego, and W. Byrne. Proceedings of the Eighth Workshop on Statistical Machine Translation. 2013.
http://aclweb.org/anthology//W/W13/W13-2225.pdf

OpenFST and Related Modelling Techniques

[OpenFst] The OpenFST Toolkit http://www.openfst.org/

[Roark2011] Lexicographic semirings for exact automata encoding of sequence models.
B. Roark, R. Sproat, and I. Shafran. Proceedings of ACL-HLT. 2011.
http://aclweb.org/anthology/P/P11/P11-2001.pdf

Lattice Minimum Bayes Risk Decoding using WFSAs

[BlackwoodPhD] Lattice rescoring methods for statistical machine translation.
G. Blackwood. Ph.D. Thesis. Cambridge University Engineering Department and Clare College. 2010.
http://mi.eng.cam.ac.uk/~gwb24/publications/phd.thesis.pdf

[Blackwood2010] Efficient path counting transducers for minimum Bayes-risk decoding of statistical machine translation lattices.
G. Blackwood, A. de Gispert, W. Byrne. Proceedings of ACL Short Papers. 2010.
http://aclweb.org/anthology//P/P10/P10-2006.pdf

[Allauzen2010] Expected Sequence Similarity Maximization.
C. Allauzen, S. Kumar, W. Macherey, M. Mohri, M Riley. Proceedings of HLT-NAACL, 2010.
http://aclweb.org/anthology//N/N10/N10-1139.pdf

Mert

[Macherey2008] Lattice-based Minimum Error Rate Training for Statistical Machine Translation.
W. Macherey, F. Och, I. Thayer, J. Uszkoreit. Proceedings of EMNLP, 2008.
http://aclweb.org/anthology/D/D08/D08-1076.pdf

[Waite2015] The Geometry of Statistical Machine Translation.
A Waite and W. Byrne. Proceedings of HLT. 2015. to appear

[Waite2014] The Geometry of Statistical Machine Translation.
A. Waite. Ph.D. Thesis. Cambridge University Engineering Department and Girton College. 2014.

[Fukuda2004] From the zonotope construction to the Minkowski addition of convex polytopes.
K Fukuda. Journal of Symbolic Computation, 38(4)

[Weibel2010] Implementation and parallelization of a reverse-search algorithm for Minkowski sums.
C Weibel. Proceedings of ALENEX 2010
http://epubs.siam.org/doi/pdf/10.1137/1.9781611972900.4

[Waite2012] Lattice-based minimum error rate training using weighted finite-state transducers with tropical polynomial weights.
A. Waite, G. Blackwood, and W. Byrne. Proceedings of FSMNLP, 2012.
http://aclweb.org/anthology-new/W/W12/W12-6219.pdf

HiFST Rule Extraction

[Pino2012] Simple and Efficient Model Filtering in Statistical Machine Translation.
J. Pino, A. Waite, W. Byrne. Proceedings of PBML, 2012.
http://ufal.mff.cuni.cz/pbml/98/art-pino-waite-byrne.pdf

Non-functional FST disambiguation

[Iglesias2015] Transducer Disambiguation with Sparse Topological Features G. Iglesias, A. de Gispert, W. Byrne. Proceedings of EMNLP, 2015
https://aclweb.org/anthology/D/D15/D15-1273.pdf

Rescoring with Bilingual Neural Network Models

[Devlin2014] Fast and Robust Neural Network Joint Models for Statistical Machine Translation J. Devlin, R. Zbib, Z. Huang, T. Lamar, R. Schwartz, and J. Makhoul http://acl2014.org/acl2014/P14-1/pdf/P14-1129.pdf

Neural Machine Translation

[Stahlberg2016] Syntactically guided neural machine translation.
F. Stahlberg, E. Hasler, A. Waite, and B. Byrne. Proceedings of ACL. 2016.

Language Modelling Toolkits and Other Tools

[SRILM] SRI Language Model Toolkit
http://www.speech.sri.com/projects/srilm/

[KenLM] The KenLM Toolkit
http://kheafield.com/code/kenlm/

[NPLM] Neural Probabilistic Language Model Toolkit
http://nlg.isi.edu/software/nplm/