Introduction
This open source package contains the Cambridge SMT system, a set of tools for statistical machine translation, which rely on OpenFST. You can download it here. You can also clone it directly with git doing as so:
git clone https://github.com/ucam-smt/ucam-smt.git
It includes the following features:
- HiFST -- Hierarchical phrase-based statistical machine translation system based on OpenFST.
- Direct production of translation lattices as Weighted Finite State Automata
- Efficient WFSA rescoring procedures
- OpenFst wrappers for direct inclusion of KenLM and ARPA language models as WFSAs
- Lattice Minimum Bayes Risk decoding
- Lattice Minimum Error Rate training
- Client/Server mode
- WFSA true-casing
Tutorial
For this release, we have prepared an extensive tutorial that explains how to use these tools. It is available at: http://ucam-smt.github.io/tutorial
The tutorial is intended to serve as a guide for the use of the tools, but our research publications contain the best descriptions of the algorithms and modelling techniques described here. Our complete publications can be found at http://divf.eng.cam.ac.uk/smt/Main/SmtPapers
Authors and Contributors
This package grew out of the Ph.D. thesis work of Gonzalo Iglesias, in which he developed HiFST, a hierarchical phrase-based statistical machine translation system based on OpenFST.
Contributors to this release and the tutorial are:
- Graeme Blackwood
- Bill Byrne
- Adria de Gispert
- Federico Flego
- Gonzalo Iglesias
- Juan Pino
- Rory Waite
- Tong Xiao
with thanks to Cyril Allauzen and Michael Riley (OpenFST).
Support or Contact
Questions? Problems? Please leave a message at https://groups.google.com/forum/#!forum/ucam-smt