We recommend installing SGNMT inside an Anaconda environment. This guide contains step-by-step instructions to set up the SGNMT environment from scratch on an Ubuntu >=14.04 system, including installing Anaconda and various dependencies like TensorFlow, T2T, OpenFST, etc.
First, download the Anaconda installer. We used Anaconda 2019.03
(Python 3.7) in this guide. Accept the license and choose the Anaconda installation direction. We refer
to this directory as
Change to a new directory where you wish to install SGNMT. We refer to this directory as
latest SGNMT version is available on github:
$ git clone https://github.com/ucam-smt/sgnmt.git
Activate your base Anaconda environment:
eval "$(<ANACONDA>/bin/conda shell.bash hook)"
And create a new environment for SGNMT:
conda create -n sgnmt_env pip python=3.6
In this guide we use Python 3.6 because TensorFlow 1.13 is not compatible with Ubuntu 14.04 under more recent Python versions (more on this here). If you don’t need compatibility with Ubuntu 14.04 or you don’t use TensorFlow, select a more recent Python version.
You can activate this new environment with:
conda activate sgnmt_env
Install dependencies under the activated
conda install numpy pyyaml scipy
python <SGNMT>/decode.py --help
We recommend to use an activation script
<ACTIVATE.SH> to set up the SGNMT environment and some
of the environment variables for external libraries (see below). This is an example of an initial
# Activate Anaconda environment eval "$(<ANACONDA>/bin/conda shell.bash hook)" conda activate sgnmt_env # SGNMT export SGNMT=<SGNMT>
End your current session and log in again with a clean shell. Then, test it with:
source <ACTIVATE.SH> python $SGNMT/decode.py --help
Installing optional dependencies¶
A minimal version of SGNMT runs with basic setup described above. However, some output formats and predictors depend on external libraries:
- Tensor2Tensor for a wide range of different sequence models in TensorFlow (>=1.7.0)
- fairseq for a wide range of different sequence models in PyTorch (>=0.7.0)
- KenLM for reading ARPA language model files with KenLM backend (latest)
- OpenFST for reading and writing FSTs (e.g. translation lattices) (>=1.5.4)
To print out available external libraries, use:
python $SGNMT/decode.py --run_diagnostics
Diagnostics are useful after installing optional dependencies to verify that the installation was successful.
Tensor2Tensor is a TensorFlow-based library with support of various neural sequence models. SGNMT can access models trained with tensor2tensor via the t2t predictor. Follow the tensor2tensor installation instructions to install t2t and TensorFlow. Make sure that you have activated your Anaconda environment:
# Assumes tensorflow or tensorflow-gpu installed pip install tensor2tensor # Installs with tensorflow-gpu requirement pip install tensor2tensor[tensorflow_gpu] # Installs with tensorflow (cpu) requirement pip install tensor2tensor[tensorflow]
If you need both the GPU and the CPU versions we recommend using two separate Anaconda environments.
Note that SGNMT supports the
--t2t_usr_dir argument to extend the registry of T2T to your custom directory.
The Tensor2Tensor code base is still under constant change, and SGNMT might or might not be compatible with the latest version.
Tested versions: Tensor2Tensor 1.7.0-1.13.4 TensorFlow 1.9-1.13.1
pip install fairseq
For more information on how to use fairseq models in SGNMT, see Tutorial: fairseq (PyTorch).
Tested versions: PyTorch 1.1, fairseq 0.7.1
Follow the instructions on the KenLM Github page to install KenLM:
pip install https://github.com/kpu/kenlm/archive/master.zip
Tested versions: latest
SGNMT supports two variants to install the Python interface to OpenFST. The easier option is to install the pre-compiled openfst-python package with pip:
pip install openfst-python
Alternatively, OpenFST can be built from the sources without relying on a third-party package. The issue at the time of writing is that the official OpenFST release 1.7.2. does not support Python 3 out-of-the-box. For future OpenFST versions with Python 3 support, compile with:
$ ./configure --enable-far --enable-python $ make $ make install
and add the following lines to
export LD_LIBRARY_PATH=/path/to/openfst/lib:$LD_LIBRARY_PATH export PYTHONPATH=/path/to/openfst/lib/python<VERSION>/site-packages:$PYTHONPATH
If you wish to use SGNMT in combination with the hierachical phrase-pased SMT system HiFST, you can directly use the OpenFST installation under externals/ in the HiFST installation directory. This will make it possible to create translation lattices with tropicalsparsetuple arcs with SGNMT to keep predictor scores separated (see fst output format).
Tested versions: OpenFST 1.5.4-1.7.2