.. _setup-label: Installation =========================== We recommend installing SGNMT inside an Anaconda environment. This guide contains step-by-step instructions to set up the SGNMT environment from scratch on an Ubuntu >=14.04 system, including installing Anaconda and various dependencies like TensorFlow, T2T, OpenFST, etc. First, `download the Anaconda installer `_. We used Anaconda 2019.03 (Python 3.7) in this guide. Accept the license and choose the Anaconda installation direction. We refer to this directory as ````. Change to a new directory where you wish to install SGNMT. We refer to this directory as ````. The latest SGNMT version is available on github:: $ git clone https://github.com/ucam-smt/sgnmt.git Activate your base Anaconda environment:: eval "$(/bin/conda shell.bash hook)" And create a new environment for SGNMT:: conda create -n sgnmt_env pip python=3.6 .. note:: In this guide we use Python 3.6 because TensorFlow 1.13 is not compatible with Ubuntu 14.04 under more recent Python versions (more on this `here `_). If you don't need compatibility with Ubuntu 14.04 or you don't use TensorFlow, select a more recent Python version. You can activate this new environment with:: conda activate sgnmt_env Install dependencies under the activated ``sgnmt_env`` environment:: conda install numpy pyyaml scipy Test SGNMT:: python /decode.py --help We recommend to use an activation script ```` to set up the SGNMT environment and some of the environment variables for external libraries (see below). This is an example of an initial ```` script:: # Activate Anaconda environment eval "$(/bin/conda shell.bash hook)" conda activate sgnmt_env # SGNMT export SGNMT= End your current session and log in again with a clean shell. Then, test it with:: source python $SGNMT/decode.py --help Installing optional dependencies ---------------------------------- A minimal version of SGNMT runs with basic setup described above. However, some output formats and predictors depend on external libraries: * `Tensor2Tensor `_ for a wide range of different sequence models in TensorFlow (>=1.7.0) * `fairseq `_ for a wide range of different sequence models in PyTorch (>=0.7.0) * `KenLM `_ for reading ARPA language model files with KenLM backend (latest) * `OpenFST `_ for reading and writing FSTs (e.g. translation lattices) (>=1.5.4) To print out available external libraries, use:: python $SGNMT/decode.py --run_diagnostics Diagnostics are useful after installing optional dependencies to verify that the installation was successful. Installing Tensor2Tensor ************************ `Tensor2Tensor `_ is a TensorFlow-based library with support of various neural sequence models. SGNMT can access models trained with tensor2tensor via the *t2t* predictor. Follow the `tensor2tensor installation instructions `_ to install t2t and TensorFlow. Make sure that you have activated your Anaconda environment:: # Assumes tensorflow or tensorflow-gpu installed pip install tensor2tensor # Installs with tensorflow-gpu requirement pip install tensor2tensor[tensorflow_gpu] # Installs with tensorflow (cpu) requirement pip install tensor2tensor[tensorflow] If you need both the GPU and the CPU versions we recommend using two separate Anaconda environments. Note that SGNMT supports the ``--t2t_usr_dir`` argument to extend the registry of T2T to your custom directory. .. note:: The Tensor2Tensor code base is still under constant change, and SGNMT might or might not be compatible with the latest version. *Tested versions: Tensor2Tensor 1.7.0-1.13.4 TensorFlow 1.9-1.13.1* Installing fairseq ************************ Follow the `installation instructions for PyTorch `_ in your activated Anaconda environment. Then, `install fairseq `_, for example using ``pip``:: pip install fairseq For more information on how to use fairseq models in SGNMT, see :ref:`tutorial_pytorch-label`. *Tested versions: PyTorch 1.1, fairseq 0.7.1* Installing KenLM ************************ Follow the instructions on the `KenLM Github page `_ to install KenLM:: pip install https://github.com/kpu/kenlm/archive/master.zip *Tested versions: latest* Installing OpenFST ********************** SGNMT supports two variants to install the Python interface to OpenFST. The easier option is to install the pre-compiled `openfst-python `_ package with pip:: pip install openfst-python Alternatively, OpenFST can be built from the sources without relying on a third-party package. The issue at the time of writing is that the `official OpenFST release 1.7.2 `_. does not support Python 3 out-of-the-box. For future OpenFST versions with Python 3 support, compile with:: $ ./configure --enable-far --enable-python $ make $ make install and add the following lines to ````:: export LD_LIBRARY_PATH=/path/to/openfst/lib:$LD_LIBRARY_PATH export PYTHONPATH=/path/to/openfst/lib/python/site-packages:$PYTHONPATH If you wish to use SGNMT in combination with the hierachical phrase-pased SMT system `HiFST `_, you can directly use the OpenFST installation under *externals/* in the HiFST installation directory. This will make it possible to create translation lattices with tropicalsparsetuple arcs with SGNMT to keep predictor scores separated (see *fst* output format). *Tested versions: OpenFST 1.5.4-1.7.2*