Cambridge SMT System
ucam::util::WordMapper Class Reference

Loads efficiently a wordmap file and provides methods to map word-to-integer or integer-to-word. To avoid memory footprint issues, hashing the wordmap entries is avoided. More...

#include <wordmapper.hpp>

Public Member Functions

 WordMapper (const std::string &wordmapfile, bool reverse=false)
 Constructor. More...
 
 WordMapper (iszfstream &wordmapstream, bool reverse=false)
 
void operator() (const std::string &is, std::string *os, bool reverse=false)
 Perform search. Both directions allowed (int to string or string to int). More...
 
unsigned operator() (const std::string &is)
 Quick hack to get what is needed for lm. More...
 
 ~WordMapper ()
 Destructor. More...
 
unordered_map< std::size_t, std::string > & get_oovwmap ()
 Return oovwmap. More...
 
void set_oovwmap (unordered_map< std::size_t, std::string > &oovmap)
 
void reset_oov_id ()
 Resets oovid to lowest value. More...
 
std::size_t get_oov_id ()
 

Detailed Description

Loads efficiently a wordmap file and provides methods to map word-to-integer or integer-to-word. To avoid memory footprint issues, hashing the wordmap entries is avoided.

Remarks
For both directions, the file always takes the following format: what 59 report 60 council 61

Also assumes:

  • Bijective relationship (word <-> integer id)
  • Sorted by id and no id missing (if 61 exists, 59 and 60 must exist in the file and appear in previous lines...
  • First index is 0.
    Remarks
    OOV ids are also generated, if the word does not exist in the file.

Definition at line 63 of file wordmapper.hpp.

Constructor & Destructor Documentation

ucam::util::WordMapper::WordMapper ( const std::string &  wordmapfile,
bool  reverse = false 
)
inline

Constructor.

Remarks
Loads wordmap file. If mapping is to be performed from string-to-int, there is an additional id sorting step.
Parameters
wordmapfileWordmap file to load.
reversePerform string-to-integer (false) or integer-to-string(true).

Definition at line 91 of file wordmapper.hpp.

ucam::util::WordMapper::WordMapper ( iszfstream wordmapstream,
bool  reverse = false 
)
inline

Definition at line 104 of file wordmapper.hpp.

ucam::util::WordMapper::~WordMapper ( )
inline

Destructor.

Definition at line 138 of file wordmapper.hpp.

Member Function Documentation

std::size_t ucam::util::WordMapper::get_oov_id ( )
inline

Definition at line 159 of file wordmapper.hpp.

Here is the call graph for this function:

unordered_map<std::size_t, std::string>& ucam::util::WordMapper::get_oovwmap ( )
inline

Return oovwmap.

Definition at line 145 of file wordmapper.hpp.

Here is the caller graph for this function:

void ucam::util::WordMapper::operator() ( const std::string &  is,
std::string *  os,
bool  reverse = false 
)
inline

Perform search. Both directions allowed (int to string or string to int).

Parameters
isinput string
osoutput string
reverseif true, triggers reverse search (string to int).

Definition at line 118 of file wordmapper.hpp.

unsigned ucam::util::WordMapper::operator() ( const std::string &  is)
inline

Quick hack to get what is needed for lm.

Remarks
Note: assumes only 1 number in the string, searches and returns unsigned. if not found, returns max unsigned value.

Definition at line 131 of file wordmapper.hpp.

void ucam::util::WordMapper::reset_oov_id ( )
inline

Resets oovid to lowest value.

Definition at line 154 of file wordmapper.hpp.

Here is the caller graph for this function:

void ucam::util::WordMapper::set_oovwmap ( unordered_map< std::size_t, std::string > &  oovmap)
inline

Definition at line 150 of file wordmapper.hpp.

Here is the caller graph for this function:


The documentation for this class was generated from the following file: