Cambridge SMT System
ucam::hifst Namespace Reference

Classes

class  AlilatsToSparseWeightLatsData
 Data class containing relevant variables. To be used as template for task classes using it. More...
 
class  CompareTool
 Class that provides basic string comparison between two const char *. More...
 
struct  conditions
 struct containing the elements that trigger local pruning. More...
 
class  CYKbackpointers
 functor that provides cyk backpointers More...
 
struct  CYKdata
 Data structure containing all cyk-related information. More...
 
class  CYKgrid
 functor that provides cykgrid access methods More...
 
class  CYKParserTask
 Implements cyk+ parser. More...
 
class  DumpNbestFeaturesTask
 Task that dumps nbest and feature file. Templated on specific Data object and Fst Arc. More...
 
class  ExpandedNumStatesRTN
 Utility class that, given an RTN with root at (cc,x,y), estimates the number of states of an expanded FSA The RTN is not explicitly passed to this class. Instead, it is updated sequentially as individual FSAs are created. More...
 
struct  GenerateTrivialFst
 
struct  GrammarData
 Struct containing grammar rules. More...
 
class  GrammarTask
 Task class that loads a grammar into memory. More...
 
class  HifstClientTaskData
 Data class containing relevant variables. To be used as template for task classes using it. More...
 
class  HifstServerTask
 Translation Server. More...
 
class  HifstStatsTask
 Reads StatsData and dumps all stats to (sentence-specific) file. Provides a special method for cyk data: dumps a grid in text format with relevant information per cell. More...
 
class  HiFSTTask
 Core of Hifst. Implements the lattice-building procedure for a cyk-parsed sentence. More...
 
class  HifstTaskData
 Data class containing relevant variables. To be used as template for task classes using it. More...
 
class  LoadSparseWeightFlowerLatticeTask
 Implements a class that loads the grammar sparseweight flower lattice and stores a pointer on the data object. More...
 
class  LoadSparseWeightsTask
 
class  LocalPruningConditions
 convenience class that takes care of local pruning conditions. Conditions are indexed by 1000*cc+y, so you can search through all conditions and get to the closest set of conditions that apply. More...
 
struct  MakeWeightHifst
 
struct  MakeWeightHifst< fst::LexStdArc >
 
struct  MakeWeightHifst< TupleArc32 >
 
struct  MakeWeightHifstLocalLm
 
struct  MakeWeightHifstLocalLm< fst::LexStdArc >
 
struct  MakeWeightHifstLocalLm< TupleArc32 >
 
class  ManualReplaceFstByArc
 Creates FST replacement or not depending on conditions including program options. More...
 
class  MultiThreadedAliLatsToSparseVecLatsTask
 Multithreaded implementation of alilats2splats pipeline. More...
 
class  MultiThreadedCreateSentenceSpecificGrammarTask
 Full multi-threaded Translation system. More...
 
class  MultiThreadedHifstTask
 Full multi-threaded Translation system. More...
 
class  NonTerminalHierarchy
 This is a functor with additional methods to include relevant rules (i.e. identify SCFG rules, S -> X X, X -> V V ) and determine the hierarchy of non-terminals within the grammar. More...
 
class  OptimizeMachine
 
class  OptimizeMachineNoDetMin
 
class  PatternCompareTool
 Class that provides "pattern" comparison between two const char *. The "patterns" are an abstraction of any non-terminal A-Z. So for instance, consider non_terminals A and Z. A rule with source 3_A_5 and another one with source 3_Z_5 are equivalent and need to be listed together (with any other equivalent sources). This class can be used with PosIndexCompare as it inherits from CompareTool. More...
 
class  PatternsToInstancesTask
 Converts patterns to instanced patterns. More...
 
struct  posindex
 Struct containing rule positions and offsets. More...
 
class  PosIndexCompare
 Functor Class that provides comparison accross the posindex structure. This is typically used e.g. with a priority queue. It can use CompareTool or any inherited class. More...
 
class  PostProTask
 Task that writes translation to a text file. This translation might be recased, wordmapped and tokenized. More...
 
class  PreProTask
 Reads text file, performs tokenization and integer-mapping. More...
 
class  ReferenceFilterTask
 Generates a substring version of a reference translation lattice and associated vocabulary. This substring fst is typically used to guide translation towards a particular search space. The associated vocabulary can be used e.g. to restrict parsing algorithms. More...
 
class  ReplaceFstByArc
 Creates FST replacement or not depending on conditions. More...
 
class  RTN
 Convenience class that stores pointers to cell FSAs. These pointers are organized through a hash using a label built from (cc,x,y) as key. More...
 
class  RuleIdsToSparseWeightLatsData
 Data class containing relevant variables. To be used as template for task classes using it. More...
 
struct  RulesToWeightsMapperObject
 
struct  SentenceSpecificGrammarData
 Structure for sentence-specific grammar Rules will be queried by cyk per position and number of elements in the right-hand-side (source) of the rule Therefore indices are stored in this fashion so queries can be done directly. Note: a more efficient implementation could be to store these rule indices in a structure much closer even to the cyk grid. For instance, if an instanced pattern has never been seen below span 5, there is no need for the cyk to query and reject it. More...
 
class  SentenceSpecificGrammarTask
 This class uses instantiated patterns to analyze the grammar and deliver two hashes providing candidate rules for a (cyk) parser to validate them. The first hash is for rules with only one element (e.g. A->word, A->B). The second hash is for rules with two or more elements. Both hashes require two keys: the word position in the sentence (x) and the first element of the rule (which can be either a word or non-terminal). More...
 
class  SingleThreadedAliLatsToSparseVecLatsTask
 Full single-threaded Alignment lattices to Sparse lattices. More...
 
class  SingleThreadedCreateSentenceSpecificGrammarTask
 Full single-threaded Translation system. More...
 
class  SingleThreadededRulesToWeightsSparseLatsTask
 Full single-threaded Alignment lattices to Sparse lattices. More...
 
class  SingleThreadedHifstClientTask
 Full single-threaded Translation system. More...
 
class  SingleThreadedHifstTask
 Full single-threaded Translation system. More...
 
class  SparseWeightVectorLatticesTask
 Creates lattices using tropical tuple weight semiring – each arc containing separate feature weight contributions. Note that the semiring is tropical under dot product of all these features with its scales. More...
 

Typedefs

typedef std::basic_string< uint > cykparser_sentence_t
 
typedef std::basic_string< uint > cykparser_rulebpcoordinates_t
 
typedef std::vector< cykparser_rulebpcoordinates_tcykparser_ruledependencies_t
 
typedef unordered_map< std::string, uint > grammar_categories_t
 
typedef unordered_map< uint, std::string > grammar_inversecategories_t
 
typedef std::basic_string< uint > ssgrammar_listofrules_t
 
typedef unordered_map< std::string, ssgrammar_listofrules_tssgrammar_firstelementmap_t
 
typedef unordered_map< uint, ssgrammar_firstelementmap_tssgrammar_rulesmap_t
 
typedef unordered_map< std::string, std::vector< pair< uint, uint > > > ssgrammar_instancemap_t
 
typedef boost::shared_ptr< tcp::socket > socket_ptr
 

Functions

const uint getSize (const std::string &rhs)
 A generic element counter that can be used to any string. It is intended to use with either source or target side of a rule (right-hand side parts of the synchronous rule). More...
 
bool isTerminal (const std::string &word)
 Determine if the element is a terminal (i.e. a word, represented by a number) or a non-terminal (i.e. ^[A-Z]+(0-9)?). Only first position is checked. More...
 
void getFilteredNonTerminal (std::string &word)
 Return the filtered non-terminal name. For example, for the rule Z 3_XT2_5 XT2, getFilteredNonTerminal("XT2") should return XT. More...
 
void getRuleMappings (const std::vector< std::string > &source, const std::vector< std::string > &translation, unordered_map< uint, uint > *mappings)
 Given a source and translation of the same rule, sharing the same non-terminals in RHS, returns correspondences between source and target non-terminal indices. For example: X a_Y_Z_b c_Z_Y_d: mappings[0]=1; mappings[1]=0. More...
 
bool phraseIsTerminalWord (const std::string &phrase)
 
void setScales (const ucam::util::RegistryPO &rg, unsigned *offset, const std::string &lmscales=HifstConstants::kLmFeatureweights, const std::string &grammarscales=HifstConstants::kRuleflowerlatticeFeatureweights, const std::string &featureweights=HifstConstants::kFeatureweights, const std::string &lmload=HifstConstants::kLmLoad)
 Sets scales using environment parameter (see sparse tuple weight semiring file), or grammar scales and language model scales. If these are active, the environment parameter will not be used. More...
 
int getLocalLmIndex (ucam::util::RegistryPO const *rg=NULL)
 

Variables

const int max_length = 1024
 

Typedef Documentation

typedef std::basic_string<uint> ucam::hifst::cykparser_rulebpcoordinates_t

Definition at line 28 of file defs.cykparser.hpp.

typedef std::basic_string<uint> ucam::hifst::cykparser_sentence_t

Definition at line 27 of file defs.cykparser.hpp.

typedef unordered_map<std::string, uint> ucam::hifst::grammar_categories_t

Definition at line 27 of file defs.grammar.hpp.

typedef unordered_map< uint, std::string > ucam::hifst::grammar_inversecategories_t

Definition at line 28 of file defs.grammar.hpp.

typedef boost::shared_ptr< tcp::socket > ucam::hifst::socket_ptr

Definition at line 30 of file main-run.createssgrammar.hpp.

Definition at line 31 of file defs.ssgrammar.hpp.

typedef unordered_map<std::string, std::vector< pair <uint, uint> > > ucam::hifst::ssgrammar_instancemap_t

Definition at line 34 of file defs.ssgrammar.hpp.

typedef std::basic_string<uint> ucam::hifst::ssgrammar_listofrules_t

Definition at line 29 of file defs.ssgrammar.hpp.

Definition at line 32 of file defs.ssgrammar.hpp.

Function Documentation

void ucam::hifst::getFilteredNonTerminal ( std::string &  word)
inline

Return the filtered non-terminal name. For example, for the rule Z 3_XT2_5 XT2, getFilteredNonTerminal("XT2") should return XT.

Parameters
wordThe non-terminal we want to filter.

Definition at line 56 of file data.grammar.utilities.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

int ucam::hifst::getLocalLmIndex ( ucam::util::RegistryPO const *  rg = NULL)
inline

Definition at line 144 of file task.hifst.makeweights.hpp.

void ucam::hifst::getRuleMappings ( const std::vector< std::string > &  source,
const std::vector< std::string > &  translation,
unordered_map< uint, uint > *  mappings 
)
inline

Given a source and translation of the same rule, sharing the same non-terminals in RHS, returns correspondences between source and target non-terminal indices. For example: X a_Y_Z_b c_Z_Y_d: mappings[0]=1; mappings[1]=0.

Parameters
sourceRHS source of a rule
translationRHS target of a rule
mappingsAfter finished, contains correspondences between source and target non-terminal indexes

Definition at line 73 of file data.grammar.utilities.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

const uint ucam::hifst::getSize ( const std::string &  rhs)
inline

A generic element counter that can be used to any string. It is intended to use with either source or target side of a rule (right-hand side parts of the synchronous rule).

Parameters
rhssource or target.

Definition at line 33 of file data.grammar.utilities.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

bool ucam::hifst::isTerminal ( const std::string &  word)
inline

Determine if the element is a terminal (i.e. a word, represented by a number) or a non-terminal (i.e. ^[A-Z]+(0-9)?). Only first position is checked.

Parameters
wordthe element we are checking.

Definition at line 44 of file data.grammar.utilities.hpp.

Here is the caller graph for this function:

bool ucam::hifst::phraseIsTerminalWord ( const std::string &  phrase)
inline

Convenience function to detect whether a phrase is actually a single word. IMPORTANT: By convention, non-terminals have a first capital letter followed by any number of letters/numbers. If the last character is a number, it will be considered as an index, not therefore in the definition of the non-terminal, e.g. X,X1,X2 are all X.

Definition at line 234 of file data.ssgrammar.hpp.

Here is the caller graph for this function:

void ucam::hifst::setScales ( const ucam::util::RegistryPO rg,
unsigned *  offset,
const std::string &  lmscales = HifstConstants::kLmFeatureweights,
const std::string &  grammarscales = HifstConstants::kRuleflowerlatticeFeatureweights,
const std::string &  featureweights = HifstConstants::kFeatureweights,
const std::string &  lmload = HifstConstants::kLmLoad 
)

Sets scales using environment parameter (see sparse tuple weight semiring file), or grammar scales and language model scales. If these are active, the environment parameter will not be used.

Parameters
offsetThe number of language models will be stored here, as an offset for the grammar scales in the sparse tropical tuple weight semiring
lmscalesKey to access registry object with commandline parameter for language model scales
grammarscalesKey to access registry object with commandline parameter for grammar scales

Dump scales

Definition at line 37 of file main-run.alilats2splats.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

Variable Documentation

const int ucam::hifst::max_length = 1024

Definition at line 29 of file main-run.createssgrammar.hpp.