Cambridge SMT System
|
Lower casing/Tokenization/Detokenization not available for open source release. More...
Go to the source code of this file.
Namespaces | |
ucam | |
ucam::util | |
Functions | |
void | ucam::util::tokenize (const std::string &is, std::string *os, const std::string languagespecific="") |
Not implemented, just pass through. More... | |
void | ucam::util::detokenize (const std::string &is, std::string *os, std::string languagespecific="") |
Not implemented, just pass through. More... | |
void | ucam::util::addSentenceMarkers (std::string &sentence) |
Adds sentence markers <s>, </s> to a sentence. More... | |
void | ucam::util::deleteSentenceMarkers (std::string &sentence) |
Deletes sentence markers 1/2 or <s>/</s> for a sentence. More... | |
void | ucam::util::capitalizeFirstWord (std::vector< std::string > &words) |
Simple function that capitalizes first word and first word of sentence and first word. More... | |
void | ucam::util::capitalizeFirstWord (std::string &words) |
Alternative implementation using a string as input/output. More... | |
Lower casing/Tokenization/Detokenization not available for open source release.
Definition in file tokenizer.osr.hpp.