Cambridge SMT System
uk.ac.cam.eng.extraction.hadoop.util.ExtractorDataLoader Class Reference

Public Member Functions

void loadTrainingData2Hdfs (String sourceTextFile, String targetTextFile, String wordAlignmentFile, String provenanceFile, String hdfsName) throws FileNotFoundException, IOException
 

Static Public Member Functions

static void main (String[] args) throws FileNotFoundException, IOException
 

Detailed Description

Load all the word aligned parallel text onto HDFS ready to have rules extracted

Author
Aurelien Waite
Juan Pino
Date
28 May 2014

Definition at line 44 of file ExtractorDataLoader.java.

Member Function Documentation

void uk.ac.cam.eng.extraction.hadoop.util.ExtractorDataLoader.loadTrainingData2Hdfs ( String  sourceTextFile,
String  targetTextFile,
String  wordAlignmentFile,
String  provenanceFile,
String  hdfsName 
) throws FileNotFoundException, IOException
inline

Loads word aligned parallel text to HDFS.

Parameters
sourceTextFileThe source text file, gzipped, with one sentence per line, same number of lines as targetTextFile.
targetTextFileThe target text file, gzipped, with one sentence per line, same number of lines as sourceTextFile.
wordAlignmentFileThe word alignment file, gzipped, one alignment per line in Berkeley format ("0-0<SPACE>1-2, etc.", zero-based source index on the left), same number of lines as sourceTextFile.
provenanceFileThe provenance file, gzipped, one set of provenances per line with format "prov1<SPACE>prov2, etc.", same number of lines as sourceTextFile.
hdfsName
Exceptions
IOException

Definition at line 66 of file ExtractorDataLoader.java.

Here is the caller graph for this function:

static void uk.ac.cam.eng.extraction.hadoop.util.ExtractorDataLoader.main ( String[]  args) throws FileNotFoundException, IOException
inlinestatic

Definition at line 122 of file ExtractorDataLoader.java.

Here is the call graph for this function:


The documentation for this class was generated from the following file: