tercpp - Documentation

1 - Description

tercpp is an open-source Translation Edit Rate (TER) scorer tool for Machine Translation.

It implements the Snover's algorithm provided at http://www.cs.umd.edu/~snover/tercom

References:
Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla and John Makhoul, "A Study of Translation Edit Rate with Targeted Human Annotation," Proceedings of Association for Machine Translation in the Americas, 2006.
Matthew Snover, Bonnie J. Dorr, Richard Schwartz, John Makhoul, Linnea Micciulla and Ralph Weischedel, "A Study of Translation Error Rate with Targeted Human Annotation," LAMP-TR-126, CS-TR-4755, UMIACS-TR-2005-58, University of Maryland, College Park, MD July, 2005.

2 - Options

tercpp [--tercom] [--sgml] [--debugMode] [--noTxtIds] [--printAlignments] [-s|-c] [-P] -r ref[,ref2...] -h hyp

`--tercom`	to use the tercom standart normalization
`--noTxtIds`	you don't have to add ids at the end of sentences
`--sgml`	to score with sgml files (incompatible with --noTxtIds and plain text files)
`--debugMode`	print debug messages
`-s or -c`	to be case sensitive
`-P`	do not take account of punctuation
`--help`	print this help message.
`--printAlignments`	print all the final alignements in a separate output file

2 - Common usage examples

Simplest example (evaluating a hypothesis file and a reference file):
tercpp --noTxtIds -r ref.txt -h hyp.txt
In this example, the hypothesis file "hyp.txt" is evaluated regarding the file "ref.txt" as reference.
Each line of the hypothesis have to correspond to each line of the reference.
In other terms, hypothesis and reference files must have the amount of lines

3 - Contact

christophe.servan@lium.univ-lemans.fr