Task 4: Machine translation

Task definition

Machine translation is a translation of text by a computer, with no human involvement. Pioneered in the 1950s, machine translation can also be referred to as automated translation, automatic or instant translation.

Currently there are three types of machine translation system: rules-based, statistical and neural:

  • Rules-based systems use a combination of language and grammar rules plus dictionaries for common words. Specialist dictionaries are created to focus on certain industries or disciplines. Rules-based systems typically deliver consistent translations with accurate terminology when trained with specialist dictionaries.
  • Statistical systems have no knowledge of language rules. Instead they "learn" to translate by analysing large amounts of data for each language pair. They can be trained for specific industries or disciplines using additional data relevant to the sector needed. Typically, statistical systems deliver more fluent-sounding but less consistent translations.
  • Neural Machine Translation (NMT) is a new approach that makes machines learn to translate through one large neural network (multiple processing devices modelled on the brain). The approach has become increasingly popular amongst MT researchers and developers, as trained NMT systems have begun to show better translation performance in many language pairs compared to the phrase-based statistical approach.

The task is to train as good as possible machine translation system, using any technology, with limited textual resources. The competition will be done for 2 language pairs, more popular English-Polish (into Polish direction) and pair that can be called low resourced Russian-Polish (in both directions).

Training data

As the training data set, we have prepared a set of bi-lingual corpora aligned at the sentence level. The corpora are saved in UTF-8 encoding as plain text, one language per file. We divided the corpora as in-domain data and out-domain data. Using any data not listed here is not permitted.

In-domain data – the transcriptions of lectures on different topics can be downloaded from here:

DOWNLOAD PL-EN – in domain training and development data

DOWNLOAD PL-RU – in domain training and development data

Out of domain data – any corpus from the OPUS project is permissible


Test data

The test data is available here for DOWNLOAD.

Evaluation procedure

The participants are asked to translate with their systems test files and submit the results of the translations. The translated files should be aligned at the sentence level with the input (test) files. Submissions that will not be aligned will not be accepted. If any pre- or post- processing will be needed for your systems, it should be done automatically with scripts. Any kind of human input into test files is strongly prohibited.

The authors of submissions will be asked to accompany it with short system descriptions. 

Evaluation will be done with four main automatic metrics widely used in machine translation:

  • BLEU [1]
  • NIST [2]
  • TER [3]
  • METEOR [4]

Example evaluation script may be found here.


[1] Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318). Association for Computational Linguistics.

[2] Doddington, G. (2002, March). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the second international conference on Human Language Technology Research (pp. 138-145). Morgan Kaufmann Publishers Inc.

[3] Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006, August). A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas (Vol. 200, No. 6).

[4] Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65-72).