Shared Task Instructions

Getting the data

Print the licenses for the two treebanks, fill them out and sign them, fax them to Sandra Kuebler: +1 (812) 855-5363.

The licenses can be downloaded from here:

for Tueba-D/Z

Sandra will then send you the username and the password for the web page from which you can download the data. Right now, you can download the training data for the shared task from the following web directory:

There, you find tarred and gzipped filed for the dependency and the constituent versions. For each treebank, we provide a training and a development set. Those of you who are familiar with TIGER will notice that we have restricted the training set for Tiger to match the size of the TueBa set. The sets contain sentences up to length 40. The same will hold for the test set.

Data formats

The constituent data are in a bracketing format similar to the Penn treebank format, except that we deleted all the extra whitespace and newlines. So each sentence constitutes one line in the file. Sentences are separated by empty lines. Constituent labels are separated from the grammatical functions by a minus (-) sign. The annotation has been converted into a true tree format, all trees are dominated by a VROOT node.

The dependency data follow the CoNLL format. More information can be found at: The conversion has been carried out by Yannick Versley (thanks a lot, Yannick), the resulting dependency annotation is similar to the Hamburg dependency format.

The task definition

The shared task is to parse either the constituent versions or the dependency versions (or a combination of both). We will provide the test data sets on March 5 and expect the parsed data on March 10. We will announce the results of the evaluation to the participants on March 12.

The test data set will consist of sentences with gold POS tags. The task is restricted to parsing only. We will not evaluate POS tags for parsers that assign them automatically as part of the parsing process.