In the first task, we are interested in your preference between two options for scoring dependencies; this is for the purpose of evaluating dependencies produced by a statistical parser against ground-truth dependencies for a given sentence.
Formally, a dependency is defined as a 4-tuple (i, j, s, c): i and j are indices identifying the endpoints of the dependency edge such that the word at index i is the head and the word at index j is the dependant; s is the categorial slot number of the argument filled by the dependency; and c is a categorial label (such as NP, S\NP, N/N, etc.).
In our context, a statistical parser predicts both lexical categories and dependencies; i.e., the parser is given input text only and must supertag the sentence along with parsing the sentence to find the dependencies. When a statistical parser is evaluated, its generated lexical categories and dependencies are compared with the ground truth to find errors.
For each sentence below, a statistical parser was used to predict lexical categories and dependencies. We then used two different dependency scoring methods, referred to as A and B, to identify parser errors in the predicted outputs when evaluated against the ground truth for the sentence as specified by our corpus. The two scoring methods differ in their choice of label c for a dependency tuple (i, j, s, c), as well as their choice of which dependencies are considered erroneous.
In the figures below:
For each sentence, you must choose between A and B. Select (click) the labelling that, in your professional opinion, better identifies the ultimate error in the statistical parser's output, as judged against the ground truth. In making your judgements, consider the (likely) semantics corresponding to the generated dependencies and how close they are to the semantics corresponding to the ground-truth dependencies. Ties are not permitted.
NB: We are not asking you to choose between different candidate parses, or to judge the ground truth; rather, we are asking you to choose between two dependency labelling/scoring methods for a single sentence, parser output, and ground truth, on the basis of the extent to which you agree with the scoring method's determination of erroneous dependencies.
At any point, you may leave or close this window and return to this URL later; each selection is automatically saved and will be restored upon returning. (The selections are associated with your email address, judge1@pilot.study.)
When you are done, hit the Verify button at the bottom of the page to confirm that you have evaluated all sentences.
As per your preference, you may change the notation used for the lexical categories: