Evaluation Exercises on Semantic Evaluation - ACL SigLex event
#4 VP Ellipsis - Detection and Resolution
Verb Phrase Ellipsis (VPE) occurs in the English language when an
auxiliary or modal verb abbreviates an entire verb phrase recoverable
from the linguistic context, as in the following examples:
Both Dr. Mason and Dr. Sullivan [oppose federal funding for abortion], as does President Bush, except in cases where a
woman's life is threatened.
They also said that vendors were [delivering goods] more quickly
in October than they had for each of the five previous months.
He spends his days [sketching passers-by], or trying to.
Here occurrences of VPE are typeset in a bold face font. The antecedent is
marked by square
The proposed shared task consists of two subtasks: (1) automatically
detecting VPE in free text; and (2) selecting the textual antecedent
of each found VPE. Task 1 is reasonably difficult (Nielsen 2004
reports an F-score of 71% on Wall Street Journal data).
Task 2 is
challenging. With a "head match" evaluation Hardt 1997 reports a
success rate of 62% for a baseline system based on recency only, and
an accurracy of 84% for an improved system taking recency, clausal
relations, parallelism, and quotation into account. We will make the
task more realistic (but more difficult) by not using head match but
rather precision and recall over each token of the antecedent.
We will provide texts where sentence boundaries are detected and each
sentence is tokenised and printed on a new line. An occurrence of VPE
is marked by a line number plus token positions of the auxiliary or
modal verb. Textual antecedents are assumed to be on one line, and are
marked by the line number plus begin/end token position.
As development data we will provide the stand-off annotation of more
than 500 occurrences of manually annotated VPE in the Wall Street
Journal part (all 25 sections) of the Penn Treebank. We have made an
arrangement with the Linguistic Data Consortium that participants
without access to the Penn Treebank can use the raw texts for the
duration of the shared task.
We will also produce a script that calculates precision and recall of
detection and the average F-score and accuracy of antecedent selection
based on overlap with a gold standard antecedent.
The test data will be a further collection of newswire (or similar
genre) articles. The "gold" standard of the test data will be
determined by using the merged results of all task
participants. Additionally, these will be manually judged by the
Daniel Hardt (1997): An Empirical Approach to VP Ellipsis.
Leif A. Nielsen (2004): Verb phrase ellipsis detection using automatically
parsed text. Proceedings of the 20th international Conference on
Computational Linguistics (Geneva, Switzerland).