Evaluation Exercises on Semantic Evaluation - ACL SigLex event
#1 Coreference Resolution in Multiple Languages
Description Using coreference information has been shown to be beneficial in a number of NLP applications including Information Extraction, Text Summarization, Question Answering and Machine Translation. This task is concerned with intra-document coreference resolution for six different languages: Catalan, Dutch, English, German, Italian and Spanish. The complete task is divided into two subtasks for each of the languages:
Detection of full coreference chains, composed by named entities, pronouns, and full noun phrases.
Pronominal resolution, i.e., finding the antecedents of the pronouns in the text.
Data is provided for both statistical training and evaluation, which extract the coreference chains from manually annotated corpora: the AnCora corpora for Catalan and Spanish, the OntoNotes corpus for English, the TüBa-D/Z for German, the KNACK corpus for Dutch, and the LiveMemories corpus for Italian, additionally enriched with morphological, syntactic and semantic information (such as gender, number, constituents, dependencies, predicates, etc.). Great effort has been devoted to provide the participants with a common and relatively simple data representation for all the languages.
The main goal is to perform and evaluate coreference resolution for six different languages with the help of other layers of linguistic information and using different evaluation metrics (MUC, B-CUBED, CEAF and BLANC).
The multilingual context will allow to study the portability of coreference resolution systems across languages. To what extent is it possible to implement a general system that is portable to all six languages? How much language-specific tuning is necessary? Are there significant differences between Germanic and Romance languages? And between languages of the same family?
The additional layers of annotation will allow to study how helpful morphology, syntax and semantics are to solve coreference relations. How much preprocessing is needed? How much does the quality of the preprocessing modules (perfect linguistic input vs. noisy automatic input) affect the performance of state-of-the-art coreference resolution systems? Is morphology more helpful than syntax? Or semantics? Or is syntax more helpful than semantics?
The use of four different evaluation metrics will allow to compare the advantages and drawback of the generally used MUC, B-CUBED and CEAF measures, as well as the newly proposed BLANC measure. Do all of them provide the same ranking? Are they correlated? Can systems be optimized under all four metrics at the same time?
Two different scenarios will be considered for evaluation. In the first one, gold‐standard annotation will be provided to participants (up to full syntax and possibly including also semantic role labeling). This input annotation will correctly identify all noun phrases that are part of the coreference chains. In the second scenario we will use state‐ of‐the‐art automatic linguistic tools to generate the input annotation of the data. In this second scenario, the matching between the automatically generated structure and the real NPs intervening in the chains does not need to be perfect. By defining these two experimental settings, we will be able to check the effectiveness of state‐of‐the‐art coreference resolution systems when working with perfect linguistic (syntactic/semantic) information and the degradation in performance when moving to a realistic scenario.
In parallel, we will also differentiate between closed and open settings, that is, when participants are allowed to use strictly the information contained in the training data (closed) and when they make use of some external resources/tools (open).
Organizers: Veronique Hoste, Lluis Marquez, M. Antonia Marti, Massimo Poesio, Marta Recasens, Emili Sapena, Mariona Taule, Yannick Versley.
(Universitat de Barcelona, Universitat Politècnica de Catalunya, Hogeschool Gent, Università di Trento, Universität Tübingen) Web Site: http://stel.ub.edu/semeval2010-coref/