Evaluation Exercises on Semantic Evaluation - ACL SigLex event
#8 Multi-Way Classification of Semantic Relations Between Pairs of Nominals
Description Recently, the NLP community has shown a renewed interest in deeper semantic analyses, among them automatic recognition of semantic relations between pairs of words. This is an important task with many potential applications including but not limited to Information Retrieval, Information Extraction, Text Summarization, Machine Translation, Question Answering, Paraphrasing, Recognizing Textual Entailment, Thesaurus Construction, Semantic Network Construction, Word Sense Disambiguation, and Language Modelling.
Despite the interest, progress was slow due to incompatible classification schemes, which made direct comparisons hard. In addition, most datasets provided no context for the target relation, thus relying on the assumption that semantic relations are largely context-independent, which is often false. A notable exception is SemEval-2007 Task 4 (Girju&al.,2007), which for the first time provided a standard benchmark dataset for seven semantic relations in context. However, this dataset treated each relation separately, asking for positive vs. negative classification decisions. While some subsequent publications tried to use the dataset in a multi-way setup, it was not designed to be used in that manner.
We believe that having a freely available standard benchmark dataset for *multi-way* semantic relation classification *in context* is much needed for the overall advancement of the field. That is why we pose as our primary objective the task of preparing and releasing such a dataset to the research community.
We will use nine mutually exclusive relations from Nastase & Szpakowicz (2003). Тhe dataset for the task will consist of annotated sentences, gathered from the Web and manually marked -- with indicated nominals and relations. We will provide 1000 examples for each relation, which is a sizeable increase over the SemEval-2007 Task 4, where there were about 210 examples for each of the seven relations. There will be also a NONE relation, for which we will have 1000 examples as well.
Using that dataset, we will set up a common evaluation task that will enable researchers to compare their algorithms. The official evaluation score will be average F1 over all relations, but we will also check whether some relations are more difficult to classify than others, and whether some algorithms are best suited for certain types of relations. Trial data and an automatic scorer will be made available well in advance (by June 2009). All data will be released under a Creative Commons license.