HOME   CALL FOR TASKS CALL FOR INTEREST Workshop Submission Workshop PROGRAM NEWS   REGISTRATION TASKS (short list) TASKS (with rankings) DATA   [41 available]
LRE special issue  


Evaluation Exercises on Semantic Evaluation - ACL SigLex event



Available tasks:   1 

#14  Word Sense Induction 


This task is a continuation of the WSI task (i.e. Task 2) of SemEval 2007 (nlp.cs.swarthmore.edu/semeval/tasks/task02/summary.shtml) with
some significant changes to the evaluation setting.

Word Sense Induction (WSI) is defined as the process of identifying the different senses (or uses) of a target word in a given text in an automatic and fully-unsupervised manner. The goal of this task is to allow comparison of unsupervised sense induction and disambiguation systems. A secondary outcome of this task will be to provide a comparison with current supervised and knowledge-based methods for sense disambiguation. The evaluation scheme consists of the following assessment methodologies:

  • Unsupervised Evaluation. The induced senses are evaluated as clusters of examples, and compared to sets of examples, which have been tagged with gold standard (GS) senses. The evaluation metric used, V-measure (Rosenberg & Hirschberg, 2007), attempts to measure both coverage and homogeneity of a clustering solution, where a perfect homogeneity is achieved if all the clusters of a clustering solution contain only data points, which are elements of a single Gold Standard (GS) class. On the other hand, a perfect coverage is achieved if all the data points, which are members of a given class are also elements of the same cluster. Homogeneity and completeness can be treated in similar fashion to precision and recall, where increasing the former often results in decreasing the latter (Rosenberg & Hirschberg, 2007).

  • Supervised Evaluation. The second evaluation setting, supervised evaluation, assesses WSI systems in a WSD task. A mapping is created between induced sense clusters (from the unsupervised evaluation described above) and the actual GS senses. The mapping matrix is then used to tag each instance in the testing corpus with GS senses. The usual recall/precision measures for WSD are then used. Supervised evaluation was a part of the SemEval-2007 WSI task (Agirre & Soroa,2007).

Andrew Rosenberg and Julia Hirschberg. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) Prague, Czech Republic, (June 2007). ACL.

Eneko Agirre and Aitor Soroa. Semeval-2007 task 02: Evaluating word sense induction and discrimination systems. In Proceedings of the Fourth International Workshop on Semantic Evaluations, pp. 7-12, Prague, Czech Republic, (June 2007). ACL.

Organizers: Suresh Manandhar (University of York), Ioannis Klapaftis (University of York), and Dmitriy Dligach, (University of Colorado)
Web Site: http://www.cs.york.ac.uk/semeval2010_WSI/

[ Ranking]

© 2008 FBK-irst  |  internal area