#14 Word Sense Induction 
Description
This task is a continuation of the WSI task (i.e. Task 2) of SemEval
2007 (nlp.cs.swarthmore.edu/semeval/tasks/task02/summary.shtml)
with
some significant changes to the evaluation setting.
Word
Sense Induction (WSI) is defined as the process of identifying the
different senses (or uses) of a target word in a given text in an
automatic and fully-unsupervised manner.
The goal of this task is to allow comparison of unsupervised sense
induction and disambiguation systems. A secondary outcome of this task
will be to provide a comparison with current supervised and
knowledge-based methods for sense disambiguation.
The evaluation scheme consists of the following assessment
methodologies:
- Unsupervised Evaluation. The induced senses are
evaluated as clusters of examples, and compared to sets of examples,
which have been tagged with gold standard (GS) senses. The evaluation
metric used, V-measure (Rosenberg & Hirschberg, 2007), attempts to
measure both coverage and homogeneity of a clustering solution, where a
perfect homogeneity is
achieved if all the clusters of a clustering solution contain only data
points, which are elements of a single Gold Standard (GS) class. On the
other hand, a perfect coverage is achieved if all the
data points, which are members of a given class are also elements of
the same cluster. Homogeneity and completeness can be treated in
similar fashion to precision and recall, where increasing the former
often results in decreasing the latter (Rosenberg & Hirschberg,
2007).
- Supervised Evaluation. The second
evaluation setting, supervised evaluation, assesses WSI systems in a
WSD task. A mapping is created between induced sense clusters (from the
unsupervised evaluation described above) and the actual GS senses. The
mapping matrix is then used to tag
each instance in the testing corpus with GS senses. The usual
recall/precision measures for WSD
are then used. Supervised evaluation was a part of the
SemEval-2007 WSI task (Agirre & Soroa,2007).
References
Andrew Rosenberg and Julia Hirschberg. V-Measure: A Conditional
Entropy-Based External Cluster Evaluation Measure. Proceedings of the
2007 Joint Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language Learning (EMNLP-CoNLL)
Prague, Czech Republic, (June 2007). ACL.
Eneko Agirre and Aitor Soroa. Semeval-2007 task 02: Evaluating word
sense induction and discrimination systems. In Proceedings of the
Fourth International Workshop on Semantic Evaluations, pp. 7-12,
Prague, Czech Republic, (June 2007). ACL.
Organizers: Suresh Manandhar (University of York), Ioannis Klapaftis (University of York), and Dmitriy Dligach, (University of Colorado)
Web Site: http://www.cs.york.ac.uk/semeval2010_WSI/
[
Ranking]