|
 |  |  |  | #3 Cross-Lingual Word Sense Disambiguation 
Description There is a general feeling in the WSD community that WSD should not be considered as an isolated research task, but should be integrated in real NLP applications such as Machine translation or multilingual IR. Using translations from a corpus instead of human defined (e.g. WordNet) sense labels, makes it easier to integrate WSD in multilingual applications, solves the granularity problem that might be task-dependent as well, is language-independent and can be a valid alternative for languages that lack sufficient sense-inventories and sense-tagged corpora.
We propose an Unsupervised Word Sense Disambiguation task for English nouns by
means of parallel corpora. The sense label is composed of translations in the
different languages and the sense inventory is built up by three annotators on
the basis of the Europarl parallel corpus by
means of a concordance tool. All translations (above a predefined frequency
threshold) of a polysemous word are grouped into clusters/"senses" of that given
word.
Languages: English - Dutch, French, German, Italian, Spanish
Subtasks:
1. Bilingual Evaluation (English - Language X)
Example:
[English] ... equivalent to giving fish to people living on the [bank] of the river ...
Sense Label = {oever/dijk} [Dutch]
Sense Label = {rives/rivage/bord/bords} [French]
Sense Label = {Ufer} [German]
Sense Label = {riva} [Italian]
Sense Label = {orilla} [Spanish]
2. Multi-lingual Evaluation (English - all target languages)
Example:
... living on the [bank] of the river ...
Sense Label = {oever/dijk, rives/rivage/bord/bords, Ufer, riva, orilla}
Resources
As the task is formulated as an unsupervised WSD task, we will not annotate any training material.
Participants can use the Europarl corpus that is freely
available and that will be used for building up the sense inventory.
For the test data, native speakers will decide on the correct translation cluster(s) for each test sentence and give their top-3 translations from the predefined list of Europarl translations,
in order to assign weights to the translations from the answer clusters for that
test sentence.
Participants will receive manually annotated development and test data:
- Development/sample data: 5 polysemous English nouns, each with 20 example instances
- Test data: 20 polysemous English nouns (selected from the test data as used in the lexical substitution task), each with 50 test instances
Evaluation
The evaluation will be done using precision and recall. We will perform both a
"best result" evaluation (the first translation returned by a system) and a more
relaxed evaluation for the "top ten" results (the first ten translations
returned by a system).
Organizers: Els Lefever and Veronique Hoste (University College Ghent, Belgium) Web Site: http://webs.hogent.be/~elef464/lt3_SemEval.html
[ Ranking]
Timeline:
- Test data availability: 22 March - 25 March , 2010
- Result submission deadline: within 4 days after downloading the *test*
data.
|
|
 |  |  |  | #11 Event Detection in Chinese News Sentences 
Description The goal of the task is to detect and analyze some basic event contents in real world Chinese news texts. It consists of finding key verbs or verb phrases to describe these events in the Chinese sentences after word segmentation and part-of-speech tagging, selecting suitable situation description formula for them, and anchoring different situation arguments with suitable syntactic chunks in the sentence. Three main sub-tasks are as follows:
- Target verb WSD: to recognize whether there are some key verbs or verb phrases to describe two focused event contents in the sentence, and select suitable situation description formula for these recognized key verbs (or verb phrases), from a situation network lexicon.
The input of the sub-task is a Chinese sentence annotated with correct word-segmentation and POS tags. Its output is the sense selection or disambiguation tags of the target verbs in the sentence.
- Sentence SRL: to anchor different situation arguments with suitable syntactic chunks in the sentence, and annotate suitable syntactic constituent and functional tags for these arguments.
Its input is a Chinese sentence annotated with correct word-segmentation, POS tags and the sense tags of the target verbs in the sentence. Its output is the syntactic chunk recognition and situation argument anchoring results.
- Event detection: to detect and analyze the special event content through the interaction of target verb WSD and sentence SRL.
Its input is a Chinese sentence annotated with correct word-segmentation and POS tags. Its output is a complete event description detected in the sentence (if it has a focused target verb).
The following is a detailed example to explain the above procedure:
For such a Chinese sentence after word-segmentation and POS tagging:
今天/n(Today) 我/r(I) 在/p(at) 书店/n(bookstore) 买/v(buy) 了/u(-ed) 三/m(three) 本/q 新/a(new) 书/n(book) 。/w (Today, I bought three new books at the bookstore.)
After the first processing stage: target verb WSD, we find there is a possession-transferring verb ‘买/v(buy)’ in the sentence and select the following situation description formula for it:
买/v(buy): DO(x, P(x,y)) CAUSE have(x,y) AND NOT have(z,y) [P=buy]
Then, we anchor four situation arguments with suitable syntactic chunks in the sentence and obtain the following sentence SRL result:
今天/n(Today) [S-np 我/r(I) ]x [D-pp 在/p(at) 书店/n(bookstore) ]z [P-vp 买/v(buy) 了/u(-ed) ]Tgt [O-np 三/m(three) 本/q 新/a(new) 书/n(book) ]y 。/w[2]
Finally, we can get the following situation description for the sentence:
DO(x, P(x,y)) CAUSE have(x,y) AND NOT have(z,y) [x=我/r(I), y=三/m(three) 本/q 新/a(new) 书/n(book), z=书店/n(bookstore), P=买/v(buy)]
Organizers: Qiang Zhou (Tsinghua University, Beijing, China) Web Site: http://www.ncmmsc.org/SemEval-2010-Task/
[ Ranking] |
|
 |  | #14 Word Sense Induction 
Description
This task is a continuation of the WSI task (i.e. Task 2) of SemEval
2007 (nlp.cs.swarthmore.edu/semeval/tasks/task02/summary.shtml)
with
some significant changes to the evaluation setting.
Word
Sense Induction (WSI) is defined as the process of identifying the
different senses (or uses) of a target word in a given text in an
automatic and fully-unsupervised manner.
The goal of this task is to allow comparison of unsupervised sense
induction and disambiguation systems. A secondary outcome of this task
will be to provide a comparison with current supervised and
knowledge-based methods for sense disambiguation.
The evaluation scheme consists of the following assessment
methodologies:
- Unsupervised Evaluation. The induced senses are
evaluated as clusters of examples, and compared to sets of examples,
which have been tagged with gold standard (GS) senses. The evaluation
metric used, V-measure (Rosenberg & Hirschberg, 2007), attempts to
measure both coverage and homogeneity of a clustering solution, where a
perfect homogeneity is
achieved if all the clusters of a clustering solution contain only data
points, which are elements of a single Gold Standard (GS) class. On the
other hand, a perfect coverage is achieved if all the
data points, which are members of a given class are also elements of
the same cluster. Homogeneity and completeness can be treated in
similar fashion to precision and recall, where increasing the former
often results in decreasing the latter (Rosenberg & Hirschberg,
2007).
- Supervised Evaluation. The second
evaluation setting, supervised evaluation, assesses WSI systems in a
WSD task. A mapping is created between induced sense clusters (from the
unsupervised evaluation described above) and the actual GS senses. The
mapping matrix is then used to tag
each instance in the testing corpus with GS senses. The usual
recall/precision measures for WSD
are then used. Supervised evaluation was a part of the
SemEval-2007 WSI task (Agirre & Soroa,2007).
References
Andrew Rosenberg and Julia Hirschberg. V-Measure: A Conditional
Entropy-Based External Cluster Evaluation Measure. Proceedings of the
2007 Joint Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language Learning (EMNLP-CoNLL)
Prague, Czech Republic, (June 2007). ACL.
Eneko Agirre and Aitor Soroa. Semeval-2007 task 02: Evaluating word
sense induction and discrimination systems. In Proceedings of the
Fourth International Workshop on Semantic Evaluations, pp. 7-12,
Prague, Czech Republic, (June 2007). ACL.
Organizers: Suresh Manandhar (University of York), Ioannis Klapaftis (University of York), and Dmitriy Dligach, (University of Colorado) Web Site: http://www.cs.york.ac.uk/semeval2010_WSI/
[ Ranking] |
|
 |  | #15 Infrequent Sense Identification for Mandarin Text to Speech Systems 
Description There are seven cases of grapheme to phoneme (GTP) in a text to speech (TTS) system (Yarowsky, 1997). Among them, the most difficult task is disambiguating the homograph word, which has the same POS (part of speech) but different pronunciation. In this case, different pronunciations of the same word always correspond to different word senses. Once the word senses are disambiguated, the problem of GTP is resolved.
There is a little different from traditional WSD (word sense disambiguation), in this task two or more senses may correspond to one pronunciation. That is, the sense granularity is coarser than WSD. For example, the preposition “为” has three senses: sense1 and sense2 have the same pronunciation {wei 4}, while sense3 corresponds to {wei 2}. In this task, to the target word, not only the pronunciations but also the sense labels are provided for training; but for test, only the pronunciations are evaluated. The challenge of this task is the much skewed distribution in real text: the most frequent pronunciation occupies usually over 80%.
In this task, we will provide a large volume of training data (each homograph word has at least 300 instances) accordance with the truly distribution in real text. In the test data, we will provide at least 100 instances for each target word. In order to focus on the performance of identifying the infrequent sense, we will intentionally divide the infrequent pronunciation instances and frequent instances half and half in the test dataset. The evaluation method compiles with the precision vs. recall evaluation.
All instances come from People Daily newspaper (the most popular newspaper in Mandarin). Double blind annotations are executed manually, and a third annotator checks the annotation.
References:
Yarowsky, David (1997). “Homograph disambiguation in text-to-speech synthesis.” In van Santen, Jan T. H.; Sproat, Richard; Olive, Joseph P.; and Hirschberg, Julia. Progress in Speech Synthesis. Springer-Verlag, New York, 157-172.
Organizers: Peng Jin, Yunfang Wu and Shiwen Yu (Peking University Beijing, China) Web Site:
[ Ranking]
Timeline:
- Test data release: March 25, 2010
- Result submission deadline: March 29, 2010,
- Organizers send the test results: April 2, 2010
|
|
 |  | #16 Japanese WSD 
Description This task can be considered an extension of SENSEVAL-2 JAPANESE
LEXICAL SAMPLE Monolingual dictionary-based task. Word senses are
defined according to the Iwanami Kokugo Jiten, a Japanese dictionary
published by Iwanami Shoten. Please refer to that task for
reference. We think that our task has the following two new
characteristics:
1) All previous Japanese sense-tagged corpora were from newspaper
articles, while sense-tagged corpora have been constructed in
English on balanced corpora, such as Brown corpus and BNC
corpus. The first balanced corpus of contemporary written Japanese
(BCCWJ corpus) is now being constructed as part of a national
project in Japan [Maekawa, 2008], and we are now constructing a
sense-tagged corpus on it. Therefore, the task will use the first
balanced Japanese sense-tagged corpus.
2) In previous WSD tasks, systems have been required to select a sense
from a given set of senses in a dictionary for a word in one
context (an instance). However, the set of senses in the dictionary
is not always complete. New word senses sometimes appear after the
dictionary has been compiled. Therefore, some instances might have
a sense that cannot be found in a set in the dictionary. The task
will take into account not only the instances having a sense in the
given set but also the instances having a sense that cannot be
found in the set. In the latter case, systems should output that
the instances have a sense that is not in the set.
Organizers: Manabu Okumura (Tokyo Institute of Technology), Kiyoaki Shirai (Japan Advanced Institute of Science and Technology) Web Site: http://lr-www.pi.titech.ac.jp/wsd.html |
|
 |  | #17 All-words Word Sense Disambiguation on a Specific Domain (WSD-domain) 
Description Domain adaptation is a hot issue in Natural Language Processing, including Word Sense Disambiguation. WSD systems trained on general corpora are known to perform worse when moved to specific domains. WSD-domain task will offer a testbed for domain-specific WSD systems, and will allow to test domain portability issues.
Texts from ECNC and WWF will be used in order to build domain specific test copora (see example below). The data will be available in a number of languages: English, Dutch and Italian, and possibly Basque and Chinese (confirmation pending). The sense inventories will be based on wordnets of the respective languages.
The test data will comprise three documents (6000 word chunk with approx. 2000 target words) for each language. The test data will be annotated by hand using double-blind annotation plus adjudication. Inter-Tagger Agreement will be measured. There will not be training data available, but participants are free to use existing hand-tagged corpora and lexical resources. Traditional precision and recall measures will be used in order to evaluate the participant systems, as implemented in past WSD Senseval and SemEval tasks.
WSD-domain is being developed in the framework of the Kyoto project (http://www.kyoto-project.eu/).
Environment domain text example:
"Projections for 2100 suggest that temperature in Europe will have risen by between 2 to 6.3 °C above 1990 levels. The sea level is projected to rise, and a greater frequency and intensity of extreme weather events are expected. Even if emissions of greenhouse gases stop today, these changes would continue for many decades and in the case of sea level for centuries. This is due to the historical build up of the gases in the atmosphere and time lags in the response of climatic and oceanic systems to changes in the atmospheric concentration of the gases."
Organizers: Eneko Agirre and Oier Lopez de Lacalle (Basque Country University) Web Site: http://xmlgroup.iit.cnr.it/SemEval2010/
Timeline:
- Test data release: March 26
- Closing competition : April 2
|
|
 |  |  |  | #18 Disambiguating Sentiment Ambiguous Adjectives 
Description Some adjectives are neutral in sentiment polarity out of context, but they show positive, neutral or negative meaning within specific context. Such words can be called dynamic sentiment ambiguous adjectives. For instance, “价格高|the price is high” indicates negative meaning, while “质量高|the quality is high” has positive connotation. Disambiguating sentiment ambiguous adjectives is an interesting task, which is an interaction between word sense disambiguation and sentiment analysis. However in the previous works, sentiment ambiguous words have not been tackled in the field of WSD, and are also discarded crudely by most of the researches concerning sentiment analysis.
This task aims to create a benchmark dataset for disambiguating dynamic sentiment ambiguous adjectives. The sentiment ambiguous words are pervasive in many languages. In this task we concentrate on Chinese, but we think, the disambiguating techniques should be language-independent.
Together 14 dynamic sentiment ambiguous adjectives are selected, which are all high-frequency words in Mandarin Chinese. They are: 大|big, 小|small, 多|many, 少|few, 高|high, 低|low, 厚|thick, 薄|thin, 深|deep, 浅|shallow, 重|heavy, 轻|light, 巨大|huge, 重大|grave.
The dataset contains two parts. Some sentences containing these target adjectives will be extracted from Chinese Gigaword (LDC corpus: LDC2005T14). And the other sentences will be gathered through the search engine like Google. Firstly these sentences will be automatically segmented and POS-tagged. And then the ambiguous adjectives are manually annotated with the correct sentiment polarity within the sentence context. Two human annotators will annotate the sentences double blindly. The third annotator will check the annotation.
This task will be carried out in an unsupervised setting, and consequently no training data will be provided. All the data of about 4,000 sentences will be provides as the test set. Evaluation will be performed in terms of the usual precision, recall and F1 scores.
Organizers: Yunfang Wu, Peng Jin, Miaomiao Wen and Shiwen Yu (Peking University, Beijing, China) Web Site:
[ Ranking]
Timeline:
- Test data release: March 23, 2010
- Result submission deadline: postponed at March 27, 2010, 4 days after downloading the test data
- Organizers send the test results: April 2, 2010
|
|
|