 |  |  |  | #3 Cross-Lingual Word Sense Disambiguation 
Description There is a general feeling in the WSD community that WSD should not be considered as an isolated research task, but should be integrated in real NLP applications such as Machine translation or multilingual IR. Using translations from a corpus instead of human defined (e.g. WordNet) sense labels, makes it easier to integrate WSD in multilingual applications, solves the granularity problem that might be task-dependent as well, is language-independent and can be a valid alternative for languages that lack sufficient sense-inventories and sense-tagged corpora.
We propose an Unsupervised Word Sense Disambiguation task for English nouns by
means of parallel corpora. The sense label is composed of translations in the
different languages and the sense inventory is built up by three annotators on
the basis of the Europarl parallel corpus by
means of a concordance tool. All translations (above a predefined frequency
threshold) of a polysemous word are grouped into clusters/"senses" of that given
word.
Languages: English - Dutch, French, German, Italian, Spanish
Subtasks:
1. Bilingual Evaluation (English - Language X)
Example:
[English] ... equivalent to giving fish to people living on the [bank] of the river ...
Sense Label = {oever/dijk} [Dutch]
Sense Label = {rives/rivage/bord/bords} [French]
Sense Label = {Ufer} [German]
Sense Label = {riva} [Italian]
Sense Label = {orilla} [Spanish]
2. Multi-lingual Evaluation (English - all target languages)
Example:
... living on the [bank] of the river ...
Sense Label = {oever/dijk, rives/rivage/bord/bords, Ufer, riva, orilla}
Resources
As the task is formulated as an unsupervised WSD task, we will not annotate any training material.
Participants can use the Europarl corpus that is freely
available and that will be used for building up the sense inventory.
For the test data, native speakers will decide on the correct translation cluster(s) for each test sentence and give their top-3 translations from the predefined list of Europarl translations,
in order to assign weights to the translations from the answer clusters for that
test sentence.
Participants will receive manually annotated development and test data:
- Development/sample data: 5 polysemous English nouns, each with 20 example instances
- Test data: 20 polysemous English nouns (selected from the test data as used in the lexical substitution task), each with 50 test instances
Evaluation
The evaluation will be done using precision and recall. We will perform both a
"best result" evaluation (the first translation returned by a system) and a more
relaxed evaluation for the "top ten" results (the first ten translations
returned by a system).
Organizers: Els Lefever and Veronique Hoste (University College Ghent, Belgium) Web Site: http://webs.hogent.be/~elef464/lt3_SemEval.html
[ Ranking]
Timeline:
- Test data availability: 22 March - 25 March , 2010
- Result submission deadline: within 4 days after downloading the *test*
data.
|
|