Evaluation Exercises on Semantic Evaluation - ACL SigLex event
#5 Automatic Keyphrase Extraction from Scientific Articles
Description Keyphrases are words that capture the main topic of the document. As keyphrases represent the key ideas of documents, extracting good keyphrases benefits various natural language processing (NLP) applications, such as summarization, information retrieval (IR) and question-answering (QA). In summarization, the keyphrases can be used as a semantic metadata. In search engines, keyphrases can supplement full-text indexing and assist users in creating good queries. Therefore, the quality of keyphrases has a direct impact on the quality of downstream NLP applications.
Recently, several systems and techniques have been proposed to extract keyphrases. Hence, we propose a shared task in order to provide the chance to compete and benchmark such technologies.
In the shared task, the participants will be provided with set of scientific articles and will be asked to produce the keyphrases for each article.
The organizers will provide trial, train and test data. The average length of the articles is between 6 and 8 pages including tables and pictures. We will provide two sets of answers: author-assigned keyphrases and reader-assigned keyphrases. All reader-assigned keyphrases will be extracted from the papers whereas some of author-assigned keyphrases may not occur in the content.
The answer set contains lemmatized keyphrases. We also accept two alternation of keyphrase: A of B -> B A (e.g. policy of school = school policy) and A's B (e.g. school's policy = school policy). However, in case that the semantics has been changed due to the alternation, we do not include the alternation as the answer set.
In this shared task, we follow the traditional evaluation metric. That is, we match the keyphrases in the answer sets (i.e. author-assigned keyphrases and reader-assigned keyphrases) with those participants provide and calculate precision, recall and F-score. Then finally, we will rank the participants by F-score.
Test and training data release : Feb. 15th (Monday)
Closing competition : March 19th (5 weeks for competition) (Friday)
Results out : by March 31st
Submission of description papers: April 17, 2010
Notification of acceptance: May 6, 2010
Workshop: July 15-16, 2010 ACL Uppsala
#6 Classification of Semantic Relations between MeSH Entities in Swedish Medical Texts
There is a growing interest and, consequently, a volume of publications related to the topic of relation classification in the medical domain. Algorithms for classifying semantic relations have potential applications in many language technology applications and there has been a renewed interest during the last years. If such semantic relations can be determined, the potential of obtaining more accurate results for systems and applications such as Information Retrieval and Extraction, Summarization, Question Answering, etc. increases, particularly since searching to mere co-occurrence of terms is unfocused and does not by any means guarantee that there can be a relation between the identified terms of interest. For instance, knowing the relationship that prevails between a medication and a disease or symptom should be useful for searching free text and easier obtaining answers to questions such as “What is the effect of treatment with substance X to the disease Y?”,
Our task "Classification of Semantic Relations between MeSH Entities in Swedish Medical Texts" deals with the classification of semantic relations between pairs of MeSH entities/annotations. We focus on three entity types: DISEASES/SYMPTOMS (category C in the MeSH hierarchy), CHEMICAL and DRUGS/ANALYTICAL, DIAGNOSTIC AND THERAPEUTIC TECHNIQUES AND EQUIPMENT (categories D and E in the MeSH hierarchy). The evaluation task is similar to the SEMEVAL-1/Task#4 by Girju et al.: Classification of Semantic Relations between Nominals. This implies that the evaluation methodology to be used will include similar evaluation criteria already developed (the SEMEVAL-1/Task#4).
The datasets for the task will consist of annotated sentences with relevant MeSH entities, including the surrounding context for the investigated entities and their relation within a window size of one to two preceding and one to two following sentences. We plan to have about nine semantic relations with approx. 100-200 training sentences and 50-100 testing sentences per relation.
#8 Multi-Way Classification of Semantic Relations Between Pairs of Nominals
Description Recently, the NLP community has shown a renewed interest in deeper semantic analyses, among them automatic recognition of semantic relations between pairs of words. This is an important task with many potential applications including but not limited to Information Retrieval, Information Extraction, Text Summarization, Machine Translation, Question Answering, Paraphrasing, Recognizing Textual Entailment, Thesaurus Construction, Semantic Network Construction, Word Sense Disambiguation, and Language Modelling.
Despite the interest, progress was slow due to incompatible classification schemes, which made direct comparisons hard. In addition, most datasets provided no context for the target relation, thus relying on the assumption that semantic relations are largely context-independent, which is often false. A notable exception is SemEval-2007 Task 4 (Girju&al.,2007), which for the first time provided a standard benchmark dataset for seven semantic relations in context. However, this dataset treated each relation separately, asking for positive vs. negative classification decisions. While some subsequent publications tried to use the dataset in a multi-way setup, it was not designed to be used in that manner.
We believe that having a freely available standard benchmark dataset for *multi-way* semantic relation classification *in context* is much needed for the overall advancement of the field. That is why we pose as our primary objective the task of preparing and releasing such a dataset to the research community.
We will use nine mutually exclusive relations from Nastase & Szpakowicz (2003). Тhe dataset for the task will consist of annotated sentences, gathered from the Web and manually marked -- with indicated nominals and relations. We will provide 1000 examples for each relation, which is a sizeable increase over the SemEval-2007 Task 4, where there were about 210 examples for each of the seven relations. There will be also a NONE relation, for which we will have 1000 examples as well.
Using that dataset, we will set up a common evaluation task that will enable researchers to compare their algorithms. The official evaluation score will be average F1 over all relations, but we will also check whether some relations are more difficult to classify than others, and whether some algorithms are best suited for certain types of relations. Trial data and an automatic scorer will be made available well in advance (by June 2009). All data will be released under a Creative Commons license.
Result submission deadline: within seven days after downloading the *test*
data, but not later than April 2
Organizers send test results: April 10, 2010
#10 Linking Events and their Participants in Discourse
Semantic role labelling (SRL) has traditionally been viewed as a
sentence-internal problem. However, it is clear that there is an
interplay between local semantic argument structure and the
surrounding discourse. In this shared task, we would like to take SRL
of nominal and verbal predicates beyond the domain of isolated
sentences by linking local semantic argument structures to the wider
discourse context. In particular, we aim to find fillers for roles
which are left unfilled in the local context (null instantiations,
NIs). An example is given below, where the "charges" role ("arg2" in
PropBank) of cleared is left empty but can be linked to
murder in the previous sentence.
In a lengthy court case the defendant was tried for murder. In the
end, he was cleared.
There will be two tasks, which will be evaluated independently
(participants can choose to enter either or both):
For the Full Task the target predicates in the (test) data set
will be annotated with gold standard word senses (frames). The participants have to:
find the semantic arguments of the predicate (role recognition)
label them with the correct role (role labelling)
find links between null instantiations and the wider context
For the NIs only task, participants will be supplied with a
test set which is already annotated with gold standard local semantic
argument structure; only the referents for null
instantiations have to be found.
We will prepare new training and test data consisting of running text from the
fiction domain. The data sets will be freely available.
The training set for both tasks will be annotated with gold
standard semantic argument structure (see for example the FrameNet full text annotation) and linking information for null
instantiations. We aim to annotate the semantic argument structures
both in FrameNet and
style; participants can choose which one they prefer.