Evaluation Exercises on Semantic Evaluation - ACL SigLex event
#5 Automatic Keyphrase Extraction from Scientific Articles
Description Keyphrases are words that capture the main topic of the document. As keyphrases represent the key ideas of documents, extracting good keyphrases benefits various natural language processing (NLP) applications, such as summarization, information retrieval (IR) and question-answering (QA). In summarization, the keyphrases can be used as a semantic metadata. In search engines, keyphrases can supplement full-text indexing and assist users in creating good queries. Therefore, the quality of keyphrases has a direct impact on the quality of downstream NLP applications.
Recently, several systems and techniques have been proposed to extract keyphrases. Hence, we propose a shared task in order to provide the chance to compete and benchmark such technologies.
In the shared task, the participants will be provided with set of scientific articles and will be asked to produce the keyphrases for each article.
The organizers will provide trial, train and test data. The average length of the articles is between 6 and 8 pages including tables and pictures. We will provide two sets of answers: author-assigned keyphrases and reader-assigned keyphrases. All reader-assigned keyphrases will be extracted from the papers whereas some of author-assigned keyphrases may not occur in the content.
The answer set contains lemmatized keyphrases. We also accept two alternation of keyphrase: A of B -> B A (e.g. policy of school = school policy) and A's B (e.g. school's policy = school policy). However, in case that the semantics has been changed due to the alternation, we do not include the alternation as the answer set.
In this shared task, we follow the traditional evaluation metric. That is, we match the keyphrases in the answer sets (i.e. author-assigned keyphrases and reader-assigned keyphrases) with those participants provide and calculate precision, recall and F-score. Then finally, we will rank the participants by F-score.