#9  Noun Compound Interpretation Using Paraphrasing Verbs 


Noun compounds -- sequences of nouns acting as a single noun, e.g. colon cancer -- are abundant in English. Understanding their syntax and semantics is challenging but important for many NLP applications, including but not limited to Question Answering, Machine Translation, Information Retrieval and Information Extraction. For example, a question-answering system might need to determine whether protein acting as a tumor suppressor is a good paraphrase for tumor suppressor protein, and an information extraction system might need to decide whether neck vein thrombosis and neck thrombosis could possibly co-refer when used in the same document. Similarly, a machine translation system facing the unknown noun compound WTO Geneva headquarters might benefit from being able to paraphrase it as Geneva headquarters of the WTO or as WTO headquarters located in Geneva. Given a query like "migraine treatment", an information retrieval system could use paraphrasing verbs like relieve and prevent for page ranking and query refinement.

We will explore the idea of using paraphrasing verbs and prepositions for noun compound interpretation. For example, nut bread can be paraphrased using verbs like contain and include, prepositions like with, and verbs+prepositions like be made from. Unlike traditional abstract relations such as CAUSE, CONTAINER, and LOCATION, verbs and prepositions are directly usable as paraphrases, and using several of them simultaneously yields an appealing fine-grained semantic representation.

We will release as trial/development data paraphrasing verbs and prepositions for 250 compounds, manually picked by 25-30 human subjects. For example, for nut bread we have the following paraphrases (the number of subjects who proposed each paraphrase is in parentheses):

contain(21); include(10); be made with(9); have(8); be made from(5); use(3); be made using(3); feature(2); be filled with(2); taste like(2); be made of(2); come from(2); consist of(2); hold(1); be composed of(1); be blended with(1); be created out of(1); encapsulate(1); diffuse(1); be created with(1); be flavored with(1), ...

Given a compound and a set of paraphrasing verbs and prepositions, the participants must provide a ranking that is as close as possible to the one proposed by human raters. Trial data and an automatic scorer will be made available well in advance (by June 2009). All data will be released under a Creative Commons license.

Organizers: Ioanacristina Butnariu, Su Nam Kim, Preslav Nakov, Diarmuid Ó Séaghdha, Stan Szpakowicz, Tony Veale. Contact: Preslav Nakov
Web Site: http://docs.google.com/View?docid=dfvxd49s_35hkprbcpt

  • Trial data released : August 30, 2009
  • Training data release: February 17 , 2010
  • Test data release: March 18 , 2010
  • Result submission deadline: within seven days after downloading the *test* data, but not later than April 2
  • Organizers send test results: April 10, 2010

