Evaluation Report

The report is for SemEval-2010 Task #11.

Task Name: Event detection in Chinese news sentences

Evaluation measures:

For the WSD subtask, we give two evaluation measures: WSD-Micro-Accuracy and WSD-Macro-Accuracy for the target verbs in the sentences. The formulas are as follows:

l WSD-Micro-Accuracy = Number of correctly-analyzed target verbs / Number of all target verbs * 100%

l WSD-Macro-Accuracy = å Micro-Accuracy * w_i, w_i= frequency of the target verb in test set / total target verb frequency in test set

The correct results should match the following conditions: the selected situation description formula and natural explanation text of the target verbs will be same with the gold-standard codes.

We evaluated 27 multiple-sense target verbs in the test set.

For the SRL subtask, we give three evaluation measures: Chunk-Precision, Chunk-Recall, and Chunk-F-measure. The formulas are as follows:

l Chunk-Precision = Number of correctly-analyzed chunks / Number of all recognized to chunks * 100%

l Chunk-Recall = Number of correctly-analyzed chunks / Number of gold-standard chunks * 100%

l Chunk-F-measure = (Chunk-P + Chunk-R) / (2*(Chunk-P * Chunk-R))

The correct results should match all the following conditions:

l The recognized chunks should have the same boundaries with the gold-standard argument chunks of the key verbs or verb phrases.

l The recognized chunks should have the same syntactic constituent and functional tags with the gold-standard ones.

l The recognized chunks should have the same situation argument tags with the gold-standard ones.

We only select the key argument chunks (with semantic tags: x, y, z, L or O) for evaluation.

For the event detection task, we give two evaluation measures: Event-Micro-Accuracy and Event-Macro-Accuracy. The formulas are as follows:

l Event-Micro-Accuracy = Number of correctly-analyzed events for a target / Number of all events for a target verb in test set * 100%

l Event-Macro-Accuracy = å Micro-Accuracy * w_i, w_i= frequency of a target verb in test set / total target verb frequency in test set

The correct results should match all the following conditions:

l The event situation description formula and natural explanation text of the target verb should be same with the gold-standard ones.

l All the argument chunks of the event descriptions should be same with the gold-standard ones.

l The number of the recognized argument chunks should be same with the gold-standard one.

We received 7 uploaded results for evaluation. The following is the evaluation result table. All the results are ranked with Event-Macro-Accuracy.

System ID	WSD-Micro-A	WSD-Macro-A	Chunk-P	Chunk-R	Chunk-F	Event-Micro-A	Enent-Macro-A	Rank
480a	89.59	87.54	80.91	77.91	79.38	53.76	52.12	1
480b	89.18	87.24	80.91	76.95	78.88	52.05	50.59	2
109	70.64	73.00	63.50	57.39	60.29	23.05	22.85	3
347	83.81	81.30	58.33	53.32	55.71	20.19	20.33	4
348	82.18	79.23	58.33	53.32	55.71	20.23	20.05	5
350	81.42	77.74	58.33	53.32	55.71	20.22	20.05	6
349	82.58	79.82	58.33	53.32	55.71	20.14	20.05	7