Task #2: Cross-lingual Lexical Substitution

The tables show two metrics, best, and oot, and then the systems are ranked according to recall. The metrics as well as the mode variations are described in our documentation. The oot tables contain an additional column showing the number of duplicates used by that particular participant.

The rank order of systems changes based on which measures are used. Also note that the system responses have not yet been analyzed to see the relative strengths and weaknesses of the different systems. For example, IRST-1 and IRSTbs did considerably better on precision compared to recall since they did not cover all test items.

As another example, note that UBA-T has the highest ranking for the mode scores in oot.


BEST

SystemsRPMode RMode P
UBA-T27.1527.1557.2057.20
USPWLV26.8126.8158.8558.85
ColSlm25.9927.5956.2459.16
WLVUSP25.2725.2752.8152.81
SWAT-E21.4621.4643.2143.21
UvT-v21.0921.0943.7643.76
CU-SMT20.5621.6244.5845.01
UBA-W19.6819.6839.0939.09
UvT-g19.5919.5941.0241.02
SWAT-S18.8718.8736.6336.63
ColEur18.1519.4737.7240.03
IRST-115.3822.1633.4745.95
IRSTbs13.2122.5128.2645.27
TYO8.398.6214.9515.31

BEST baselines

SystemsRPMode RMode P
DICT24.3424.3450.3450.34
DICTCORP15.0915.0929.2229.22

OOT

SystemsRPMode RMode Pdups
SWAT-E174.59174.5966.9466.94968
SWAT-S97.9897.9879.0179.01872
UvT-v58.9158.9162.9662.96345
UvT-g55.2955.2973.9473.94146
UBA-W52.7552.7583.5483.54-
WLVUSP48.4848.4877.9177.9164
UBA-T47.9947.9981.0781.07-
USPWLV47.6047.6079.8479.8430
ColSlm43.9146.6165.9869.41509
ColEur41.7244.7767.3571.47125
TYO34.5435.4658.0259.16-
IRST-131.4833.1455.4258.30-
FCC-LS23.9023.9031.9631.96308
IRSTbs8.3329.7419.8964.44-

OOT baselines

SystemsRPMode RMode Pdups
DICT44.0444.0473.5373.5330
DICTCORP42.6542.6571.6071.60-