September 6th, 2006

language

Quelle vitesse: NIST Machine Translation workshop

It's hard to believe, but another year has rolled past, and once the machine translation arena is being dominated by a few giants. We aren't supposed to talk about the BLEU scores until they've been published at the start of next month, but for your reference, the top 6 scorers in 2005 were:

  • Google (with 0.5131), ISI, IBM, UMD, JHU, and Edinburgh in the Arabic-to-English Large Data Track

  • Google (with 0.5137), SAKHR, and ARL in the Arabic-to-English Unlimited Data Track

  • Google (with 0.3531), ISI, UMD, RWTH, JHU and IBM in the Chinese-to-English Large Data Track

  • Google (with 0.3516), ICT, and HIT in the Chinese-to-English Unlimited Data Track


Two of my grad students, Tejaswi Pydimarri (pnvtejaswi) and Waleed Al-Jandal, are at the NIST annual Machine Translation evaluation workshop. This is our first attendance at an MT or computational linguistics meeting, so we're mainly there to learn. Teja has a short presentation on our first (nominally) functioning end-to-end translation system, but we did it primarily to get our feet wet. This being our first BLEU score ever, I have strong hopes for the coming year. My goal is to "make the scoreboard" in earnest by October (with a 0.1 on the Chinese track) and get as close as we can to 0.2 by year's end. If you're interested in keeping tabs on our progress, look for my posts in comptranslation.

I can't believe they've been at the workshop a day now and are almost coming home! Time zips by.

--
Banazir