The unique language model is built to fill a gap in language translation models which today mainly cater to very high-resource language speakers. With few data sources available, the InstaDeep duo trained an unsupervised Neural Machine Translation model, and essentially created a pidgin-English catalogue of word pairs from scratch, scraping 56,695 pidgin sentences and 32,925 unique words from only a few websites.

Based on the same language as the NLP model, the team wrote the paper “Towards Supervised and Unsupervised Neural Machine Translation Baselines for Nigerian Pidgin“, for short PidginUNMT. After peer review, it was selected for presentation at NeurIPS 2019, however, due to dubious visa standards by the Canadian embassy, the researchers were denied the honour of presenting at the world-renowned conference. The paper has also been accepted into the AfricanNLP workshop to be held at ICLR in April 2020.

TechCabal picked up the news, read their story on the solution here.

Well done team!

The paper, PidginUNMT, by InstaDeep’s Orevaoghene Ahia and Kelechi Ogueji.