Language Translation Using Meta AI NLLB (No Language Left Behind) And SMS
The Meta AI NLLB project has open-sourced models, capable of performing language translation directly between 200 languages. And utilising SMS is an avenue for democratised access to language translation.
Introduction
Not only in the African context, but in other underdeveloped regions, access to information is impeded due to two factors.
The one factor is language.
Information is not generally available in minority languages.
This has been due to the supporting technology not being available. Either due to language technology not supporting the minority human language, or there being no commercial incentive and cost justification to develop technology for a minority language.
Another element to factor in is cost, often these translation and access to information initiatives are humanitarian in nature with no way to offset cost of software and hosting.
Added to this, the practical consideration in terms of effort when translating large volumes of data, like Wikipedia. The ideal is obviously to automatically translate the information as users request it in smaller volumes. Thus negating the overhead of translating large volumes of data into various languages and performing management and maintenance on those volumes.
The second factor is the access mediums.
The available access mediums can also be an impediment. If a user interface is restricted to smartphones, apps and demanding high band-width, access is severely impeded. Hence the argument for an interface like SMS.
In this story I want to consider:
- Meta AI’s No Language Left Behind (NLLB)
- SMS as an access medium to democratise access to language technology.
Meta AI NLLB
According to Meta AI, No Language Left Behind (NLLB) is a unique, AI breakthrough project…
The project has open-sourced models capable of delivering evaluated and high-quality translations between 200 languages.
These translations can be performed directly between any of the 200 languages, including languages like Afrikaans, Zulu, Sotho, Shona, etc.
NLLB affords users the opportunity to access web content in their native langauge. It allows people to access information in their own language and communication with anyone, anywhere.
Above you can see the NLLB Translator demo using Facebook’s NLLB models. This API to NLLB was developed by Narrativa. An Afrikaans sentence is first translated to English, and subsequently the same Afrikaans sentence is translated into Zulu.
The advantage of the NLLB model is that it can be use free of charge, another advantage is that translation can be performed between any two given languages. Hence there is no need to pre-translate information, or an intermediate step which demands a single go-between language.
The list of languages included in NLLB can be accessed here, this list also contain the list of language codes.
Here are three ways of accessing NLLB:
1️⃣ The first being a self-contained Colab notebook, one example of such a notebook can be found here.
After the Colab routines are all executed, the translate.sh script can be run with the source and target languages defined; together with the text to translate. As seen below…
Here is the input and the script executed…
Input:
! bash translate.sh /content/checkpoint.pt eng_Latn afr_Latn <<< 'The Africa physical geography, environment and resources, and human geography can be considered separately. '
And below the output…
Output:
H-0 -0.7240010499954224 ▁Die ▁fisi ese ▁geograf ie ▁van ▁Afrika , ▁om gewing ▁en ▁hulp br onne ▁en ▁menslike ▁geograf ie ▁kan ▁afs onder lik ▁oor weeg ▁word . D-0 -0.7240010499954224 ▁Die ▁fisi ese ▁geograf ie ▁van ▁Afrika , ▁om gewing ▁en ▁hulp br onne ▁en ▁menslike ▁geograf ie ▁kan ▁afs onder lik ▁oor weeg ▁word .
2️⃣ Access NLLB via the Narrativa 🤗HuggingFace space which can be accessed here.
An easy no-code avenue to access NLLB is the GUI made available in the Narrativa 🤗HuggingFace Space. The basic parameters can be set and the submit button can be clicked.
3️⃣ Lastly, access NLLB via the Narrativa API.
The Narrativa API can be accessed directly via a client like Postman, as seen below.
The input is defined in a simple JSON document, and the output contains the translated data and the duration.
SMS (Text Medium)
Why SMS? Looking at the GSMA report from September 2021, the unique mobile subscriber penetration is on a mere 46%.
Added to this impediment, the smartphone adoption sits at 48%, with mobile internet users only at 28%.
And with 4G access lagging at 12%.
Thus it is clear that any data intensive, app dependant user interface which demands access to a smart phone, will not yield the required access and democratisation desired.
Below is a simple prototype illustrating how a Twilio SMS gateway can easily be integrated with a NLLB API.
Here is a tutorial on how to poll Twilio for incoming SMS messages.
This notebook extract shows how the Narrativa NLLB API can be accessed and the translated sentence sent to a mobile number via a SMS message.
Conclusion
In the recent past, much focus has been placed on Large Language Models in terms of generation, embeddings and classification.
However, translation on a large scale, which include minority languages, are of utmost importance. NLLB will not only enable language translation, but can open the way for a multitude of language tasks and functionality to be developed in the near future.