🤗 HuggingFace Is Making NLP More Accessible

And How AutoNLP Enables You To Train, Evaluate and Deploy NLP

Cobus Greyling
8 min readJul 10, 2021

--

Introduction

If you are new to Natural Language Processing (NLP), an easy introduction to basic functionality is AutoNLP from 🤗 HuggingFace. Is an easy way to train, evaluate and deploy state-of-the-art NLP models for various applications.

AutoNLP takes care of selecting the best model, fine-tuning and deployment.

If a company can lower the barrier to entry for AI in general, and Conversational AI in specific, there is bound to be interest. Ease of initial access needs to be two-fold;

  • Technical and
  • Cost.

Obviously also whilst presenting a compelling value proposition.

Being able to access and experiment with software via Jupyter Notebooks at no cost, without too much technical knowledge requirements are important for creating critical mass in adoption.

Simple example of sentiment analysis on a sentence.

This is why 🤗 HuggingFace is thriving with their easy accessible and open source library for a number of natural language processing tasks.

There is striking similarities in the NLP functionality of GPT-3 and 🤗 HuggingFace, with the latter obviously leading in the areas of functionality, flexibility and fine-tuning.

Named Entity Recognition using the NER pipeline.

Pretrained models for Natural Language Understanding (NLU) tasks allow for rapid prototyping and instant functionality. Transfer learning is a technique to train a machine learning model for a task by using knowledge from another task.

🤗 HuggingFace is democratizing NLP, this is being achieved by acting as catalyst and making research-level work in NLP accessible to mere mortals.

It is important to understand 🤗 HuggingFace is a Natural Language Processing problem solving company, and not a chatbot development framework company per say.

Their pipelines and models can be used to augment a chatbot framework to perform various tasks, as you will see later in this article. But elements like operational implementation and management of intents and entities are not part of their ambit. Together with dialog development and management.

Considerations For Conversational AI

There are a few general considerations when in comes to Conversational AI technology choices for any enterprise.

Cost, Hosting & Technical Barriers

The first tier includes Cost, Hosting and Technical Barriers. With 🤗 HuggingFace there is no initial cost impediment, prototyping can be performed within Jupyter Notebooks.

And productionized environments can be hosted in the cloud or installed locally. The technical barriers to entry in terms of skills are relatively low; 🤗 HuggingFace succeeded in democratizing NLP for the masses.

Three tiers of consideration when implementing Conversational AI

Complete Solution, NLU/P Services & Fine Tuning

Looking at the second tier, it is important to understand what the requirements is. For a chatbot, 🤗 HuggingFace is not a complete solution; lacking in the areas of:

  • Dialog state management and development
  • Operational Intent and Entity creation & management.

Natural Language Understanding and Processing are the mainstay of 🤗 HuggingFace. With extensive Fine Tuning avenues. It needs to be noted that the finetuning of 🤗 HuggingFace is quite a step up from the initial prototyping phase and can get technical.

🤗 HuggingFace is a NLP tool, and even though functionality is available like Natural Language Generation and entity extraction, for day-to-day chatbot operation and scaling it’s not a perfect fit, as mentioned before. A comprehensive solution is required for dialog state management and granular intent and entity implementation and management.

Scaling, Specific Languages and ML

The third tier of considerations are scaling, specific languages and Machine Learning. Scaling addressed the process of adding conversational aspects like disambiguation, digression, limiting fallback proliferation, forms and slot filling etc. Again this is not the forte of 🤗 HuggingFace.

🤗 HuggingFace can assist in training a model for a new language. Machine learning be incorporated and assist in the chatbot development. 🤗 HuggingFace is ideal for a higher-order NLP first pass on user input.

Chatbots

There is a big difference between Natural Language Processing (NLP) tools and a chatbot development framework. Common NLP tools include Q&A, classification, summarization, key word extract, named entity extraction etc.

These tools can be implemented as a top tier in a chatbot technology stack of a chatbot. Acting as a pre-processing layer for user input. This processing can include sentence boundary detection, language identification etc.

The next layer is Natural Language Under standing (NLU). A key requirement here is to, on a very granular level, define intents (verbs) and entities (nouns) which should be detected.

Areas in the chatbot development framework where 🤗 HuggingFace can make a contribution.

You can read more about 🤗 HuggingFace entitity extraction. As mentioned before, granular intent and entity extraction is required, which must be maintained daily with limited overhead by a team.

It might be possible via 🤗 HuggingFace, but for an operational environment the task of managing intents and entities is a continuous process and required a interface which allows for easy management.

While most chatbot environments converge on the NLU portion, there is a considerable divergence on how the management and development the dialog state. Regardless of the approach, this is a vital part of the chatbot’s working, but also management. An area 🤗 HuggingFace does not play in.

The Dialog Responses can be augmented with Natural Language Generation (NLG). Even though this is still experimental and not mainstream in production systems, it is an area 🤗 HuggingFace excels in.

Natural Language Processing

Here are a few practical examples of how 🤗 HuggingFace can be implemented within an existing chatbot development framework.

Sentiment Analysis

Classifying sequences according to positive or negative sentiments.

Input:

classifier = pipeline("sentiment-analysis")
classifier("I am not impressed with their slow and unfriendly service.")

Output:

[{'label': 'NEGATIVE', 'score': 0.9987296462059021}]

Question And Answer

Input:

q_a = pipeline("question-answering")context = "Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System, being larger than only Mercury. In English, Mars carries the name of the Roman god of war and is often referred to as the Red Planet. The latter refers to the effect of the iron oxide prevalent on Mars's surface, which gives it a reddish appearance distinctive among the astronomical bodies visible to the naked eye.[18] Mars is a terrestrial planet with a thin atmosphere, with surface features reminiscent of the impact craters of the Moon and the valleys, deserts and polar ice caps of Earth."question = "Who is the Roman God of war?"q_a({"question": question, "context": context})

Output:

{'answer': 'Mars', 'end': 4, 'score': 0.4511910080909729, 'start': 0}

Text Generation

Input:

text = "Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System."text_generator = pipeline("text-generation")text_generator(text)

Output:

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.[{'generated_text': 'Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System.\n\nThe Sun takes about 70 percent of the solar wind energy that travels to the Sun and about 100 percent is stored in the Sun.\n\n'}]

Named Entity Recognition

Input:

ner = pipeline("ner")text = "Johannesburg is located in South Africa in the contimnet of Africa"ner(text)

Output:

[{'end': 12,   'entity': 'I-LOC',   'index': 1,   'score': 0.9987455,   'start': 0,   'word': 'Johannesburg'},  {'end': 32,   'entity': 'I-LOC',   'index': 5,   'score': 0.99958795,   'start': 27,   'word': 'South'},  {'end': 39,   'entity': 'I-LOC',   'index': 6,   'score': 0.9996102,   'start': 33,   'word': 'Africa'},  {'end': 66,   'entity': 'I-LOC',   'index': 14,   'score': 0.6802336,   'start': 60,   'word': 'Africa'}]

Translation

English to German:

translator = pipeline("translation_en_to_de")text = "Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System."translator(text)[{'translation_text': 'Mars ist der vierte Planet der Sonne und der zweitkleinste Planet im Sonnensystem.'}]

English to French:

translator = pipeline("translation_en_to_fr")text = "Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System."translator(text)[{'translation_text': 'Mars est la quatrième planète du Soleil et la deuxième plus petite planète du Système solaire.'}]

Conclusion

🤗 HuggingFace calls themselves the AI community building the future.

And their vehicle is to build, train and deploy state of the art models powered by referencing open source in natural language processing.

The challenge is knowing which technology to use for which task. And combining technologies in such a way, that scaling is not impeded. All the whilst adding new functionality.

🤗 HuggingFace can help enhance any chatbot with supporting processes, but it needs to be implemented according to its intended purpose.

--

--

Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com