Using A Large Language Model For Entity Extraction
Can LLMs Extract Entities Better Than Traditional NLP methods?
Introduction To Entities
For the purposes of this demo, the Co:here Large Language Model was used.
Entities can be thought of as nouns in a sentence or user input. With conversation design, there are two approaches to entity extraction…
The first is where a more rudimentary, sequential slot-filling process is followed. Where the chatbot prompts the user for each entity one after the other and the user needs to follow this highly structured approach.
For example, in the case of flight booking, the bot prompts the user in the following way to capture the entities.
A framework like AWS Lex V2 has very much a slot filling approach, where the interface is not conversational and unstructured and the framework centres around slot filling.
Secondly, the more sophisticated approach is to design for a compound and contextual approach to entity types. And in the case of Microsoft LUIS machine learning nested entities are being pioneered; you can read more about nested entities here.
A hallmark of this approach is where the chatbot mines the user input thoroughly for entities. The chatbot does not re-prompt the user for any input already provided by the user. The user is also not forced to adhere to a predefined structure and format their input.
This approach is illustrated by the image below, the user input contains compound entities which are extracted contextually from the user utterance.
There is also a tendency amongst the Gartner leaders to have entities associated with specific intents. Hence once an intent is detected, the NLU has a smaller pool of expected possible entities which are associated with the identified intent.
Three Types Of Entities
One could argue that there are three approaches to entity extraction…
NLU Defined Entities
These entities are custom entities, predominantly defined within a chatbot development framework. Read more about the emergence of entity structures in chatbots, and why it is important for capturing unstructured data accurately & efficiency here.
In NLP, a named entity is a real-world object, such as people, places, companies, products, etc. Named entities do not require training or any process defining the named entities (in most cases) NLP / NLU systems detect it automatically. The only impediment is availability of the named entities functionality within a specific human language.
These named entities can be abstract or have a physical existence. Below are examples of named entities being detected by Riva NLU.
Jensen Huang is the CEO of NVIDIA Corporation, located in Santa Clara, California.
jensen huang (PER)
nvidia corporation (ORG)
santa clara (LOC)
spaCy also has a very efficient named entity detection system which also assigns labels. The default model identifies a host of named and numeric entities. This can include places, companies, products and the like.
- Text: The original entity text.
- Start: Index of start of entity in the doc
- End: Index of end of entity in the doc
- Label: Entity label, i.e. type
Back To Large Language Models
Before we get to LLM’s and entities…The functionality of LLM’s can be divided into two broad implementations, Generation and Representation.
In this article you can read more on how a generative & representation model can be used to bootstrap a chatbot making use of semantic search, language generation and a concept I like to call intent-documents.
Entity Extraction With LLM’s
For entity extraction we will be using Co:here’s Generation Language Model which can be used for Completion, Text Summarisation and Entity Extraction.
Training a model and extracting entities by using a large language model like Co:here are different in the following ways:
- A small amount of training data is required for a few-shot training approach.
- The accuracy with highly varying data was astounding.
- Managing and environment with multiple training samples and multiple entities can become complex. A graphic management studio environment will be ideal to visually manage the entities via a no-code interface.
- I did not test entity extraction with compound entities, multiple entities per utterance or sentence. The system did well to detect multi word entities, something traditional entity extraction often fail at.
- The utterances from which the intents were extracted were in some instances quite long, which made the LLM performance all the more impressive.
- This type of extraction is interesting because it doesn’t just blindly look at the text. The model has picked up on movie information during its pretraining process and that helps it understand the task from only a few examples.
Below is the training data used, in JSON format…
movie_examples = [("Deadpool 2", "Deadpool 2 | Official HD Deadpool's \"Wet on Wet\" Teaser | 2018"),("none", "Jordan Peele Just Became the First Black Writer-Director With a $100M Movie Debut"),("Joker", "Joker Officially Rated “R”"),("Free Guy", "Ryan Reynolds’ 'Free Guy' Receives July 3, 2020 Release Date - About a bank teller stuck in his routine that discovers he’s an NPC character in brutal open world game."),("none", "James Cameron congratulates Kevin Feige and Marvel!"),("Guardians of the Galaxy", "The Cast of Guardians of the Galaxy release statement on James Gunn"),("Inception", "Inception is a movie about dreams and levels in dreams."),]
And next we get the data to analyze:
['Hayao Miyazaki Got So Bored with Retirement He Started Directing Again ‘in Order to Live’', "First poster for Pixar's Luca", 'New images from Space Jam: A New Legacy', 'Official Poster for "Sonic the Hedgehog 2"', 'Ng Man Tat, legendary HK actor and frequent collborator of Stephen Chow (Shaolin Soccer, God of Gambler) died at 70', 'Zack Snyder’s Justice League has officially been Rated R for for violence and some language', 'HBOMax and Disney+ NEED to improve their apps if they want to compete with Netflix.', 'I want a sequel to Rat Race where John Cleese’s character dies and invites everyone from the first film to his funeral, BUT, he’s secretly set up a Rat Maze to trap them all in. A sort of post-mortem revenge on them for donating all his wealth to charity.',"'Trainspotting' at 25: How an Indie Film About Heroin Became a Feel-Good Classic", '‘Avatar: The Last Airbender’ Franchise To Expand With Launch Of Nickelodeon’s Avatar Studios, Animated Theatrical Film To Start Production Later This Year']
Here are the results:
- The model got nine out of 10 correct.
- Number four (4) in the set was missed.
- Experimentation is required to detect edge-cases along the way. For instance, what if someone mentions two movie titles? The more examples we can add to the prompt that address these cases, the more resilient the results will be.
A few observations from working through the notebook:
- The few-shot training approach is indeed a more flexible and exciting prospect for entity extraction.
- A chatbot can be bootstrapped to some degree, and entities can be added to the intent-document approach I discuss here.
- With only a few training examples, it does seem like a broader base of potential user utterances are covered.
- I see a use-case emerging where LLM entity extraction can be implemented within a chatbot as an extension, or avenue to bootstrap entity extraction. This is something I would like to explore in the near future.
- And lastly, there is a dire need for a no-code studio approach through which users can access LLM functionality, create and submit training data and build entity extraction functionality.
Cobus Greyling - City of Johannesburg, Gauteng, South Africa | Professional Profile | LinkedIn
Rasa Hero. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer, Ubiquitous User Interfaces…
Cobus Greyling - Medium
Read writing from Cobus Greyling on Medium. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer…
Eliza Language Technology Community - Language Technology: Conversational AI, NLP/NLP, CCAI…
ELIZA - Where language technology enthusiasts unite.
API Documentation | Cohere AI
Extracting a piece of information from text is a common need in language processing systems. LLMs can at times extract…