Natural Language Processing For Dependencies and Named Entities
Start Discovering Meaning And Intent In User Utterances
This story will be taking you through a process of creating you own Natural Language Processing (NLP) Interface. Natural Language interfaces are actually very accessible in terms of technology, and also fairly easy to setup and program. Contrary to popular belief…
Here I will help you to demystify some of the basic elements of advanced NLP.
You might want to refer to this article where the tools and basics are discussed in detail.
You can use a virtual environment like Anaconda, which I prefer for the mere reason of it being a local install. But to get up and running very quickly, you can use a Jupyter Notebook.
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and in our case, for Natural Language Understanding and Processing.
spaCy is an industrial strength natural language processing framework which is opensource and very accessible, even if you are starting out with natural language processing.
spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It’s easy to install, and its API is simple and productive. We like to think of spaCy as the Ruby on Rails of Natural Language Processing.
So let’s get started and look at two use cases below; dependencies and entities.
Start a new basic Python Jupyter notebook, and install spaCy…
Words can often have different meanings depending on the how it is used within a sentence. Hence analyzing how a sentence is constructed can help us determine how single worlds relate to each other.
If we look at the sentence, “Mary rows the boat.” There are two nouns, being Mary and boat. There is also a single verb, being rows. To understand the sentence correctly, the word order is important, we cannot only look at the words and their part of speech.
Now this will be an arduous task, but within spaCy we can use noun chunks. According to the spaCy documentation, You can think of noun chunks as a noun plus the words describing the noun — for example, “the lavish green grass” or “the world’s largest tech fund”. To get the noun chunks in a document, simply iterate over Doc.noun_chunks.
The sentence “Smart phones pose a problem for insurance companies in terms of fraudulent claims”, returns the following data:
Text is the original noun chunk text. Root text is the original text of the word connecting the noun chunk to the rest o the phrase. Root dep: Dependency relation connecting the root to its head. Root head text: The text of the root token’s head.
- NSUBJ denotes Nominal subject.
- DOBJ is a direct object.
- POBJ is Object of preposition.
A visual representation of the dependencies can be generated with a single line of code.
displacy.render(doc, style=”dep”, jupyter= True)
But first, what is an entity?
Entities are the information in the user input that is relevant to the user’s intentions.
Intents can be seen as verbs (the action a user wants to execute), entities represent nouns (for example; the city, the date, the time, the brand, the product.). Consider this, when the intent is to get a weather forecast, the relevant location and date entities are required before the application can return an accurate forecast.
Recognizing entities in the user’s input helps you to craft more useful, targeted responses. For example, You might have a #buy_something intent. When a user makes a request that triggers the #buy_something intent, the assistant’s response should reflect an understanding of what the something is that the customer wants to buy. You can add a product entity, and then use it to extract information from the user input about the product that the customer is interested in.
spaCy has a very efficient entity detection system which also assigns labels. The default model identifies a host of named and numeric entities. This can include places, companies, products and the like.
- Text: The original entity text.
- Start: Index of start of entity in the doc
- End: Index of end of entity in the doc
- Label: Entity label, i.e. type
The entity visualizer, highlights named entities and their labels in a text.
With these simple examples making use of an open source environment, you will be able to make a start with Natural Language Processing and creating structured from unstructured conversational input.