IBM Watson Natural Language Understanding API (Part 2 of 2)
This is a continuation of the the initial Watson NLU API article. In this article I am looking at Semantic Roles, Sentiment and Syntax.
Sytax is still marked as Experimental within the IBM documentation.
However, I find this section especially interesting and powerful for processing a large dialog.
Semantic Roles allow for parsing a sentence into a subject, action and object form. The sentence “John is running a race for his club.” returns the subject of the sentence as “John”, the tense as present and the verb as run from the text running.
The semantic roles request can be enhanced by setting “keywords” to true; the default being false. Also, entities can be displayed by setting entities to true; again, the default is false.
Here is an example of an augmented request payload.
Keywords returned are: John, race. Entity listed is Person: John. The action verb is run and the tense is present.
Sentiment allows for the analysis of general sentiment of the content, and also sentiment toward specific target phrases.
Tokens: Tokenization is the process of parsing text into smaller units, referred to as tokens. Tokens are a sequence of characters that are semantically meaningful units.
Here’s an example sentence with its tokens.
Often tokenization is the initial task performed in the NLP process. Tokens can be useful for a range of applications. For instance part of speech tagging, dependency parsing, lemmas and more.
The quality of the higher order feature you are building will ultimately depend on how good your tokenizer for the language is.
Lemma: The words you find in a dictionary are lemmas: the base form, or root form of words.
Lemmatization is commonly used in information retrieval systems or search engines while building the indexes. Words like, editing, commenting, programming can be converted to their root forms (lemmas) before adding to the search index. At query time the text is normalized and compared with the index.
Part of Speech tags all the tokens in a text with part of speech such as noun, verb, adjective etc. In almost all languages, certain words can mean different things depending on the context and this is where part of speech tagging is very useful.
Sentence Boundary Detection is essential for any higher order features in NLP/NLU. Sentence boundary detection is an important initial step in building higher order features. For example, to determine the sentiment of a paragraph with multiple sentences, you first have to identify where individual sentences start and end.
The sentences where extracted from the dialog as per the example to the left.
This is extremely handy should users enter a longer dialog consisting of a couple of sentences. And obviously with a whole host of intents and entities embedded in the dialog. Any developer having a desire to build a robust and resilient conversational interface, need to make provision for highly unstructured user input.
It make sense to pass the dialog through a higher order NLP layer like this, and establish some basic fundamentals pertaining to the dialog. Based on the output from the NLP, the NLU layer can be aided to accurately assess the intents and entities embedded in the dialog.
The NLP layer can be used for post process analysis also in the case of tone, sentiment and semantics.
Read more here:
Rasa Open Source Conversational AI:
New Syntax API in Watson Natural Language Understanding: