IBM Watson Assistant Fix Spelling
Why is conversational input a challenge? Well, for starters it is highly unstructured data being entered via an interface which most probably is a chatbot or Digital Assistant.
Some more background on unstructured data…
Unstructured data is information that either does not have a predefined data model or is not organized in a predefined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs not using Natural Language Processing.
In 1998, Merrill Lynch cited a rule of thumb that somewhere around 80–90% of all potentially usable business information may originate in unstructured form. IBM had a very similar projection. The Computer World magazine states that unstructured information might account for more than 70%–80% of all data in organizations.
Watson Assistant can correct user input…
You need to enable the Autocorrection beta feature to fix spelling mistakes users make in the text or utterances that they submit as user input. When Autocorrection is enabled, the misspelled words are automatically corrected. And it is the corrected words that are used to evaluate the input. When given more precise input, your assistant can more often recognize entity mentions and understand the user’s intent. However, there are obviously boundaries. Between the user’s mindful intent and their output there needs to be some sense of semblance. The user cannot stray to far away from an utterance which is a basic construct of their intention; even how cryptic and incomplete it might be.
This setting can be enabled for English-language dialog skills only.
According to IBM, when Watson Assistant evaluates whether to correct the spelling of a word, it does not rely on a simple dictionary lookup process. Instead, it uses a combination of Natural Language Processing and probabilistic models to assess whether a term is, in fact, misspelled and should be corrected.
IBM Watson does not correct the spelling of the following types of input:
- Capitalized words
- Location entities, such as states and street addresses
- Numbers and units of measurement or time
- Proper nouns, such as common first names or company names
- Text within quotation marks
- Words containing special characters, such as hyphens (-), asterisks (*), ampersands (&), or at signs (@), including those used in email addresses or URLs.
Words that belong in this skill, meaning words that have implied significance because they occur in entity values, entity synonyms, or intent user examples.
Autocorrection assists you here in the sense that you do not have to make provision for an infinite list of variations.
Mentions of a contextual entity can be corrected inadvertently. That’s because terms that function as contextual entity mentions are fluid; they cannot be predetermined and avoided by the spell checker function in the way a list of dictionary-based terms can be.
If, after testing, you find that mentions are being over corrected for a certain contextual entity, consider using a dictionary-based entity in its place.
While testing in the “Try it out” window, and you input words which are misspelled, they are corrected automatically, and an icon is displayed. The corrected utterance is underlined.
Read more here:
More IBM Documentation: