IBM Watson Assistant: Fuzzy Matching
Chatbots using Stemming, Misspelling & Partial Match
Firstly, Fuzzy matching and Auto Correction
Fuzzy matching helps your chatbot recognize dictionary-based entity mentions in user input. It uses a dictionary lookup approach to match a word from the user input to an existing entity value or synonym in the skill’s training data.
For example, if the user enters the word “ruun”, and your training data contains a “Exercise” entity with a “run” value, then fuzzy matching recognizes that the two terms (ruun and run) mean the same thing.
When you enable both autocorrection and fuzzy matching, the fuzzy matching function runs prior to autocorrection is triggered.
If it finds a term that it can match to an existing dictionary entity value or synonym, it adds the term to the list of words that belong to the skill, and does not correct it.
For example, if a user enters a sentence like “I wnt to go for a ruun”, fuzzy matching recognizes that the term “ruun” means the same thing as your entity value “exercise”, and adds it to the protected words list. Your assistant corrects the input to be, “I want to go for a ruun”. Notice that it corrects “wnt” but does not correct the spelling of “ruun”. If you see this type of result when you are testing your dialog, you might think your assistant is misbehaving.
However, your assistant is not. Thanks to fuzzy matching, it correctly identifies ”ruun” as a entity “exercise” entity mention. And thanks to autocorrection revising the term to “want”, your assistant is able to map the input to the correct intent. Each feature does its part to help your assistant understand the meaning of the user input.
Accorrding to IBM’s documentation, Fuzzy Matching has these components:
- Stemming — The feature recognizes the stem form of entity values that have several grammatical forms. For example, the stem of ‘bananas’ would be ‘banana’, while the stem of ‘running’ would be ‘run’.
- Misspelling — The feature is able to map user input to the appropriate corresponding entity despite the presence of misspellings or slight syntactical differences. For example, if you define giraffe as a synonym for an animal entity, and the user input contains the terms giraffes or girafe, the fuzzy match is able to map the term to the animal entity correctly.
- Partial match — With partial matching, the feature automatically suggests substring-based synonyms present in the user-defined entities, and assigns a lower confidence score as compared to the exact entity match.
It must be noted, for English at least, if a word is defined as a value or synonym or an entity, then the fined word will always be matched to the entity it is defined in.
Fuzzy matching has no impact on the synonym recommendations. So the sequence of correction is; first, what you literally defined as examples and synonyms. Then Fuzzy Matching, and the Auto Correction.