Using Pinecone For Question Answering with Similarity Search

And How To Create A Knowledge Base Chatbot


Knowledge Bases & Chatbots

Adding search to chatbots is something which have been in the works for quite a while.

IBM Watson Assistant has the option to incorporate Watson Discovery into their Assistant. Data can be uploaded in a JSON format, and the data can be queried via natural language. The idea is that the input from the user can be passed directly to the knowledge base.

As seen above, the first input from the user goes to an intent based dialog skill. The second question cannot be fielded by the chatbot dialog skill and is sent to the knowledge base.

The OpenAI language API (below)does an exceptional job at answering general questions, maintaining the context and dialog turns and generating natural responses (NLG). Fine-tuning can be performed, read more about it here.

Rasa has knowledge base actions which are highly configureable. Lastly Cisco MindMeld has a solution for question answering which by default uses Elasticsearch for full-text search and analytics engine for information retrieval.

Three Levels Of Question Answering

Level 1

The first level of defining Questions and Answers, is doing it via traditional chatbot development affordances; intents, entities, dialog trees and response messages.


  • The process and approach form part of existing chatbot development process.
  • Ease of integration with existing chatbot journeys, and act as an extension of current conversational functionality.
  • QnA experiences can be transformed into an integrated journey.


  • Maintenance intensive in terms of NLU (intents & entities), dialog state management and dialog management.
  • Does not scale well with large amounts of dynamic data.
  • Semantic search is more adept to finding one or more matching answers.

Level 2

The second level is where a custom knowledge base is setup. This can be done via various means, Elasticsearch, Watson Discovery, Rasa knowledge base actions, OpenAI Language API with fine-tuning or Pinecone.

Pinecone is a unique approach to search, extremely flexibly, but more detail on Pinecone later.

Level two knowledge bases are focussed and aimed at domain specific search data, and loading or making searchable data available.

A challenge with this level 2 knowledge base is to have an effective message abstraction layer. Response messages should be flexible, a portion of a response might be more appropriate for a specific question. Or, there might be a need for two or more messages to be merged for a more accurate response.


  • Scales well with large bodies of data which changes continuously.
  • Lower maintenance as the incorporated search options takes care of data retrieval.
  • Advances in Semantic Search, vector databases and more.
  • Knowledge bases negates chatbot fallback proliferation by most probably having a domain related answer to the question.


  • More demanding in terms of technical skills.
  • Cost might be a consideration.
  • An additional dimension is added to the Conversational AI landscape to manage.

Level 3

Level 3 could be seen as instances where general, non-domain specific questions can be asked. And where a vast general knowledge base needs to be leveraged. This can be Wikipedia, GPT3, etc.

OpenAI’s Language API does a good job at fielding any general knowledge questions in a very natural way with no dialog or messaging management. In the image below general random questions are fielded in short, well-formed sentences.

NVIDIA Riva has a general Question and Answer chatbot where Wikipedia is leveraged, as seen from the Notebook example below.

Two Challenges with Conversational AI & Knowledge Bases

Natural Language Generation

As mentioned previously, NLG remains a challenge for knowledge bases. In many cases the response messages are defined and segmented within the Knowledge Base.

But what if you require only a portion of one message, or portions of multiple messages to be merged? This is where OpenAI work’s magic.

Having an intelligent messaging abstraction layer for the knowledge base is crucial.

Semantic Search

Searching for matching works, key word or phrases is not the ideal, and the goal should be to have an NLU like search approach; hence searching in natural language.

Semantic search is not a literal search for matching words or combinations of words. But it is the searching with meaning.

Semantic search improves the search process by understanding the intent and the contextual meaning of the search.

Pinecone’s Question Answering with Similarity Search

With Pinecone you can create vector databases, and create high-speed vector search applications. The advantages are:

  • Developer-friendly,
  • Fully managed, and
  • Easily scalable without infrastructure hassles.

Initially it might be a bit abstract to get your head around, this article has the simplest application possible leveraging Pinecone.

Pinecone as a vector database needs a data source on the one side, and then an application to query and search the vector imbedding.

With Pinecone, you can write a questions answering application with in three steps:

  • Represent questions as vector embeddings. Semantically similar questions are in close proximity within the same vector space.
  • Index vectors using Pinecone.
  • Query the index to fetch similar questions.

The tutorial to create the semantic similarity search can be found here. A few things to lookout for:

  • You need to start with:
!pip install -qU pip pinecone-client
  • The API key can be generated within the Pinecone console:

On the free tier you are only entitled to one pod. If you have an existing one, and try to create an additional pod via the notebook, it will fail.

You will have to head back to the Pinecone console and delete the existing pod there.

As seen above, the df.head function is useful for quickly testing if your object has the right type of data in it.

It only takes a few minutes to run through the notebook and reach the desired results.

Above, the question asked is marked in red…

The results, scores and ID’s are listed above.


The solution to a flexible chatbot does not lie in the extremes, but in a compromise. A more rigid dialog state environment for product and service specific conversations, and a knowledge base for specific domain relevant searches. And lastly a search approach leveraging the web, Wikipedia, GPT-3 , etc.

Haystack lets you write your search code once and choose the search engine you want it to run on. The quick configuration dials reminds of the HumanFirst interface to some degree.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store