Pinecone In The Simplest Terms
And How To Leverage Semantic Search
The idea and use-cases for Pinecone may be abstract to some…here is an attempt to demystify the purpose of Pinecone and illustrate implementations in its simplest form.
Pinecone gives you access to powerful vector databases, you can upload your data to these vector databases from various sources. Then perform true semantic searches on the data, returning highly accurate results.
As seen above, Pinecone has integration to OpenAI, Haystack and co:here. Obviously, custom integration is also possible.
Pinecone allows for data to be uploaded into a vector database and true semantic search can be performed.
Not only is conversational data highly unstructured, but it can also be complex. Vector search and vector databases allows for similarity searches.
In the words of Pinecone…
Semantic search Vector databases store and index vector embeddings from Natural Language Processing models to understand the meaning and context of strings of text, sentences, and whole documents for more accurate and relevant search results.
Searching with natural language to find relevant results works much better than users needing to know specifics of the data.
With Pinecone vector databases can be built easily for vectors search applications.
The Pinecone management console is minimalistic and effective, allowing users to manage their environment.
API Keys are managed here, which is used in notebooks or applications. There exists a free tier for experimentation.
As seen below, Pods can be managed to some extent within the console.
The Simplest Pinecone Demo
This is the simplest Pinecone demo application, it is an excellent way to understand and grasp the basic concepts.
The Pinecone client is installed and the vector created with a set of values.
pip install pinecone-clientimport pinecone
import ospinecone.init(api_key="xxxxxxxx-xxxx-xxxx-xxx-xxxx", environment="us-west1-gcp")index=pinecone.Index(index_name="pod1")import pandas as pddf = pd.DataFrame(
data={
"id": ["A", "B", "C", "D", "E"],
"vector":[
[1.,1.,1.],
[1.,2.,3.],
[2.,1.,1.],
[3.,2.,1.],
[2.,2.,2.]]
})
dfindex.upsert(vectors=zip(df.id,df.vector))index.describe_index_stats()index.query(
queries=[[1.,2.,3.]],
top_k=3,
include_values=True)
The vector is queried, with the value 1,2,3. And the top 3 matches are returned, with a score and the values.
{'matches': [],
'namespace': '',
'results': [{'matches': [{'id': 'B',
'score': 0.99999994,
'values': [1.0, 2.0, 3.0]},
{'id': 'E',
'score': 0.925820112,
'values': [2.0, 2.0, 2.0]},
{'id': 'A',
'score': 0.925820112,
'values': [1.0, 1.0, 1.0]}],
'namespace': ''}]}
Semantic search denotes search with meaning, as distinguished from lexical search where the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query.
Conclusion
Pinecone enables large bodies of data to be searched. A number of search options are available, but text search and question answering are the most related to NLP.
The example applications include Semantic text search, Question-answering, Video Recommendations, Audio similarity search, Personalised article recommender and more.
Pinecone has integration to OpenAI, co:here and haystack.
For example, the co:here example, utilises co:here for generating language embeddings, which can then be stored in Pinecone and used for Semantic Search.