sherpa_ai.connectors package#
Overview#
The connectors
package provides interfaces for Sherpa AI to connect with external systems
and databases. It includes specialized connectors for vector stores and other data persistence
mechanisms required for retrieval-augmented generation and knowledge storage.
Key Components
Base: Abstract interface for connector implementations
ChromaVectorStore: Implementation for the Chroma vector database
VectorStores: Generic interfaces for vector database interactions
Example Usage#
from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore
# Initialize a vector store
vector_store = ChromaVectorStore(
collection_name="documents",
embedding_function=embedding_fn
)
# Store documents
documents = [
"Sherpa AI is a framework for building intelligent agents.",
"Vector databases store vector embeddings for semantic search."
]
vector_store.add_texts(documents, metadatas=[{"source": "docs"} for _ in documents])
# Retrieve similar documents
results = vector_store.similarity_search("How do I build an agent?", k=2)
print(results)
Submodules#
Module |
Description |
---|---|
Abstract base classes defining the connector interface. |
|
|
Implementation for the Chroma vector database with document storage. |
Generic interfaces and utilities for vector database interactions. |
sherpa_ai.connectors.base module#
- class sherpa_ai.connectors.base.BaseVectorDB(db)[source]#
Bases:
ABC
Abstract base class for vector database connectors.
This class defines the interface that all vector database connectors must implement, providing methods for similarity search operations.
- db#
The underlying database connection or client.
Example
>>> from sherpa_ai.connectors.base import BaseVectorDB >>> from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore >>> # ChromaVectorStore implements BaseVectorDB >>> vector_db = ChromaVectorStore(db=some_db) >>> results = vector_db.similarity_search("query", number_of_results=5)
- abstractmethod similarity_search(query, number_of_results, k, session_id=None)[source]#
Perform a similarity search in the vector database.
This method searches for documents that are semantically similar to the query.
- Parameters:
query (str) – The search query.
number_of_results (int) – The number of results to return.
k (int) – The number of nearest neighbors to consider.
session_id (str, optional) – Session ID to filter results. Defaults to None.
- Returns:
A list of documents that match the query.
- Return type:
List[Document]
Example
>>> from sherpa_ai.connectors.base import BaseVectorDB >>> from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore >>> vector_db = ChromaVectorStore(db=some_db) >>> results = vector_db.similarity_search("What is machine learning?", number_of_results=5) >>> for doc in results: ... print(doc.page_content[:100])
sherpa_ai.connectors.chroma_vector_store module#
sherpa_ai.connectors.vectorstores module#
- class sherpa_ai.connectors.vectorstores.ConversationStore(namespace, db, embeddings, text_key)[source]#
Bases:
VectorStore
A vector store for storing and retrieving conversation data.
This class provides methods to store conversation data in a vector database and retrieve similar conversations based on queries.
- db#
The underlying database connection.
- namespace#
The namespace for the vector store.
- Type:
str
- embeddings_func#
The embedding function to use.
- text_key#
The key used to store the text in metadata.
- Type:
str
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index") >>> store.add_text("This is a conversation", {"user": "user1"}) >>> results = store.similarity_search("conversation", top_k=5)
- classmethod from_index(namespace, openai_api_key, index_name, text_key='text')[source]#
Create a ConversationStore from a Pinecone index.
This method initializes a Pinecone client and creates a ConversationStore instance connected to the specified index.
- Parameters:
namespace (str) – The namespace for the vector store.
openai_api_key (str) – The OpenAI API key.
index_name (str) – The name of the Pinecone index.
text_key (str, optional) – The key used to store the text in metadata. Defaults to “text”.
- Returns:
A new ConversationStore instance.
- Return type:
- Raises:
ImportError – If the pinecone-client package is not installed.
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
- add_text(text, metadata={})[source]#
Add a single text to the vector store.
This method embeds the text, adds it to the database with the provided metadata, and returns the ID of the added text.
- Parameters:
text (str) – The text to add.
metadata (dict, optional) – Metadata to associate with the text. Defaults to {}.
- Returns:
The ID of the added text.
- Return type:
str
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index") >>> id = store.add_text("This is a conversation", {"user": "user1"}) >>> print(id) '123e4567-e89b-12d3-a456-426614174000'
- property embeddings: Embeddings | None#
Access the query embedding object if available.
- add_texts(texts, metadatas)[source]#
Add multiple texts to the vector store.
This method adds each text with its corresponding metadata to the vector store.
- Parameters:
texts (Iterable[str]) – The texts to add.
metadatas (List[dict]) – The metadata for each text.
- Return type:
List
[str
]
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index") >>> texts = ["Text 1", "Text 2"] >>> metadatas = [{"user": "user1"}, {"user": "user2"}] >>> store.add_texts(texts, metadatas)
- similarity_search(text, top_k=5, filter=None, threshold=0.7)[source]#
Perform a similarity search in the vector store.
This method searches for texts that are semantically similar to the query.
- Parameters:
text (str) – The search query.
top_k (int, optional) – The number of results to return. Defaults to 5.
filter (Optional[dict], optional) – Filter criteria for the search. Defaults to None.
threshold (float, optional) – The similarity threshold. Defaults to 0.7.
- Returns:
A list of documents that match the query.
- Return type:
list[Document]
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index") >>> results = store.similarity_search("What is machine learning?", top_k=5) >>> for doc in results: ... print(doc.page_content[:100])
- classmethod delete(namespace, index_name)[source]#
Delete all vectors in a namespace.
This method deletes all vectors in the specified namespace of the Pinecone index.
- Parameters:
namespace (str) – The namespace to delete.
index_name (str) – The name of the Pinecone index.
- Returns:
The result of the delete operation.
- Raises:
ImportError – If the pinecone-client package is not installed.
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> ConversationStore.delete("my_namespace", "my_index")
- classmethod get_vector_retrieval(namespace, openai_api_key, index_name, search_type='similarity', search_kwargs={})[source]#
Create a vector store retriever.
This method creates a ConversationStore and returns a VectorStoreRetriever for it.
- Parameters:
namespace (str) – The namespace for the vector store.
openai_api_key (str) – The OpenAI API key.
index_name (str) – The name of the Pinecone index.
search_type (str, optional) – The type of search to perform. Defaults to “similarity”.
search_kwargs (dict, optional) – Additional keyword arguments for the search. Defaults to {}.
- Returns:
A retriever for the vector store.
- Return type:
VectorStoreRetriever
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> retriever = ConversationStore.get_vector_retrieval("my_namespace", "api_key", "my_index") >>> results = retriever.get_relevant_documents("What is machine learning?")
- classmethod from_texts(texts, embedding, metadatas)[source]#
Create a ConversationStore from a list of texts.
This method is not implemented for ConversationStore.
- Parameters:
texts (List[str]) – The texts to add.
embedding (Embeddings) – The embedding function to use.
metadatas (list[dict]) – The metadata for each text.
- Raises:
NotImplementedError – This method is not implemented for ConversationStore.
- class sherpa_ai.connectors.vectorstores.LocalChromaStore(collection_name='langchain', embedding_function=None, persist_directory=None, client_settings=None, collection_metadata=None, client=None, relevance_score_fn=None)[source]#
Bases:
Chroma
A local Chroma-based vector store.
This class extends the Chroma vector store to provide additional functionality for working with local files.
Example
>>> from sherpa_ai.connectors.vectorstores import LocalChromaStore >>> store = LocalChromaStore.from_folder("path/to/files", "api_key") >>> results = store.similarity_search("query", k=5)
- classmethod from_folder(file_path, openai_api_key, index_name='chroma')[source]#
Create a Chroma DB from a folder of files.
This method creates a ChromaDB from a folder of files, currently supporting PDFs and markdown files.
- Parameters:
file_path (str) – Path to the folder containing files.
openai_api_key (str) – The OpenAI API key.
index_name (str, optional) – Name of the index. Defaults to “chroma”.
- Returns:
A new LocalChromaStore instance.
- Return type:
Example
>>> from sherpa_ai.connectors.vectorstores import LocalChromaStore >>> store = LocalChromaStore.from_folder("path/to/files", "api_key") >>> results = store.similarity_search("query", k=5)
- sherpa_ai.connectors.vectorstores.configure_chroma(host, port, index_name, openai_api_key)[source]#
Configure a ChromaDB instance.
This function creates a ChromaDB instance connected to a remote server.
- Parameters:
host (str) – The host of the ChromaDB server.
port (int) – The port of the ChromaDB server.
index_name (str) – The name of the index.
openai_api_key (str) – The OpenAI API key.
- Returns:
A configured ChromaDB instance.
- Return type:
Chroma
- Raises:
ImportError – If the chromadb package is not installed.
Example
>>> from sherpa_ai.connectors.vectorstores import configure_chroma >>> chroma = configure_chroma("localhost", 8000, "my_index", "api_key") >>> results = chroma.similarity_search("query", k=5)
- sherpa_ai.connectors.vectorstores.get_vectordb()[source]#
Get a vector database retriever based on configuration.
This function returns a vector database retriever based on the configuration in the config module. It supports Pinecone, Chroma, and local ChromaDB.
- Returns:
A retriever for the vector store.
- Return type:
VectorStoreRetriever
Example
>>> from sherpa_ai.connectors.vectorstores import get_vectordb >>> retriever = get_vectordb() >>> results = retriever.get_relevant_documents("What is machine learning?")