sherpa_ai.connectors package#

Overview#

The connectors package provides interfaces for Sherpa AI to connect with external systems and databases. It includes specialized connectors for vector stores and other data persistence mechanisms required for retrieval-augmented generation and knowledge storage.

Key Components

  • Base: Abstract interface for connector implementations

  • ChromaVectorStore: Implementation for the Chroma vector database

  • VectorStores: Generic interfaces for vector database interactions

Example Usage#

from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore

# Initialize a vector store
vector_store = ChromaVectorStore(
    collection_name="documents",
    embedding_function=embedding_fn
)

# Store documents
documents = [
    "Sherpa AI is a framework for building intelligent agents.",
    "Vector databases store vector embeddings for semantic search."
]
vector_store.add_texts(documents, metadatas=[{"source": "docs"} for _ in documents])

# Retrieve similar documents
results = vector_store.similarity_search("How do I build an agent?", k=2)
print(results)

Submodules#

Module

Description

sherpa_ai.connectors.base

Abstract base classes defining the connector interface.

sherpa_ai.connectors.chroma_vector_store

Implementation for the Chroma vector database with document storage.

sherpa_ai.connectors.vectorstores

Generic interfaces and utilities for vector database interactions.

sherpa_ai.connectors.base module#

class sherpa_ai.connectors.base.BaseVectorDB(db)[source]#

Bases: ABC

Abstract base class for vector database connectors.

This class defines the interface that all vector database connectors must implement, providing methods for similarity search operations.

db#

The underlying database connection or client.

Example

>>> from sherpa_ai.connectors.base import BaseVectorDB
>>> from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore
>>> # ChromaVectorStore implements BaseVectorDB
>>> vector_db = ChromaVectorStore(db=some_db)
>>> results = vector_db.similarity_search("query", number_of_results=5)

Perform a similarity search in the vector database.

This method searches for documents that are semantically similar to the query.

Parameters:
  • query (str) – The search query.

  • number_of_results (int) – The number of results to return.

  • k (int) – The number of nearest neighbors to consider.

  • session_id (str, optional) – Session ID to filter results. Defaults to None.

Returns:

A list of documents that match the query.

Return type:

List[Document]

Example

>>> from sherpa_ai.connectors.base import BaseVectorDB
>>> from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore
>>> vector_db = ChromaVectorStore(db=some_db)
>>> results = vector_db.similarity_search("What is machine learning?", number_of_results=5)
>>> for doc in results:
...     print(doc.page_content[:100])

sherpa_ai.connectors.chroma_vector_store module#

sherpa_ai.connectors.vectorstores module#

class sherpa_ai.connectors.vectorstores.ConversationStore(namespace, db, embeddings, text_key)[source]#

Bases: VectorStore

A vector store for storing and retrieving conversation data.

This class provides methods to store conversation data in a vector database and retrieve similar conversations based on queries.

db#

The underlying database connection.

namespace#

The namespace for the vector store.

Type:

str

embeddings_func#

The embedding function to use.

text_key#

The key used to store the text in metadata.

Type:

str

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
>>> store.add_text("This is a conversation", {"user": "user1"})
>>> results = store.similarity_search("conversation", top_k=5)
classmethod from_index(namespace, openai_api_key, index_name, text_key='text')[source]#

Create a ConversationStore from a Pinecone index.

This method initializes a Pinecone client and creates a ConversationStore instance connected to the specified index.

Parameters:
  • namespace (str) – The namespace for the vector store.

  • openai_api_key (str) – The OpenAI API key.

  • index_name (str) – The name of the Pinecone index.

  • text_key (str, optional) – The key used to store the text in metadata. Defaults to “text”.

Returns:

A new ConversationStore instance.

Return type:

ConversationStore

Raises:

ImportError – If the pinecone-client package is not installed.

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
add_text(text, metadata={})[source]#

Add a single text to the vector store.

This method embeds the text, adds it to the database with the provided metadata, and returns the ID of the added text.

Parameters:
  • text (str) – The text to add.

  • metadata (dict, optional) – Metadata to associate with the text. Defaults to {}.

Returns:

The ID of the added text.

Return type:

str

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
>>> id = store.add_text("This is a conversation", {"user": "user1"})
>>> print(id)
'123e4567-e89b-12d3-a456-426614174000'
property embeddings: Embeddings | None#

Access the query embedding object if available.

add_texts(texts, metadatas)[source]#

Add multiple texts to the vector store.

This method adds each text with its corresponding metadata to the vector store.

Parameters:
  • texts (Iterable[str]) – The texts to add.

  • metadatas (List[dict]) – The metadata for each text.

Return type:

List[str]

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
>>> texts = ["Text 1", "Text 2"]
>>> metadatas = [{"user": "user1"}, {"user": "user2"}]
>>> store.add_texts(texts, metadatas)

Perform a similarity search in the vector store.

This method searches for texts that are semantically similar to the query.

Parameters:
  • text (str) – The search query.

  • top_k (int, optional) – The number of results to return. Defaults to 5.

  • filter (Optional[dict], optional) – Filter criteria for the search. Defaults to None.

  • threshold (float, optional) – The similarity threshold. Defaults to 0.7.

Returns:

A list of documents that match the query.

Return type:

list[Document]

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
>>> results = store.similarity_search("What is machine learning?", top_k=5)
>>> for doc in results:
...     print(doc.page_content[:100])
classmethod delete(namespace, index_name)[source]#

Delete all vectors in a namespace.

This method deletes all vectors in the specified namespace of the Pinecone index.

Parameters:
  • namespace (str) – The namespace to delete.

  • index_name (str) – The name of the Pinecone index.

Returns:

The result of the delete operation.

Raises:

ImportError – If the pinecone-client package is not installed.

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> ConversationStore.delete("my_namespace", "my_index")
classmethod get_vector_retrieval(namespace, openai_api_key, index_name, search_type='similarity', search_kwargs={})[source]#

Create a vector store retriever.

This method creates a ConversationStore and returns a VectorStoreRetriever for it.

Parameters:
  • namespace (str) – The namespace for the vector store.

  • openai_api_key (str) – The OpenAI API key.

  • index_name (str) – The name of the Pinecone index.

  • search_type (str, optional) – The type of search to perform. Defaults to “similarity”.

  • search_kwargs (dict, optional) – Additional keyword arguments for the search. Defaults to {}.

Returns:

A retriever for the vector store.

Return type:

VectorStoreRetriever

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> retriever = ConversationStore.get_vector_retrieval("my_namespace", "api_key", "my_index")
>>> results = retriever.get_relevant_documents("What is machine learning?")
classmethod from_texts(texts, embedding, metadatas)[source]#

Create a ConversationStore from a list of texts.

This method is not implemented for ConversationStore.

Parameters:
  • texts (List[str]) – The texts to add.

  • embedding (Embeddings) – The embedding function to use.

  • metadatas (list[dict]) – The metadata for each text.

Raises:

NotImplementedError – This method is not implemented for ConversationStore.

class sherpa_ai.connectors.vectorstores.LocalChromaStore(collection_name='langchain', embedding_function=None, persist_directory=None, client_settings=None, collection_metadata=None, client=None, relevance_score_fn=None)[source]#

Bases: Chroma

A local Chroma-based vector store.

This class extends the Chroma vector store to provide additional functionality for working with local files.

Example

>>> from sherpa_ai.connectors.vectorstores import LocalChromaStore
>>> store = LocalChromaStore.from_folder("path/to/files", "api_key")
>>> results = store.similarity_search("query", k=5)
classmethod from_folder(file_path, openai_api_key, index_name='chroma')[source]#

Create a Chroma DB from a folder of files.

This method creates a ChromaDB from a folder of files, currently supporting PDFs and markdown files.

Parameters:
  • file_path (str) – Path to the folder containing files.

  • openai_api_key (str) – The OpenAI API key.

  • index_name (str, optional) – Name of the index. Defaults to “chroma”.

Returns:

A new LocalChromaStore instance.

Return type:

LocalChromaStore

Example

>>> from sherpa_ai.connectors.vectorstores import LocalChromaStore
>>> store = LocalChromaStore.from_folder("path/to/files", "api_key")
>>> results = store.similarity_search("query", k=5)
sherpa_ai.connectors.vectorstores.configure_chroma(host, port, index_name, openai_api_key)[source]#

Configure a ChromaDB instance.

This function creates a ChromaDB instance connected to a remote server.

Parameters:
  • host (str) – The host of the ChromaDB server.

  • port (int) – The port of the ChromaDB server.

  • index_name (str) – The name of the index.

  • openai_api_key (str) – The OpenAI API key.

Returns:

A configured ChromaDB instance.

Return type:

Chroma

Raises:

ImportError – If the chromadb package is not installed.

Example

>>> from sherpa_ai.connectors.vectorstores import configure_chroma
>>> chroma = configure_chroma("localhost", 8000, "my_index", "api_key")
>>> results = chroma.similarity_search("query", k=5)
sherpa_ai.connectors.vectorstores.get_vectordb()[source]#

Get a vector database retriever based on configuration.

This function returns a vector database retriever based on the configuration in the config module. It supports Pinecone, Chroma, and local ChromaDB.

Returns:

A retriever for the vector store.

Return type:

VectorStoreRetriever

Example

>>> from sherpa_ai.connectors.vectorstores import get_vectordb
>>> retriever = get_vectordb()
>>> results = retriever.get_relevant_documents("What is machine learning?")

Module contents#