sherpa_ai.models package#
Overview#
The models
package provides interfaces and wrappers for language models in Sherpa AI, enabling seamless integration with various LLM providers while adding enhanced functionality like logging and error handling.
Key Components
SherpaBaseChatModel: Core interface for chat-based language models
SherpaBaseModel: Base implementation for all model types
ChatModelWithLogging: Chat model wrapper with integrated logging capabilities
Example Usage#
from sherpa_ai.models.sherpa_base_chat_model import SherpaBaseChatModel
# Initialize a chat model
model = SherpaBaseChatModel(
model_name="gpt-4",
temperature=0.7,
max_tokens=1000
)
# Generate a response to a user query
response = model.generate(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
)
print(response)
Submodules#
Module |
Description |
---|---|
Provides a chat model wrapper with integrated logging capabilities. |
|
Implements the core interface for chat-based language models. |
|
Contains the base implementation for all model types in the system. |
sherpa_ai.models.chat_model_with_logging module#
Chat model with logging functionality for Sherpa AI.
This module provides a wrapper for chat models that adds detailed logging capabilities. It defines the ChatModelWithLogging class which logs all model interactions including inputs, outputs, and model information.
- class sherpa_ai.models.chat_model_with_logging.ChatModelWithLogging(*args, **kwargs)[source]#
Bases:
BaseChatModel
Chat model wrapper that adds detailed logging functionality.
This class wraps any chat model to add comprehensive logging of all interactions, including input messages, generated responses, and model information.
- llm#
The underlying chat model to wrap.
- Type:
BaseChatModel
- logger#
Logger instance for recording interactions.
- Type:
type(logger)
Example
>>> base_model = ChatOpenAI() >>> model = ChatModelWithLogging(llm=base_model, logger=custom_logger) >>> response = model.generate([Message("Hello")]) >>> # Logs will include input, output, and model info
-
llm:
BaseChatModel
#
-
logger:
Logger
#
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'protected_namespaces': ()}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
sherpa_ai.models.sherpa_base_chat_model module#
Chat model integration module for Sherpa AI.
This module provides chat model integration for the Sherpa AI system. It defines base and OpenAI-specific chat model classes with Sherpa enhancements like usage tracking and verbose logging.
- class sherpa_ai.models.sherpa_base_chat_model.SherpaBaseChatModel(*args, **kwargs)[source]#
Bases:
BaseChatModel
Base chat model with Sherpa-specific enhancements.
This class extends the base chat model to add Sherpa-specific functionality, including user-based token usage tracking and verbose logging capabilities.
- user_id#
ID of the user making model requests.
- Type:
Optional[str]
- verbose_logger#
Logger for detailed operation tracking.
- Type:
Example
>>> model = SherpaBaseChatModel(user_id="user123") >>> response = model.generate([Message("Hello")]) >>> print(response.generations[0].text) 'Hi there!'
-
user_id:
Optional
[str
]#
-
verbose_logger:
BaseVerboseLogger
#
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'protected_namespaces': ()}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class sherpa_ai.models.sherpa_base_chat_model.SherpaChatOpenAI(*args, **kwargs)[source]#
Bases:
ChatOpenAI
Enhanced OpenAI chat model with Sherpa-specific features.
This class extends the OpenAI chat model to add Sherpa-specific functionality, including user-based token usage tracking and verbose logging capabilities.
- user_id#
ID of the user making model requests.
- Type:
Optional[str]
- verbose_logger#
Logger for detailed operation tracking.
- Type:
Example
>>> model = SherpaChatOpenAI(user_id="user123") >>> response = model.generate([Message("Hello")]) >>> print(response.generations[0].text) 'Hi there!'
-
user_id:
Optional
[str
]#
-
verbose_logger:
BaseVerboseLogger
#
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
sherpa_ai.models.sherpa_base_model module#
Base OpenAI model integration for Sherpa AI.
This module provides the base OpenAI model integration for the Sherpa AI system. It defines the SherpaOpenAI class which extends OpenAI’s chat model with Sherpa-specific functionality like usage tracking.
- class sherpa_ai.models.sherpa_base_model.SherpaOpenAI(*args, **kwargs)[source]#
Bases:
ChatOpenAI
Enhanced OpenAI chat model with Sherpa-specific features.
This class extends the OpenAI chat model to add Sherpa-specific functionality, particularly user-based token usage tracking. It maintains compatibility with the base ChatOpenAI interface while adding user tracking capabilities.
- user_id#
ID of the user making model requests.
- Type:
Optional[str]
Example
>>> model = SherpaOpenAI(user_id="user123") >>> response = model.generate("What is the weather?") >>> print(response.generations[0].text) 'The weather is sunny.'
-
user_id:
Optional
[str
]#
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Module contents#
Language model integration module for Sherpa AI.
This module provides language model integration for the Sherpa AI system. It exports the SherpaOpenAI and SherpaChatOpenAI classes which provide interfaces to OpenAI’s language models with Sherpa-specific enhancements.
Example
>>> from sherpa_ai.models import SherpaOpenAI, SherpaChatOpenAI
>>> model = SherpaOpenAI()
>>> chat_model = SherpaChatOpenAI()
>>> response = model.generate("Hello")
>>> chat_response = chat_model.chat("How are you?")
- class sherpa_ai.models.SherpaOpenAI(*args, **kwargs)[source]#
Bases:
ChatOpenAI
Enhanced OpenAI chat model with Sherpa-specific features.
This class extends the OpenAI chat model to add Sherpa-specific functionality, particularly user-based token usage tracking. It maintains compatibility with the base ChatOpenAI interface while adding user tracking capabilities.
- user_id#
ID of the user making model requests.
- Type:
Optional[str]
Example
>>> model = SherpaOpenAI(user_id="user123") >>> response = model.generate("What is the weather?") >>> print(response.generations[0].text) 'The weather is sunny.'
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
user_id:
Optional
[str
]#
- stream_usage: bool#
Whether to include usage metadata in streaming output. If True, additional message chunks will be generated during the stream including usage metadata.
- max_tokens: Optional[int]#
Maximum number of tokens to generate.
- model_name: str#
Model name to use.
- temperature: float#
What sampling temperature to use.
- model_kwargs: Dict[str, Any]#
Holds any model parameters valid for create call not explicitly specified.
- openai_api_key: Optional[SecretStr]#
- openai_api_base: Optional[str]#
Base URL path for API requests, leave blank if not using a proxy or service emulator.
- openai_organization: Optional[str]#
Automatically inferred from env var OPENAI_ORG_ID if not provided.
- openai_proxy: Optional[str]#
- request_timeout: Union[float, Tuple[float, float], Any, None]#
Timeout for requests to OpenAI completion API. Can be float, httpx.Timeout or None.
- max_retries: int#
Maximum number of retries to make when generating.
- presence_penalty: Optional[float]#
Penalizes repeated tokens.
- frequency_penalty: Optional[float]#
Penalizes repeated tokens according to frequency.
- seed: Optional[int]#
Seed for generation
- logprobs: Optional[bool]#
Whether to return logprobs.
- top_logprobs: Optional[int]#
Number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
- logit_bias: Optional[Dict[int, int]]#
Modify the likelihood of specified tokens appearing in the completion.
- streaming: bool#
Whether to stream the results or not.
- n: int#
Number of chat completions to generate for each prompt.
- top_p: Optional[float]#
Total probability mass of tokens to consider at each step.
- reasoning_effort: Optional[str]#
Constrains effort on reasoning for reasoning models.
o1 models only.
Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
Added in version 0.2.14.
- tiktoken_model_name: Optional[str]#
The model name to pass to tiktoken when using this class. Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit. By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.
- default_headers: Union[Mapping[str, str], None]#
- default_query: Union[Mapping[str, object], None]#
- http_client: Union[Any, None]#
Optional httpx.Client. Only used for sync invocations. Must specify http_async_client as well if you’d like a custom client for async invocations.
- http_async_client: Union[Any, None]#
Optional httpx.AsyncClient. Only used for async invocations. Must specify http_client as well if you’d like a custom client for sync invocations.
- stop: Optional[Union[List[str], str]]#
Default stop sequences.
- extra_body: Optional[Mapping[str, Any]]#
Optional additional JSON properties to include in the request parameters when making requests to OpenAI compatible APIs, such as vLLM.
- include_response_headers: bool#
Whether to include response headers in the output message response_metadata.
- disabled_params: Optional[Dict[str, Any]]#
Parameters of the OpenAI client or chat.completions endpoint that should be disabled for the given model.
Should be specified as
{"param": None | ['val1', 'val2']}
where the key is the parameter and the value is either None, meaning that parameter should never be used, or it’s a list of disabled values for the parameter.For example, older models may not support the ‘parallel_tool_calls’ parameter at all, in which case
disabled_params={"parallel_tool_calls: None}
can ben passed in.If a parameter is disabled then it will not be used by default in any methods, e.g. in
with_structured_output()
. However this does not prevent a user from directly passed in the parameter during invocation.
- callback_manager: Optional[BaseCallbackManager]#
- rate_limiter: Optional[BaseRateLimiter]#
An optional rate limiter to use for limiting the number of requests.
- disable_streaming: Union[bool, Literal['tool_calling']]#
Whether to disable streaming for this model.
If streaming is bypassed, then
stream()
/astream()
/astream_events()
will defer toinvoke()
/ainvoke()
.If True, will always bypass streaming case.
If “tool_calling”, will bypass streaming case only when the model is called with a
tools
keyword argument.If False (default), will always use streaming case if available.
- cache: Union[BaseCache, bool, None]#
Whether to cache the response.
If true, will use the global cache.
If false, will not use a cache
If None, will use the global cache if it’s set, otherwise no cache.
If instance of BaseCache, will use the provided cache.
Caching is not currently supported for streaming methods of models.
- verbose: bool#
Whether to print out response text.
- callbacks: Callbacks#
Callbacks to add to the run trace.
- tags: Optional[list[str]]#
Tags to add to the run trace.
- metadata: Optional[dict[str, Any]]#
Metadata to add to the run trace.
- custom_get_token_ids: Optional[Callable[[str], list[int]]]#
Optional encoder to use for counting tokens.
- name: Optional[str]#
The name of the Runnable. Used for debugging and tracing.
- class sherpa_ai.models.SherpaChatOpenAI(*args, **kwargs)[source]#
Bases:
ChatOpenAI
Enhanced OpenAI chat model with Sherpa-specific features.
This class extends the OpenAI chat model to add Sherpa-specific functionality, including user-based token usage tracking and verbose logging capabilities.
- user_id#
ID of the user making model requests.
- Type:
Optional[str]
- verbose_logger#
Logger for detailed operation tracking.
- Type:
Example
>>> model = SherpaChatOpenAI(user_id="user123") >>> response = model.generate([Message("Hello")]) >>> print(response.generations[0].text) 'Hi there!'
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
user_id:
Optional
[str
]#
-
verbose_logger:
BaseVerboseLogger
#
- stream_usage: bool#
Whether to include usage metadata in streaming output. If True, additional message chunks will be generated during the stream including usage metadata.
- max_tokens: Optional[int]#
Maximum number of tokens to generate.
- model_name: str#
Model name to use.
- temperature: float#
What sampling temperature to use.
- model_kwargs: Dict[str, Any]#
Holds any model parameters valid for create call not explicitly specified.
- openai_api_key: Optional[SecretStr]#
- openai_api_base: Optional[str]#
Base URL path for API requests, leave blank if not using a proxy or service emulator.
- openai_organization: Optional[str]#
Automatically inferred from env var OPENAI_ORG_ID if not provided.
- openai_proxy: Optional[str]#
- request_timeout: Union[float, Tuple[float, float], Any, None]#
Timeout for requests to OpenAI completion API. Can be float, httpx.Timeout or None.
- max_retries: int#
Maximum number of retries to make when generating.
- presence_penalty: Optional[float]#
Penalizes repeated tokens.
- frequency_penalty: Optional[float]#
Penalizes repeated tokens according to frequency.
- seed: Optional[int]#
Seed for generation
- logprobs: Optional[bool]#
Whether to return logprobs.
- top_logprobs: Optional[int]#
Number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
- logit_bias: Optional[Dict[int, int]]#
Modify the likelihood of specified tokens appearing in the completion.
- streaming: bool#
Whether to stream the results or not.
- n: int#
Number of chat completions to generate for each prompt.
- top_p: Optional[float]#
Total probability mass of tokens to consider at each step.
- reasoning_effort: Optional[str]#
Constrains effort on reasoning for reasoning models.
o1 models only.
Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
Added in version 0.2.14.
- tiktoken_model_name: Optional[str]#
The model name to pass to tiktoken when using this class. Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit. By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.
- default_headers: Union[Mapping[str, str], None]#
- default_query: Union[Mapping[str, object], None]#
- http_client: Union[Any, None]#
Optional httpx.Client. Only used for sync invocations. Must specify http_async_client as well if you’d like a custom client for async invocations.
- http_async_client: Union[Any, None]#
Optional httpx.AsyncClient. Only used for async invocations. Must specify http_client as well if you’d like a custom client for sync invocations.
- stop: Optional[Union[List[str], str]]#
Default stop sequences.
- extra_body: Optional[Mapping[str, Any]]#
Optional additional JSON properties to include in the request parameters when making requests to OpenAI compatible APIs, such as vLLM.
- include_response_headers: bool#
Whether to include response headers in the output message response_metadata.
- disabled_params: Optional[Dict[str, Any]]#
Parameters of the OpenAI client or chat.completions endpoint that should be disabled for the given model.
Should be specified as
{"param": None | ['val1', 'val2']}
where the key is the parameter and the value is either None, meaning that parameter should never be used, or it’s a list of disabled values for the parameter.For example, older models may not support the ‘parallel_tool_calls’ parameter at all, in which case
disabled_params={"parallel_tool_calls: None}
can ben passed in.If a parameter is disabled then it will not be used by default in any methods, e.g. in
with_structured_output()
. However this does not prevent a user from directly passed in the parameter during invocation.
- callback_manager: Optional[BaseCallbackManager]#
- rate_limiter: Optional[BaseRateLimiter]#
An optional rate limiter to use for limiting the number of requests.
- disable_streaming: Union[bool, Literal['tool_calling']]#
Whether to disable streaming for this model.
If streaming is bypassed, then
stream()
/astream()
/astream_events()
will defer toinvoke()
/ainvoke()
.If True, will always bypass streaming case.
If “tool_calling”, will bypass streaming case only when the model is called with a
tools
keyword argument.If False (default), will always use streaming case if available.
- cache: Union[BaseCache, bool, None]#
Whether to cache the response.
If true, will use the global cache.
If false, will not use a cache
If None, will use the global cache if it’s set, otherwise no cache.
If instance of BaseCache, will use the provided cache.
Caching is not currently supported for streaming methods of models.
- verbose: bool#
Whether to print out response text.
- callbacks: Callbacks#
Callbacks to add to the run trace.
- tags: Optional[list[str]]#
Tags to add to the run trace.
- metadata: Optional[dict[str, Any]]#
Metadata to add to the run trace.
- custom_get_token_ids: Optional[Callable[[str], list[int]]]#
Optional encoder to use for counting tokens.
- name: Optional[str]#
The name of the Runnable. Used for debugging and tracing.