sherpa_ai.policies package

In This Page:

sherpa_ai.policies package#

Overview#

The policies package provides decision-making strategies that agents use to handle different scenarios and determine appropriate actions. It includes reactive and state machine-based policies for various interaction patterns and reasoning approaches.

Key Components

  • Base Policy: Abstract interface for all policy implementations

  • React Policy: Implementation of ReAct (Reasoning+Acting) pattern

  • State Machine Policies: Policies that use finite state machines for complex workflows

  • Agent Feedback: Policies for handling agent evaluation and improvement

Example Usage#

from sherpa_ai.policies.react_policy import ReactPolicy
from sherpa_ai.agents import QAAgent
from sherpa_ai.models import SherpaBaseChatModel

# Initialize a model
model = SherpaBaseChatModel(model_name="gpt-4")

# Create a ReactPolicy
policy = ReactPolicy(
    max_iterations=5,
    tools=["search", "calculator"]
)

# Create an agent using this policy
agent = QAAgent(model=model, policy=policy)

# Process a query that requires reasoning and tool use
result = agent.get_response("What is the square root of the population of France?")
print(result)

Submodules#

Module

Description

sherpa_ai.policies.agent_feedback_policy

Policy for handling agent evaluation and improvement processes.

sherpa_ai.policies.base

Abstract base classes defining the policy interface.

sherpa_ai.policies.chat_sm_policy

Chat-based state machine policy for conversational workflows.

sherpa_ai.policies.exceptions

Custom exceptions for policy-related error handling.

sherpa_ai.policies.react_policy

Implementation of the ReAct (Reasoning+Acting) policy pattern.

sherpa_ai.policies.react_sm_policy

State machine-based implementation of the ReAct pattern.

sherpa_ai.policies.utils

Utility functions and helpers for policy implementations.

sherpa_ai.policies.agent_feedback_policy module#

Agent feedback-based policy module for Sherpa AI.

This module provides a policy implementation that selects actions based on feedback from an agent. It defines the AgentFeedbackPolicy class which generates questions for agents and uses their responses to guide action selection.

class sherpa_ai.policies.agent_feedback_policy.AgentFeedbackPolicy(**data)[source]#

Bases: ReactPolicy

Policy for selecting actions based on agent feedback.

This class extends ReactPolicy to incorporate feedback from an agent in the action selection process. It generates questions about possible actions, gets responses from the agent, and uses those responses to guide action selection.

prompt_template#

Template for generating questions.

Type:

PromptTemplate

agent#

Agent to provide feedback on actions.

Type:

BaseAgent

model_config#

Configuration allowing arbitrary types.

Type:

ConfigDict

Example

>>> agent = UserAgent()  # Agent that can provide feedback
>>> policy = AgentFeedbackPolicy(agent=agent)
>>> output = policy.select_action(belief)
>>> print(output.action.name)
'SearchAction'
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

prompt_template: PromptTemplate#
agent: BaseAgent#
async async_select_action(belief, **kwargs)[source]#

Select action based on agent feedback.

This method generates a question about possible actions, gets feedback from the agent, and uses that feedback to select the next action.

Parameters:
  • belief (Belief) – Current belief state containing available actions.

  • **kwargs – Additional arguments for action selection.

Returns:

Selected action and arguments based on feedback.

Return type:

PolicyOutput

Example

>>> policy = AgentFeedbackPolicy(agent=user_agent)
>>> belief.actions = [SearchAction(), AnalyzeAction()]
>>> output = policy.select_action(belief)
>>> print(output.action.name)  # Based on agent feedback
'SearchAction'
select_action(belief, **kwargs)[source]#

Select action based on agent feedback.

This method generates a question about possible actions, gets feedback from the agent, and uses that feedback to select the next action.

Parameters:
  • belief (Belief) – Current belief state containing available actions.

  • **kwargs – Additional arguments for action selection.

Returns:

Selected action and arguments based on feedback.

Return type:

PolicyOutput

sherpa_ai.policies.base module#

Base policy classes for Sherpa AI.

This module provides base classes for implementing agent policies. It defines the core interfaces that all policies must implement, including action selection and execution.

class sherpa_ai.policies.base.PolicyOutput(**data)[source]#

Bases: BaseModel

Output from policy action selection.

This class represents the result of a policy’s action selection process, including both the selected action and its arguments.

action#

The action to be executed. Currently Any type due to BaseAction not inheriting from BaseModel.

Type:

Any

args#

Arguments to pass to the selected action.

Type:

dict

Example

>>> output = PolicyOutput(
...     action=SearchAction(),
...     args={"query": "python programming"}
... )
>>> print(output.args["query"])
'python programming'
action: Any#
args: dict#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class sherpa_ai.policies.base.BasePolicy(**data)[source]#

Bases: ABC, BaseModel

Abstract base class for agent policies.

This class defines the interface that all policies must implement. Policies are responsible for selecting actions based on the agent’s current belief state.

Example

>>> class MyPolicy(BasePolicy):
...     def select_action(self, belief):
...         return PolicyOutput(action=SearchAction(), args={})
>>> policy = MyPolicy()
>>> output = policy(belief)
>>> print(output.action)
SearchAction()
abstractmethod select_action(belief, **kwargs)[source]#

Select an action based on current belief state.

Parameters:
  • belief (Belief) – Agent’s current belief state.

  • **kwargs – Additional arguments for action selection.

Returns:

Selected action and arguments, or None

if no action should be taken.

Return type:

Optional[PolicyOutput]

Example

>>> policy = MyPolicy()
>>> output = policy.select_action(belief)
>>> if output:
...     print(output.action)
SearchAction()
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sherpa_ai.policies.chat_sm_policy module#

class sherpa_ai.policies.chat_sm_policy.ChatStateMachinePolicy(**data)[source]#

Bases: BasePolicy

The policy to select an action from the belief based on the ReACT framework.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

SYSTEM_PROMPT: ClassVar[str] = "You are an assistant that must parse the user's instructions and select the most appropriate action from the provided possibilities. Only respond with valid JSON as described, without any extra text. Comply strictly with the specified format.\n\n**Task Description**: {task_description}\n\nYou should only respond in JSON format as described below without any extra text.\nResponse Format:\n{response_format}\nEnsure the response can be parsed by Python json.loads"#
ACTION_SELECTION_PROMPT: ClassVar[str] = 'You have a state machine to help you with the action execution. You are currently in the {state} state.\n{state_description}\n\n## Possible Actions:\n{possible_actions}\n\nYou should only select the actions specified in **Possible Actions**\nFirst, reason about what to do next based on the information, then select the best action.'#
chat_template: ChatPromptTemplate#
llm: Optional[BaseChatModel]#
response_format: dict#
max_conversation_tokens: int#
get_prompt_data(belief, actions)[source]#

Create the prompt based on information from the belief

Return type:

dict

select_action(belief)[source]#

Synchronous wrapper for async_select_action

Return type:

Optional[PolicyOutput]

async async_select_action(belief)[source]#

Select an action based on the current belief state

Return type:

Optional[PolicyOutput]

sherpa_ai.policies.exceptions module#

Policy exception classes for Sherpa AI.

This module provides exception classes for handling policy-related errors. It defines the SherpaPolicyException class which captures detailed error information including stack traces.

exception sherpa_ai.policies.exceptions.SherpaPolicyException(message)[source]#

Bases: Exception

Exception raised for policy-related errors.

This class extends the base Exception class to add stack trace capture and custom message handling for policy errors.

message#

Detailed error message.

Type:

str

stacktrace#

Stack trace at the point of error.

Type:

list[str]

Example

>>> try:
...     raise SherpaPolicyException("Invalid action")
... except SherpaPolicyException as e:
...     print(e.message)
...     print(len(e.stacktrace) > 0)  # Has stack trace
'Invalid action'
True

sherpa_ai.policies.react_policy module#

ReAct framework policy implementation for Sherpa AI.

This module provides a policy implementation based on the ReAct framework (https://arxiv.org/abs/2210.03629). It defines the ReactPolicy class which selects actions by reasoning about the current state and available actions.

class sherpa_ai.policies.react_policy.ReactPolicy(**data)[source]#

Bases: BasePolicy

Policy implementation based on the ReAct framework.

This class implements action selection based on the ReAct framework, which combines reasoning and acting. It uses a language model to analyze the current state and available actions to select the most appropriate next action.

role_description#

Description of agent role for action selection.

Type:

str

output_instruction#

Instruction for JSON output format.

Type:

str

llm#

Language model for generating text (BaseLanguageModel).

Type:

Any

prompt_template#

Template for generating prompts.

Type:

PromptTemplate

response_format#

Expected JSON format for responses.

Type:

dict

model_config#

Configuration allowing arbitrary types.

Type:

ConfigDict

Example

>>> policy = ReactPolicy(
...     role_description="Assistant that helps with coding",
...     output_instruction="Choose the best action",
...     llm=language_model
... )
>>> output = policy.select_action(belief)
>>> print(output.action.name)
'SearchCode'
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

role_description: str#
output_instruction: str#
llm: Any#
prompt_template: PromptTemplate#
response_format: dict#
async async_select_action(belief)[source]#

Asynchronously select an action based on current belief state.

This method currently just calls the synchronous version. It exists to provide an async interface for future async implementations.

Parameters:

belief (Belief) – Current belief state of the agent.

Returns:

Selected action and arguments, or None.

Return type:

Optional[PolicyOutput]

Example

>>> policy = ReactPolicy(llm=language_model)
>>> output = await policy.async_select_action(belief)
>>> if output:
...     print(output.action.name)
'SearchCode'
select_action(belief)[source]#

Select an action based on current belief state.

This method analyzes the current state and available actions using the ReAct framework to select the most appropriate next action. For trivial cases (single action with no args), it skips the language model reasoning.

Parameters:

belief (Belief) – Current belief state of the agent.

Returns:

Selected action and arguments, or None

if selected action not found.

Return type:

Optional[PolicyOutput]

Raises:

SherpaPolicyException – If selected action not in available actions.

Example

>>> policy = ReactPolicy(llm=language_model)
>>> belief.actions = [SearchAction(), AnalyzeAction()]
>>> output = policy.select_action(belief)
>>> print(output.action.name)  # Based on reasoning
'SearchAction'

sherpa_ai.policies.react_sm_policy module#

ReAct State Machine policy implementation for Sherpa AI.

This module provides a policy implementation that combines the ReAct framework with state machine information for action selection. It defines the ReactStateMachinePolicy class which uses both the current state and state machine transitions to guide action selection.

class sherpa_ai.policies.react_sm_policy.ReactStateMachinePolicy(**data)[source]#

Bases: BasePolicy

Policy implementation combining ReAct framework with state machine.

This class extends the ReAct framework by incorporating state machine information into the action selection process. It considers both the current state and possible state transitions when choosing actions.

role_description#

Description of agent role for action selection.

Type:

str

output_instruction#

Instruction for JSON output format.

Type:

str

llm#

Language model for generating text (BaseLanguageModel).

Type:

Any

prompt_template#

Template for generating prompts.

Type:

PromptTemplate

response_format#

Expected JSON format for responses.

Type:

dict

model_config#

Configuration allowing arbitrary types.

Type:

ConfigDict

Example

>>> policy = ReactStateMachinePolicy(
...     role_description="Assistant that helps with coding",
...     output_instruction="Choose the best action",
...     llm=language_model
... )
>>> output = policy.select_action(belief)
>>> print(output.action.name)
'SearchCode'
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

role_description: str#
output_instruction: str#
llm: Any#
prompt_template: PromptTemplate#
response_format: dict#
get_prompt(belief, actions)[source]#

Create a prompt for action selection using belief and state info.

This method generates a prompt that includes the current task, available actions, action history, current state, and state description to help guide action selection.

Parameters:
  • belief (Belief) – Current belief state of the agent.

  • actions (list[BaseAction]) – List of available actions.

Returns:

Formatted prompt for the language model.

Return type:

str

Example

>>> prompt = policy.get_prompt(belief, actions)
>>> print(prompt)  # Shows formatted prompt with state info
'You are an assistant that helps with coding...'
select_action(belief)[source]#

Select an action based on current belief state and state machine.

This method analyzes the current state, available actions, and state machine information to select the most appropriate next action. For trivial cases (single action with no args), it skips the language model reasoning.

Parameters:

belief (Belief) – Current belief state of the agent.

Returns:

Selected action and arguments, or None

if selected action not found.

Return type:

Optional[PolicyOutput]

Raises:

SherpaPolicyException – If selected action not in available actions.

Example

>>> policy = ReactStateMachinePolicy(llm=language_model)
>>> belief.actions = [SearchAction(), AnalyzeAction()]
>>> output = policy.select_action(belief)
>>> print(output.action.name)  # Based on state and reasoning
'SearchAction'
async async_select_action(belief)[source]#

Asynchronously select an action based on belief state and state machine.

This method provides an asynchronous version of action selection, considering the current state, available actions, and state machine information.

Parameters:

belief (Belief) – Current belief state of the agent.

Returns:

Selected action and arguments, or None

if selected action not found.

Return type:

Optional[PolicyOutput]

Raises:

SherpaPolicyException – If selected action not in available actions.

Example

>>> policy = ReactStateMachinePolicy(llm=language_model)
>>> output = await policy.async_select_action(belief)
>>> if output:
...     print(output.action.name)
'SearchAction'

sherpa_ai.policies.utils module#

Policy utility functions for Sherpa AI.

This module provides utility functions for policy implementations. It includes functions for parsing action outputs, checking action selection conditions, and constructing conversation histories from belief states.

sherpa_ai.policies.utils.transform_json_output(output_str)[source]#

Transform JSON-formatted output string into action and arguments.

This function extracts action name and arguments from a JSON string that follows the format: {“command”: {“name”: “action”, “args”: {…}}} or {“action”: {“name”: “action”, “args”: {…}}}.

Parameters:

output_str (str) – JSON-formatted string containing action details.

Returns:

Action name and arguments dictionary.

Return type:

Tuple[str, dict]

Raises:

SherpaPolicyException – If output lacks proper JSON format or command.

Example

>>> output = '{"command": {"name": "search", "args": {"query": "python"}}}'
>>> name, args = transform_json_output(output)
>>> print(name, args["query"])
'search' 'python'
sherpa_ai.policies.utils.is_selection_trivial(actions)[source]#

Check if action selection requires no deliberation.

This function determines if action selection is trivial by checking if there is only one action available and it takes no arguments.

Parameters:

actions (list[BaseAction]) – List of available actions.

Returns:

True if selection is trivial (one action, no args), False otherwise.

Return type:

bool

Example

>>> actions = [SimpleAction()]  # Action with no arguments
>>> print(is_selection_trivial(actions))
True
>>> actions.append(ComplexAction())  # Multiple actions
>>> print(is_selection_trivial(actions))
False
sherpa_ai.policies.utils.construct_conversation_from_belief(belief, token_counter=None, maximum_tokens=4000)[source]#

Construct conversation history from belief state.

This function extracts action starts and results from the belief state’s internal events to construct a conversation history, optionally limiting the total token count.

Args:

belief (Belief): Current belief state containing event history. token_counter (Optional[Callable[[str], int]]): Function to count tokens. maximum_tokens (int): Maximum total tokens in conversation.

Returns:

list[tuple[str, str]]: List of (speaker, message) conversation turns.

Example:
>>> belief = Belief()
>>> belief.update_internal("action_start", "search", content="query")
>>> belief.update_internal("action_finish", "result", content="found")
>>> history = construct_conversation_from_belief(belief)
>>> print(history[0][1])  # First assistant message
'```json

query ```

>>> print(history[1][1])  # First human message
'Action output: found'
Return type:

list[tuple[str, str]]

Module contents#

Agent policy module for Sherpa AI.

This module provides policy implementations for agent decision making. It exports various policy classes that define how agents select and execute actions based on their current state and observations.

Example

>>> from sherpa_ai.policies import ReactPolicy
>>> policy = ReactPolicy()
>>> action = policy.select_action(state)
>>> result = policy.execute_action(action)
class sherpa_ai.policies.ReactPolicy(**data)[source]#

Bases: BasePolicy

Policy implementation based on the ReAct framework.

This class implements action selection based on the ReAct framework, which combines reasoning and acting. It uses a language model to analyze the current state and available actions to select the most appropriate next action.

role_description#

Description of agent role for action selection.

Type:

str

output_instruction#

Instruction for JSON output format.

Type:

str

llm#

Language model for generating text (BaseLanguageModel).

Type:

Any

prompt_template#

Template for generating prompts.

Type:

PromptTemplate

response_format#

Expected JSON format for responses.

Type:

dict

model_config#

Configuration allowing arbitrary types.

Type:

ConfigDict

Example

>>> policy = ReactPolicy(
...     role_description="Assistant that helps with coding",
...     output_instruction="Choose the best action",
...     llm=language_model
... )
>>> output = policy.select_action(belief)
>>> print(output.action.name)
'SearchCode'
async async_select_action(belief)[source]#

Asynchronously select an action based on current belief state.

This method currently just calls the synchronous version. It exists to provide an async interface for future async implementations.

Parameters:

belief (Belief) – Current belief state of the agent.

Returns:

Selected action and arguments, or None.

Return type:

Optional[PolicyOutput]

Example

>>> policy = ReactPolicy(llm=language_model)
>>> output = await policy.async_select_action(belief)
>>> if output:
...     print(output.action.name)
'SearchCode'
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

select_action(belief)[source]#

Select an action based on current belief state.

This method analyzes the current state and available actions using the ReAct framework to select the most appropriate next action. For trivial cases (single action with no args), it skips the language model reasoning.

Parameters:

belief (Belief) – Current belief state of the agent.

Returns:

Selected action and arguments, or None

if selected action not found.

Return type:

Optional[PolicyOutput]

Raises:

SherpaPolicyException – If selected action not in available actions.

Example

>>> policy = ReactPolicy(llm=language_model)
>>> belief.actions = [SearchAction(), AnalyzeAction()]
>>> output = policy.select_action(belief)
>>> print(output.action.name)  # Based on reasoning
'SearchAction'
role_description: str#
output_instruction: str#
llm: Any#
prompt_template: PromptTemplate#
response_format: dict#