1. Create a Blog Writer with Sherpa#

In this tutorial we will create a simple blog writer using Sherpa. The blog writer will be able to read the transcript of a presentation, create an outline for a blog post, and then write the blog post section by section using a “Writer” agent that can gather information about the topic of the section from the web by performing a Google Search and/or from the transcript itself by performing a Document Search.

1.1. Overview#

The two main Python files are:

1. main.py: This file instantiates and configures a Sherpa agent with access to various “actions” (wrappers for “tools”) such as Google search (with citation validation) and document search (for retrieving relevant context from the transcript). The agent takes statements that comprise “evidence” for claims to be made in the blog post and extends and expands them into paragraphs. These pieces of “evidence” are part of the blog outline generated by the second component.

2. ouliner.py: This file houses the Outliner component which performs the following:

  • Preprocessing (chunking) the document. This is necessary due to GPT-3.5’s context window limit of 4096 tokens.

  • Analyzing the transcript. This consists of two steps: First, a short list of “key insights” is extracted from each chunk. In the next step, these lists are concatenated together and a blog post “blueprint” (analogous to an “essay outline”) is synthesized from the list of all key insights. This blueprint can be thought of as a tree with three levels of depth:

    1. Thesis Statement: A single statement that forms the core message of the blog post (essentially a topic and a claim made about the topic).

    2. Supporting Arguments: A list of statements that support the Thesis Statement at a high level.

    3. Evidence: Lists of statements that provide factual evidence for the Supporting Argument under which they appear. The blueprint is output as a JSON string.

1.2. How to Install#

Step 1. Install Python 3.9 using your preferred installation method.

Step 2. Create a folder for storing the blog writer code and input/output files:

cd <your development directory>
mkdir sherpa_blog_writer
cd sherpa_blog_writer

Step 3. You may wish to create a virtual environment to isolate the Python libraries used for this tutorial from your other Python code. This step is optional but highly recommended. An example of this (using venv) would be:

python -m venv bwvenv
source bwvenv/bin/activate

Step 4. Install the Sherpa library using pip.

pip install sherpa_ai

Step 5. Download all files from (Aggregate-Intellect/sherpa) into this directory.

Step 6. Install additional requiremetns with pip.

pip install -r requirements.txt

Step 7. Rename the file .env.sample to .env. Then open it in your favourite text editor and add your OpenAI and Serper API keys.

Step 8. Source the environment variables from the .env file with direnv if you use it, or alternatively using:

export $(grep -v '^#' .env | xargs)

1.3. How to Use#

Step 1. Currently the blog writer needs the transcript in both .txt and in .pdf formats. So the first step is to ensure you have both files and copy them under the Transcripts subdirectory. Most text editors will have an “Export to PDF” feature. Alternatively, you can “print” the file as a PDF. Name the files projectname.txt and transcript.pdf.

Step 2. To create a custom blueprint for your blog writing strategy, you can either manually populate a provided JSON template by defining a thesis statement, articulating key arguments, and listing supportive evidence, or opt for automation using a blueprint generator tool that crafts the blueprint based on input parameters such as topic and target audience. This should be put in the /Output folder. Once created, you can review and adjust the content to align with your specific needs. For those preferring flexibility, existing blueprints can be modified by editing the thesis, arguments, or evidence, and changes are saved to either local storage or cloud services. After you’ve created a blueprin, specify your blueprint using the –blueprint argument followed by the path to your JSON file or the identifier of the automated blueprint, ensuring your blog strategy is both structured and adaptable to future changes.

An example of the blueprint is available in the output folder as a placeholder. Please edit or delete this if you prefer the blueprint to be auto-generated before step 3

Step 3. Run:

python main.py --config agent_config.yml  --transcript projectname.txt --blueprint blueprint_projectname.json

The blog writer will output verbose feedback to the console as it works through the transcript files. The blueprint will be saved as blueprint_transcript.json if there isn’t one that exists, and you don’t specify a file and the final output (blog post) as blog_transcript.md in the Output folder.

  • In the first step, key insights are extracted from each chunk and output to the console.

  • Next, the blueprint for the post is generated and the resulting JSON is output to the console and simultaneously saved as a file named blueprint.json in the current directory.

  • Finally, the blog post is generated from the outline. This step is interactive. For each “evidence” encountered, the Writer agent generates a paragraph and asks the user for feedback. The user can accept the paragraph as is (by typing “yes”, “y” or pressing Enter) or provide feedback to the Writer to modify or rewrite the paragraph. The final blog post is saved as the file blog.md in the current directory.

1.4. Revisions and Added Features#

Date

Description

22-May-2024

Added Human in the Loop (User Agent)