LLM evaluative routing

letting the LLM decide what action to take
2026-01-12 17:30
// updated 2026-01-12 16:19

Evaluative routing, when dealing with the LLM prompt-response chat completion, enables something similar to an if-else branching structure:

  • if the chat completion results in one decision, take action 1
  • else if it results in another decision, take action 2

Of course, more than two decisions can result in more than two actions; for the sake of simplicity, we will just look at a two-prong decision structure!

(All of the code below can combine into one long Python file - enjoy...)

Setup

For more information about the following, please refer to this post about connecting to an LLM:

import os
import json
from openai import OpenAI

# connect to the client
client = OpenAI(
  api_key=os.environ.get("GROQ_API_KEY"),
  base_url="https://api.groq.com/openai/v1"
)

system_role = "you are a helpful coding assistant who can introduce short snippets to students of web development"

# chat completion function
def get_completion(prompt, system_prompt=system_role, json_mode=False):
  response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
      {"role": "system", "content": system_prompt}, 
      {"role": "user", "content": prompt}],
      response_format={"type": "json_object"} if json_mode else None
  )
  return response.choices[0].message.content

Reusable criteria

We will create a set of criteria on which to base each set of snippets (and, if necessary, revision of those snippets):

# reusable criteria for snippet evaluation
snippets_criteria = '''
  <criteria>
  please evaluate the following code snippet task: {task} ... and evaluate based on these criteria:
  1. Completeness: is the task vague or well-defined? 
  2. Simplicity: does the task involve a single feature? 
  3. Clarity: will the instructions be clear and easy to follow?
  4. Environment accuracy: are the techniques appropriate for the framework or library or programming language?
  5. Feasibility: can a student of web development follow the snippet easily?
  6. Reusability: can a student use this snippet to build a wide variety of apps with it
  7. Conciseness: does the snippet focus on the core task and is free of unnecessary code? 
  </criteria>
'''

Initial prompt

Start with an initial prompt that determines whether or not a snippet is "good" or needs work:

  • provide a role for the LLM
  • reference the reusable criteria
  • reference the code snippets
  • provide instructions for the response
  • provide formatting rules for the response
# initial prompt
def get_snippets_evaluation_prompt(snippets, task):
  return f"""You are a professional web developer evaluating code snippet quality for {task}.

  <criteria>
  {snippets_criteria}
  </criteria>
  
  <snippets>
  {snippets}
  </snippets>
  
  <instructions>
  Respond with a JSON object containing:
  1. "decision": Either "good" (meets standards) or "needs_work" (requires revision)
  2. "reason": Brief explanation of your decision
  3. "feedback": Specific suggestions for improvement if needed. 
  (If the question is good, the "feedback" property should be an empty string.)
  </instructions>
  
  <format>
  {{
    "decision": "good or needs_work",
    "reason": "your reasoning here",
    "feedback": "your specific feedback here"
  }}
  </format>

  <output_rules>
  Output ONLY the JSON, nothing else.
  Do not include any markdown backticks in your response.
  </output_rules>
  """

Revision prompt

If a snippet "needs work", then we will create a revision prompt for the LLM that contains:

  • provide instructions for the response
  • reference the code snippets
  • reference the reusable criteria
  • provide feedback from the last round of evaluation
# revision prompt
def get_snippets_revision_prompt(snippets, task, feedback):

  # simply return the improved snippet
  return f"""You are a professional web developer. Use the following feedback to improve the snippets for {task}.
  Return the revised snippets in the same format as the original snippets. Output ONLY the snippets with no other commentary. Ensure that any markdown backticks are closed properly. 
  
  <snippets>
  {snippets}
  </snippets>

  <criteria>
  {snippets_criteria}
  </criteria>

  <feedback>
  {feedback}
  </feedback>
  
  Your improved snippets:"""

Routing!

Now we have arrived at the part where the evaluative routing can finally take place:

  • it will evaluate the snippet
  • if the decision is good then the chat has completed
  • otherwise, keep on revising the snippet recursively using the revision prompt from the last section
    • unless, of course, we have reached the maximum # of times
# evaluative routing 
def route_snippets_request(test_snippet, task, max_tries=3, current_try=1): 
  
  # decision on the snippet
  evaluation_prompt = get_snippets_evaluation_prompt(test_snippet, task)
  evaluation_response = get_completion(evaluation_prompt, json_mode=True)    
  evaluation_data = json.loads(evaluation_response)
  decision = evaluation_data["decision"]  
  print(f"====================== 📋 evaluation decision: {decision} \n")
  
  # routing by evaluation 
  if decision == "good":    
    
    # if it meets standards then stop
    print("======== ✅ snippet meets quality standards! returning to user...\n")
    print(f"* reason: {evaluation_data['reason']}")
    print(f"\n\n enjoy coding! 🎉🎉🎉\n")
    return test_snippet
  
  else:
  
    # if it needs work
    print(f"======== ⚠️ snippet needs work! sending to revision prompt...\n")
    print(f"* reason: {evaluation_data['reason']}\n")
    print(f"* feedback: {evaluation_data['feedback']}\n")
        
    if current_try >= max_tries:
    
      # rate limit to prevent API overuse
      print("\n=== 🛑 error: maximum number of tries reached!\n")
      return test_snippet
    
    else:            
    
      # revise snippet
      revision_prompt = get_snippets_revision_prompt(test_snippet, task, evaluation_data['feedback'])
      revised_snippet = get_completion(revision_prompt)
      print(f"\n=== ❇️ revised snippet: {revised_snippet}\n")

      # recursively evaluate revised snippet
      return route_snippets_request(revised_snippet, task, max_tries, current_try+1)

Runtime

Lastly, we provide a sample "training" snippet, testsnippet, from which the LLM will learn its format and derive the correct snippet for the testtask!

Of course, the testsnippet does not need to have anything to do with the testtask, in order to demonstrate the LLM's ability to generate a snippet closer to testtask than testsnippet:

# sample training snippet
testsnippet = """
HTML table
Requirements:
- browser
- text editor
- basic HTML knowledge
Instructions:
1. open your text editor and create a new file named "index.html"
2. add the basic structure at the top of the file:
  ```html
    <!DOCTYPE html>
    <html>
      <head>
        <title>My First Table</title>
      </head>
      <body>    
  ```
3. inside the <body> tags, add the <table> element:
  ```html
      <table border="1">
        <tr>
          <th>Header 1</th>
          <th>Header 2</th>
        </tr>
        <tr>
          <td>Row 1, Cell 1</td>
          <td>Row 1, Cell 2</td>
        </tr>
        <tr>
          <td>Row 2, Cell 1</td>
          <td>Row 2, Cell 2</td>
        </tr>
      </table>
  ```
4. close the <body> and <html> tags:
  ```html
      </body> 
    </html>
  ```
5. save the file and open it in your web browser to see your table
"""

# the request
testtask = "create an HTML carousel with vanilla JavaScript"
result = route_snippets_request(testsnippet, testtask)

So there we have it: an autonomous code agent that can either revise code and/or provide new web development code snippets! We did this by leveraging the power of LLM in its ability to make its own judgments regarding an input, then use those judgments as further input for an eventual correct output!

⬅️ older (in snippets)
🐍🧮 Environment variables with Python
⬅️ older (posts)
🐍🧮 Environment variables with Python