Executing code generated by prompts to guarantee correct calculations

Analyse a data set - perform calculations using JavaScript generated by an LLM and then compose a natural language answer with an explanation.

Mar 12, 2024

In this post we’re going to look at how to build an endpoint that can take a natural language question along with a CSV table and give you accurate calculations along with an explanation of how they were obtained. The calculations will be run in Javascript generated by an LLM, and you’ll be able to interrogate each step of the chain as it executes.

Background

OpenAI’s code interpreter allows the model to choose to execute generated python code in a secure environment. This is an incredible feature, and makes it possible to use an LLM to give verifiably correct answers for complex data analysis tasks. However you don’t get full control over when and how this happens, and you may find instances where calculations are not performed and it’s unclear where the problem lies.

In PlayFetch things are slightly different. When you’re shipping a production system you need guarantees about what will happen and when, and when you’re developing these kinds of complex systems it helps to break down the work into smaller blocks.

Using our chains feature you can generate code using an LLM and then choose to execute it right there in a PlayFetch code block. This gives you the advantages of the correctness of code execution while allowing you to keep each prompt concise and focussed on a single task. You also have the added advantage of being able to use any model provider you like.

A simple example

For this endpoint we have two inputs:

A natural language question such as “what was the revenue increase between January and February?”
A CSV string of our data set.

Our chain will produce an output like this:

The revenue increase between January and February 2023 was $28,808.71. This value was calculated by subtracting the revenue of January ($68,727.01) from the revenue of February ($97,535.72).

To achieve this we will have three steps:

Use a prompt to take the input data and generate JavaScript that we can execute to calculate the answer.
Execute the JavaScript to calculate the value.
Pass the generated JavaScript along with the calculated answer, the question, and the original data set to a second prompt which will format our final output along with a description of how it was obtained.

The chain takes two inputs (you can see our three examples of test data on the right) and produces a plain text output.

Step 1 - generate the JavaScript

Our first prompt generates the JavaScript. We use a few prompting techniques here such as starting with a succinct description of the task up front before getting into more detail, clearly providing our desired output format, and finally giving an example of what an output might look like. Note that we declare the variables in double curly braces.

Given a data set and a question, you must generate a valid Javascript one-liner which when run will calculate the answer.

The goal is to answer the question directly, so if the question asks for a month the result of the javascript calculation should be a month, if it's for a dollar value the answer must be that value, and so on.

Only include the bits of data you need to get the correct answer.  Return only the javascript one liner with no surrounding characters, wrapped like this 

```javascript 
one-liner-here (not surrounded by quotes or backticks)
```

Do not summarise or abbreviate - you must return full valid javascript containing just the data you need.

Here's an example:
```javascript
return 88273.1282 * 1188273.1722 (note the lack of quotes or backticks surrounding this line)
```

Question: {{question}}
Data set: {{data set}}

We will get an output that looks like this (I’m skipping a step for brevity where I remove the back ticks around the line using another code block):

parseFloat("97,535.72".replace(/,/g, '')) - parseFloat("68,727.01".replace(/,/g, ''))

Step 2 - execute the JavaScript

Next we create a code block that can execute the code we’ve generated. We use a variable to take the generated code and pass it into our code block. The JavaScript code is executed in an isolated context. The output is text just like our prompt outputs.

We can also catch any errors in execution and take them into account in our next step or branch based on them to go down a different path, but I won’t cover that here.

This code block will return a string which is the calculated result.

Step 3 - compose the answer

The final step is to take all of our inputs, the generated JavaScript, and the answer and compose our natural language response. The prompt looks like this:

Given a question and a numerical answer to that question, along with the original data set from which the answer was derived and the code used to derive it, please write a short summary of the answer.

Question: {{question}}
Answer: {{answer}}

Dataset: {{data set}}
Code used: {{code}}

Once you have the chain set up it’s simple to format the final response in the way you’d like by iterating on this last prompt in isolation. This means that any team member can tweak what your users are seeing without risking breaking changes to the calculation portions of your system.

PlayFetch