John Tan Chong
Using Structured Outputs to Chain LLM Pipelines
Why Use Structured Output?
If you ever want to use Large Language Models (LLMs) for production, it can be quite tricky to extract what you want, especially when the LLM output can be very verbose. One example is shown below:
Figure 1 Comparison between JSON and free text output
- Here, we simply want the meaning of life in a short answer. Asking the LLM to output in JSON gets us precisely that (left) – a number 42, which pays homage to the Hitchhiker’s Guide to the Galaxy.
- If we had not asked the LLM to output in JSON format (right), we may not get a concise answer, or even worse, may not be able to extract the answer properly at all in the right format.
Getting a concise JSON output
To get a JSON output, one may use the latest Structured Outputs API by OpenAI(https://openai.com/index/introducing-structured-outputs-in-the-api/).However, it uses a verbose JSON schema and only works for OpenAI models.
At Simbian AI, we use StrictJSON which uses a more concise StrictJSON schema, where we have the ability to generate JSON for most major LLM providers such as Meta, OpenAI, Google, Claude using an iterative prompting approach that guides the LLM to generate JSON of the right structure and type.
Compared to JSON schema used by OpenAI and Pydantic, the StrictJSON schema is much more concise, at around 50% of the tokens used by JSON schema. The token savings are greater the more fields the JSON contains.
Figure 2 StrictJSON Schema is more concise than JSON Schema
Benefits of fewer tokens in StrictJSON schema
Figure 3 Longer context leads to poorer performance
Image from Fig. 5b of Effective Long-Context Scaling of Foundation Models. 2023. Xiong et. al. https://arxiv.org/abs/2309.16039
Having fewer tokens in the StrictJSON schema not only reduces cost for your LLM processing, it also
improves reliability.
In a paper by Meta on long-context scaling, it is shown that the longer the
context length (Task length), the poorer the performance (here measured by the ROUGE-L metric to
determine how similar the answer is to the ground truth).
As such, we want to strive towards having
fewer tokens in the context, in order to have better performance and lower cost.
StrictJSON Usage
Figure 4 Installing TaskGen
Then, define your own LLM.
Figure 5 Defining your own LLM
You are now ready to use StrictJSON:
Using StrictJSON to extract entities
Figure 7 Using StrictJSON as a classifier
Simply have a system prompt which contains your instruction to the LLM, a user prompt which contains what you give as input to your LLM, and the output format which specifies the key names and the description and types of what you expect for your output. StrictJSON will prompt the LLM to generate the JSON in the exact format and type you specified in the output format.
StrictJSON currently supports the following data types: `int`, `float`, `str`, `dict`, `list`, `array`, `Dict[]`, `List[]`, `Array[]`, `Enum[]`, `bool`, `code`
Under the Hood
Figure 8 StrictJSON under the hood
- How StrictJSON works under the hood is that we generate a system prompt that encloses each field key of the JSON with ###, and enclose the field value with <>. Then, we get the LLM to output the JSON with the keys enclosed in ###, and update the value in <>.
- The response by the LLM will then be parsed by regex, that splits on the ###`key`###, which is robust enough that incomplete quotation marks or brackets will not affect the parsing of the JSON.
- If there is any error in the JSON generated, be it missing field keys/values, or incorrect data format, StrictJSON will generate an error message and pass it to the LLM to re-generate the JSON. This increases robustness of the JSON generation and enables the LLM to self-correct.
How can I use it?
SrictJSON is used extensively in TaskGen (https://github.com/simbianai/taskgen) for Agent outputs, and the original StrictJSON repo is at https://github.com/tanchongmin/strictjson
Do check it out and utilize it for your pipelines today!
Also, stay tuned for our next article, where we will share more about how to make LLM pipelines more robust using verifiers and ground truth checking.