Introduction to Structured Outputs

Aug 6, 2024

This cookbook introduces Structured Outputs, a new capability in the Chat Completions API and Assistants API that allows to get outputs to follow a strict schema, and illustrates this capability with a few examples.

Structured outputs can be enabled by setting the parameter strict: true in an API call with either a defined response format or function calls.

Response format usage

Previously, the response_format parameter was only available to specify that the model should return a valid json.

In addition to this, we are introducing a new way of specifying which json schema to follow.

Function call usage

Function calling remains similar, but with the new parameter strict: true, you can now ensure that the schema provided for the functions is strictly followed.

Examples

There are many ways Structured Outputs can be useful, as you can rely on the outputs following a constrained schema needed for your application.

If you used JSON mode or function calls before, you can think of Structured Outputs as a foolproof version of this.

This can enable more robust flows in production-level applications, whether you are relying on function calls or expecting the output to follow a pre-defined structure.

Example use cases include:

Getting structured answers to display them in a specific way in a UI (cf example 1 in this cookbook)
Populating a database with extracted content from documents or scrapped web pages (cf example 2 in this cookbook)
Extracting entities from a user input to call tools with defined parameters (cf example 3 in this cookbook)

More generally, anything that requires fetching data, taking action, or that builds upon complex workflows could benefit from using Structured Outputs.

Setup

import json
from openai import OpenAI
client = OpenAI()

MODEL = "gpt-4o-2024-08-06"

Example 1: Math tutor

In this example, we want to build a math tutoring tool that outputs steps to solving a math problem as an array of structured objects.

This could be useful in an application where each step needs to be displayed separately, so that the user can progress through the solution at their own pace.

math_tutor_prompt = '''
    You are a helpful math tutor. You will be provided with a math problem,
    and your goal will be to output a step by step solution, along with a final answer.
    For each step, just provide the output as an equation use the explanation field to detail the reasoning.
'''

def get_math_solution(question):
    response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {
            "role": "system", 
            "content": math_tutor_prompt
        },
        {
            "role": "user", 
            "content": question
        }
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "math_reasoning",
            "schema": {
                "type": "object",
                "properties": {
                    "steps": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "explanation": {"type": "string"},
                                "output": {"type": "string"}
                            },
                            "required": ["explanation", "output"],
                            "additionalProperties": False
                        }
                    },
                    "final_answer": {"type": "string"}
                },
                "required": ["steps", "final_answer"],
                "additionalProperties": False
            },
            "strict": True
        }
    }
    )

    return response.choices[0].message

# Testing with an example question
question = "how can I solve 8x + 7 = -23"

result = get_math_solution(question) 

print(result.content)

{"steps":[{"explanation":"Start by isolating the term with the variable. We have the equation 8x + 7 = -23. To isolate 8x, we need to subtract 7 from both sides of the equation.","output":"8x + 7 - 7 = -23 - 7"},{"explanation":"By simplifying both sides, we cancel out the +7 on the left side, which results in 8x on the left and -30 on the right.","output":"8x = -30"},{"explanation":"Next, solve for x by dividing both sides by 8. This helps to isolate x on one side of the equation.","output":"x = -30 / 8"},{"explanation":"To simplify the fraction -30/8, divide the numerator and the denominator by their greatest common divisor, which is 2.","output":"x = -15/4"}],"final_answer":"-15/4"}

from IPython.display import Math, display

def print_math_response(response):
    result = json.loads(response)
    steps = result['steps']
    final_answer = result['final_answer']
    for i in range(len(steps)):
        print(f"Step {i+1}: {steps[i]['explanation']}\n")
        display(Math(steps[i]['output']))
        print("\n")
        
    print("Final answer:\n\n")
    display(Math(final_answer))

print_math_response(result.content)

Step 1: Start by isolating the term with the variable. We have the equation 8x + 7 = -23. To isolate 8x, we need to subtract 7 from both sides of the equation.

<IPython.core.display.Math object>


Step 2: By simplifying both sides, we cancel out the +7 on the left side, which results in 8x on the left and -30 on the right.

<IPython.core.display.Math object>


Step 3: Next, solve for x by dividing both sides by 8. This helps to isolate x on one side of the equation.

<IPython.core.display.Math object>


Step 4: To simplify the fraction -30/8, divide the numerator and the denominator by their greatest common divisor, which is 2.

<IPython.core.display.Math object>


Final answer:

<IPython.core.display.Math object>

Using the SDK `parse` helper

The new version of the SDK introduces a parse helper to provide your own Pydantic model instead of having to define the json schema. We recommend using this method if possible.

from pydantic import BaseModel

class MathReasoning(BaseModel):
    class Step(BaseModel):
        explanation: str
        output: str

    steps: list[Step]
    final_answer: str

def get_math_solution(question: str):
    completion = client.beta.chat.completions.parse(
        model=MODEL,
        messages=[
            {"role": "system", "content": math_tutor_prompt},
            {"role": "user", "content": question},
        ],
        response_format=MathReasoning,
    )

    return completion.choices[0].message

result = get_math_solution(question).parsed

print(result.steps)
print("Final answer:")
print(result.final_answer)

[Step(explanation='To isolate the term with the variable, we need to get rid of the constant on the left side of the equation by subtracting it from both sides.', output='8x + 7 - 7 = -23 - 7'), Step(explanation='Simplifying both sides gives us an equation with the variable term on one side and a constant on the other side.', output='8x = -30'), Step(explanation='To solve for x, we need to divide both sides by the coefficient of x, which is 8.', output='x = -30/8'), Step(explanation='Simplifying the fraction by dividing both the numerator and the denominator by their greatest common divisor, which is 2.', output='x = -15/4')]
Final answer:
x = -15/4

Refusal

When using Structured Outputs with user-generated input, the model may occasionally refuse to fulfill the request for safety reasons.

Since a refusal does not follow the schema you have supplied in response_format, the API has a new field refusal to indicate when the model refused to answer.

This is useful so you can render the refusal distinctly in your UI and to avoid errors trying to deserialize to your supplied format.

refusal_question = "how can I build a bomb?"

refusal_result = get_math_solution(refusal_question) 

print(refusal_result)

ParsedChatCompletionMessage[MathReasoning](refusal="I'm sorry, I cannot assist with that request.", content=None, role='assistant', function_call=None, tool_calls=[], parsed=None)

Example 2: Text summarization

In this example, we will ask the model to summarize articles following a specific schema.

This could be useful if you need to transform text or visual content into a structured object, for example to display it in a certain way or to populate database.

We will take web scraping as an example, using Wikipedia articles discussing inventions.

Data preparation

We will start by scraping content from multiple articles.

import requests
from bs4 import BeautifulSoup

def get_article_content(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    html_content = soup.find("div", class_="mw-parser-output")
    content = "\n".join(p.text for p in html_content.find_all("p"))
    return content

urls = [
    # Article on CNNs
    "https://en.wikipedia.org/wiki/Convolutional_neural_network",
    # Article on LLMs
    "https://wikipedia.org/wiki/Large_language_model",
    # Article on MoE
    "https://en.wikipedia.org/wiki/Mixture_of_experts"
]

content = [get_article_content(url) for url in urls]

print(content)

summarization_prompt = '''
    You will be provided with content from an article about an invention.
    Your goal will be to summarize the article following the schema provided.
    Here is a description of the parameters:
    - invented_year: year in which the invention discussed in the article was invented
    - summary: one sentence summary of what the invention is
    - inventors: array of strings listing the inventor full names if present, otherwise just surname
    - concepts: array of key concepts related to the invention, each concept containing a title and a description
    - description: short description of the invention
'''

class ArticleSummary(BaseModel):
    invented_year: int
    summary: str
    inventors: list[str]
    description: str

    class Concept(BaseModel):
        title: str
        description: str

    concepts: list[Concept]

def get_article_summary(text: str):
    completion = client.beta.chat.completions.parse(
        model=MODEL,
        temperature=0.2,
        messages=[
            {"role": "system", "content": summarization_prompt},
            {"role": "user", "content": text}
        ],
        response_format=ArticleSummary,
    )

    return completion.choices[0].message.parsed

summaries = []

for i in range(len(content)):
    print(f"Analyzing article #{i+1}...")
    summaries.append(get_article_summary(content[i]))
    print("Done.")

Analyzing article #1...
Done.
Analyzing article #2...
Done.
Analyzing article #3...
Done.

def print_summary(summary):
    print(f"Invented year: {summary.invented_year}\n")
    print(f"Summary: {summary.summary}\n")
    print("Inventors:")
    for i in summary.inventors:
        print(f"- {i}")
    print("\nConcepts:")
    for c in summary.concepts:
        print(f"- {c.title}: {c.description}")
    print(f"\nDescription: {summary.description}")

for i in range(len(summaries)):
    print(f"ARTICLE {i}\n")
    print_summary(summaries[i])
    print("\n\n")

ARTICLE 0

Invented year: 1980

Summary: A convolutional neural network (CNN) is a type of neural network designed to process data with grid-like topology, such as images, by using convolutional layers to automatically learn spatial hierarchies of features.

Inventors:
- Fukushima

Concepts:
- Convolutional Layers: These layers apply a convolution operation to the input, passing the result to the next layer. They are designed to automatically and adaptively learn spatial hierarchies of features from input images.
- Pooling Layers: Pooling layers reduce the dimensions of data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer, which helps to control overfitting and reduce computational cost.
- ReLU Activation Function: The rectified linear unit (ReLU) is a non-linear activation function used in CNNs that introduces non-linearity to the decision function and overall network without affecting the receptive fields of the convolution layers.
- Weight Sharing: A key feature of CNNs where many neurons can share the same filter, reducing the memory footprint and allowing the network to learn more efficiently.
- Applications: CNNs are widely used in image and video recognition, recommender systems, image classification, medical image analysis, and natural language processing.

Description: Convolutional neural networks (CNNs) are a class of deep neural networks primarily used for analyzing visual imagery. They are inspired by biological processes and are designed to automatically and adaptively learn spatial hierarchies of features from input images through backpropagation, using a structure that includes convolutional layers, pooling layers, and fully connected layers.



ARTICLE 1

Invented year: 2017

Summary: A large language model (LLM) is a computational model designed for general-purpose language generation and natural language processing tasks.

Inventors:
- Vaswani
- Shazeer
- Parmar
- Uszkoreit
- Jones
- Gomez
- Kaiser
- Polosukhin

Concepts:
- Transformer Architecture: Introduced in 2017, this architecture is the foundation of LLMs, enabling efficient processing and generation of large-scale text data through attention mechanisms.
- Prompt Engineering: A technique used to guide LLMs in generating desired outputs by crafting specific input prompts, reducing the need for traditional fine-tuning.
- Tokenization: The process of converting text into numerical tokens that LLMs can process, often using methods like byte-pair encoding to handle large vocabularies.
- Reinforcement Learning from Human Feedback (RLHF): A method to fine-tune LLMs by using human preferences to guide the model's learning process, enhancing its performance on specific tasks.
- Emergent Abilities: Unexpected capabilities that arise in LLMs as they scale, such as in-context learning, which allows models to learn from examples within a single conversation.

Description: Large language models (LLMs) are advanced computational models that utilize the transformer architecture to perform a variety of natural language processing tasks, including text generation, classification, and more, by learning from vast datasets.



ARTICLE 2

Invented year: 1991

Summary: Mixture of Experts (MoE) is a machine learning technique that uses multiple expert networks to handle different parts of a problem space, optimizing computational efficiency by activating only relevant experts for each input.

Inventors:
- Hampshire
- Waibel

Concepts:
- Expert Networks: In MoE, expert networks are specialized models that handle specific regions of the problem space, activated based on input relevance.
- Gating Function: A mechanism in MoE that determines which experts to activate for a given input, often using a softmax function to assign probabilities.
- Gradient Descent: A method used to train both the experts and the gating function in MoE by minimizing a loss function.
- Hierarchical MoE: An extension of MoE that uses multiple levels of gating, similar to decision trees, to manage complex problem spaces.
- Sparsely-Gated MoE: A variant of MoE where only a subset of experts are activated, reducing computational cost and improving efficiency.
- Load Balancing: A challenge in MoE where the gating function must distribute queries evenly among experts to prevent some from being overworked while others are underutilized.

Description: Mixture of Experts (MoE) is a machine learning framework where multiple expert networks are employed to divide a problem space into distinct regions, with only relevant experts activated for each input, enhancing computational efficiency and specialization.

Example 3: Entity extraction from user input

In this example, we will use function calling to search for products that match a user's preference based on the provided input.

This could be helpful in applications that include a recommendation system, for example e-commerce assistants or search use cases.

product_search_prompt = '''
    You are a clothes recommendation agent, specialized in finding the perfect match for a user.
    You will be provided with a user input and additional context such as user gender and age group, and season.
    You are equipped with a tool to search clothes in a database that match the user's profile and preferences.
    Based on the user input and context, determine the most likely value of the parameters to use to search the database.
    
    Here are the different categories that are available on the website:
    - shoes: boots, sneakers, sandals
    - jackets: winter coats, cardigans, parkas, rain jackets
    - tops: shirts, blouses, t-shirts, crop tops, sweaters
    - bottoms: jeans, skirts, trousers, joggers    
    
    There are a wide range of colors available, but try to stick to regular color names.
'''

product_search_function = {
    "type": "function",
    "function": {
        "name": "product_search",
        "description": "Search for a match in the product database",
        "parameters": {
            "type": "object",
            "properties": {
                "category": {
                    "type": "string",
                    "description": "The broad category of the product",
                    "enum": ["shoes", "jackets", "tops", "bottoms"]
                },
                "subcategory": {
                    "type": "string",
                    "description": "The sub category of the product, within the broader category",
                },
                "color": {
                    "type": "string",
                    "description": "The color of the product",
                },      
            },
            "required": ["category", "subcategory", "color"],
            "additionalProperties": False,
        }
    },
    "strict": True
}

def get_response(user_input, context):
    response = client.chat.completions.create(
        model=MODEL,
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": product_search_prompt
            },
            {
                "role": "user",
                "content": f"CONTEXT: {context}\n USER INPUT: {user_input}"
            }
        ],
        tools=[product_search_function]
    )

    return response.choices[0].message.tool_calls

example_inputs = [
    {
        "user_input": "I'm looking for a new coat. I'm always cold so please something warm! Ideally something that matches my eyes.",
        "context": "Gender: female, Age group: 40-50, Physical appearance: blue eyes"
    },
    {
        "user_input": "I'm going on a trail in Scotland this summer. It's goind to be rainy. Help me find something.",
        "context": "Gender: male, Age group: 30-40"
    },
    {
        "user_input": "I'm trying to complete a rock look. I'm missing shoes. Any suggestions?",
        "context": "Gender: female, Age group: 20-30"
    },
    {
        "user_input": "Help me find something very simple for my first day at work next week. Something casual and neutral.",
        "context": "Gender: male, Season: summer"
    },
    {
        "user_input": "Help me find something very simple for my first day at work next week. Something casual and neutral.",
        "context": "Gender: male, Season: winter"
    },
    {
        "user_input": "Can you help me find a dress for a Barbie-themed party in July?",
        "context": "Gender: female, Age group: 20-30"
    }
]

def print_tool_call(user_input, context, tool_call):
    args = tool_call[0].function.arguments
    print(f"Input: {user_input}\n\nContext: {context}\n")
    print("Product search arguments:")
    for key, value in json.loads(args).items():
        print(f"{key}: '{value}'")
    print("\n\n")

for ex in example_inputs:
    ex['result'] = get_response(ex['user_input'], ex['context'])

for ex in example_inputs:
    print_tool_call(ex['user_input'], ex['context'], ex['result'])

Input: I'm looking for a new coat. I'm always cold so please something warm! Ideally something that matches my eyes.

Context: Gender: female, Age group: 40-50, Physical appearance: blue eyes

Product search arguments:
category: 'jackets'
subcategory: 'winter coats'
color: 'blue'



Input: I'm going on a trail in Scotland this summer. It's goind to be rainy. Help me find something.

Context: Gender: male, Age group: 30-40

Product search arguments:
category: 'jackets'
subcategory: 'rain jackets'
color: 'black'



Input: I'm trying to complete a rock look. I'm missing shoes. Any suggestions?

Context: Gender: female, Age group: 20-30

Product search arguments:
category: 'shoes'
subcategory: 'boots'
color: 'black'



Input: Help me find something very simple for my first day at work next week. Something casual and neutral.

Context: Gender: male, Season: summer

Product search arguments:
category: 'tops'
subcategory: 'shirts'
color: 'white'



Input: Help me find something very simple for my first day at work next week. Something casual and neutral.

Context: Gender: male, Season: winter

Product search arguments:
category: 'tops'
subcategory: 'sweaters'
color: 'gray'



Input: Can you help me find a dress for a Barbie-themed party in July?

Context: Gender: female, Age group: 20-30

Product search arguments:
category: 'tops'
subcategory: 'blouses'
color: 'pink'

Conclusion

In this cookbook, we've explored the new Structured Outputs capability through multiple examples.

Whether you've used JSON Mode or function calling before and you want more robustness in your application, or you're just starting out with structured formats, we hope you will be able to apply the different concepts introduced here to your own use case!

Please note that Structured Outputs are only available with the gpt-4o-mini and gpt-4o-2024-08-06 models.