API Documentation

Overview

The Text to JSON API provides a convenient interface for extracting structured data from unstructured text using large language models (LLMs). You can send text, images, or PDFs to the API and receive JSON data in the shape of your schema.

Defining a Schema

To let the llm know what data you want in your JSON structure. The schema is a JSON object of the same shape you want the output to have. The value of every property is a string in angle brackets with a type and a description. For example:

                    
{
  "customer": {
    "company_name": "<string: The name of the company>",
    "vat_id": "<string: The VAT (Value Added Tax) identification number of the company>",
    "contact": {
      "first_name": "<string: The first name of the contact person>",
      "last_name": "<string: The last name of the contact person>",
      "email": "<string: The email address of the contact person>"
    },
    "address": {
      "street": "<string: The street address of the company>",
      "town": "<string: The town or city where the company is located>",
      "zip_code": "<string: The postal code of the company's location>"
    }
  }
}
You can specify data types and text-to-json will automatically enforce those data types in the response. Supported datatypes are: Once the schema is saved, an API query link will be provided right below the save button. Test the schema. To test the schema, enter some text into the text field below the schema editor. The extracted data will be displayed below the text field. If the extracted data is not what you expected, you can modify the schema and test again. An example of a test for asking the model to fetch the name of a US-President and return their time in the office is shown below. Testing the schema It is recommended to add fallback values and to follow other best practices to increase accuracy of results. Fields not present in the text will be omitted automatically unless a fallback value is specified. An example of implementing fallback values is shown below:
                    
[
    {
        "name": "<string: Name of the ingredient>",
        "amount": "<float: Amount of the ingredient if the amount is not given use 1>",
        "unit": "<string: Unit of the ingredient if the unit is given or 'piece' if not mentioned e.g. \"g\",\"kg,\"piece\",\"lb\",\"tsp\",\"tbsp\",\"cup\",\"fl\",\"oz\",\"ml\">"
    }
]
                

Querying with the Text API

To query the model with your own input, you'll need to send a POST request to the `/api/v1/infer` endpoint along with specific query parameters.

POST /api/v1/infer

This endpoint allows you to input text to the model and receive the model's output in response.

Request URL

This is found on the model's page on the website after saving it.
https://text-to-json.com/api/v1/infer?uuid=<YOUR_MODEL_UUID>&apiToken=<YOUR_API_TOKEN>

Query Parameters

Body

The body parameters are:

Response

The response will be a JSON object structured as defined in the schema. The structure of this JSON object can deviate from your schema depending on the input and is not guaranteed to be correct.

Querying with the Document API

Text-to-JSON also supports a document API. This API allows you to just provide a document as PDF or image and get the extracted data in JSON format. The document API uses a custom OCR toolchain to extract multiple text options from the provided documents and queries the language model with the extracted text.

POST /api/v1/inferDocument

This endpoint allows you to input a document to the model and receive the model's output in response.

Query Parameters

Request Body

Returns

The response will be a JSON object of the format:

Example Response:
                        
{
    id: "b269263d-7276-46e3-bb52-85dbde6fcd35"
}
                    
The id can be used to query the status of the document extraction using the /api/v1/document endpoint.

GET /api/v1/document

This endpoint allows you to query the status of the document extraction and get the extracted data.

Query Parameters

Returns

The response will be a JSON object containing the fields:

Example Response:
                        
{
    "result": {
        "name": "John Doe",
        "date_of_birth": "1990-01-01",
        "address": "1234 Main St, Springfield, IL 62701",
        "license_number": "1234567890",
        "license_types": "A, B, C"
    },
    "id": "b269263d-7276-46e3-bb52-85dbde6fcd35",
    "user": "e2309358-bd33-44e9-a98d-b27c44b8aeaa",
    "token": "61838840-7595-40d3-bc57-d38da00a929a",
    "status": "finished",
    "retrycount": 0,
    "started": "2024-02-28T10:27:17.971Z",
    "finished": "2024-02-28T10:29:17.971Z",
    "error": null,
    "parsingsteps": [
        {
            "name": "gpt-3.5-turbo",
            "maxcount": 3,
            "type": "raw"
        },
        {
            "name": "gpt-3.5-turbo",
            "maxcount": 3,
            "type": "padded"
        }
    ],
    "schema": {
        "surname": "<string: the surname of the licensee (1)>",
        "firstname": "<string: the first name of the licensee (2)>",
        "date_of_birth": "<date: the date of birth of the licensee (3)>",
        "date_issued": "<date: the date the driver's license was issued (4a)>",
        "date_expiry": "<date: the date the driver's license expires (4b)>",
        "issuer": "<string: the issuing authority (4c)>",
        "license_number": "<string: the ID of the license (5)>",
        "license_types": "<string: the types of licenses (9)>"
    },
    "timezone": null,
    "returnprobabilities": false
}
                    

GET /api/v1/documentFile

This endpoint allows you to download the original document uploaded as well as a cropped version if the document was an image. The cropped image will contain the area of the image containing the document.

Query Parameters

Returns

The response will be a file of the specified type.

Best Practices for Inferencing

The descriptions in the schema are important. Here are some best practices to keep in mind when using the Text to JSON API.

Client Libraries

We provide client libraries to make it easier to interact with the Text to JSON API.

If you have any questions or need help with the API, feel free to email!