Advanced Type Annotations

Table of Content

When writing JavaScript, it can be hard to keep track of the signatures of functions. TypeScript is supposed to help with this problem, but sometimes TypeScript can get in the way of what you’re trying to do.

It is often convenient not to have to go through a compile step to run your code. Especially in Node.js, avoiding code compilation makes troubleshooting easier because your stack traces match your source code without relying on source maps.

When we wrote the ttj-client library, we knew we wanted to have the best type annotations possible, but we did not want to have to rely on TypeScript.

Enter JSDoc Type Annotations

Luckily, TypeScript supports JSDoc type annotations. We can write regular JavaScript and just add the type definitions in JSDoc comments. Here’s a simple example:

/**
 * @param {number} a
 * @param {number} b
 * @returns {number}
 */
function add(a, b) {
  return a + b;
}

That’s simple enough. However, things get tricky if you want to use more advanced types like generics, and how does exporting and importing types from other files work?

Importing Types from other Files

To import types from other files, you can use the import keyword in JSDoc comments. Here’s an example:

/**
 * @param {import('../db/models/ocrjobs').OcrJob} job 
 */
function processJob(job) {
  // do something with the job
}

Exporting Types

You don’t need to do anything special to make your types available to other files. Just define them in a JSDoc comment, and they will be available to other files. Here’s an example:

/**
 * @typedef {{
 *   id: string,
 *   status: 'pending' | 'processing' | 'finished' | 'failed',
 *   retrycount: number,
 *   started?: Date,
 *   finished?: Date,
 *   error?: string,
 *   parsingsteps?: { name: string, maxcount?: number, type: string, skip?: number }[],
 *   schema: any,
 *   result?: any,
 *   timezone?: string,
 *   returnprobabilities?: boolean,
 *   codeData?: { type: string, decoded: string, raw: string, position: number[][] }[]
 * }} OcrJob
 */

Advanced Generics

Sometimes, you want to define dynamic data types. For example, in the ttj-client library, in the TTJClient#inferDocumentBySchema function, the return value depends on the returnprobabilities parameter. If the returnprobabilities parameter is set to true, we return an array of the form { value: any, probability: number }[] for every property of the schema. If it is set to false, we return the value directly (which has type ‘any’).

Because we support nested objects and arrays, we first need to define a recursive @callback function type for both cases.

First, we deal with the case where returnprobabilities is set to false:

/**
 * @template T
 * @callback TTJParsingFunction
 * @param {T} schema
 * @returns {{ [K in keyof T]?: T[K] extends {} ? ReturnType<TTJParsingFunction<T[K]>> : T[K] }}
 * */

This is essentially a recursive Partial<T> type. We first need to define a generic type T using the @template annotation. Then we define a @callback function type TTJParsingFunction that takes a T. The return type is an object with the same keys as T, but every key is optional. If the value of the key is an object, we recursively call the TTJParsingFunction type on it.

Next up, we deal with the case where returnprobabilities is set to true:

/**
 * @template T
 * @callback ReturnProbabilitiesFunction
 * @param {T} schema
 * @returns {T extends string ? { value: any, probability: number }[] : T extends infer U ? ReturnType<ReturnProbabilitiesFunction<U>>[] : T extends {} ? { [K in keyof T]?: ReturnType<ReturnProbabilitiesFunction<T[K]>> } : { value: any, probability: number }[]}
 * */

Here, we also define a generic type T using the @template annotation. We define a @callback function type ReturnProbabilitiesFunction that takes a T. As we know that every property of our schema is defined as a string, we can check if T is a string by using a ternary expression and the extends keyword and return the { value: any, probability: number }[] type. Otherwise, we check if T is an array. We can declare a new generic variable with the type of the array elements using the infer keyword and use it to call the ReturnProbabilitiesFunction type recursively. If T is an object, similar to the TTJParsingFunction type, we return an object with the same keys as T, but every key is optional.

Finally, we define the inferDocumentBySchema function:

/**
 * 
 * @template S
 * @template {boolean} R
 * @param {Buffer|Uint8Array|string} data A PDF, PNG, or JPEG file as a buffer, Uint8Array, or data URL
 * @param {'application/pdf'|'image/png'|'image/jpeg'} mimetype The mimetype of the data
 * @param {S} schema
 * @param {(TextParsingStep|ImageParsingStep)[]=} parsingsteps
 * @param {R=} returnprobabilities
 * @returns {Promise<{
 *      results: R extends true ? ReturnType<ReturnProbabilitiesFunction<S>> : ReturnType<TTJParsingFunction<S>>,
 *      ... 
 * }>}
 * }
 * */
async inferDocumentBySchema(data, mimetype, schema, parsingsteps, returnprobabilities) {
    //...
}

Here, we define two generic types S and R. S is the schema type, and R is a boolean that determines if we return probabilities or not. To check if returnprobabilities is set to true, we use the R extends true expression and return the ReturnType<ReturnProbabilitiesFunction<S>> type. Otherwise, we return the ReturnType<TTJParsingFunction<S>> type.

Complex object types

In cases like the parsingsteps parameter, we need to differentiate between two lists of large language models that can handle different types of data. For example, gemini-pro-vision can handle image data while gpt-3.5-turbo cannot. Therefore, if a ParsingStep has type: "raw" or type: "padded", it can’t have name: "vertex/gemini-1.0-pro-vision-001". To solve this, we can introduce a union type:

/**
 * @typedef {'openai/gpt-3.5-turbo'|'openai/gpt-4'|'azure/gpt-35-turbo'|'vertex/text-bison@001'|'ollama/mixtral'|'ollama/llama2'|'ollama/llama2:13b'|'ollama/gemma'} SupportedLanguageModel
 * @typedef {SupportedLanguageModel | 'vertex/gemini-1.0-pro-vision-001'} SupportedVisionModel
 */

/**
 * @typedef {{
 *   type: 'raw' | 'padded',
 *   name: SupportedLanguageModel,
 *   maxcount?: number
 * }} TextParsingStep
 */

/**
 * @typedef {{
 *   type: 'image',
 *   name: SupportedVisionModel,
 *   maxcount?: number
 * }} ImageParsingStep
 */

/**
* ...
* @param {(TextParsingStep|ImageParsingStep)[]=} parsingsteps
* ...
*/

This is called a discriminated union type. If the type property is set to raw or padded, the name property can only be of type SupportedLanguageModel. If the type property is set to image, the name property can only be of type SupportedVisionModel. In this case, SupportedVisionModel is a superset of SupportedLanguageModel, and therefore, we effectively just allow more options for the name property if the type is set to image.

Enabling JSDoc Type checking in VSCode

VSCode has built-in support for JSDoc type annotations with type checking and autocompletion.

Add a jsconfig.json file to your project root with the following content:

{
    "compilerOptions": {
        "allowJs": true,
        "checkJs": true
    },
    "exclude": [
        "node_modules"
    ]
}

Maybe add more excludes if you have more folders you don’t want to check as this can slow down your editor.

Updates to this post:

Published on 4/10/2024 by Stefan Gussner

Tags: