Tokens vs Characters for TypeScript Devs — The Agentic Web

Tokens, not characters, are the currency of large language models. Costs, latency, truncation, and even model behavior are governed by token counts. This guide equips TypeScript devs to measure, reason about, and control tokens in day‑to‑day engineering work.

What is a token?

A token is a unit of text the model reads or writes—roughly a word piece. Tokenization depends on the model family (e.g., cl100k for many OpenAI‑style models, tiktoken‑like for Llama 3). Two strings with the same character length can produce very different token counts.

Key implications:

Billing is per token (input + output), not characters.
Context limits are in tokens. Exceed them and your prompt is truncated.
Latency scales with tokens seen/produced and any tool‑use turnarounds.

Count tokens in TypeScript

Use a tokenizer that matches your target model. For OpenAI‑style encodings, tiktoken works well in Node.


// token-counter.ts
import { encoding_for_model } from "tiktoken";

type Model = "gpt-4o-mini" | "gpt-4o" | "gpt-3.5-turbo" | "cl100k_base";

export function countTokens(text: string, model: Model = "gpt-4o-mini"): number {
  const enc = encoding_for_model(model);
  try {
    const tokens = enc.encode(text);
    return tokens.length;
  } finally {
    enc.free();
  }
}

if (require.main === module) {
  const sample = "Summarize this document into three bullets.";
  console.log(countTokens(sample));
}

For Llama 3, use a tokenizer compatible with its vocab; if you’re hosting via an API (OpenAI‑compatible), the provider often exposes a token counter or equivalent tools.

Estimate costs programmatically

Maintain a simple price sheet and a helper that multiplies tokens by price per million.


// cost.ts
type Price = { input: number; output: number }; // USD per 1M tokens
const PRICES: Record<string, Price> = {
  "gpt-4o-mini": { input: 0.15, output: 0.6 },
  "gpt-4o": { input: 5, output: 15 },
};

export function estimateCost({
  model,
  promptTokens,
  completionTokens,
}: {
  model: keyof typeof PRICES;
  promptTokens: number;
  completionTokens: number;
}) {
  const p = PRICES[model];
  return (
    (promptTokens / 1_000_000) * p.input + (completionTokens / 1_000_000) * p.output
  );
}

Example: if your prompt is ~2,000 tokens and you expect 500‑token completions, estimateCost({ model: "gpt-4o-mini", promptTokens: 2000, completionTokens: 500 }) yields a few tenths of a cent.

Budget prompts and tool calls

Complex apps with tools (functions) spend tokens over multiple turns. Track per‑turn budgets to stay within limits and manage cost of retries.


// budget.ts
export type TokenBudget = {
  maxPrompt: number;       // per turn
  maxCompletion: number;   // per turn
  maxConversation: number; // running total
};

export class BudgetTracker {
  private used = 0;
  constructor(private readonly b: TokenBudget) {}
  canSpend(n: number) { return this.used + n <= this.b.maxConversation; }
  spend(n: number) { this.used += n; }
}

Combine with a counter to enforce guardrails:


import { countTokens } from "./token-counter";

export function assertWithinBudget(
  user: string,
  system: string,
  toolSchemaJson: string,
  budget: TokenBudget,
  tracker: BudgetTracker
) {
  const promptTokens = countTokens(system + "\n" + toolSchemaJson + "\n" + user);
  if (promptTokens > budget.maxPrompt) throw new Error("prompt too large");
  if (!tracker.canSpend(promptTokens)) throw new Error("conversation budget exceeded");
  tracker.spend(promptTokens);
}

Practical optimization tactics

Trim context. Summaries beat raw logs. Teach your agent to ask for missing context instead of over‑stuffing prompts.
Prefer structured inputs. A concise JSON schema costs fewer tokens than long prose.
Cache expensive system prompts. Hash the prompt template; skip recomputation.
Stream results. Don’t wait for the full completion to show progress.
Set max_tokens defensively to avoid accidental 10k‑token rambles.

Common token traps (and fixes)

Long examples in the prompt blow up token counts. Use short exemplars or tool descriptions instead.
Repeating instructions per turn. Factor them into system and reuse.
Huge tool schema dumps. Provide only the tools relevant to this step; swap tool sets as the workflow advances.
Tokenizer mismatch. Use the encoding that matches your model family or your estimates will be off.

Unit tests for token budgets

Treat token budgets like API contracts. A simple jest test catches regressions:


import { countTokens } from "../token-counter";

test("system prompt stays under 800 tokens", () => {
  const SYSTEM = `You are a code assistant. Use tools; be concise; ask clarifying questions.`;
  expect(countTokens(SYSTEM, "gpt-4o-mini")).toBeLessThan(800);
});

TL;DR

Tokens control cost, latency, and feasibility; measure them explicitly.
Use a tokenizer in tests and at runtime to enforce budgets.
Estimate cost with a tiny helper and keep a per‑turn budget.
Optimize by trimming context, using structure, and versioning prompts.

Ship teams hate surprises. A little token discipline turns AI cost from a guess into a line item you can manage.