Token Usage Estimator

Estimate token usage per MCP server using tiktoken. Track input and output tokens per tool, export to CSV for analysis, and monitor live counters in the TUI.

Quick Reference

Property	Value
Handler	`token_usage`
Type	Auditing
Scope	Global
Default Critical	`false` (fail-open — observability should not block traffic)

How It Works

The Token Usage Estimator plugin:

Intercepts MCP requests, responses, and notifications
Estimates token counts using tiktoken's cl100k_base encoding
Tracks input tokens (server responses into LLM) and output tokens (tool call requests from LLM) per tool
Writes per-event CSV rows for historical analysis
Maintains a JSON state file for live TUI display

Accuracy Note: Token estimates use tiktoken's cl100k_base tokenizer when available (~88% accurate), or a pure-Python heuristic fallback otherwise. Either way, these numbers are for monitoring and trend analysis, not exact billing data. See Estimation Methods below.

Estimation Methods

The plugin uses two estimation methods, selected automatically:

tiktoken (default on macOS/Linux)

Uses OpenAI's cl100k_base tokenizer for ~88% accuracy across models. This is a compiled library installed automatically on macOS and Linux.

Heuristic fallback (automatic on Windows)

A pure-Python estimator that uses max(chars/4, words*1.33) as a rough approximation. Less accurate than tiktoken but requires no native dependencies. The heuristic is used automatically when tiktoken is unavailable.

Windows and WDAC

tiktoken is not installed on Windows by default because its compiled Rust extension (.pyd file) can be blocked by Windows Defender Application Control (WDAC), which prevents Gatekit from starting entirely.

If your Windows environment does not enforce WDAC (or your IT admin has whitelisted Python extensions), you can install tiktoken manually for better accuracy:

pip install tiktoken

If tiktoken is installed but blocked by WDAC at runtime, the plugin falls back to heuristic estimation automatically. To resolve the WDAC block, your IT administrator can create a supplemental WDAC policy that whitelists the Python installation directory using file path rules or publisher-based rules. See Microsoft's App Control documentation for details.

TUI Features

When enabled, the Token Usage Estimator adds live columns to the TUI server list:

Column	Label	Description
Input tokens	↑	Server responses into the LLM
Output tokens	↓	Tool call requests from the LLM

Token counts are displayed in humanized format (e.g., 1.2k, 3.4M).

Context menu actions (right-click or select a token column):

Reset counters for a specific server
Reset all server counters

Configuration Reference

output_file

Path to the CSV log file for historical token usage data.

Type: string Default: logs/token_usage.csv Required: Yes

Note: Relative paths are resolved relative to the config file's directory.

state_file

Path to the JSON state file used for live TUI counter display and multi-instance coordination.

Type: string Default: logs/token_usage_state.json Required: Yes

Note: Relative paths are resolved relative to the config file's directory.

flush_interval_seconds

How often to flush state to disk.

Type: number Default: 1.0 Range: 0.1–60.0 seconds

YAML Configuration

Minimal Configuration

plugins:
  auditing:
    _global:
      - handler: token_usage
        config:
          output_file: logs/token_usage.csv
          state_file: logs/token_usage_state.json

Full Configuration

plugins:
  auditing:
    _global:
      - handler: token_usage
        config:
          output_file: logs/token_usage.csv
          state_file: logs/token_usage_state.json
          flush_interval_seconds: 1.0
          critical: false

CSV Log Format

The CSV log appends one row per request/response pair:

Column	Description
`timestamp`	ISO 8601 timestamp (UTC)
`server`	MCP server name
`method`	MCP method (e.g., `tools/call`, `initialize`)
`tool`	Tool name for `tools/call`, otherwise `_other`
`input_tokens`	Estimated input tokens (server response into LLM)
`output_tokens`	Estimated output tokens (tool call request from LLM)

Example CSV output:

timestamp,server,method,tool,input_tokens,output_tokens
2026-01-15T10:30:45.123456Z,filesystem,tools/call,read_file,42,1250
2026-01-15T10:30:46.789012Z,filesystem,tools/call,write_file,890,15
2026-01-15T10:30:47.345678Z,github,tools/call,search,125,3400

Analysis tip: Load the CSV into a spreadsheet and create pivot tables by server or tool to identify which tools consume the most tokens.

Token usage CSV log opened in a spreadsheet

Multi-Instance Support

The plugin supports multiple gateway processes running concurrently:

Each gateway instance registers in the state file under its process ID (PID)
The TUI aggregates token counts across all running instances
Dead instances (stale PIDs) are automatically cleaned up
File locking prevents race conditions between instances
Counters persist across gateway restarts

State Persistence

Token counters are saved to the state file and restored on restart. The state file uses a JSON format with per-PID instance tracking:

On startup, the plugin restores its counters from the state file
On shutdown, it removes its own PID entry while preserving other instances
Reset operations propagate across all instances via reset IDs