ADR-004: Error Handling Strategy

Context

Gatekit operates as a security proxy in the MCP ecosystem, requiring robust error handling for:

  1. Protocol Compliance: Must return proper JSON-RPC 2.0 error responses
  2. Security Isolation: Errors from upstream servers must be sanitized
  3. Debugging Support: Developers need actionable error information
  4. Reliability: System should gracefully handle various failure modes
  5. Monitoring: Operations teams need visibility into error patterns

The error handling strategy will impact security, usability, and maintainability throughout the system.

Decision

We will implement a structured error handling strategy using JSON-RPC 2.0 error codes with Gatekit-specific extensions:

# Gatekit-specific error codes (see gatekit/protocol/errors.py)
# Uses IntEnum for type safety
class MCPErrorCodes(IntEnum):
    # Standard JSON-RPC error codes
    PARSE_ERROR = -32700
    INVALID_REQUEST = -32600
    METHOD_NOT_FOUND = -32601
    INVALID_PARAMS = -32602
    INTERNAL_ERROR = -32603

    # Gatekit-specific error codes (-32099 to -32000 per JSON-RPC spec)
    SECURITY_VIOLATION = -32000
    CONFIGURATION_ERROR = -32001
    PLUGIN_LOADING_ERROR = -32002
    PERMISSION_ERROR = -32003
    UPSTREAM_UNAVAILABLE = -32004
    AUDITING_FAILURE = -32005

Key Principles

  1. Protocol Compliance: All errors follow JSON-RPC 2.0 specification
  2. Security-First: Never leak sensitive information in error messages
  3. Structured Data: Consistent error format with codes and details
  4. Contextual Information: Include relevant context for debugging
  5. Graceful Degradation: System continues operating despite errors

Alternatives Considered

Alternative 1: Simple Exception Propagation

# Just let Python exceptions bubble up
try:
    result = await server.request(message)
except Exception as e:
    raise e  # Raw exception propagation

Alternative 2: Generic Error Responses

# Always return same generic error
def handle_error(e):
    return {"error": {"code": -1, "message": "An error occurred"}}

Alternative 3: HTTP-Style Status Codes

# Use HTTP status codes instead of JSON-RPC
class GatekitError:
    BAD_REQUEST = 400
    UNAUTHORIZED = 401
    FORBIDDEN = 403
    INTERNAL_ERROR = 500

Consequences

Positive

Negative

Implementation Notes

Error Response Format

{
  "jsonrpc": "2.0",
  "error": {
    "code": -32003,
    "message": "Validation failed",
    "data": {
      "details": "Request missing required 'method' field",
      "request_id": "req_123",
      "timestamp": "YYYY-MM-DDTHH:MM:SSZ"
    }
  },
  "id": null
}

Error Response Creation

# Error responses are created via create_error_response() helper
# Error handling is distributed throughout the codebase rather than centralized
def create_error_response(
    request_id: Optional[Union[str, int]],
    code: int,
    message: str,
    data: Optional[Any] = None,
) -> MCPResponse:
    """Create a JSON-RPC error response."""
    error_dict = {"code": code, "message": message}
    if data is not None:
        error_dict["data"] = data
    return MCPResponse(jsonrpc="2.0", id=request_id, error=error_dict)

# Example usage (always use named parameters for clarity):
response = create_error_response(
    request_id=request.id,
    code=MCPErrorCodes.INVALID_PARAMS,
    message="Tool call missing required 'name' parameter",
)

Security Considerations

Error Categories

Transport Errors

Validation Errors

Security Errors

Internal Errors

Monitoring and Observability

Error tracking is handled through:

Note: Prometheus-style metrics are not currently implemented but could be added as a future enhancement.

Review

This decision will be reviewed when: