ADR-004: Error Handling Strategy

Context

Gatekit operates as a security proxy in the MCP ecosystem, requiring robust error handling for:

Protocol Compliance: Must return proper JSON-RPC 2.0 error responses
Security Isolation: Errors from upstream servers must be sanitized
Debugging Support: Developers need actionable error information
Reliability: System should gracefully handle various failure modes
Monitoring: Operations teams need visibility into error patterns

The error handling strategy will impact security, usability, and maintainability throughout the system.

Decision

We will implement a structured error handling strategy using JSON-RPC 2.0 error codes with Gatekit-specific extensions:

# Gatekit-specific error codes (see gatekit/protocol/errors.py)
# Uses IntEnum for type safety
class MCPErrorCodes(IntEnum):
    # Standard JSON-RPC error codes
    PARSE_ERROR = -32700
    INVALID_REQUEST = -32600
    METHOD_NOT_FOUND = -32601
    INVALID_PARAMS = -32602
    INTERNAL_ERROR = -32603

    # Gatekit-specific error codes (-32099 to -32000 per JSON-RPC spec)
    SECURITY_VIOLATION = -32000
    CONFIGURATION_ERROR = -32001
    PLUGIN_LOADING_ERROR = -32002
    PERMISSION_ERROR = -32003
    UPSTREAM_UNAVAILABLE = -32004
    AUDITING_FAILURE = -32005

Key Principles

Protocol Compliance: All errors follow JSON-RPC 2.0 specification
Security-First: Never leak sensitive information in error messages
Structured Data: Consistent error format with codes and details
Contextual Information: Include relevant context for debugging
Graceful Degradation: System continues operating despite errors

Alternatives Considered

Alternative 1: Simple Exception Propagation

# Just let Python exceptions bubble up
try:
    result = await server.request(message)
except Exception as e:
    raise e  # Raw exception propagation

Pros: Simple, preserves full error details
Cons: Breaks JSON-RPC compliance, potential security leaks

Alternative 2: Generic Error Responses

# Always return same generic error
def handle_error(e):
    return {"error": {"code": -1, "message": "An error occurred"}}

Pros: Maximum security, simple implementation
Cons: Poor debugging experience, no actionable information

Alternative 3: HTTP-Style Status Codes

# Use HTTP status codes instead of JSON-RPC
class GatekitError:
    BAD_REQUEST = 400
    UNAUTHORIZED = 401
    FORBIDDEN = 403
    INTERNAL_ERROR = 500

Pros: Familiar to web developers
Cons: Doesn't follow JSON-RPC 2.0 specification

Consequences

Positive

Protocol Compliance: Follows JSON-RPC 2.0 error specification exactly
Security: Controlled error information prevents information leakage
Debugging: Structured errors with codes enable targeted debugging
Monitoring: Error codes allow for meaningful metrics and alerting
Client Support: Clients can handle errors programmatically

Negative

Complexity: More code to handle error categorization and formatting
Maintenance: Error codes must be documented and maintained
Potential Over-Engineering: May be more structure than needed initially

Implementation Notes

Error Response Format

{
  "jsonrpc": "2.0",
  "error": {
    "code": -32003,
    "message": "Validation failed",
    "data": {
      "details": "Request missing required 'method' field",
      "request_id": "req_123",
      "timestamp": "YYYY-MM-DDTHH:MM:SSZ"
    }
  },
  "id": null
}

Error Response Creation

# Error responses are created via create_error_response() helper
# Error handling is distributed throughout the codebase rather than centralized
def create_error_response(
    request_id: Optional[Union[str, int]],
    code: int,
    message: str,
    data: Optional[Any] = None,
) -> MCPResponse:
    """Create a JSON-RPC error response."""
    error_dict = {"code": code, "message": message}
    if data is not None:
        error_dict["data"] = data
    return MCPResponse(jsonrpc="2.0", id=request_id, error=error_dict)

# Example usage (always use named parameters for clarity):
response = create_error_response(
    request_id=request.id,
    code=MCPErrorCodes.INVALID_PARAMS,
    message="Tool call missing required 'name' parameter",
)

Security Considerations

Information Filtering: Remove stack traces and internal paths from client responses
Error Logging: Full error details logged internally for debugging
Rate Limiting: Prevent error-based enumeration attacks
Context Sanitization: Remove sensitive data from error context

Error Categories

Transport Errors

Connection failures to upstream servers
Timeout errors
Protocol-level communication issues

Validation Errors

Malformed JSON-RPC requests
Missing required fields
Invalid parameter types

Security Errors

Blocked requests due to security policies
Authentication failures
Authorization violations

Internal Errors

Unexpected system failures
Configuration errors
Resource exhaustion

Monitoring and Observability

Error tracking is handled through:

Auditing plugins: Log all requests/responses including errors (see gatekit/plugins/auditing/)
Processing pipeline: Full visibility into plugin decisions and transformations
Structured logging: Errors logged with context for debugging

Note: Prometheus-style metrics are not currently implemented but could be added as a future enhancement.

Review

This decision will be reviewed when:

JSON-RPC specification changes significantly
Security requirements become more stringent
Debugging needs change substantially
Monitoring requirements evolve