ADR-003: Test-Driven Development Approach
Context
Gatekit is a security-critical component that sits between MCP clients and servers. It must:
- Reliably validate and filter potentially malicious requests
- Maintain protocol compliance with MCP specifications
- Handle edge cases and error conditions gracefully
- Support future protocol evolution without regression
- Provide confidence in security guarantees
Given the security-critical nature and the need for robust protocol handling, we need a development approach that ensures comprehensive testing and high code quality.
Decision
We will follow a Test-Driven Development (TDD) approach throughout the project:
# Example TDD cycle for message validation
class TestMessageValidation:
@pytest.mark.asyncio
async def test_valid_request_passes_validation(self):
# Red: Write failing test first
request = {"jsonrpc": "2.0", "method": "ping", "id": 1}
validator = MessageValidator()
result = validator.validate(request) # Validation is synchronous
assert result.is_valid
assert result.errors == []
@pytest.mark.asyncio
async def test_missing_jsonrpc_fails_validation(self):
# Red: Write failing test
request = {"method": "ping", "id": 1} # Missing jsonrpc
validator = MessageValidator()
result = validator.validate(request) # Validation is synchronous
assert not result.is_valid
assert "jsonrpc field required" in result.errors
Key Principles
- Red-Green-Refactor Cycle: Write test → Make it pass → Improve code
- Test First: No production code without a failing test
- Comprehensive Coverage: Test happy path, edge cases, and error conditions
- Fast Feedback: Tests run quickly and provide immediate feedback
- Living Documentation: Tests serve as executable specifications
Alternatives Considered
Alternative 1: Traditional Test-After Development
# Write implementation first, then add tests
def implement_feature():
# Build the feature
pass
def test_feature():
# Test the built feature
pass
- Pros: Faster initial development, familiar approach
- Cons: Often leads to untestable code, missing edge cases, lower coverage
Alternative 2: Behavior-Driven Development (BDD)
# Feature: Message Validation
# Scenario: Valid message passes validation
# Given a valid JSON-RPC message
# When validation is performed
# Then the message should be accepted
- Pros: Business-readable specifications, stakeholder involvement
- Cons: Additional tooling complexity, overkill for technical components
Alternative 3: Property-Based Testing Only
@given(st.dictionaries(st.text(), st.text()))
def test_message_validation(message):
# Generate random inputs and test properties
result = validate(message)
assert isinstance(result, ValidationResult)
- Pros: Discovers edge cases automatically
- Cons: Harder to understand failures, doesn't replace example-based tests
Consequences
Positive
- High Confidence: Comprehensive test coverage provides confidence in changes
- Better Design: TDD drives towards more testable, modular code
- Regression Prevention: Tests catch breaking changes immediately
- Documentation: Tests serve as living examples of how code should work
- Faster Debugging: Failing tests pinpoint exact issues
- Refactoring Safety: Can improve code structure without fear
Negative
- Initial Overhead: Writing tests first slows initial development
- Learning Curve: Team must be disciplined about TDD practices
- Test Maintenance: Tests require ongoing maintenance as code evolves
- Over-Testing Risk: May write tests for trivial functionality
Implementation Notes
Current Test Structure
tests/
├── unit/ # Fast, isolated unit tests
│ ├── test_plugin_manager.py
│ ├── test_config_loader.py
│ └── ...
├── integration/ # Component integration tests
│ ├── test_pii_integration.py
│ └── ...
├── validation/ # Manual validation scripts with third-party tools
│ └── test-files/
├── mocks/ # Mock utilities
├── utils/ # Test utilities
└── fixtures/ # Fixture definitions
Testing Tools and Patterns
- pytest: Test framework with excellent async support
- pytest-asyncio: Async test execution (tests require
@pytest.mark.asynciodecorator) - pytest-xdist: Parallel test execution (
pytest tests/ -n auto) - unittest.mock: Mocking for isolation (standard library)
- pytest-cov: Test coverage reporting
TDD Workflow
- Write failing test describing desired behavior
- Run test to confirm it fails for the right reason
- Write minimal code to make test pass
- Run all tests to ensure no regression
- Refactor code and tests for clarity
- Repeat for next requirement
Example TDD Implementation
# Red: Test for transport connection
async def test_stdio_transport_connects_successfully(self):
transport = StdioTransport(command=["echo", "test"])
await transport.connect()
assert transport.is_connected
# Green: Minimal implementation
class StdioTransport:
async def connect(self):
self.process = await asyncio.create_subprocess_exec(...)
self.is_connected = True
# Refactor: Improve error handling and resource management
class StdioTransport:
async def connect(self):
try:
self.process = await asyncio.create_subprocess_exec(
*self.command,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
self.is_connected = True
except Exception as e:
raise TransportConnectionError(f"Failed to connect: {e}")
Review
This decision will be reviewed when:
- Development velocity significantly decreases due to test overhead
- Test maintenance burden becomes excessive
- Team composition changes significantly
- Project requirements shift away from security-critical operations