The Problem: Schema Drift
A real incident: the user service team renames user_id to userId in their API response. No tests break because no test covers the exact field name. The analytics pipeline silently produces null values for user attribution. The billing team discovers the issue 3 weeks later when revenue reports are off by 15%.
Warning
What Is a Data Contract?
| Component | JSON Schema Alone | Full Data Contract |
|---|---|---|
| Structure definition | Field names, types, required | Field names, types, required |
| Validation rules | Patterns, enums, min/max | Patterns, enums, min/max |
| Ownership | Not defined | Named team/individual owner |
| Change process | Not defined | PR review, compatibility check, approval |
| Breaking change detection | Manual | Automated in CI/CD |
| Consumer awareness | None | Consumer-driven contracts, notifications |
| SLA guarantees | None | Availability, latency, data freshness |
| Semantic documentation | Minimal (description field) | Business meaning, units, examples |
1# contracts/user-service/user-profile.yaml2apiVersion: v13kind: DataContract4metadata:5 name: user-profile6 version: 2.1.07 owner: user-platform-team8 description: User profile data exposed via REST API and published to Kafka9 sla:10 availability: 99.9%11 freshness: 5 minutes12 response_time_p99: 200ms1314channels:15 - type: rest_api16 path: /api/v2/users/{id}17 method: GET18 - type: kafka_topic19 name: user.profile.updated2021schema:22 type: object23 required: [user_id, email, created_at]24 properties:25 user_id:26 type: string27 format: uuid28 description: Unique user identifier (immutable after creation)29 email:30 type: string31 format: email32 description: Primary email address (PII - handle with care)33 display_name:34 type: string35 maxLength: 10036 description: User-chosen display name37 plan:38 type: string39 enum: [free, pro, enterprise]40 description: Current subscription plan41 created_at:42 type: string43 format: date-time44 description: Account creation timestamp (ISO 8601 UTC)Breaking vs Non-Breaking Changes
| Change Type | Example | Breaking? | Action Required |
|---|---|---|---|
| Add optional field | Add "avatar_url" to response | No | Minor version bump |
| Widen enum values | Add "team" to plan enum | No | Minor version bump |
| Remove a field | Remove "legacy_id" | Yes | Major version, migration period |
| Rename a field | user_id -> userId | Yes | Major version, migration period |
| Change field type | age: string -> number | Yes | Major version, new endpoint |
| Make optional required | display_name becomes required | Yes | Major version, notify consumers |
| Narrow enum | Remove "trial" from plan enum | Yes | Major version, verify no consumers use it |
| Change field semantics | status: HTTP code -> business code | Yes | Major version, document thoroughly |
Schema Governance Workflow
Step 1: Automated Compatibility Check
1name: Schema Compatibility Check2on:3 pull_request:4 paths: ['contracts/**/*.json', 'contracts/**/*.yaml']56jobs:7 check:8 runs-on: ubuntu-latest9 steps:10 - uses: actions/checkout@v411 with:12 fetch-depth: 01314 - name: Check backward compatibility15 run: |16 # Compare modified schemas against main branch17 for file in $(git diff --name-only origin/main -- contracts/); do18 echo "Checking $file..."19 npx json-schema-diff \20 <(git show origin/main:$file) \21 "$file" \22 --fail-on-breaking23 done2425 - name: Validate all schemas26 run: |27 npx ajv validate -s meta-schema.json -d 'contracts/**/*.json'Consumer-Driven Contract Testing with Pact
Consumer Side: Define Expectations
1import { PactV3 } from '@pact-foundation/pact';23const provider = new PactV3({4 consumer: 'billing-service',5 provider: 'user-service',6});78describe('User API Contract', () => {9 it('returns user profile for billing', async () => {10 // Define what the consumer expects11 await provider12 .given('user u123 exists')13 .uponReceiving('a request for user profile')14 .withRequest({ method: 'GET', path: '/api/v2/users/u123' })15 .willRespondWith({16 status: 200,17 headers: { 'Content-Type': 'application/json' },18 body: {19 user_id: 'u123', // billing uses this field20 email: '[email protected]', // billing uses this field21 plan: 'pro', // billing uses this field22 },23 });2425 await provider.executeTest(async (mockServer) => {26 const response = await fetch(27 `${mockServer.url}/api/v2/users/u123`28 );29 const user = await response.json();3031 // Assert the fields billing actually depends on32 expect(user.user_id).toBeDefined();33 expect(user.email).toBeDefined();34 expect(user.plan).toMatch(/^(free|pro|enterprise)$/);35 });36 });37});Producer Side: Verify Contracts
1import { Verifier } from '@pact-foundation/pact';23describe('User Service Provider Verification', () => {4 it('satisfies all consumer contracts', async () => {5 const verifier = new Verifier({6 providerBaseUrl: 'http://localhost:3000',7 pactBrokerUrl: process.env.PACT_BROKER_URL,8 provider: 'user-service',9 publishVerificationResult: true,10 providerVersion: process.env.GIT_SHA,1112 // Set up test data for each contract state13 stateHandlers: {14 'user u123 exists': async () => {15 await db.users.create({16 user_id: 'u123',17 email: '[email protected]',18 plan: 'pro',19 });20 },21 },22 });2324 await verifier.verifyProvider();25 // If any consumer contract is broken, this test FAILS26 // preventing the producer from deploying27 });28});Schema Registry
A Schema Registry centralizes schema storage and enforces compatibility checks. When a producer registers a new schema version, the registry verifies it is compatible with the previous version before allowing it:
| Registry | Format Support | Compatibility Modes | Best For |
|---|---|---|---|
| Confluent Schema Registry | JSON Schema, Avro, Protobuf | BACKWARD, FORWARD, FULL, NONE | Kafka-centric systems |
| Apicurio Registry | JSON Schema, Avro, Protobuf, OpenAPI | Same as Confluent | Multi-protocol, open source |
| AWS Glue Schema Registry | JSON Schema, Avro | BACKWARD, FULL, NONE | AWS-native environments |
1# Register a new schema version2curl -X POST http://schema-registry:8081/subjects/user-profile-value/versions \3 -H "Content-Type: application/vnd.schemaregistry.v1+json" \4 -d '{5 "schemaType": "JSON",6 "schema": "{"type":"object","required":["user_id","email"],"properties":{"user_id":{"type":"string"},"email":{"type":"string"},"plan":{"type":"string","enum":["free","pro","enterprise"]}}}"7 }'89# Check compatibility before registering10curl -X POST http://schema-registry:8081/compatibility/subjects/user-profile-value/versions/latest \11 -H "Content-Type: application/vnd.schemaregistry.v1+json" \12 -d '{"schemaType": "JSON", "schema": "..."}'13# Response: {"is_compatible": true} or {"is_compatible": false}Anti-Patterns
1. Schemas in Multiple Places
When the schema is defined in the producer code, duplicated in the consumer code, documented in Confluence, and partially described in the OpenAPI spec, no single source of truth exists. Changes in one place are not reflected elsewhere. Always have a single canonical schema file that all systems reference.
2. Implicit Contracts
An implicit contract exists when consumers depend on fields that are not formally documented. The producer team doesn't know consumers use internal_score because it was never in the contract — they remove it in a "cleanup" and break the recommendation engine.
3. No Versioning
Without version numbers on schemas, there is no way to know which version a consumer expects, no way to maintain multiple versions during migration, and no way to roll back. Every schema should have an explicit version from day one.
Best Practices
- ✓Store schemas in version control alongside code — treat them as first-class artifacts
- ✓Run automated compatibility checks in CI on every schema change PR
- ✓Use consumer-driven contract testing (Pact) for REST APIs
- ✓Use a Schema Registry for event streams (Kafka, RabbitMQ)
- ✓Apply semantic versioning: MAJOR for breaking changes, MINOR for additions
- ✓Assign explicit ownership for every data contract (team name, not individual)
- ✓Document field semantics beyond types: units, examples, business meaning
- ✗Never remove or rename a field without a major version bump and migration period
- ✗Never maintain the schema in multiple places — use a single source of truth
- ✗Never skip compatibility checks "just this once" — that is when breakage happens