Learn/Advanced Topics

JSON Data Contracts — Schema Governance & Contract Testing

When multiple teams produce and consume JSON data, schemas drift silently. A producer renames a field, changes a type, or removes a key — and the consuming team discovers the breakage days later through corrupted reports. Data contracts are the organizational and technical solution to this problem.

The Problem: Schema Drift

A real incident: the user service team renames user_id to userId in their API response. No tests break because no test covers the exact field name. The analytics pipeline silently produces null values for user attribution. The billing team discovers the issue 3 weeks later when revenue reports are off by 15%.

Schema Drift Causing Silent Data Corruption

Warning

Schema drift is one of the most expensive problems in distributed systems. The cost is not the fix (usually trivial) but the detection delay: corrupted data flowing into downstream systems for days or weeks before anyone notices.

What Is a Data Contract?

ComponentJSON Schema AloneFull Data Contract
Structure definitionField names, types, requiredField names, types, required
Validation rulesPatterns, enums, min/maxPatterns, enums, min/max
OwnershipNot definedNamed team/individual owner
Change processNot definedPR review, compatibility check, approval
Breaking change detectionManualAutomated in CI/CD
Consumer awarenessNoneConsumer-driven contracts, notifications
SLA guaranteesNoneAvailability, latency, data freshness
Semantic documentationMinimal (description field)Business meaning, units, examples
Data contract definition (YAML + JSON Schema)yaml
1# contracts/user-service/user-profile.yaml
2apiVersion: v1
3kind: DataContract
4metadata:
5 name: user-profile
6 version: 2.1.0
7 owner: user-platform-team
8 description: User profile data exposed via REST API and published to Kafka
9 sla:
10 availability: 99.9%
11 freshness: 5 minutes
12 response_time_p99: 200ms
13
14channels:
15 - type: rest_api
16 path: /api/v2/users/{id}
17 method: GET
18 - type: kafka_topic
19 name: user.profile.updated
20
21schema:
22 type: object
23 required: [user_id, email, created_at]
24 properties:
25 user_id:
26 type: string
27 format: uuid
28 description: Unique user identifier (immutable after creation)
29 email:
30 type: string
31 format: email
32 description: Primary email address (PII - handle with care)
33 display_name:
34 type: string
35 maxLength: 100
36 description: User-chosen display name
37 plan:
38 type: string
39 enum: [free, pro, enterprise]
40 description: Current subscription plan
41 created_at:
42 type: string
43 format: date-time
44 description: Account creation timestamp (ISO 8601 UTC)

Breaking vs Non-Breaking Changes

Change TypeExampleBreaking?Action Required
Add optional fieldAdd "avatar_url" to responseNoMinor version bump
Widen enum valuesAdd "team" to plan enumNoMinor version bump
Remove a fieldRemove "legacy_id"YesMajor version, migration period
Rename a fielduser_id -> userIdYesMajor version, migration period
Change field typeage: string -> numberYesMajor version, new endpoint
Make optional requireddisplay_name becomes requiredYesMajor version, notify consumers
Narrow enumRemove "trial" from plan enumYesMajor version, verify no consumers use it
Change field semanticsstatus: HTTP code -> business codeYesMajor version, document thoroughly

Schema Governance Workflow

Schema Change Governance Process

Step 1: Automated Compatibility Check

.github/workflows/schema-check.ymlyaml
1name: Schema Compatibility Check
2on:
3 pull_request:
4 paths: ['contracts/**/*.json', 'contracts/**/*.yaml']
5
6jobs:
7 check:
8 runs-on: ubuntu-latest
9 steps:
10 - uses: actions/checkout@v4
11 with:
12 fetch-depth: 0
13
14 - name: Check backward compatibility
15 run: |
16 # Compare modified schemas against main branch
17 for file in $(git diff --name-only origin/main -- contracts/); do
18 echo "Checking $file..."
19 npx json-schema-diff \
20 <(git show origin/main:$file) \
21 "$file" \
22 --fail-on-breaking
23 done
24
25 - name: Validate all schemas
26 run: |
27 npx ajv validate -s meta-schema.json -d 'contracts/**/*.json'

Consumer-Driven Contract Testing with Pact

Pact Consumer-Driven Contract Flow

Consumer Side: Define Expectations

Consumer Pact test (JavaScript)javascript
1import { PactV3 } from '@pact-foundation/pact';
2
3const provider = new PactV3({
4 consumer: 'billing-service',
5 provider: 'user-service',
6});
7
8describe('User API Contract', () => {
9 it('returns user profile for billing', async () => {
10 // Define what the consumer expects
11 await provider
12 .given('user u123 exists')
13 .uponReceiving('a request for user profile')
14 .withRequest({ method: 'GET', path: '/api/v2/users/u123' })
15 .willRespondWith({
16 status: 200,
17 headers: { 'Content-Type': 'application/json' },
18 body: {
19 user_id: 'u123', // billing uses this field
20 email: '[email protected]', // billing uses this field
21 plan: 'pro', // billing uses this field
22 },
23 });
24
25 await provider.executeTest(async (mockServer) => {
26 const response = await fetch(
27 `${mockServer.url}/api/v2/users/u123`
28 );
29 const user = await response.json();
30
31 // Assert the fields billing actually depends on
32 expect(user.user_id).toBeDefined();
33 expect(user.email).toBeDefined();
34 expect(user.plan).toMatch(/^(free|pro|enterprise)$/);
35 });
36 });
37});

Producer Side: Verify Contracts

Producer Pact verificationjavascript
1import { Verifier } from '@pact-foundation/pact';
2
3describe('User Service Provider Verification', () => {
4 it('satisfies all consumer contracts', async () => {
5 const verifier = new Verifier({
6 providerBaseUrl: 'http://localhost:3000',
7 pactBrokerUrl: process.env.PACT_BROKER_URL,
8 provider: 'user-service',
9 publishVerificationResult: true,
10 providerVersion: process.env.GIT_SHA,
11
12 // Set up test data for each contract state
13 stateHandlers: {
14 'user u123 exists': async () => {
15 await db.users.create({
16 user_id: 'u123',
17 email: '[email protected]',
18 plan: 'pro',
19 });
20 },
21 },
22 });
23
24 await verifier.verifyProvider();
25 // If any consumer contract is broken, this test FAILS
26 // preventing the producer from deploying
27 });
28});

Schema Registry

A Schema Registry centralizes schema storage and enforces compatibility checks. When a producer registers a new schema version, the registry verifies it is compatible with the previous version before allowing it:

RegistryFormat SupportCompatibility ModesBest For
Confluent Schema RegistryJSON Schema, Avro, ProtobufBACKWARD, FORWARD, FULL, NONEKafka-centric systems
Apicurio RegistryJSON Schema, Avro, Protobuf, OpenAPISame as ConfluentMulti-protocol, open source
AWS Glue Schema RegistryJSON Schema, AvroBACKWARD, FULL, NONEAWS-native environments
Confluent Schema Registry: register and check compatibilitybash
1# Register a new schema version
2curl -X POST http://schema-registry:8081/subjects/user-profile-value/versions \
3 -H "Content-Type: application/vnd.schemaregistry.v1+json" \
4 -d '{
5 "schemaType": "JSON",
6 "schema": "{"type":"object","required":["user_id","email"],"properties":{"user_id":{"type":"string"},"email":{"type":"string"},"plan":{"type":"string","enum":["free","pro","enterprise"]}}}"
7 }'
8
9# Check compatibility before registering
10curl -X POST http://schema-registry:8081/compatibility/subjects/user-profile-value/versions/latest \
11 -H "Content-Type: application/vnd.schemaregistry.v1+json" \
12 -d '{"schemaType": "JSON", "schema": "..."}'
13# Response: {"is_compatible": true} or {"is_compatible": false}

Anti-Patterns

1. Schemas in Multiple Places

When the schema is defined in the producer code, duplicated in the consumer code, documented in Confluence, and partially described in the OpenAPI spec, no single source of truth exists. Changes in one place are not reflected elsewhere. Always have a single canonical schema file that all systems reference.

2. Implicit Contracts

An implicit contract exists when consumers depend on fields that are not formally documented. The producer team doesn't know consumers use internal_score because it was never in the contract — they remove it in a "cleanup" and break the recommendation engine.

3. No Versioning

Without version numbers on schemas, there is no way to know which version a consumer expects, no way to maintain multiple versions during migration, and no way to roll back. Every schema should have an explicit version from day one.

Best Practices

  • Store schemas in version control alongside code — treat them as first-class artifacts
  • Run automated compatibility checks in CI on every schema change PR
  • Use consumer-driven contract testing (Pact) for REST APIs
  • Use a Schema Registry for event streams (Kafka, RabbitMQ)
  • Apply semantic versioning: MAJOR for breaking changes, MINOR for additions
  • Assign explicit ownership for every data contract (team name, not individual)
  • Document field semantics beyond types: units, examples, business meaning
  • Never remove or rename a field without a major version bump and migration period
  • Never maintain the schema in multiple places — use a single source of truth
  • Never skip compatibility checks "just this once" — that is when breakage happens

Frequently Asked Questions

What is a data contract?
A data contract is a formal agreement between a data producer and its consumers about the structure, types, semantics, and quality of data exchanged. For JSON, this includes the schema (field names, types, required fields), the semantic meaning of fields, SLAs (availability, latency), and change management processes. It goes beyond JSON Schema validation by adding organizational ownership and change governance.
How is a data contract different from JSON Schema?
JSON Schema defines the structure and validation rules for a JSON document. A data contract is broader: it includes the schema but also defines who owns it, how changes are proposed and approved, what the SLAs are, and how breaking changes are handled. JSON Schema is a technical artifact; a data contract is a team agreement that uses JSON Schema as one of its components.
What is consumer-driven contract testing?
Consumer-driven contract testing (popularized by Pact) reverses the traditional approach: instead of the producer defining the API and consumers adapting, consumers define their expectations (which fields they use, what types they expect) and the producer verifies it meets all consumer contracts. This ensures no consumer breaks when the producer changes.
What counts as a breaking change in a JSON schema?
Breaking changes include: removing a field, renaming a field, changing a field type (string to number), making an optional field required, narrowing an enum (removing allowed values), changing the structure of nested objects, and changing field semantics (same name, different meaning). Non-breaking changes: adding a new optional field, widening an enum, adding a new endpoint.
What is a Schema Registry?
A Schema Registry is a centralized service that stores, versions, and validates schemas for data formats. Confluent Schema Registry (for Kafka) is the most well-known. When a producer publishes a message, the registry checks if the schema is backward-compatible with previous versions. If it is not, the message is rejected before it reaches any consumer, preventing data corruption.
How do I detect breaking changes automatically?
Use tools like json-schema-diff, openapi-diff, or Confluent Schema Registry compatibility checks in your CI/CD pipeline. When a PR modifies a schema file, the CI job compares the new schema against the currently deployed version and fails the build if a backward-incompatible change is detected. This prevents breaking changes from reaching production.
What is semantic versioning for JSON schemas?
Apply SemVer to your schemas: MAJOR version for breaking changes (field removals, type changes), MINOR version for backward-compatible additions (new optional fields), PATCH for documentation or description changes. When a schema bumps its major version, consumers must explicitly upgrade — they are not automatically affected.