JSON Data Contracts — Schema Governance, Contract Testing & Preventing Silent Data Corruption

The Problem: Schema Drift

A real incident: the user service team renames user_id to userId in their API response. No tests break because no test covers the exact field name. The analytics pipeline silently produces null values for user attribution. The billing team discovers the issue 3 weeks later when revenue reports are off by 15%.

Schema Drift Causing Silent Data Corruption

Warning

Schema drift is one of the most expensive problems in distributed systems. The cost is not the fix (usually trivial) but the detection delay: corrupted data flowing into downstream systems for days or weeks before anyone notices.

What Is a Data Contract?

Component	JSON Schema Alone	Full Data Contract
Structure definition	Field names, types, required	Field names, types, required
Validation rules	Patterns, enums, min/max	Patterns, enums, min/max
Ownership	Not defined	Named team/individual owner
Change process	Not defined	PR review, compatibility check, approval
Breaking change detection	Manual	Automated in CI/CD
Consumer awareness	None	Consumer-driven contracts, notifications
SLA guarantees	None	Availability, latency, data freshness
Semantic documentation	Minimal (description field)	Business meaning, units, examples

Data contract definition (YAML + JSON Schema)yaml

1# contracts/user-service/user-profile.yaml
2apiVersion: v1
3kind: DataContract
4metadata:
5  name: user-profile
6  version: 2.1.0
7  owner: user-platform-team
8  description: User profile data exposed via REST API and published to Kafka
9  sla:
10    availability: 99.9%
11    freshness: 5 minutes
12    response_time_p99: 200ms
13
14channels:
15  - type: rest_api
16    path: /api/v2/users/{id}
17    method: GET
18  - type: kafka_topic
19    name: user.profile.updated
20
21schema:
22  type: object
23  required: [user_id, email, created_at]
24  properties:
25    user_id:
26      type: string
27      format: uuid
28      description: Unique user identifier (immutable after creation)
29    email:
30      type: string
31      format: email
32      description: Primary email address (PII - handle with care)
33    display_name:
34      type: string
35      maxLength: 100
36      description: User-chosen display name
37    plan:
38      type: string
39      enum: [free, pro, enterprise]
40      description: Current subscription plan
41    created_at:
42      type: string
43      format: date-time
44      description: Account creation timestamp (ISO 8601 UTC)

Breaking vs Non-Breaking Changes

Change Type	Example	Breaking?	Action Required
Add optional field	Add "avatar_url" to response	No	Minor version bump
Widen enum values	Add "team" to plan enum	No	Minor version bump
Remove a field	Remove "legacy_id"	Yes	Major version, migration period
Rename a field	user_id -> userId	Yes	Major version, migration period
Change field type	age: string -> number	Yes	Major version, new endpoint
Make optional required	display_name becomes required	Yes	Major version, notify consumers
Narrow enum	Remove "trial" from plan enum	Yes	Major version, verify no consumers use it
Change field semantics	status: HTTP code -> business code	Yes	Major version, document thoroughly

Schema Governance Workflow

Schema Change Governance Process

Step 1: Automated Compatibility Check

.github/workflows/schema-check.ymlyaml

1name: Schema Compatibility Check
2on:
3  pull_request:
4    paths: ['contracts/**/*.json', 'contracts/**/*.yaml']
5
6jobs:
7  check:
8    runs-on: ubuntu-latest
9    steps:
10      - uses: actions/checkout@v4
11        with:
12          fetch-depth: 0
13
14      - name: Check backward compatibility
15        run: |
16          # Compare modified schemas against main branch
17          for file in $(git diff --name-only origin/main -- contracts/); do
18            echo "Checking $file..."
19            npx json-schema-diff \
20              <(git show origin/main:$file) \
21              "$file" \
22              --fail-on-breaking
23          done
24
25      - name: Validate all schemas
26        run: |
27          npx ajv validate -s meta-schema.json -d 'contracts/**/*.json'

Consumer-Driven Contract Testing with Pact

Pact Consumer-Driven Contract Flow

Consumer Side: Define Expectations

Consumer Pact test (JavaScript)javascript

1import { PactV3 } from '@pact-foundation/pact';
2
3const provider = new PactV3({
4  consumer: 'billing-service',
5  provider: 'user-service',
6});
7
8describe('User API Contract', () => {
9  it('returns user profile for billing', async () => {
10    // Define what the consumer expects
11    await provider
12      .given('user u123 exists')
13      .uponReceiving('a request for user profile')
14      .withRequest({ method: 'GET', path: '/api/v2/users/u123' })
15      .willRespondWith({
16        status: 200,
17        headers: { 'Content-Type': 'application/json' },
18        body: {
19          user_id: 'u123',           // billing uses this field
20          email: '[email protected]', // billing uses this field
21          plan: 'pro',               // billing uses this field
22        },
23      });
24
25    await provider.executeTest(async (mockServer) => {
26      const response = await fetch(
27        `${mockServer.url}/api/v2/users/u123`
28      );
29      const user = await response.json();
30
31      // Assert the fields billing actually depends on
32      expect(user.user_id).toBeDefined();
33      expect(user.email).toBeDefined();
34      expect(user.plan).toMatch(/^(free|pro|enterprise)$/);
35    });
36  });
37});

Producer Side: Verify Contracts

Producer Pact verificationjavascript

1import { Verifier } from '@pact-foundation/pact';
2
3describe('User Service Provider Verification', () => {
4  it('satisfies all consumer contracts', async () => {
5    const verifier = new Verifier({
6      providerBaseUrl: 'http://localhost:3000',
7      pactBrokerUrl: process.env.PACT_BROKER_URL,
8      provider: 'user-service',
9      publishVerificationResult: true,
10      providerVersion: process.env.GIT_SHA,
11
12      // Set up test data for each contract state
13      stateHandlers: {
14        'user u123 exists': async () => {
15          await db.users.create({
16            user_id: 'u123',
17            email: '[email protected]',
18            plan: 'pro',
19          });
20        },
21      },
22    });
23
24    await verifier.verifyProvider();
25    // If any consumer contract is broken, this test FAILS
26    // preventing the producer from deploying
27  });
28});

Schema Registry

A Schema Registry centralizes schema storage and enforces compatibility checks. When a producer registers a new schema version, the registry verifies it is compatible with the previous version before allowing it:

Registry	Format Support	Compatibility Modes	Best For
Confluent Schema Registry	JSON Schema, Avro, Protobuf	BACKWARD, FORWARD, FULL, NONE	Kafka-centric systems
Apicurio Registry	JSON Schema, Avro, Protobuf, OpenAPI	Same as Confluent	Multi-protocol, open source
AWS Glue Schema Registry	JSON Schema, Avro	BACKWARD, FULL, NONE	AWS-native environments

Confluent Schema Registry: register and check compatibilitybash

1# Register a new schema version
2curl -X POST http://schema-registry:8081/subjects/user-profile-value/versions \
3  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
4  -d '{
5    "schemaType": "JSON",
6    "schema": "{"type":"object","required":["user_id","email"],"properties":{"user_id":{"type":"string"},"email":{"type":"string"},"plan":{"type":"string","enum":["free","pro","enterprise"]}}}"
7  }'
8
9# Check compatibility before registering
10curl -X POST http://schema-registry:8081/compatibility/subjects/user-profile-value/versions/latest \
11  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
12  -d '{"schemaType": "JSON", "schema": "..."}'
13# Response: {"is_compatible": true} or {"is_compatible": false}

Anti-Patterns

1. Schemas in Multiple Places

When the schema is defined in the producer code, duplicated in the consumer code, documented in Confluence, and partially described in the OpenAPI spec, no single source of truth exists. Changes in one place are not reflected elsewhere. Always have a single canonical schema file that all systems reference.

2. Implicit Contracts

An implicit contract exists when consumers depend on fields that are not formally documented. The producer team doesn't know consumers use internal_score because it was never in the contract — they remove it in a "cleanup" and break the recommendation engine.

3. No Versioning

Without version numbers on schemas, there is no way to know which version a consumer expects, no way to maintain multiple versions during migration, and no way to roll back. Every schema should have an explicit version from day one.

Best Practices

✓Store schemas in version control alongside code — treat them as first-class artifacts
✓Run automated compatibility checks in CI on every schema change PR
✓Use consumer-driven contract testing (Pact) for REST APIs
✓Use a Schema Registry for event streams (Kafka, RabbitMQ)
✓Apply semantic versioning: MAJOR for breaking changes, MINOR for additions
✓Assign explicit ownership for every data contract (team name, not individual)
✓Document field semantics beyond types: units, examples, business meaning
✗Never remove or rename a field without a major version bump and migration period
✗Never maintain the schema in multiple places — use a single source of truth
✗Never skip compatibility checks "just this once" — that is when breakage happens

Try These Tools

JSON Schema Validator

Validate JSON data against schema definitions

Open tool

JSON Compare

Diff JSON schemas to detect breaking changes

Open tool

JSON Validator

Validate JSON structure and syntax

Open tool

Continue Learning

JSON Schema

Write JSON Schema definitions for validation

JSON Migration & Versioning

Evolve schemas with backward compatibility

JSON API Responses

Design consistent API response formats

JSON in CI/CD

Automate schema validation in deployment pipelines

Frequently Asked Questions

What is a data contract?

A data contract is a formal agreement between a data producer and its consumers about the structure, types, semantics, and quality of data exchanged. For JSON, this includes the schema (field names, types, required fields), the semantic meaning of fields, SLAs (availability, latency), and change management processes. It goes beyond JSON Schema validation by adding organizational ownership and change governance.

How is a data contract different from JSON Schema?

JSON Schema defines the structure and validation rules for a JSON document. A data contract is broader: it includes the schema but also defines who owns it, how changes are proposed and approved, what the SLAs are, and how breaking changes are handled. JSON Schema is a technical artifact; a data contract is a team agreement that uses JSON Schema as one of its components.

What is consumer-driven contract testing?

Consumer-driven contract testing (popularized by Pact) reverses the traditional approach: instead of the producer defining the API and consumers adapting, consumers define their expectations (which fields they use, what types they expect) and the producer verifies it meets all consumer contracts. This ensures no consumer breaks when the producer changes.

What counts as a breaking change in a JSON schema?

Breaking changes include: removing a field, renaming a field, changing a field type (string to number), making an optional field required, narrowing an enum (removing allowed values), changing the structure of nested objects, and changing field semantics (same name, different meaning). Non-breaking changes: adding a new optional field, widening an enum, adding a new endpoint.

What is a Schema Registry?

A Schema Registry is a centralized service that stores, versions, and validates schemas for data formats. Confluent Schema Registry (for Kafka) is the most well-known. When a producer publishes a message, the registry checks if the schema is backward-compatible with previous versions. If it is not, the message is rejected before it reaches any consumer, preventing data corruption.

How do I detect breaking changes automatically?

Use tools like json-schema-diff, openapi-diff, or Confluent Schema Registry compatibility checks in your CI/CD pipeline. When a PR modifies a schema file, the CI job compares the new schema against the currently deployed version and fails the build if a backward-incompatible change is detected. This prevents breaking changes from reaching production.

What is semantic versioning for JSON schemas?

Apply SemVer to your schemas: MAJOR version for breaking changes (field removals, type changes), MINOR version for backward-compatible additions (new optional fields), PATCH for documentation or description changes. When a schema bumps its major version, consumers must explicitly upgrade — they are not automatically affected.