Learn/Advanced Topics

Large JSON Files — Performance, Streaming & JSON Lines

When your JSON file is 100 MB, 1 GB, or larger, JSON.parse() is no longer enough. This guide covers every technique for handling large JSON: streaming, chunking, JSON Lines, and memory optimization.

Advanced~14 min read

The Problem with Large JSON

JSON.parse() loads the entire file into memory, builds a complete object tree, and only then returns. For a 500 MB file, this means ~2 GB of RAM used:

File SizeParse Time (Node.js)Memory UsedApproach
1 MB~8 ms~4 MBJSON.parse() — fine
10 MB~80 ms~40 MBJSON.parse() — acceptable
100 MB~800 ms~400 MBConsider streaming
500 MB~4 s~2 GBMust stream
1 GB+Crashes (OOM)N/AMust stream or use JSON Lines
JSON.parse() vs Streaming

JSON Lines (NDJSON)

The simplest solution for large datasets: one JSON object per line, separated by newlines.

Standard JSON Array

[
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"},
{"id": 3, "name": "Charlie"}
]

Must read entire file to validate. Can't append without rewriting.

JSON Lines (.jsonl)

{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Charlie"}

Process line by line. Append with a single write. Stream-friendly.

FeatureJSON ArrayJSON Lines
Stream processing✗ (need full file)✓ (line by line)
Append data✗ (rewrite file)✓ (append line)
Parallel processing✓ (split by lines)
File validityAll-or-nothingPer-line validity
ConcatenationComplexJust cat files together
Tool supportUniversaljq, ndjson-cli, most DBs

Streaming Parsers

Node.js — JSONStream

Stream a large JSON arrayjavascript
1import { createReadStream } from 'fs';
2import JSONStream from 'JSONStream';
3
4const stream = createReadStream('users.json', 'utf-8');
5const parser = JSONStream.parse('*'); // each array item
6
7let count = 0;
8parser.on('data', (user) => {
9 count++;
10 // Process one user at a time — constant memory
11 processUser(user);
12});
13
14parser.on('end', () => {
15 console.log(`Processed ${count} users`);
16});
17
18stream.pipe(parser);

Node.js — Process JSON Lines

Read JSON Lines filejavascript
1import { createReadStream } from 'fs';
2import { createInterface } from 'readline';
3
4const rl = createInterface({
5 input: createReadStream('events.jsonl', 'utf-8'),
6 crlfDelay: Infinity,
7});
8
9for await (const line of rl) {
10 if (line.trim()) {
11 const event = JSON.parse(line);
12 processEvent(event);
13 }
14}

Python — ijson

Stream JSON in Pythonpython
1import ijson
2
3with open('huge_data.json', 'rb') as f:
4 # Stream each item in the "users" array
5 for user in ijson.items(f, 'users.item'):
6 process_user(user)
7 # Memory stays constant regardless of file size

Command Line — jq

Process large JSON with jqtext
1# Stream array items
2jq -c '.users[]' huge_data.json | while read -r user; do
3 echo "$user" | jq '.name'
4done
5
6# Convert JSON array to JSON Lines
7jq -c '.[]' array.json > output.jsonl
8
9# Filter JSON Lines
10cat events.jsonl | jq -c 'select(.type == "error")'

Browser Strategies

Web Workers for Large JSON

Parsing large JSON blocks the main thread, freezing the UI. Move parsing to a Web Worker:

Offload to Workerjavascript
1// In the Web Worker
2self.onmessage = async ({ data: { url } }) => {
3 const res = await fetch(url);
4 const json = await res.json(); // parse off main thread
5 self.postMessage({ result: json });
6};

Fetch with Streaming

Stream JSON Lines from an APIjavascript
1const response = await fetch('/api/export');
2const reader = response.body.getReader();
3const decoder = new TextDecoder();
4let buffer = '';
5
6while (true) {
7 const { done, value } = await reader.read();
8 if (done) break;
9
10 buffer += decoder.decode(value, { stream: true });
11 const lines = buffer.split('\n');
12 buffer = lines.pop() || ''; // keep incomplete line
13
14 for (const line of lines) {
15 if (line.trim()) {
16 const item = JSON.parse(line);
17 renderItem(item); // process incrementally
18 }
19 }
20}

Optimization Techniques

TechniqueWhen to UseMemory Savings
JSON LinesLog files, data exports, ETL~95%
Streaming parserProcessing arrays item-by-item~90%
Chunked splittingParallel processingPer-chunk limit
JSON.parse with reviverFilter during parseModerate
Binary formats (MessagePack)High-performance IPC30-50% file size
Compression (gzip)Network transfer70-90% transfer size

Quick Win

If your API returns a large JSON array, add a ?format=jsonl option that returns JSON Lines instead. This lets clients stream the response and reduces memory on both sides.

Try It Yourself

This is a small JSON array. Imagine it with 10 million items — that's when you need streaming.

Try It Yourself

Validate this JSON — then think about how JSON Lines would look

Frequently Asked Questions

What is the maximum JSON file size?
The JSON spec has no size limit. Practical limits depend on the parser and available memory. JSON.parse() in Node.js can handle ~500 MB with enough RAM. For larger files, use streaming parsers.
What is JSON Lines (NDJSON)?
JSON Lines is a format where each line is a separate, valid JSON object, separated by newlines. It enables streaming: you can process one line at a time without loading the entire file into memory.
How do I process a 10 GB JSON file?
Use a streaming parser like JSONStream (Node.js), ijson (Python), or jq (command line). These read the file incrementally without loading it all into memory.
Why is JSON.parse() slow for large files?
JSON.parse() must read the entire string into memory, parse it into tokens, and build the complete object tree before returning. This uses ~3-5x the file size in memory.
Should I use JSON Lines instead of JSON arrays for large datasets?
Yes, for most cases. JSON Lines allows line-by-line streaming, easy appending, and parallel processing. A single JSON array requires the entire file to be valid before any processing.