Case Study · Confidential analytics platform, North America

10x Ingest Throughput on a PostgreSQL Analytics Pipeline Using UNLOGGED Tables

How UnlockLive used PostgreSQL UNLOGGED tables, batched COPY ingestion, and a disciplined two-table pattern to push a high-volume analytics pipeline from 8K to 80K events/sec — without losing the data integrity guarantees the audited path required.

IndustryData / Analytics
Year2024
CountryUSA
Duration3 months

Talk to the team that built this

10x Ingest Throughput on a PostgreSQL Analytics Pipeline Using UNLOGGED Tables hero screenshot

At-a-glance results

10xSustained ingest throughput (8K → 80K+ events/sec)
60%Lower RDS IOPS cost on the same instance class
<50msp95 stage-1 insert latency under sustained load
0Data-integrity regressions on the audited reporting path

The challenge

An analytics platform was ingesting product events from a growing customer base into a single "raw_events" table. The original schema was correct but slow: every event hit a logged table with seven indexes, the WAL was the bottleneck, and ingest throughput had topped out around 8,000 events per second. RDS IOPS were the dominant cost line, and a planned 3x customer expansion was going to break the pipeline before the next contract was signed.

The team had read the usual advice — "use a queue," "shard the table," "move to a dedicated TSDB" — but those answers all meant a multi-quarter migration. The real question was: can PostgreSQL keep up if we stop fighting it? The catch was a hard requirement that anything that reached the audited reporting tables had to be durable, consistent, and indexed. We couldn't sacrifice integrity on the analytics side to save the ingest side.

Our solution

We split the pipeline into two clearly separated stages with two different durability contracts.

Stage 1 (ingest): an UNLOGGED staging table that takes the firehose. UNLOGGED tables in PostgreSQL skip WAL writes for inserts, which is exactly the right trade-off for short-lived staging data — we get 5-10x raw insert throughput in exchange for losing the staged rows if the server crashes. Combined with batched COPY (not INSERT) from a Python ingest service and a deliberate "no indexes on the staging table" rule, raw ingest jumped from 8K to 80K+ events/sec on the same RDS instance.

Stage 2 (durable): a Celery worker drains the staging table in micro-batches into the real, fully-logged, indexed `events` table inside a single transaction with idempotent upserts. The audited reporting path only ever reads from the durable table. If the server crashes mid-ingest, we lose at most a few seconds of in-flight events from the staging table — and the upstream producers retry, so the durable table converges back to correct.

We paired this with a small but careful operational layer: a Grafana dashboard for stage-1 fill rate, a hard alert if the drainer falls behind, partition rotation on the durable table, and a written runbook for the three failure modes that matter.

UNLOGGED staging table with no indexes — engineered specifically for raw insert throughput
Batched COPY ingest from a Python/FastAPI service, not row-at-a-time INSERTs
Celery drainer moving micro-batches into the durable, fully-indexed events table
Idempotent upserts (ON CONFLICT DO NOTHING) so producer retries are safe
Monthly partitioning on the durable table to keep vacuum and index work bounded
End-to-end lag dashboards and alerts in Grafana / Datadog
Written runbook covering RDS failover, drainer crash, and producer back-pressure
Audit sign-off on the durability profile before launch — no surprises

How we built it

01
Profiling the real bottleneck
Before touching the schema we ran a week of pg_stat_statements, RDS Performance Insights, and a custom WAL throughput tracker. The data was unambiguous: WAL writes and index maintenance on the single fat events table accounted for 78% of write time. The CPU and memory weren't the problem — durability was.
02
Designing the two-table contract
We wrote a short design doc that defined exactly what UNLOGGED bought us, what it cost us, and what guarantees the upstream producers had to provide (at-least-once delivery, idempotent event IDs). The contract made the loss profile explicit so the audit team could sign off in advance — "can lose up to N seconds of staged events on RDS failover; durable table is unaffected."
03
Batched COPY, micro-batched drainer
We replaced row-at-a-time inserts with a Python ingest service that buffers events in-memory for up to 250ms and flushes via PostgreSQL COPY into the UNLOGGED staging table. A Celery drainer pulls 5,000-row micro-batches into the durable table inside a single transaction with ON CONFLICT DO NOTHING for idempotency.
04
Operations, partitioning, runbook
We added monthly partitioning to the durable table (so vacuum and index rebuilds stay cheap), Grafana dashboards for end-to-end lag, alerts on staging fill and drainer lag, and a written runbook covering the three real failure modes: drainer crash, RDS failover, and producer back-pressure. Handed off to the client's on-call team with shadowing.

Tech stack

PostgreSQL 16
Python
FastAPI
Celery
Redis
AWS RDS
AWS S3
Grafana
Datadog

Backend Performance Engineering
Python & FastAPI
Cloud Solutions
Database Architecture

Inside 10x Ingest Throughput on a PostgreSQL Analytics Pipeline Using UNLOGGED Tables

SQL editor — UNLOGGED staging table and partitioned durable events table

Two-stage ingest pipeline — UNLOGGED staging draining into LOGGED partitioned table

Sustained 30-min load test report — 86,210 events/sec, zero errors

“We expected to spend a quarter migrating to a dedicated time-series database. Instead UnlockLive showed us how to make Postgres do the job — and our auditors signed off on the new design before we shipped it.”

Head of Data Engineering · Analytics platform (name confidential)

Frequently asked questions

Are PostgreSQL UNLOGGED tables safe to use in production?

Yes — for the right job. UNLOGGED tables skip WAL writes, which gives you 5-10x raw insert throughput in exchange for a documented loss profile: rows are dropped on a server crash and not replicated. They're a great fit for short-lived staging data when an upstream system can replay events. They're a terrible fit for anything you need to read back authoritatively. We always pair them with a durable, logged table the audited path actually reads from.

Why not just use a dedicated time-series database like TimescaleDB or ClickHouse?

Because the migration cost and operational surface area are real. If your team already runs PostgreSQL well and your throughput target is in the tens of thousands of events per second, the UNLOGGED + batched COPY pattern often gets you there in weeks instead of quarters. We recommend a dedicated TSDB when the workload genuinely needs columnar storage, time-partitioned indexes, or millisecond range queries at scale.

Can I add indexes to a PostgreSQL UNLOGGED staging table?

You can, but you almost shouldn't. The whole point of the staging tier is raw insert speed; every index you add costs you ingest throughput. We keep the staging table index-free and put all the indexes on the downstream durable table that the analytics queries actually run against.

How do you guarantee no data loss when using UNLOGGED tables in a pipeline?

Two rules. First, the upstream producers must give you at-least-once delivery with idempotent event IDs so retries are safe. Second, the durable side of the pipeline only ever reads from the fully-logged downstream table, never from the UNLOGGED staging table. The worst case becomes a few seconds of replayable staged events on a crash — never a missing report.

What ingest throughput can you get out of a single PostgreSQL instance with this pattern?

It depends on row size, network, and instance class, but on a modest AWS RDS db.r6g.xlarge with this exact two-table pattern we routinely see 60K-100K events per second sustained, with p95 stage-1 insert latency under 50ms. Beyond that we'd partition the staging table or scale horizontally with a write router.

Want a result like this?

Talk to the same team that built 10x Ingest Throughput on a PostgreSQL Analytics Pipeline Using UNLOGGED Tables. We’ll scope your project, give you a fixed-price proposal, and show you the closest analog from our portfolio.

Book a strategy call

10x Ingest Throughput on a PostgreSQL Analytics Pipeline Using UNLOGGED Tables

At-a-glance results

The challenge

Our solution

How we built it

Profiling the real bottleneck

Designing the two-table contract

Batched COPY, micro-batched drainer

Operations, partitioning, runbook

Tech stack

Inside 10x Ingest Throughput on a PostgreSQL Analytics Pipeline Using UNLOGGED Tables

Frequently asked questions

Want a result like this?

Quick Links

Services

Locations

10x Ingest Throughput on a PostgreSQL Analytics Pipeline Using UNLOGGED Tables

At-a-glance results

The challenge

Our solution

How we built it

Profiling the real bottleneck

Designing the two-table contract

Batched COPY, micro-batched drainer

Operations, partitioning, runbook

Tech stack

Inside 10x Ingest Throughput on a PostgreSQL Analytics Pipeline Using UNLOGGED Tables

Frequently asked questions

Related case studies

Cutting B2B SaaS API p95 Latency 8x with a Production Redis Caching Layer

Event-Driven Canvas LMS Integration at 50K+ Daily API Calls for a North American University

End-to-End Automated Lecture Capture: Canvas + Panopto + Python Orchestration

Want a result like this?

Quick Links

Services

Locations