ClimsTech
Cloud architecture27 Aug 2025

Postgres or MongoDB? Choosing without the dogma

Both databases have converged on each other's best features. The real differentiator in 2025 is your query pattern, not the data model — and getting that wrong is expensive to unwind.

ClimsTech Engineering · 19 min read · Sources verified 2 Jul 2026

The SQL vs. NoSQL debate died somewhere around 2018 when MongoDB shipped multi-document ACID transactions in version 4.0, and quietly before that in 2014 when PostgreSQL shipped JSONB — binary JSON with first-class indexing support. Engineers who still argue in those terms are arguing about a world that no longer exists. The actual question worth asking is narrower and more honest: when your application reads data, does it navigate a web of relationships between entities, or does it fetch self-contained units that already contain everything they need? That access pattern — not the database paradigm, not the marketing — determines which database will give you less friction.

The cost of getting this wrong is real. Discord blogged in 2017 about moving their messages table from MongoDB to Apache Cassandra when the document model could not handle their time-series write workload; they moved again to ScyllaDB in 2023. Notion published "Sharding Postgres at Notion" in December 2021, documenting how they ran 96 RDS shards under serious load — a reminder that the relational model can scale far further than most teams ever need to push it. Neither migration was cheap, and both were driven entirely by access patterns, not ideology.

PostgreSQL adoption by professional developers

55.6%

Developers using PostgreSQL

2025, up from 48.7% in 2024

5th

MongoDB 'most wanted' rank

down from #1 in 2017–2020

+6.9pp

PostgreSQL YoY growth

largest single-year gain on record

Source: Stack Overflow Developer Survey, 2025 (50,000 respondents, 177 countries)

Both databases grew up — and that changes the calculation

PostgreSQL 9.4 (December 2014) introduced JSONB: a binary-encoded, indexable JSON type that stores data in a decomposed format rather than as raw text. Unlike the original json type — which preserves whitespace and key order and is essentially a blob — JSONB supports GIN and GIST indexes, full containment queries with the @> operator, and since PostgreSQL 17 (September 2024), the full SQL/JSON path language including JSON_TABLE, JSON_EXISTS, JSON_VALUE, and JSON_QUERY. On the document side, MongoDB's version 4.0 (2018) added multi-document ACID transactions. Since then, it has added schema validation via $jsonSchema, a mature aggregation pipeline, Atlas Vector Search, time-series collections, and change streams for event-driven architectures.

The practical implication: you cannot justify choosing MongoDB because "Postgres doesn't do documents," and you cannot justify choosing Postgres because "Mongo doesn't do transactions." Both justifications were always shaky; now they are simply false. What remains is a genuine performance and ergonomics tradeoff rooted in how each engine stores and retrieves data internally, and how that maps to your query shape.

Neither database escaped its storage model. PostgreSQL stores data in heap files, row by row, with MVCC (Multi-Version Concurrency Control) for concurrent access. MongoDB stores BSON documents in a WiredTiger B-tree with document-level locking. When you join two Postgres tables, the planner may choose a nested-loop join, a hash join, or a merge join based on statistics — all transparent to the query. When you "join" in MongoDB, you use $lookup in an aggregation pipeline, and the performance is considerably more sensitive to whether the joined collection fits in working memory. These structural differences do not go away regardless of which features each product adds.

The relational model: what it buys you at query time

When your data has relationships — an order references a customer, a customer has many addresses, a product belongs to a category hierarchy — the relational model is not a constraint; it is the access pattern you wanted anyway. Normalization means each fact lives in one place, and joins reconstruct whatever logical view you need at query time without duplicating data across the schema.

Postgres's query planner maintains per-column statistics: histograms, most-common values, null fractions, and correlation with physical row order. It uses these to estimate row counts through each node of a query plan and make cost-based decisions about join order, join algorithm, and index use. This is why ad-hoc analytical queries run predictably well in Postgres: you did not have to predict your query shape at schema-design time.

-- Sales summary per region for the last 90 days, with running total
WITH regional_sales AS (
  SELECT
    c.region,
    DATE_TRUNC('week', o.created_at)    AS week,
    SUM(oi.quantity * oi.unit_price)     AS revenue
  FROM orders o
  JOIN order_items oi ON oi.order_id  = o.id
  JOIN customers   c  ON c.id         = o.customer_id
  WHERE o.created_at >= NOW() - INTERVAL '90 days'
  GROUP BY c.region, DATE_TRUNC('week', o.created_at)
)
SELECT
  region,
  week,
  revenue,
  SUM(revenue) OVER (PARTITION BY region ORDER BY week) AS running_total
FROM regional_sales
ORDER BY region, week;

This works because the planner pushes the date filter down to orders, uses an index on created_at, and hash-joins the result against the smaller lookup tables. Adding a new reporting dimension — say, product category — requires joining one more table, not redesigning the schema or pre-computing an aggregation pipeline.

Postgres also gives you row-level security, partial indexes, generated columns, and foreign key enforcement as first-class engine features:

-- Row-level security: each customer sees only their own orders
CREATE POLICY customer_isolation ON orders
  USING (customer_id = current_setting('app.current_customer_id')::bigint);
 
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

Enforcing isolation in the database means no code path in your application can accidentally bypass the constraint. This is a different category of guarantee from application-layer checks.

The document model: what it actually buys you

The document model solves a different problem. When your data is naturally hierarchical and self-contained — a product catalog entry with variable attributes, a user profile with embedded preferences, a log event with arbitrary fields — fetching a single document is a single B-tree lookup on a single collection with no joins. At high read concurrency, that simplicity compounds.

More importantly, documents accommodate schema evolution gracefully. If one product category needs five attributes and another needs twenty different ones, a document store handles this without NULL-heavy rows or a separate attribute table (the Entity-Attribute-Value anti-pattern that haunts many relational schemas). The tradeoff is that schema logic moves from the database into your application code, which is a maintenance cost that compounds over time.

MongoDB's aggregation pipeline is worth treating seriously in its own right. It is a dataflow DSL that composes well for document-centric transformations:

// Top-rated electronics: joined, filtered, paged
db.products.aggregate([
  { $match: { category: "electronics", status: "active" } },
  { $lookup: {
      from: "reviews",
      localField: "_id",
      foreignField: "product_id",
      as: "reviews"
  }},
  { $addFields: {
      avg_rating:   { $avg: "$reviews.rating" },
      review_count: { $size: "$reviews" }
  }},
  { $match: { avg_rating: { $gte: 4.0 } } },
  { $sort:  { review_count: -1 } },
  { $skip:  0 },
  { $limit: 20 }
])

Notice the $lookup stage: this is the equivalent of a SQL join, and it performs well when the joined collection fits comfortably in the WiredTiger cache. When it does not — when reviews is large and rarely accessed together — the aggregation pipeline stalls on disk I/O. This is the document model's structural limit: $lookup is fast when your data is already shaped like your query, and expensive when it is not. That sentence is a reasonable summary of the entire database choice.

MongoDB's horizontal sharding is also genuinely first-class in a way that Postgres is not, out of the box. The mongos router, config servers, and shard key allow you to distribute a collection across many nodes with the application talking to a single endpoint. Postgres sharding options — Citus (now Azure Cosmos DB for PostgreSQL), foreign data wrappers, or manual application-level routing — each carry operational complexity that MongoDB's native sharding sidesteps, at the cost of reduced query flexibility.

JSONB: Postgres in document mode — and its limits

JSONB is a legitimate solution for semi-structured data that lives alongside relational data. Understanding where it breaks down is the key to using it correctly.

JSONB supports two index types worth knowing:

-- GIN index on the full column: containment queries and key-existence checks
CREATE INDEX idx_metadata_gin ON products USING GIN (metadata);
 
-- Expression index for a high-cardinality path you query frequently
CREATE INDEX idx_metadata_sku ON products ((metadata->>'sku'));
 
-- Containment query (uses GIN index)
SELECT id, name
FROM products
WHERE metadata @> '{"category": "electronics", "brand": "Acme"}';
 
-- SQL/JSON path language (PostgreSQL 17+)
SELECT id, JSON_VALUE(metadata, '$.sku') AS sku
FROM products
WHERE JSON_EXISTS(metadata, '$.variants[*] ? (@.in_stock == true)');

Where JSONB breaks down is less obvious. The PostgreSQL statistics collector operates at the column level — it has no visibility into the keys or value distributions inside a JSONB document. The query planner cannot estimate how many rows a JSONB containment filter will match, so it falls back to a fixed estimate (typically around 1% of rows for @> queries). On a large table, this causes the planner to underestimate selectivity and potentially choose a sequential scan where an index seek would be better, or overestimate it and pick a nested-loop join that blows up. You can partially mitigate this with per-path expression indexes, but you are taking on a planning burden that simply does not exist with typed columns.

The second JSONB pitfall is update amplification under MVCC. When you update a JSONB column, PostgreSQL writes a new version of the entire row — even if you changed only one key inside the document. For documents stored via TOAST (above roughly 2 KB), this means additional I/O and faster dead-tuple accumulation. On write-heavy tables with large JSONB columns, tune autovacuum_vacuum_scale_factor down aggressively:

ALTER TABLE events SET (
  autovacuum_vacuum_scale_factor = 0.01,  -- vacuum at 1% dead tuples; default is 20%
  autovacuum_vacuum_cost_delay   = 2      -- ms; lower means more aggressive
);

Better still: decompose frequently-written keys into typed columns and reserve JSONB for genuinely variable structure. The JSONB update problem is a schema design signal, not just a configuration problem.

relational

JSONB in PostgreSQL

  • Stored alongside typed columns — one query retrieves the full record
  • GIN indexes enable containment and key-existence queries
  • SQL/JSON in PG17 adds JSON_TABLE, JSON_EXISTS, JSON_VALUE
  • No intra-document column statistics; planner estimates are rough
  • MVCC rewrites the whole row on every update — autovacuum pressure on write-heavy tables
  • Schema migrations tracked via Flyway or Liquibase; full change history
document

Native documents in MongoDB

  • Document is the storage unit — single-collection reads need one B-tree lookup
  • Compound indexes and covered queries are fine-grained and well-planned
  • Schema validation via $jsonSchema is opt-in but as strict as you configure it
  • In-place updates with $set and $inc are efficient; no row-level MVCC rewrite
  • WiredTiger compresses on disk; cache hit rate is the dominant performance variable
  • No enforced foreign keys; referential integrity lives entirely in application code
JSONB vs. native document storage: same data, different engine trade-offsSource: PostgreSQL 17 documentation; MongoDB 7 documentation; binaryigor.com benchmark, 2024

MongoDB's hidden costs

Multi-document transactions in MongoDB are real, but they are not free. The WiredTiger engine holds document locks across multiple collections for the duration of the transaction. Under load, this creates contention that does not exist in the single-document write pattern MongoDB was designed for. The moment you find yourself writing multi-document transactions to maintain consistency that a relational foreign key would enforce for free, you should reconsider whether MongoDB was the right fit for that data.

Schema validation is the other hidden cost. MongoDB markets schema flexibility as a feature, and for genuinely polymorphic data it is. But most applications need consistent shapes, and without a validation layer the database accumulates structural drift that is painful to clean up retroactively. Define $jsonSchema validators from day one:

db.createCollection("orders", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["customer_id", "items", "status", "created_at"],
      properties: {
        customer_id: { bsonType: "objectId" },
        status: { enum: ["pending", "fulfilled", "cancelled"] },
        items: {
          bsonType: "array",
          minItems: 1,
          items: {
            bsonType: "object",
            required: ["sku", "quantity", "unit_price"],
            properties: {
              quantity:   { bsonType: "int", minimum: 1 },
              unit_price: { bsonType: "decimal" }
            }
          }
        }
      }
    }
  },
  validationAction: "error"
})

The point is not that MongoDB lacks schema enforcement — it is that you own it explicitly rather than getting it from the engine's DDL. That is not inherently wrong, but it is a cost that is easy to skip until it is not.

Access pattern decision framework

Fill in the column for your dominant workload — the recommendation follows directly from the pattern.

| Workload characteristic | PostgreSQL | MongoDB | |---|---|---| | Complex joins across 3 or more tables | Strong (cost-based planner, histogram stats) | Weak ($lookup is expensive without index on join field) | | Ad-hoc reporting and analytics | Strong (window functions, CTEs, any query shape) | Moderate (aggregation pipeline, Atlas Data Federation) | | Multi-entity ACID transactions | Strong (native row-level, no contention penalty) | Moderate (multi-doc transactions, avoid in write-hot path) | | Highly variable document shape | Moderate (JSONB works; no intra-doc planner stats) | Strong (native BSON; schema validation opt-in) | | High-volume single-document reads | Moderate (index scan + heap fetch) | Strong (document is the storage unit; one B-tree lookup) | | Horizontal sharding at scale | Moderate (Citus or application-level routing) | Strong (native mongos routing with shard key) | | Schema evolution without migration window | Moderate (online DDL tools exist; migrations still required) | Strong (additive changes need no migration) | | Full-text search | Moderate (tsvector/tsquery; good enough for many uses) | Strong (Atlas Search, Lucene-based) | | Vector similarity search | Strong (pgvector, widely adopted, runs anywhere) | Moderate (Atlas Vector Search, Atlas-only) | | Referential integrity enforced by DB | Strong (foreign keys, check constraints) | None (application responsibility) | | Row-level security | Strong (native RLS policies, SQL-level) | None (application responsibility) |

No workload scores all-strong in one column. When the score is split, the operational simplicity of running one database tips toward PostgreSQL: the tooling ecosystem (Flyway, pgAdmin, psql, EXPLAIN ANALYZE, pg_stat_statements) is broader, and debugging "why is this query slow" is substantially better.

JSON document test-case wins: PostgreSQL vs. MongoDB across 17 scenarios
PostgreSQL wins9 of 17
MongoDB wins7 of 17
Draw1 of 17
Source: binaryigor.com, JSON Documents Performance benchmark, 2024

The near-even split matters. MongoDB's own engineering blog published a separate benchmark in January 2026 — 256 concurrent writers, 30-minute run, approximately 13 million existing documents — showing steadier MongoDB throughput and lower tail latency under that specific update-heavy workload. Both benchmarks are correct and both are incomplete: workload shape completely reverses the outcome, which means any benchmark that does not match your actual access pattern is noise.

Pitfalls and how to fix them

Pitfall 1: JSONB updates driving table bloat

Symptom: A table stores session or preference data as JSONB. A background job touches one key inside each document every few minutes. After days, the table is significantly larger than expected and vacuum cannot keep pace.

Why: PostgreSQL's MVCC writes a full new row version on every update, regardless of change size. For TOAST-stored documents above roughly 2 KB, this also means TOAST pointer churn. The default autovacuum_vacuum_scale_factor of 20% means autovacuum waits until 20% of the table is dead tuples before acting.

Fix: Lower the scale factor for affected tables (shown above) and decompose frequently-written fields into typed columns. The JSONB column should hold genuinely variable, infrequently-written structure.

Pitfall 2: MongoDB $lookup on an un-indexed field

Symptom: An aggregation that runs in milliseconds in development takes 30 seconds in production on a large collection.

Why: Without an index on the foreignField, MongoDB performs a full collection scan per matched document in the pipeline. Development collections were small; production is not.

Fix: Index the join field before shipping, and use explain("executionStats") to confirm index use:

// Index the foreign field first
db.reviews.createIndex({ product_id: 1 })
 
// Use a $lookup sub-pipeline to limit what gets joined
{ $lookup: {
    from: "reviews",
    localField: "_id",
    foreignField: "product_id",
    as: "reviews",
    pipeline: [
      { $sort:  { created_at: -1 } },
      { $limit: 5 }
    ]
}}

Pitfall 3: Secondary reads serving stale data

Symptom: You add readPreference: "secondary" to scale read throughput. Users intermittently see orders that appear to revert to a previous state for a few seconds after update.

Why: MongoDB replication is asynchronous. Under load, secondary lag can reach seconds. readPreference: "secondary" routes reads to potentially lagged nodes with no staleness bound.

Fix: Use readPreference: "primaryPreferred" for reads where freshness matters. Set maxStalenessSeconds for analytics reads that can tolerate some lag:

const client = new MongoClient(uri, {
  readPreference: new ReadPreference("secondary", null, {
    maxStalenessSeconds: 30
  })
})

Pitfall 4: JSONB expression as a join key

Symptom: You join across tables on metadata->>'tenant_id' with an expression index in place, but the query plan is wrong and performance is poor.

Why: Expression indexes are usable, but the planner has no histogram for a JSONB path — only the column-level statistics for the JSONB column itself. Row estimates are inaccurate, leading to a poor join strategy.

Fix: Materialize the join key as a stored generated column. The statistics collector then treats it as a regular typed column:

ALTER TABLE events
  ADD COLUMN tenant_id bigint
    GENERATED ALWAYS AS ((metadata->>'tenant_id')::bigint) STORED;
 
CREATE INDEX idx_events_tenant ON events (tenant_id);

Pitfall 5: WiredTiger cache undersized for the working set

Symptom: After a MongoDB server restart, tail latencies spike for several minutes before recovering. Under memory pressure, p99 latency is consistently high even for simple document fetches.

Why: WiredTiger maintains an in-memory cache (default: 50% of RAM minus 1 GB, minimum 256 MB). If the hot working set exceeds the cache, every cache miss goes to disk. MongoDB does not warm the cache after restart.

Fix: Size the cache explicitly and monitor eviction pressure:

storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: 12   # target 60–70% of available RAM; never leave at default on production

Monitor wiredTiger.cache.bytes currently in the cache and alert on rising tracked dirty bytes — that is the signal that eviction is not keeping pace with write load.

Polyglot or single database?

The mature answer to "which one?" is often "the one that fits your dominant workload, with a second tool only if you have a concrete, different secondary workload that justifies the operational overhead."

The common polyglot pattern: PostgreSQL as the transactional source of truth for orders, accounts, and billing; MongoDB as the denormalized read surface for high-volume, simple reads like product catalogs or activity feeds. The plumbing is typically change-data capture — Debezium reads PostgreSQL's logical replication slot, publishes changes to a Kafka topic, and a consumer writes them into MongoDB in the denormalized shape the application needs.

This is not a trivial addition. You gain an eventually-consistent read surface that developers must understand and handle correctly. You gain two sets of infrastructure to operate, monitor, and pay for. You gain two failure modes.

The threshold question is whether the simpler single-database model solves your problem first. For most applications — and especially for teams below roughly 20 engineers — it does. Notion's 96-shard Postgres setup is a useful upper bound on how far a relational model can go before you genuinely need to reconsider.

How to make the database choice
  1. 01

    Map your 10 most important queries

    List the 5 most frequent reads and 5 most frequent writes. Note how many entities each query touches and whether reads are predictable at schema-design time or genuinely ad-hoc.

  2. 02

    Count the relationships

    If most queries touch 3 or more tables with joins, relational is your natural model. If most reads fetch a single self-contained unit, document storage fits more naturally — and $lookup costs are acceptable.

  3. 03

    Assess schema stability

    Will document shape change frequently with no migration window? Schema-free documents ease that. If your data model is stable and correctness matters more than flexibility, typed columns and constraints are cheaper in the long run.

  4. 04

    Estimate scale trajectory honestly

    Both databases handle tens of millions of rows comfortably. If you are planning for partitionable data at hundreds of millions of documents with a clear, predictable shard key, MongoDB's native sharding saves real operational work.

  5. 05

    Default to PostgreSQL unless you have a specific reason not to

    The tooling, debugging story, and ecosystem maturity favor Postgres for most workloads. The burden of proof sits with MongoDB — it needs to win on a concrete access-pattern argument, not a general preference for flexibility or a discomfort with migrations.

Source: ClimsTech Engineering