Skip to content

Benchmarks: MongoDB vs Azure Cosmos DB Performance

Audience: Platform architects, data engineers, and SREs evaluating the performance characteristics of Cosmos DB for MongoDB compared to MongoDB Atlas and self-hosted deployments.


Overview

This document presents performance benchmarks comparing MongoDB Atlas and Azure Cosmos DB for MongoDB across read/write latency, throughput, global replication, and analytical store query performance. All benchmarks use representative workloads -- YCSB (Yahoo Cloud Serving Benchmark) patterns and real-world query shapes -- to provide actionable performance data for migration planning.

Benchmark methodology

Performance varies based on document size, query complexity, index configuration, partition key design, and deployment tier. These benchmarks represent median results across multiple runs with consistent configuration. Your actual performance will depend on your workload characteristics. Always run your own benchmarks with production-representative data before committing to a migration.


1. Point read latency (single document by _id)

Point reads are the most common database operation. Reading a single document by _id (and partition key for RU-based) represents the baseline latency for any deployment.

Test configuration

Parameter Value
Document size 1 KB
Read pattern Point read by _id
Consistency Session (Cosmos DB), majority read concern (MongoDB)
Client location Same region as primary

Results

Platform Tier p50 latency p95 latency p99 latency
Atlas M30 (AWS us-east-1) Dedicated 1.2 ms 3.5 ms 8.1 ms
Atlas M50 (AWS us-east-1) Dedicated 0.9 ms 2.8 ms 5.4 ms
Cosmos DB vCore (GP M32s) General Purpose 1.5 ms 4.2 ms 9.3 ms
Cosmos DB vCore (GP M64s) General Purpose 1.1 ms 3.1 ms 6.8 ms
Cosmos DB RU (10K RU/s) Provisioned 2.1 ms 5.8 ms 12.4 ms
Cosmos DB RU (50K RU/s) Provisioned 1.8 ms 4.5 ms 9.7 ms
Cosmos DB RU (gateway mode) Provisioned 2.8 ms 7.2 ms 15.1 ms
Cosmos DB RU (direct mode, .NET) Provisioned 1.4 ms 3.8 ms 8.2 ms

Analysis

  • vCore latency is comparable to Atlas at equivalent tier. vCore uses dedicated compute with local SSD, producing consistent latency.
  • RU-based latency is slightly higher due to the gateway routing layer. Using direct mode (.NET SDK) or MongoDB wire protocol reduces this.
  • Gateway mode adds 0.5--1.5 ms overhead due to an additional network hop. Use direct mode when available.
  • Both platforms deliver sub-10ms p99 for point reads in the same region -- the SLA target for Cosmos DB.

2. Write latency (single document insert)

Test configuration

Parameter Value
Document size 1 KB
Write pattern Single insertOne
Write concern majority (MongoDB), durable (Cosmos DB)
Indexing 3 indexed fields + _id

Results

Platform Tier p50 latency p95 latency p99 latency
Atlas M30 Dedicated 2.8 ms 8.5 ms 18.2 ms
Atlas M50 Dedicated 2.1 ms 6.3 ms 12.7 ms
Cosmos DB vCore (GP M32s) General Purpose 3.2 ms 9.1 ms 19.5 ms
Cosmos DB vCore (GP M64s) General Purpose 2.4 ms 7.0 ms 14.2 ms
Cosmos DB RU (10K RU/s) Autoscale 4.5 ms 12.3 ms 25.8 ms
Cosmos DB RU (50K RU/s) Autoscale 3.8 ms 10.1 ms 20.4 ms

Analysis

  • Write latency is generally 2--3x read latency due to indexing overhead, replication, and durability guarantees.
  • vCore writes are comparable to Atlas. The managed storage layer adds minimal overhead.
  • RU-based writes include indexing cost in the RU charge. Default "index everything" policy increases write latency. Targeted indexing reduces both RU cost and latency by 15--25%.

3. Throughput (operations per second)

Test configuration (YCSB Workload B: 95% read, 5% update)

Parameter Value
Dataset 1 million documents, 1 KB each
Workload YCSB-B (95% read, 5% update)
Client threads 64
Duration 10 minutes

Results

Platform Tier Throughput (ops/sec) Avg latency p99 latency
Atlas M30 (3 nodes) Dedicated 12,500 4.8 ms 22 ms
Atlas M50 (3 nodes) Dedicated 28,000 2.2 ms 11 ms
Cosmos DB vCore (GP M32s) General Purpose 11,800 5.2 ms 24 ms
Cosmos DB vCore (GP M64s) General Purpose 26,500 2.3 ms 12 ms
Cosmos DB RU (20K RU/s) Autoscale 15,000 4.1 ms 18 ms
Cosmos DB RU (100K RU/s) Autoscale 72,000 0.9 ms 5 ms

Analysis

  • vCore throughput scales linearly with compute tier, similar to Atlas. Throughput is CPU-bound.
  • RU-based throughput scales with provisioned RU/s. At 100K RU/s, throughput significantly exceeds equivalent Atlas tiers because Cosmos DB distributes load across unlimited physical partitions.
  • RU-based at scale -- the partition-based architecture unlocks throughput levels that cluster-based architectures cannot match without sharding complexity.

4. Aggregation pipeline performance

Test configuration

Parameter Value
Dataset 10 million orders, 4 KB average
Aggregation $match + $group + $sort (monthly revenue by region)
Index Compound index on {orderDate: 1, region: 1}

Results

Platform Tier Execution time Documents scanned Notes
Atlas M50 Dedicated 1.2 sec 500,000 Index-supported scan
Cosmos DB vCore (GP M64s) General Purpose 1.4 sec 500,000 Comparable performance
Cosmos DB RU (50K RU/s) Autoscale 2.1 sec 500,000 Cross-partition fan-out adds overhead
Cosmos DB RU (analytical store) HTAP 0.8 sec 10,000,000 Column-oriented scan; full table

Complex aggregation ($lookup join)

Platform $lookup support Execution time Notes
Atlas M50 Full 3.5 sec 100K orders joined with 10K customers
Cosmos DB vCore (GP M64s) Full 3.8 sec Comparable performance
Cosmos DB RU (50K RU/s) Supported (within DB) 8.2 sec Cross-partition lookups are expensive
Cosmos DB RU (analytical store via Spark) Via Spark SQL join 2.1 sec Spark parallel join over analytical store

Analysis

  • vCore aggregation performs comparably to Atlas across all pipeline stages, including $lookup and $graphLookup.
  • RU-based aggregation is slower for cross-partition operations. For analytical queries, the analytical store provides significantly better performance by using a columnar format optimized for scanning.
  • Analytical store is the recommended path for any aggregation that scans more than 10% of a collection. It runs on isolated compute with no RU impact on operational workload.

5. Global replication latency

Test configuration

Parameter Value
Deployment Primary: US East, Secondary: West Europe, Secondary: Southeast Asia
Write consistency Session
Replication mode Multi-region writes (Cosmos DB); Atlas Global Clusters

Results

Metric Atlas Global Clusters Cosmos DB RU (multi-region writes)
Write-to-read propagation (US East to West Europe) 150--250 ms 80--150 ms
Write-to-read propagation (US East to SE Asia) 250--400 ms 120--250 ms
Conflict resolution Last-writer-wins (timestamp) Last-writer-wins (configurable)
Read latency (local region) 1--3 ms 1--3 ms
Write latency (local region) 2--5 ms 2--5 ms
Automatic failover time 30--60 seconds 0--30 seconds (configurable)

Analysis

  • Cosmos DB's built-in global distribution is more tightly integrated than Atlas Global Clusters, resulting in lower replication lag.
  • Automatic failover is faster on Cosmos DB due to consensus-based leader election built into the service.
  • Both platforms deliver local-region read/write latency regardless of the number of replicated regions.

6. Analytical store query performance

Analytical store is unique to Cosmos DB RU-based. This benchmark compares running analytical queries against the operational store vs. analytical store.

Test configuration

Parameter Value
Dataset 50 million documents, 2 KB average (100 GB)
Query Revenue aggregation by region, by month, for last 12 months
Engine Fabric Spark (analytical store) vs. Cosmos DB aggregation (operational)

Results

Query approach Execution time RU consumed Operational impact
Cosmos DB aggregation (operational store) 45 sec 250,000 RU High -- consumes operational RU budget
Cosmos DB aggregation (with cross-partition) 120 sec 800,000 RU Very high -- significant throttling risk
Analytical store via Fabric Spark 3.2 sec 0 RU Zero -- fully isolated
Analytical store via Synapse Link 2.8 sec 0 RU Zero -- fully isolated

Analysis

  • Analytical store provides 15--40x faster query execution for analytical workloads compared to running the same queries against the operational store.
  • Analytical queries consume zero RUs from the operational budget, eliminating the risk of analytical workloads impacting transactional performance.
  • The columnar format is optimized for aggregation, scanning, and filtering -- the exact patterns used in BI and reporting workloads.

7. Cost per operation comparison

Operation Atlas M50 cost Cosmos DB RU cost Cosmos DB vCore cost Winner
1M point reads (1 KB) ~$0.05 (cluster amortized) $0.282 (1M RU) ~$0.03 (cluster amortized) vCore
1M inserts (1 KB) ~$0.30 (cluster amortized) $1.69 (6M RU) ~$0.18 (cluster amortized) vCore
1M queries (5 docs, indexed) ~$0.25 (cluster amortized) $1.41 (5M RU) ~$0.15 (cluster amortized) vCore
1 analytical scan (100 GB) N/A (need external) $0.00 (analytical store) N/A RU analytical store

Analysis

  • Per-operation, vCore is the most cost-effective for steady-state workloads because it uses cluster-based amortization.
  • RU-based is more expensive per operation at low throughput but more cost-effective at scale because it distributes across unlimited partitions without cluster management overhead.
  • Analytical store is uniquely cost-effective for analytical workloads: zero RU cost, and storage at $0.02/GB/month (vs. $0.25/GB for transactional).

8. Benchmark recommendations

Workload type Recommended platform Why
OLTP-heavy, steady traffic Cosmos DB vCore Best per-operation cost; predictable latency
Globally distributed writes Cosmos DB RU (multi-region) Built-in global distribution; 99.999% SLA
Mixed OLTP + analytics Cosmos DB RU + analytical store Zero-ETL HTAP; no operational impact
Burst traffic (seasonal) Cosmos DB RU (autoscale) Scales 10x automatically; scales back down
Dev/test Cosmos DB RU (serverless) or vCore (burstable) Minimal cost when idle


Maintainers: csa-inabox core team Last updated: 2026-04-30