Databases in Production · 8 of 8

Read replicas and multi-DB in Rails

When read replicas help, when they do not, what replication lag breaks in practice, Rails 6+ multi-database role API, and the "we added a replica and the app got slower" failure mode that catches teams who reach for replicas too early.

Where this rule comes from

Postgres has had streaming replication since version 9.0 (2010). A primary database streams its write-ahead log (WAL) to one or more standby servers. The standbys replay the WAL and stay close to the primary's state. Reads can be served from a standby, taking read load off the primary.

Rails 6 (2019) added native support for talking to multiple databases. The connects_to declaration on a model maps a role (writing, reading) to a database connection. Switching connections at runtime is done via ActiveRecord::Base.connected_to(role: :reading) do ... end. The middleware that ships with Rails routes GET/HEAD requests to the reading role by default.

The senior question is when replicas actually pay back, and when they make the app worse. Both answers are common in production. Adding a replica without a clear use case is one of the most consistent ways to make a Rails app slower and harder to reason about. The replica is a tool for specific shapes of workload, not a generic performance fix.

The anti-pattern: "we added a replica and the app got slower"

A team is paged for occasional slow queries on the primary. They add a read replica and route all reads to it. The next week, the app is consistently slower, replication lag is alerting, and someone notices that one specific feature is now showing yesterday's data.

Three things went wrong.

1. The replica's buffer cache is cold. The primary has been running for months; its shared_buffers contain the working set in memory. The replica is fresh; its cache is empty. The first time the replica serves a query, it reads pages from disk that the primary would have served from RAM. The query is slower on the replica than it would have been on the primary, even though the replica was supposed to be the fast path.

2. Replication lag turned into bugs. Users who saved a setting were redirected to a page that read from the replica, which had not yet received the WAL entry for the save. The page showed the old setting. Users assumed the save did not work, saved again, watched it fail to stick, and filed support tickets.

3. Connection pool exhaustion got worse. Before the replica, Puma processes had one pool of connections. After, each process needed two pools: writing and reading. Sized for peak, that meant double the connection count. The Postgres primary's max_connections setting was hit during peak traffic and new sessions failed entirely.

The replica was correct for some workloads (reporting, analytics, long-running OLAP queries that should not compete with OLTP traffic) and wrong for others (user-facing reads where consistency matters). Adding it everywhere as a blanket fix was the actual mistake.

When replicas help

Three workload shapes benefit from replicas.

1. Reporting and analytics. Long-running aggregation queries that touch huge amounts of data should not run on the primary, where they compete with OLTP traffic for resources. A dedicated reporting replica with the analytics queries pinned to it isolates the workload. The replica can be sized differently (more memory, more parallel workers) and lag does not matter because reports do not need real-time data.

2. Read-heavy endpoints with cacheable shape. A public homepage that lists posts, a marketing page that shows stats, an API endpoint that returns lookup data: these read but do not write, do not need write-after-read consistency, and benefit from being served by a replica that the primary does not have to compete with.

3. Geographic read distribution. A US-primary app with European users can put a replica in Frankfurt and serve European read traffic from it. The latency win for those users is real; the cost is one more piece of infrastructure to monitor.

Each of these has the same property: the read traffic is identifiable, isolated, and tolerant of staleness. Reads that do not have these properties belong on the primary.

When they do not

Four cases where replicas are the wrong tool.

1. Read-your-own-writes. A user saves a profile change, then navigates to the profile page. The save went to the primary; the page reads from the replica. If replication lag is non-zero (and it always is), the page shows the old data. The user thinks the save failed. Any user-facing read that happens shortly after a write the user made must hit the primary.

2. Transactions that mix reads and writes. A request that starts a transaction must keep all of it on one connection. Switching mid-transaction breaks atomicity. Rails handles this correctly out of the box (transactions stay on the writing connection), but custom code that uses connected_to blocks inside a transaction can get this wrong.

3. Low-volume apps where the primary is not saturated. A replica is operational overhead: another database to monitor, alert on, rotate credentials for, run migrations against (replicas usually replay the migration automatically, but configuration changes do not). For an app whose primary is at 5% CPU during peak, the operational cost outweighs the benefit.

4. Workloads with bursty write patterns. A primary that takes a sudden write spike (bulk import, daily aggregation, big backfill) produces a large WAL burst. Replicas have to catch up by replaying the burst. During catch-up, replication lag spikes; if read traffic on the replica returns stale data, the spike becomes user-visible.

Rails 6+ multi-DB role API

Setting up multi-database in database.yml:

production:
  primary:
    database: app_production
    host: <%= ENV['PRIMARY_DB_HOST'] %>
    user: app
  primary_replica:
    database: app_production
    host: <%= ENV['REPLICA_DB_HOST'] %>
    user: app
    replica: true

And in the abstract base model:

class ApplicationRecord < ActiveRecord::Base
  primary_abstract_class

  connects_to database: { writing: :primary, reading: :primary_replica }
end

With the middleware enabled (it is on by default in Rails 6+ apps with multi-DB configured), GET and HEAD requests automatically route to the reading role. POST, PUT, PATCH, DELETE route to writing. For explicit routing inside controllers or jobs:

ActiveRecord::Base.connected_to(role: :reading) do
  # Reads go to the replica
  PostsReport.compute_top_authors
end

ActiveRecord::Base.connected_to(role: :writing) do
  # Force a read to hit the primary, useful right after a write
  user.reload
end

The DatabaseSelector middleware adds one more behaviour: for a configurable window after a write (default 2 seconds), it routes the same session's reads to the primary. This is the framework-level fix for the read-your-own-writes problem. The window can be tuned via database_selector.delay in the application config.

Replication lag in practice

Lag is measured from the replica's side as the time between the primary committing a WAL entry and the replica applying it. The query that reports it:

-- On the replica
SELECT
  pg_last_wal_receive_lsn() - pg_last_wal_replay_lsn() AS bytes_lag,
  EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS seconds_lag;

Under normal load on a well-provisioned replica, the seconds_lag is under 100 milliseconds. During WAL bursts (large backfills on the primary, busy migrations), it can spike to seconds or minutes. The senior practice is to alert on lag thresholds and to consider routing reads back to the primary when lag exceeds an SLA.

Rails's DatabaseSelector::Resolver::Session supports a "lag check" pattern: before routing to the replica, check the replica's lag; if it is over a threshold, fall back to the primary. The pattern lives in the resolver class and adds one query per replica decision, which is cheap relative to the cost of returning stale data.

The principle at play

Replicas are a workload isolation tool, not a performance multiplier. They pay back when a specific subset of reads can be cleanly separated from the rest of the workload and that subset is tolerant of staleness. They cost operational overhead, double the connection pooling, and introduce a class of bugs (lag, stale reads, transaction-boundary mistakes) that the single-database app does not have.

The senior workflow on a slowing primary:

Find the slow queries first. pg_stat_statements, EXPLAIN ANALYZE, lessons 1 and 2 of this section.
Check if the primary is actually saturated (CPU, IO, connection count). If not, replicas will not help.
Identify which workload can be cleanly separated. Reporting? Geo distribution? A specific endpoint? Without a clean separation, the replica adds overhead without isolating anything.
Add a replica only for that workload, with explicit connected_to calls. Leave the rest of the app on the primary.
Monitor lag. Alert on it. Have a runbook for "lag exceeded SLA: cut over to primary."

Most Rails apps run on a single Postgres primary for their entire useful life. The teams that add replicas correctly are the ones that name the specific workload first and add the replica for that workload, not the ones that add a replica and hope reads get faster.

Practice exercise

Check your primary's CPU, IO, and connection count during peak. If the primary is at 30% CPU and connection count is fine, you do not need a replica. Solve the slow queries with the lessons in this section instead.
Identify one workload in your app that is read-only, tolerant of seconds of staleness, and large enough to matter (analytics, marketing pages, public API). That is your candidate for a replica.
If you already run replicas, audit which reads use them. Grep for connected_to and the middleware config. Find any read-your-own-writes paths that route to the replica; these are silent bug sources.
Set up the lag query above as a dashboard. Pick an SLA (often 1 second). Add an alert. The first time you see a spike, you will understand more about your write workload than a year of reading docs would have taught you.