Spot the Tax · Card 14 of 20
User.all.each will eventually run out of memory
Why a digest sender that works at launch becomes an outage trigger as the table grows.
The code
What will this cost you in six months?
class WeeklyDigestSender
def call
User.all.each do |user|
next unless user.subscribed?
WeeklyDigestMailer.with(user: user).digest.deliver_later
end
end
end The problem
User.all.each loads every row from the users table into memory before iterating. At 5,000 users that's fine. At 50,000 it's slow. At 2,000,000 the worker process runs out of memory long before it sends a single email. The code that worked perfectly at launch becomes an outage trigger the day your table grows past whatever Sidekiq's memory limit happens to be.
Take a moment. Before revealing, think about how you'd iterate over a table that doesn't fit in memory. What does Active Record give you for that?
The solution
Use find_each instead of each. It fetches records in batches (default 1,000) so memory usage stays bounded regardless of table size. While you're at it, push the subscribed? filter into the database so you skip the rows you weren't going to email anyway.
- Memory usage doesn't grow with the table
- The database does the filtering, which is faster than
next unlessin Ruby - The same code works at 5K users and 5M users
User.where(subscribed: true).find_each do |user|
WeeklyDigestMailer.with(user: user).digest.deliver_later
end The principle at play — Bounded memory iteration
Active Record's .each is just Enumerable#each on the result of the query. The query has to materialize first, which means the database returns every row and Ruby instantiates every one as a User object before the iteration begins. Memory usage grows linearly with the size of the table.
find_each works differently: it fetches one batch at a time, yields each record to your block, then loads the next batch. At any moment, only the current batch is in memory. The total work is the same; the peak memory usage stays flat.
The general principle is that code which iterates over a collection should not assume the collection fits in memory. Tables grow, request payloads grow, log files grow. Anything that loads "all of X" works fine until X gets big enough to exceed your worker's memory budget — at which point your code transitions from working to causing an outage with no intermediate warning.