โ† Back to Course

Reading the Source ยท Card 3

How User.where(...) becomes SQL

You write User.where(active: true).order(:created_at).limit(10) and somehow you get back ten users. No SQL ran when you typed it. Then later, when you started iterating, suddenly the query fired. Here's what's happening, step by step.

1. User.where(...) returns an object, not rows.

When you call User.where(active: true), Rails does not talk to the database. It returns a Ruby object called an ActiveRecord::Relation. The Relation holds a little spec of what you want: the model class, a list of where-conditions, a list of order clauses, a limit, and so on. It's a description of a query, sitting in memory.

relation = User.where(active: true)

relation.class
# => User::ActiveRecord_Relation

# No SQL ran yet. The Relation is just a Ruby object holding:
#   - klass:     User
#   - where_clause: [active = true]
#   - order_values: []
#   - limit_value:  nil
#   - ...and a few more fields

๐Ÿ“‚ activerecord/lib/active_record/relation.rb

2. Each chained call adds to the spec.

When you chain .order(:created_at) onto the relation, Rails makes a new Relation that copies the previous one and adds the order. Same for .limit(10). The original Relation isn't changed. You end up with a small chain of Relation objects, each one a more specific version of the last.

step1 = User.where(active: true)
step2 = step1.order(:created_at)
step3 = step2.limit(10)

# step3's spec now looks like:
#   where_clause: [active = true]
#   order_values: [:created_at]
#   limit_value:  10
#
# Still no SQL. Still a Ruby object describing the query.

# This is why you can build queries up over many lines:
posts = Post.all
posts = posts.where(published: true) if filter_published
posts = posts.where("title ILIKE ?", "%#{q}%") if q.present?
posts = posts.order(:created_at).limit(20)
# All of the above adds to the spec. None of it hits the database.

๐Ÿ“‚ activerecord/lib/active_record/relation/query_methods.rb

3. The SQL runs when someone needs the data.

The Relation waits until you do something that actually needs the rows. Calling .each, .to_a, .first, .count, or interpolating the Relation into a view all force the query to run. Until then, the SQL stays in your imagination.

relation = User.where(active: true).order(:created_at).limit(10)
# No SQL.

relation.to_a
# Now Rails runs:
#   SELECT "users".* FROM "users"
#   WHERE "users"."active" = TRUE
#   ORDER BY "users"."created_at" ASC
#   LIMIT 10

# After this, the loaded results are cached on the relation:
relation.to_a  # second call: no new SQL, uses the cached array

# But this is a DIFFERENT relation (chained again):
relation.where(role: "admin").to_a  # runs SQL again

๐Ÿ“‚ activerecord/lib/active_record/relation.rb, the load and exec_queries methods

4. Rails builds the SQL through a tree, not string-concatenation.

When it's time to make the SQL string, Rails doesn't glue together strings like "SELECT * FROM " + table. It uses a small library called Arel, which builds a tree (an AST) representing the query, then asks the database adapter to turn the tree into the SQL dialect for your database (Postgres, MySQL, SQLite, etc.). This is why the same Ruby code produces slightly different SQL depending on which database you're on.

# A query like User.where(active: true).order(:created_at)
# becomes, roughly, an Arel tree:

SelectManager
  โ”œโ”€ projections: [users.*]
  โ”œโ”€ from:        users
  โ”œโ”€ wheres:      [Eq(users.active, true)]
  โ””โ”€ orders:      [Asc(users.created_at)]

# The Postgres adapter walks this tree and produces:
#   SELECT "users".* FROM "users"
#   WHERE "users"."active" = TRUE
#   ORDER BY "users"."created_at" ASC

# The MySQL adapter walks the same tree and produces:
#   SELECT `users`.* FROM `users`
#   WHERE `users`.`active` = TRUE
#   ORDER BY `users`.`created_at` ASC

# Same tree. Different SQL strings. Same Ruby code worked for both.

๐Ÿ“‚ activerecord/lib/arel/ and activerecord/lib/active_record/connection_adapters/

5. The database returns rows. Rails turns them into model objects.

The database adapter sends the SQL over the connection, gets back a result set, and hands Rails an array of hashes (one per row). Rails then walks that array and builds a Ruby User instance for each row, setting up the attribute values from the row data. The cached array on the Relation is now a list of real User objects, ready to use.

# Database returns:
#   [
#     { "id" => 1, "name" => "Karim", "active" => true, ... },
#     { "id" => 2, "name" => "Sara",  "active" => true, ... },
#     ...
#   ]

# Rails creates one User instance per row:
users = rows.map { |row| User.allocate.init_with_attributes(row) }

# Now relation.to_a returns [#<User id: 1, name: "Karim", ...>, ...].
# Each user has typed attributes (Integer id, String name, Boolean active)
# because ActiveRecord casts the raw column data using the column types.

๐Ÿ“‚ activerecord/lib/active_record/persistence.rb and connection_adapters/postgresql/database_statements.rb

6. The query cache. The thing that surprises everyone.

During a single web request, Rails wraps your code in a query cache. If you run the exact same SQL twice in the same request, Rails returns the cached result the second time, without hitting the database. This is helpful, but it has a sharp edge: the cache only matches on the SQL string. If you build "the same query" two different ways and end up with different SQL, you get two database calls. If you write to the table, the cache is cleared.

# Inside one web request:

User.where(active: true).to_a  # SQL runs once
User.where(active: true).to_a  # cache hit, no SQL

# But this misses the cache:
User.where(active: true).to_a
User.where("active = ?", true).to_a  # different SQL string โ†’ new query

# And writing busts it:
User.where(active: true).to_a   # SQL runs
User.create!(active: true)      # write happens, cache cleared
User.where(active: true).to_a   # SQL runs AGAIN

# The cache only exists for the duration of one request,
# and only inside ActiveRecord::Base.cache do ... end blocks
# (which Rails wraps your request in automatically).

๐Ÿ“‚ activerecord/lib/active_record/connection_adapters/abstract/query_cache.rb

Why this is worth knowing

Knowing the difference between "a Relation exists" and "the SQL ran" changes how you debug and how you write queries:

  • You stop being surprised by N+1 queries. A method that returns @posts (a Relation) and then iterates inside a view to fetch each post's user fires one query per post. The fix, .includes(:user), adds to the spec before the SQL runs.
  • You write scopes with confidence. A scope is a chunk of the spec, ready to be combined with other chunks. Knowing the spec is in memory until something forces the SQL means you can chain scopes anywhere.
  • You debug "why did this query run twice?" faster. If the SQL string isn't byte-identical, the cache won't match. Often the fix is to chain the same way both places.
  • You read to_sql output as the source of truth. Calling relation.to_sql shows the actual SQL Rails will run, without running it. It's the fastest way to check whether your scope chain is what you think it is.