See all articles
Intro to Arel - the database agnostic SQL

Intro to Arel - the database agnostic SQL

Those in the know use Arel as a clever SQL AST manager for Ruby. It makes SQL queries easy to read, reliable, and secure - and it’s database agnostic. Learn the basics of Arel right here in our mini-tutorial.

In this blog post we will show you what Arel is and how it can be used to create database queries. If you are interested in writing complex, database engine agnostic queries directly from Ruby, then Arel may be a clever solution to your problem.

Arel (A Relational Algebra) is an SQL AST (Abstract Syntax Tree) manager for Ruby, which allows you to generate SQL queries for various database engines and is used by ActiveRecord under the hood. It also allows you to construct more complex queries without writing plain SQL or string interpolation - making them more readable, reliable (i.e. fixing the ambiguity problem - which we will show later) and secure.

Before diving into more complex statements in Arel, let’s start with the simplest ones.

Probably the most common Rails query is `find(id)` which allows us to find a record based on its primary key. The Arel equivalent will look like this:

users = User.arel_table
user = User.where(users[:id].eq(id)).take(1).first

Note: Previously, instead of using `User.arel_table` you could also use `Arel::Table.new(:users``).` This is no longer recommended since automatic type casting is deprecated (and will be removed in the newest version) in favor of an external type caster (which needs to be passed explicitly).

For more info on this deprecation, you can check this commit or compare the result of `User.arel_table` and `Arel::Table.new(:users)`:

> User.arel_table
=> #<Arel::Table:0x007fdca7c06300 @columns=nil, @name="users", @table_alias=nil, @type_caster= #<ActiveRecord::TypeCaster::Map:0x007fdca7c06350 ... >>
> Arel::Table.new(:users)
=> #<Arel::Table:0x007fdcad56ea28 @columns=nil, @name="users", @table_alias=nil, @type_caster=nil>

As you can see, the `type_caster` instance variable for `User.arel_table` is set automatically, but not so with `Arel::Table.new(:users)`.

Reliability

Now let’s talk about that ambiguity problem that we mentioned earlier. Let’s imagine that you would like to fetch users who have registered in 2017 with their published posts.

You might try `eager_load` to load users with posts:

User.eager_load(:posts).where("created_at >= ? AND published = ?", Date.new(2017), true)

However, if you try to run this query, you will get an error: `ActiveRecord::StatementInvalid: PG::AmbiguousColumn: ERROR: column reference "created_at" is ambiguous`. This happens because both the `users` and `posts` tables contain a `created_at` column. This problem can be easily fixed by adding the table’s name (`users`) to the column’s name in the condition:

User.eager_load(:posts).where("users.created_at >= ? AND published = ?", Date.new(2017), true)

Let’s turn to Arel to see how a similar query would look like:

recent_users = User.arel_table[:created_at].gteq(Date.new(2017))
published = Post.arel_table[:published].eq(true)
User.eager_load(:posts).where(recent_users.and(published))

When we work with Arel, we always have to explicitly provide the column and table names that we are matching - a query will never depend on implicit table names or aliases, meaning that we can’t make this particular mistake in our code. We also have additional predicate operators (here is the full list) at our disposal so we don’t need to interpolate the strings.

Readability

Imagine that you need to find a user who either has a first name or last name but not the one who has both or none.

Again we will start with an ActiveRecord example:

User.where('(first_name IS NULL AND last_name IS NOT NULL) OR (first_name IS NOT NULL AND last_name IS NULL)')

You can try to use intermediate variables for storing each condition:

first_name_clause = 'first_name IS NULL AND last_name IS NOT NULL'
last_name_clause = 'first_name IS NOT NULL AND last_name IS NULL'
User.where("#{first_name_clause} OR #{last_name_clause}")
# alternativly you can use `or` method from ActiveRecord
User.where(first_name_clause).or(User.where(last_name_clause))

But this is not the best way to split them - we either need to operates on strings directly or duplicate ActiveRecord relations.

Let’s instead try using Arel again:

table = User.arel_table
first_name_nil_clause = table[:first_name].eq(nil).and(table[:last_name].not_eq(nil))
last_name_nil_clause = table[:first_name].not_eq(nil).and(table[:last_name].eq(nil))
User.where(first_name_nil_clause.or(last_name_nil_clause))

With Arel, we are able to split the query conditions using objects and then nicely combine them when we need to run the query. Using these Arel building blocks makes query composition a whole lot easier and more traceable than writing a complete, complex SQL query.

A few more examples

Finding a record containing a substring

`ILIKE` is a Postgres specific function, so if you use it directly in your model’s scope or query object, it means you won’t be able to use it with different RDBMS.

With Arel, you can use an alternative approach, one that will use a function that is available in the database engine of your choice:

table = User.arel_table
where_clause = table[:first_name].lower.matches('%ronin')
User.where(where_clause)

Subquery join

Let’s try one more example, to see how Arel can help with creating more complex queries.

Imagine that you would like to select the top most popular authors, based on the number of comments their posts get. To do that, you’ll need to join `comments` with `posts` and then group them by `user_id` and count. Those results can be used to fetch `users` and sort them.

Here is how the SQL query might look like (there are more solutions to this problem but we just wanted to show how we can use Arel for one of them):

SELECT users.*, popularity.comments_no
FROM USERS
INNER JOIN (
  SELECT post.user_id, COUNT(*) AS comments_no
  FROM comments
  INNER JOIN posts
  ON posts.id = comments.post_id
  GROUP BY post.user_id
) as popularity
ON popular_authors.user_id = user.id
ORDER BY popularity.comments_no DESC

We will create a simple Query object for running this query:

class PopularAuthors < Struct.new(:limit)
  def fetch
    User.joins(join_clause)
        .select('users.*, popularity.comments_no')
        .order('popularity.comments_no DESC')
        .take(limit)
  end
  private
  def join_clause
    @join_clause ||= begin
      users = User.arel_table
      users.join(popularity).on(users[:id].eq(popularity[:user_id]))
    end
  end
  def popularity
    @popularity ||= begin
      comments = Comment.arel_table
      posts = Post.arel_table
      comments.join(posts).on(posts[:id].eq(comments[:post_id]))
              .project(posts[:user_id], comments[:id].count.as('comments_no'))
              .group(posts[:user_id])
              .as('popularity')
    end
  end
end

As you see, we are creating a `popularity` table which is joined with the `users` table and provides `comments_no` to then order the results.

A case that even Arel cannot solve

There is an interesting case which even Arel is not able to solve (unfortunately!). Imagine once again that you have both `users` and `posts` models:

class User < ActiveRecord::Base
  has_many :posts, inverse_of: :user, dependent: :destroy
end
class Post < ActiveRecord::Base
  belongs_to :user, inverse_of: :posts
end

and you would like to load all `users` - and then preload only a subset of each user’s `posts` (based on some runtime conditional):

User.includes(:posts).where('posts.created_at > ?', Date.new(2017, 11))

Unfortunately the code above won’t work, because it will use an `INNER JOIN` for fetching data - so it will only load those users who created a post in November. That isn’t what we want. We want to include all users - even those not active in November - but only load posts for each user if they were created in November.

Switching back to `joins` also won’t work, because it will fetch *all* `posts` when we call the method on the `User` instance (`joins` does not assign results of the join to associations).

> users = User.joins("LEFT OUTER JOIN posts ON posts.user_id = users.id AND posts.created_at > '2017-11-01'")
> user = users.first
> user.posts.loaded?
=> false
> user.first.posts # it fires a query to get all time sessions

We can try to break the nasty manual join with Arel (which can be useful if you are dealing with complex joins and multiple conditions):

users = User.arel_table
posts = Post.arel_table
on_clause = Arel::Nodes::Equality.new(users[:id], posts[:user_id])
and_clause = time_sessions[:created_at].gt(Date.new(2017, 11))
join_clause = users.join(time_sessions, Arel::Nodes::OuterJoin).on(on_clause.and(and_clause)).join_sources
User.joins(join_clause)

but it won’t solve the original problem, since we still leave managing association to Rails.

The only solution we are aware of relies on combining associations manually (ick, but works):

posts = Post.where('created_at > ?', Date.new(2017, 11))
users = User.all.each do |user|
  records = posts.select { |p| p.user_id == user.id }
  association = user.association(:posts)
  association.loaded!
  association.target.concat(records)
  records.each { |record| association.set_inverse_instance(record) }
end

Keep in mind this solution is not efficient if you are working on a large dataset, as it loads all objects into memory - so it’s good to perform some initial filtering if you don’t want to be waiting around for a lifetime to get the results.

This kind of solution is used by the active_loaders gem (with the `loaded` method). You can read more about this problem in the post from active_loaders author.

We know all about creating complex SQL queries and making different queries across different databases - that’s why we know Arel! If you have any complex database operations that you need solving or implementing, would like help in migrating to a different database, or making your own queries database agnostic then reach out to the experts at iRonin. We would love to help out!

Read Similar Articles