Rails: Don’t “pluck” Unnecessarily
Using pluck in Rails is both quick and efficient. But there are cases where it’s not what you want to use. For example if you’re selecting all users who have contacts, you might see something like this.
Rails’ where method allows for Arrays of items for lookup so the above code might turn into something like User.where(id: [1,2,3]) after the pluck method hits the database to get back the values you requested. The problem with this is you’re hitting the database twice for something that it can handle on its own in one request.
Now pluck is an eager method that hits the database immediately when called. That and it doesn’t permit you to call the to_sql method on it. For example:
Contact.pluck(:user_id).to_sql # (0.8ms) SELECT "contacts"."user_id" FROM "contacts" # NoMethodError: undefined method `to_sql' for #<Array:0x00000009486fa8>
Fortunately we can see the query being sent in the console so we can use this to compare with other queries. So we need to postpone this particular query and have it sent as part of the other SQL query we are generating. Fortunately we have an Arel equivalent method for this called project. Look at the SQL output from this and compare to what you see in the above example.
Contact.arel_table.project(:user_id).to_sql # => "SELECT user_id FROM \"contacts\""
As you can see we’re selecting the same column name from the same table. And yes it does come out slightly different, but it accomplishes the same task. Now with this we can change our first query to only hit the DB once using Arel.
User.where( User.arel_table[:id].in( Contact.arel_table.project(:user_id) ) ) # User Load (1.6ms) SELECT "users".* FROM "users" WHERE "users"."id" IN (SELECT user_id FROM "contacts") # => #<ActiveRecord::Relation [#<User...]>
And now everything is performed in one query. Using project in this way will allow you to build more advanced queries without touching the database and only querying your data when you’re ready. Same data handled better.
Tip #1: Above I used Arel’s in method for inclusive finds, you can also use not_in for excluding records.
Tip #2: In the earlier examples of using pluck it would be more efficient to add distinct before pluck. This is because many of the records are duplicates and we only need the unique ones. e.g. Contact.distinct.pluck(:user_id)
Thanks to a tip from djtal64 I’ve found you can perform this query more simply with the following and it defers to just one query just as we would like it.
User.where(id: Contact.select(:user_id)) # User Load (1.5ms) SELECT "users".* FROM "users" WHERE "users"."id" IN (SELECT "contacts"."user_id" FROM "contacts") # => #<ActiveRecord::Relation [#<User...]>
Also hat tip tofor another way to accomplish this.
User.joins(:contacts).uniq # User Load (3.3ms) SELECT DISTINCT "users".* FROM "users" INNER JOIN "contacts" ON "contacts"."user_id" = "users"."id" # => #<ActiveRecord::Relation [#<User...]>
The load times shown above are based on 455 contacts belonging to only 1 user. So the load times may be different for you depending on your situation.
pluck is great when you plan to process that data in Ruby/Rails, but if you’re handing the data back into the database you end up with a less efficient system which build Objects into Ruby memory which will also need to be garbage collected. So all things considered use pluck for stand alone queries, and Arel to combine multiple queries into one.
-Daniel P. Clark
Image by Andrew Bartram via the Creative Commons Attribution-NonCommercial 2.0 Generic License.