Don’t Use Objects as Hash Keys in Ruby*
Hashes have been optimized for symbols and strings in Ruby which technically are objects but this article is for revealing how much of a difference this makes when using other objects as hash keys. There are some cases where this makes a big difference but many times you won’t notice much of a difference.
I wrote a little experiment to see how different kinds of keys would perform for a hash. It’s been very common for me to use the self reference as a key in a hash and now I know that’s not good for performance. Here’s the code.
A = Object.new def a :result end def value_of [1] end hash_of = { a: [1], "a" => [1], result: [1], A => [1] } require "benchmark/ips" Benchmark.ips do |x| x.report("Method Call") do value_of() end x.report("Hash w/ Symbol key") do hash_of[:a] end x.report("Hash w/ String key") do hash_of["a"] end x.report("Hash w/ Method key") do hash_of[a] end x.report("Hash w/ Constant key") do hash_of[A] end x.compare! end
The first part of the code is setting up objects and methods to act as keys for the hash, then there’s the hash, and finally the benchmark. Here are the results.
Warming up -------------------------------------- Method Call 144.797k i/100ms Hash w/ Symbol key 164.880k i/100ms Hash w/ String key 156.820k i/100ms Hash w/ Method key 141.293k i/100ms Hash w/ Constant key 92.234k i/100ms Calculating ------------------------------------- Method Call 5.511M (± 2.7%) i/s - 27.656M in 5.021800s Hash w/ Symbol key 7.581M (± 2.7%) i/s - 37.922M in 5.006266s Hash w/ String key 6.620M (± 2.9%) i/s - 33.089M in 5.002952s Hash w/ Method key 4.542M (± 5.0%) i/s - 22.748M in 5.022166s Hash w/ Constant key 1.837M (± 1.0%) i/s - 9.223M in 5.022241s Comparison: Hash w/ Symbol key: 7580536.5 i/s Hash w/ String key: 6619759.4 i/s - 1.15x slower Method Call: 5511494.2 i/s - 1.38x slower Hash w/ Method key: 4541697.4 i/s - 1.67x slower Hash w/ Constant key: 1836695.9 i/s - 4.13x slower
So any time you index a hash with an object as a key you are having your code look up the result 313% slower than it would with a Symbol type object.
Summary
I had always heard that symbols were that faster than strings in hash lookups, but I wasn’t aware that hashes were faster than method calls (see comment section below) or how slow objects were for keys. It’s okay to use objects as hash keys if you really want to. Just know that you pay a small price for doing so.
Where you really need to be more concerned with this is when you implement some code that will be used a lot in your code base. So when you implement something like a raw type which may be called thousands of times in one run this is where that difference really matters. Generally you don’t have to worry about this performance loss as Ruby itself is fast and in most cases the code written doesn’t get called that much.
One example of where it would make a big difference is the Pathname class in Ruby’s standard library. In older versions of Rails this was called many thousands of times per request because of the asset pipeline. I wrote the gem FasterPath to implement this heavily used code in Rust just to improve performance. The more times the code is called in a small time frame, the more you should keep performance in mind.
Hopefully you found this information useful! I know this is a short post. Let me know if you like posts like this and I’ll write more. Please feel free to comment, share, subscribe to my RSS Feed, and follow me on twitter @6ftdan
God Bless!
-Daniel P. Clark
Image by Levan Gokadze via the Creative Commons Attribution-ShareAlike 2.0 Generic License
Thomas
May 14, 2017 - 5:46 am
When testing the method call, you need to avoid unnecessary object allocations. So better change the method to:
~~~
X = [1]
def value_of
X
end
~~~
Then you will find that the plain method call is fastest.
Also note that calling `#value_of` and `Hash#[]` performs one method call each, so the only difference is in the implementation of the method. Although `#value_of` has a very simple implementation, Ruby has to still call Ruby code to perform this action. When invoking `Hash#[]`, a C method is invoked and although it does more, it’s still very fast.
Daniel P. Clark
May 14, 2017 - 5:04 pm
Good catch! That is an important note to remember as it’s not an uncommon practice to write code like:
Thomas
May 15, 2017 - 7:28 am
Yes, that’s sadly true. In most cases it won’t make a difference but using such a method in a critical execution path can lead to much work for the GC.
Ego
May 15, 2017 - 12:43 pm
This is somewhat ‘anal’ of me, but you might want to change the title of this article. Everything in Ruby is an object. You cannot create a non-object key for a hash in Ruby.
Daniel P. Clark
May 15, 2017 - 2:01 pm
True, everything is basically an object in Ruby and hashes are optimized more so for specific objects such as symbols and strings.
I believe many people may be aware that it’s recommended to look up hash values with either a symbol or a string so I’m generally inclined to believe that the title will lead them to think of other objects in general. I will try to clear it up some.
Itay Ben Ari
May 19, 2017 - 3:57 am
Thanks for the post, if you’ll freeze the string, the performance will be just like the symbol performance.
soulcutter
May 23, 2017 - 10:03 am
When benchmarking it’s really handy to give Ruby version numbers since performance may vary (sometimes significantly) between versions.
I ran the benchmark against ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-darwin16] and the Method key was approximately the same as a String key.
I suppose your title is clickbait, but I think the best practice is to avoid premature optimization and write good, clear code – towards that end I would expect most people to not change how they use hash keys based on this microbenchmark. It’s a nifty piece of trivia, though, and could certainly come in handy when optimizing hot code paths.
Daniel P. Clark
May 24, 2017 - 11:01 am
I agree with you on this.
It wasn’t intended to be. One of my popular posts was “Rails: Don’t “pluck” Unnecessarily” which was also a post on performance. I was merely going for a similar name for the title.