May 13, 2017 by Daniel P. Clark

Don’t Use Objects as Hash Keys in Ruby*

Hashes have been optimized for symbols and strings in Ruby which technically are objects but this article is for revealing how much of a difference this makes when using other objects as hash keys.  There are some cases where this makes a big difference but many times you won’t notice much of a difference.

I wrote a little experiment to see how different kinds of keys would perform for a hash.  It’s been very common for me to use the self reference as a key in a hash and now I know that’s not good for performance.  Here’s the code.

A = Object.new

def a
  :result
end

def value_of
  [1]
end

hash_of = {
  a: [1],
  "a" => [1],
  result: [1],
  A => [1]
}

require "benchmark/ips"
Benchmark.ips do |x|
  x.report("Method Call") do
    value_of()
  end

  x.report("Hash w/ Symbol key") do
    hash_of[:a]
  end

  x.report("Hash w/ String key") do
    hash_of["a"]
  end

  x.report("Hash w/ Method key") do
    hash_of[a]
  end

  x.report("Hash w/ Constant key") do
    hash_of[A]
  end

  x.compare!
end

The first part of the code is setting up objects and methods to act as keys for the hash, then there’s the hash, and finally the benchmark.  Here are the results.

Warming up --------------------------------------
         Method Call   144.797k i/100ms
  Hash w/ Symbol key   164.880k i/100ms
  Hash w/ String key   156.820k i/100ms
  Hash w/ Method key   141.293k i/100ms
Hash w/ Constant key    92.234k i/100ms
Calculating -------------------------------------
         Method Call      5.511M (± 2.7%) i/s -     27.656M in   5.021800s
  Hash w/ Symbol key      7.581M (± 2.7%) i/s -     37.922M in   5.006266s
  Hash w/ String key      6.620M (± 2.9%) i/s -     33.089M in   5.002952s
  Hash w/ Method key      4.542M (± 5.0%) i/s -     22.748M in   5.022166s
Hash w/ Constant key      1.837M (± 1.0%) i/s -      9.223M in   5.022241s

Comparison:
  Hash w/ Symbol key:  7580536.5 i/s
  Hash w/ String key:  6619759.4 i/s - 1.15x  slower
         Method Call:  5511494.2 i/s - 1.38x  slower
  Hash w/ Method key:  4541697.4 i/s - 1.67x  slower
Hash w/ Constant key:  1836695.9 i/s - 4.13x  slower

So any time you index a hash with an object as a key you are having your code look up the result 313% slower than it would with a Symbol type object.

Summary

I had always heard that symbols were that faster than strings in hash lookups, but I wasn’t aware that hashes were faster than method calls (see comment section below) or how slow objects were for keys.  It’s okay to use objects as hash keys if you really want to.  Just know that you pay a small price for doing so.

Where you really need to be more concerned with this is when you implement some code that will be used a lot in your code base.  So when you implement something like a raw type which may be called thousands of times in one run this is where that difference really matters.  Generally you don’t have to worry about this performance loss as Ruby itself is fast and in most cases the code written doesn’t get called that much.

One example of where it would make a big difference is the Pathname class in Ruby’s standard library.  In older versions of Rails this was called many thousands of times per request because of the asset pipeline.  I wrote the gem FasterPath to implement this heavily used code in Rust just to improve performance.  The more times the code is called in a small time frame, the more you should keep performance in mind.

Hopefully you found this information useful!  I know this is a short post.  Let me know if you like posts like this and I’ll write more.  Please feel free to comment, share, subscribe to my RSS Feed, and follow me on twitter @6ftdan

God Bless!
-Daniel P. Clark

Image by Levan Gokadze via the Creative Commons Attribution-ShareAlike 2.0 Generic License

#hash#method#object#performance#ruby#string#symbol

Comments

  1. Thomas
    May 14, 2017 - 5:46 am

    When testing the method call, you need to avoid unnecessary object allocations. So better change the method to:

    ~~~
    X = [1]
    def value_of
    X
    end
    ~~~

    Then you will find that the plain method call is fastest.

    Also note that calling `#value_of` and `Hash#[]` performs one method call each, so the only difference is in the implementation of the method. Although `#value_of` has a very simple implementation, Ruby has to still call Ruby code to perform this action. When invoking `Hash#[]`, a C method is invoked and although it does more, it’s still very fast.

    • Daniel P. Clark
      May 14, 2017 - 5:04 pm

      Good catch! That is an important note to remember as it’s not an uncommon practice to write code like:

      
      def value_of
        [1]
      end
      
      • Thomas
        May 15, 2017 - 7:28 am

        Yes, that’s sadly true. In most cases it won’t make a difference but using such a method in a critical execution path can lead to much work for the GC.

  2. Ego
    May 15, 2017 - 12:43 pm

    This is somewhat ‘anal’ of me, but you might want to change the title of this article. Everything in Ruby is an object. You cannot create a non-object key for a hash in Ruby.

    • Daniel P. Clark
      May 15, 2017 - 2:01 pm

      True, everything is basically an object in Ruby and hashes are optimized more so for specific objects such as symbols and strings.

      I believe many people may be aware that it’s recommended to look up hash values with either a symbol or a string so I’m generally inclined to believe that the title will lead them to think of other objects in general. I will try to clear it up some.

  3. Itay Ben Ari
    May 19, 2017 - 3:57 am

    Thanks for the post, if you’ll freeze the string, the performance will be just like the symbol performance.

  4. soulcutter
    May 23, 2017 - 10:03 am

    When benchmarking it’s really handy to give Ruby version numbers since performance may vary (sometimes significantly) between versions.

    I ran the benchmark against ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-darwin16] and the Method key was approximately the same as a String key.

    I suppose your title is clickbait, but I think the best practice is to avoid premature optimization and write good, clear code – towards that end I would expect most people to not change how they use hash keys based on this microbenchmark. It’s a nifty piece of trivia, though, and could certainly come in handy when optimizing hot code paths.

    • Daniel P. Clark
      May 24, 2017 - 11:01 am

      “I think the best practice is to avoid premature optimization and write good, clear code – towards that end I would expect most people to not change how they use hash keys based on this microbenchmark. It’s a nifty piece of trivia, though, and could certainly come in handy when optimizing hot code paths.”

      I agree with you on this.

      “I suppose your title is clickbait”

      It wasn’t intended to be. One of my popular posts was “Rails: Don’t “pluck” Unnecessarily” which was also a post on performance. I was merely going for a similar name for the title.

Leave a Reply

Your email address will not be published / Required fields are marked *