March 26, 2015 by Daniel P. Clark

Different Collection Types in Ruby

In computer science, a collection or container is a grouping of some variable number of data items (possibly zero) that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion.

– Wikipedia

In Ruby the most common collection types used are known as Array and Hash.  In different languages there a different flavors of these same things.  For Array you might have something like a Tuple, Set, or a List.  And a Hash is known as a Dictionary in some other languages (like Python).

Arrays

Arrays in ruby are a comma separated collection that maintains the ordering in which it’s given Objects.  Arrays can hold any item that is an instance of Object; so basically anything that’s not a keyword.  Array’s are all instances of the Array class and have several ways in which you can define one.

a = []
# => []
b = Array.new
# => []
c = Array[]
# => []
d = Array()
# => []

And %W, or %w, with (most) any symbol will make an Array.

%W^ 1 2 3 ^
# => ["1", "2", "3"]
%w% 1 2 3 %
# => ["1", "2", "3"]

Of all of these Array() will “array-ify” any Object.  So if you hand it nil it calls .to_a on it and you get an empty Array whereas Array[] would simply wrap the nil.  Array() was added in Ruby 2.0 and Array[] has been present since Ruby 1.8 .

Array(nil)
# => []
Array[nil]
# => [nil]

Arrays can be initialized with a default set of Objects and at any length.  The first parameter you hand to Array.new is the length of the Array you’re instantiating, and the second parameter is the Object.

Array.new(4)
# => [nil, nil, nil, nil]
Array.new(4, 4)
# => [4, 4, 4, 4]

Arrays can be accessed by indexes which are the position in the area starting with a count of zero.  So if you want the second item in the Array you index it by 1 (remember to start at 0).

arr = [:a,:b:,c:]
arr[1]
# => :b

You can also set where you place, or replace, an item in the Array by index.

brr = [:a, :b, :c, :d]
brr[1] = 3
brr
# => [:a, 3, :c, :d]

If the Array isn’t as long as the index you use when you assign a value it auto fills up to that point with nil Objects.

crr = [1,2,3]
crr[10] = :foo
crr
# => [1, 2, 3, nil, nil, nil, nil, nil, nil, nil, :foo]

This auto fill feature is very useful for use in algorithms such as the sieve of Eratosthenes (example).

Arrays make an excellent way to have a “Stack” on which to do FIFO and FILO (first in first out & first in last out).  The stack like methods for adding and removing from the end of an Array are :pop and :push and the methods for adding and removing from the beginning of an Array are :shift and :unshift.

There’s so much more to cover on Arrays, I’ll get into that in a later article.  For now know that some of the most used methods on Array, which will be useful for you, are :<< (append) and :include?  And an Array you should be aware of in Ruby is $: (dollar colon) which is an Array of where your ruby files can be required from.

Hash

A Hash is a dictionary look up which stores data in key (to) value pairs.  Generally the key is something simple (something known) that can be used to access whatever has been stored.  Just like when you know a word but you don’t know the meaning so you go to the dictionary and look it up by the word and the meaning is given to you there.

Ruby hashes can have any Object as a key or value.  Because of improvements in Ruby a symbol is more efficient as a key rather than a String.  But that doesn’t mean you have to use them.  You can use anything.

There are a few ways for instantiating a Hash.

e = {}
# => {}
f = Hash.new
# => {}
g = Hash[]
# => {}
h = Hash(nil)
# => {}

Just as with Array the parenthesis option Hash( … ) will “hash-ify” and Object.  It cannot be used without a parameter though.

Hashes are straight forward to use.

hash = Hash.new
hash[:book_name] = "I am the book"
hash[:book_name]
# => "I am the book"

When asking a Hash for a key that hasn’t been set yet it will, by default, return nil.  If you want to change the default then know that the same default will get handed to every call.

x = {}
# => {} 
x.default = []
# => [] 
x.default
# => [] 
x[:a] << 8
# => [8] 
x[:b] << 9
# => [8, 9] 
x[:c] << 10
# => [8, 9, 10] 
x.default
# => [8, 9, 10]

So now every key is accessing the default Object we set and do not remain separate.  This also happens with Hash.new([]).  This is not behavior befitting the design and purpose of a Hash so we need to specify a lambda to ensure each new key is handled separately.

y = {}
y.default_proc = lambda{|hash, key| hash[key] = []}
y[:a] << 5
# => [5] 
y[:b] << 6
# => [6] 
y[:c] << 7
# => [7] 
y
# => {:a=>[5], :b=>[6], :c=>[7]}

This can also be achieved with Hash.new {|hash, key| hash[key] = []}

In general Hashes (and Dictionaries as in Python) are not required to keep things sorted in the order you created them.  Even if the language “happens” to do that for you you shouldn’t be in the practice of using it as if it were ordered.  That will bite you later when you switch languages.

Ruby’s Hash instances give you an Array of keys and an Array of values with the appropriately names :keys and :values methods.

Set

Set is a lesser known collection in Ruby but I like it a lot.  Set is basically an Array that enforces uniqueness and has some handy methods to help you with this.  Set is part of the Ruby core but you will have to require it.

require 'set'

s = Set.new
# => #<Set: {}> 
s << :a
# => #<Set: {:a}> 
s << :a
# => #<Set: {:a}> 
s << :b
# => #<Set: {:a, :b}> 
s << :c
# => #<Set: {:a, :b, :c}> 
s << :c
# => #<Set: {:a, :b, :c}>

When you require Set it automatically adds a method on all Array instances called :to_set .  Sets also have a :to_a for converting into an Array.

My favorite feature of set is the :add? method.  It will add the Object if it hasn’t already been included; and returns itself if it was added, or nil if it already exists within the set.  This is great for keeping track of hierarchical copies to make sure you don’t infinitely loop.  For example:  I’ll make a hash where the key-value pairs will have an infinite relationship loop.  Then I’ll clone the key value pairs to an Array while using :add? to make sure I don’t overlap.

foo = Hash.new

foo[:a] = [:c, 1]
foo[:b] = [:a, 2]
foo[:c] = [:b, 3]

require 'set'

output = []
copy_check = Set.new
key = foo.keys.first

while copy_check.add?(key) # Also works with foo[key][0] as the parameter
  output << [key, foo[key]]
  key = foo[key][0]
end

output
# => [[:a, [:c, 1]], [:c, [:b, 3]], [:b, [:a, 2]]]

This is actually the methodology I used in Rails for duplicating ActiveRecord hierarchical trees (in PBT) while ensuring I wasn’t following any bad circular references.  Pretty neat!

Rinda (for Tuples)

Believe it or not Rinda has been in Ruby since early 1.8 versions.  What is it you might ask?  Well it’s Ruby’s own Tuple feature built on top of dRuby.  No dRuby isn’t another flavor of Ruby, it’s a distributed Ruby library included in the core.

dRuby is a distributed object system for Ruby. It is written in pure Ruby and uses its own protocol. No add-in services are needed beyond those provided by the Ruby runtime, such as TCP sockets. …

dRuby allows methods to be called in one Ruby process upon a Ruby object located in another Ruby process, even on another machine. References to objects can be passed between processes. Method arguments and return values are dumped and loaded in marshalled format. All of this is done transparently to both the caller of the remote method and the object that it is called upon.

– ruby-doc.org

A Tuple is a fixed Array.  You can’t modify a Tuple.  In Python a Tuple is used with parenthesis (1,2,3).  In Ruby things aren’t as fixed so you don’t have Tuples as a common things to access.  But in distributed Ruby, DRb, you have Rinda which was designed after a library (not in Ruby) called Linda.

The idea behind how Rinda works is there is a “TupleSpace” created between connected DRb instances in which you use it as a stack to put Tuples (fixed Arrays) of Ruby Objects onto, and any machine (ruby DRb process) may retrieve it.  These Tuples of Ruby Objects get pushed into the TupleSpace with a write method $tuple_space.write([‘add’, 2.5, 5]) and any other machine (DRb process) can pop it out of the stack by calling it with a matching format: $tuple_space.take(/add|sub/, Integer, Integer) .  So with this you can hand Ruby Objects around over the network and split/share the work to be processed.

Last I checked there is virtually no English documentation or blog(s) about Rinda.  A book only became available in English in 2012 “The dRuby Book by Masatoshi Seki” via The Pragmatic Programmers.  Get the book on Amazon while you can as it is out of print!  I highly encourage people to look into using this.  And I’d love to see blogs about it.  I can’t wait for using it myself.  Just keep in mind that security wasn’t in the implementation detail.

Struct

Struct allows you to predefine what values you can set by providing keys when created.

class Face < Struct.new(:hair, :eyes, :skin)
end

my_face = Face.new :blond, :blue, :awesome
# => #<struct Face hair=:blond, eyes=:blue, skin=:awesome>

my_face[:hair]
# => :blond 
my_face.hair
# => :blond 
my_face.nose
# NoMethodError: undefined method `nose' for #<struct Face hair=:blond, eyes=:blue, skin=:awesome>
my_face[:nose]
# NameError: no member 'nose' in struct

As you can see Struct preserves what “keys” are allowed by predefining them at inheritance.

You can define some Array like method behavior on Structs which comes in pretty handy.

class PayChecks < Struct.new(:week1, :week2, :week3, :week4)
  def total
    inject :+
  end
end

pay_checks = PayChecks.new 100, 800, 300, 45
# => #<struct PayChecks week1=100, week2=800, week3=300, week4=45> 
pay_checks.total
# => 1245 

OpenStruct

OpenStruct is similar to Struct but is more “open”; more accepting.  You will have to require it with require ‘ostruct’ .

require 'ostruct'

x = OpenStruct.new
# => #<OpenStruct> 

x[:a]
# => nil 
x
# => #<OpenStruct> 

x[:a] = 0
# => 0 
x
# => #<OpenStruct a=0> 
x.a
# => 0 

x.b
# => nil 
x
# => #<OpenStruct a=0> 
x[:b]  = 123
# => 123 
x
# => #<OpenStruct a=0, b=123>

As you can see you don’t inherit OpenStruct like you did with Struct.  Also you don’t get errors when accessing a key that hasn’t been set yet.  This can lead to problems if you need to store nil as an actual value; if the key gets removed it’ll still say nil.

Other

You can define your own collections in Ruby.  Objects can have Objects within them (introspect yourself 😉 ).  And you can leverage a lot of the same methods from what Arrays can do by including the Enumerable module in your class and defining an each method.  So you can do most any kind of collection you can imagine with all the tools Ruby provides you. ^_^  And that’s how it should be.

Summary

By covering so many different collection types I have barely gone into detail on just how powerful they are and the methods that “perform magic”.  Be sure to read the Ruby Docs on them and experiment.  I’ll be getting into more of this in the future I’m sure; so look for it!  Please feel free to comment, share, subscribe to my RSS Feed, and follow me on twitter @6ftdan!

God Bless!
-Daniel P. Clark

Image by Evan Leeson via the Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic License.

#array#collection#collections#hash#openstruct#rhinda#set#struct#tuple#tuples#types

Comments

  1. Mark
    March 26, 2015 - 11:34 pm

    Good Read 🙂

  2. Sharath B. Patel
    March 29, 2015 - 12:24 pm

    I read this 3 times so that i can memorize everything. Good stuff!

    • Daniel P. Clark
      March 29, 2015 - 2:12 pm

      Thanks! Glad you found this information useful to you.

Leave a Reply

Your email address will not be published / Required fields are marked *