Using Ruby Object Type Classes to Safely Build Data
When building collections of data you will find situations where the types aren’t what you planned to work with. And when I say types I’m speaking generically of arrays, hashes, strings, integers, nil, etc. Everything’s cosy when you know what your getting. For example putting 10 integers into an Array:
arr = [] 10.times do |num| arr << num end arr # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
But if you want to work something that is not yet defined, or is nil, then you’ll get an error:
y << 5 # NameError: undefined local variable or method `y' for main:Object y = nil # => nil y << 5 # NoMethodError: undefined method `<<' for nil:NilClass
You will find yourself more likely to run into this kind of situation when working with Hashes:
example = {} # => {} example[:foo] # => nil example[:foo] << 5 # NoMethodError: undefined method `<<' for nil:NilClass
In each of these examples I’ve been using << to put something into what we would like to be an Array. But the Array Object must first be instantiated before we can insert items into it. We could do it like so:
example[:bar] = [] # => [] example[:bar] << 5 # => [5]
But now we’ve taken two lines to accomplish this. We may end up writing LOTS of code with similar behavior so we don’t really want to have to write more than we need to. Well there’s good news. Ruby has classes that allow us to create the right Objects for just this kind of situation.
Array(nil) # => [] Hash(nil) # => {} String(nil) # => ""
As you can see we handed these classes a nil Object and it returned an empty collection of the type class we used. So with this we can take our example and one-line the nil Object assignment and insertion.
example[:fiz] # => nil example[:fiz] = Array(example[:fiz]) << 5 # => [5]
And it worked! The Array(example[:fiz]) created an empty Array since example[:fiz] was nil which then allowed us to insert 5 in to the Array and finally save it on the left side of the equals into example[:fiz]. It looks a lot like the way += works except that += will not work on nil. Now lets try it with the loop.
sample = {} # => {} 10.times do |num| sample[:fez] << num end # NoMethodError: undefined method `<<' for nil:NilClass 10.times do |num| sample[:fez] = Array(sample[:fez]) << num end sample[:fez] # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In this situation the Array() class method ensured we had an Array the very first time. It took nil and made an empty Array. From there it used the same Array for each cycle of the loop and appended the new number to the end. With this we don’t have to care about nil being an Issue. That issue’s been dealt with.
A Complex Example
When organizing data you will get into far more complex situations where you will need this. One great way is to handle files and directories and organize collections with them. Here’s the start of an MP3 play-list creator I’m working on. The way it works is it finds all mp3s in subdirectories and then uses the folder as the play-list group that they will belong to.
def parse_dir(dir) result = {} Dir.glob("#{dir}/**/*.mp3").each {|path| m3u = "#{path.split('/')[-2].gsub(' ','_').downcase}.m3u" result[m3u] = Hash(result[m3u]).update({ files: Array(Hash(result[m3u])[:files]).<<({ path: path, filename: path.split('/').last }) }) } result end
There’s a lot going on here. You can see that I’m using both Array() and Hash(). In each case where these class methods are first reached the Object within them will evaluate as nil. So they will create a new empty instance of either the Array or Hash instance that the collection gets built from.
Lets break it down. Dir.glob(“#{dir}/**/*.mp3”) takes the path we hand in to the parse_dir method and goes through all subdirectories no matter how deep. The double stars ** are what has the glob method traversing all the directories. The end of the string *.mp3 selects any file that ends with .mp3 . The result of this will be an enumerable Object which we can iterate over (a list of results).
Now that we have the list of results we want to take each one and place them in a “play-list” group in our Hash called result. So with each we hand each item in the list to the variable path and start our process.
First we want to create the play-list name. So we take the files complete path and split it by directory seperators “/”, from there we take the second from last [-2] which is the directory the file is in. ([-1] is the file name itself) We’ll want uniform file names so we replace the spaces with underscores and lower-case the whole thing. From there we have our string “my_directory_name.m3u” stored in the variable m3u.
Now we apply the technique I’ve shown here with result[m3u] = Hash(result[m3u]). The first time this code is cycled through result[m3u] is nil, so Hash() turns it into an empty Hash {} and it gets assigned to result[m3u] = {}.
Next we update the Hash with new values. Now this may take a little bit to wrap your head around so I’ll see if I can simplify it. The first time this block is run it looks like this (with nils):
result[m3u] = Hash(nil).update({ files: Array(Hash(nil)[:files]).<<({ path: path, filename: path.split('/').last }) })
During the first time the Hash(nil)[:files] is attempting to access the symbol :files on an empty Hash {}[:files] #=> nil . So that turns it into files: Array(nil).<< which further turns into files: [] << . So the first time the :files key inside the result[m3u] Hash is created with an empty Array. Then continues to insert the first item into that Array via the << method. The Object getting inserted to that Array is the hand written Hash you see above which will look something like:
{ path: "/computer/path/to/file.mp3", filename: "file.mp3" }
Each time the loop goes through it will now update the Array within the Hash within the Hash with the individual file details as hashes of their own. Let me simplify it. So result has multiple keys labelled as m3u lists (“my_directory_name.m3u”) based on the directory names. Each of those keys will access the Hash that has the key :files which returns an Array (list) of files with their own hashes of path/filename.
And now we have successfully grouped all the file details by directory name. If you’re wondering why I used a :files Hash instead of just an Array here it is because I have more items in the Hash in my production code. My next step in the project is to dynamically render a m3u play-list as a view and use it to stream audio over my local area network. It’s fun and I’m looking forward to using it.
Cautionary Note:
These methods are good for working with collections. But not all Ruby Type Classes are equal. If you use the Integer() method on nil it doesn’t produce zero.
Integer(nil) # TypeError: can't convert nil into Integer nil.to_i # => 0
So you’ll need to use .to_i for integers. That being the case if you want to use any other methods like this then test them first to ensure their behavior.
Summary
Using Type Classes are a good practice. They will save time and effort by incorporating them on the right side of the assignment methods. It’s also nice that Array([]) won’t result in doubling the depth of the Array like [[]] but still returns []. So these methods are a safety net against nil. These methods are like having your cake and eating it too ^_^.
I hope you enjoyed this and that it was insightful. Please comment, share, subscribe to my RSS Feed, and follow me on twitter @6ftdan!
God Bless!
-Daniel P. Clark
Image by Mark Thurman via the Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic License.
Andrew Berls
February 5, 2015 - 2:30 pm
So these are not really “type classes” in the traditional sense – type classes are a way to define functions that have different implementations depending on the type of data they’re given (“ad-hoc polymorphism”). In Haskell, you might define a typeclass and an associated “instance”
so now you can call
a == b
to your hearts desire and the compiler will be able to figure out which function to call based on the types. That's a typeclass. In Ruby, things likeArray()
andHash()
are just methods defined in the [kernel module](http://www.ruby-doc.org/core-2.2.0/Kernel.html#method-i-Array).In your case, I think it'd be simpler to just provide a default value for your hashes. For example, instead of writing
, you can just provide a default like this:
Daniel P. Clark
February 5, 2015 - 3:11 pm
Thank you for informing me about the traditional sense of “type classes”. I was using it in the more literal sense of a method of the class for the type. I had originally learned about the coercive methods like Array from Avdi Grimm’s blogs and “Confident Ruby” talk.
There are many who seem to share your view of what is “simpler”. But when it comes to that I believe that may be a matter of preference/perception. If I were new to the language then all of this “{ |h,k| h[k] = [] }” would having me scratching my head in confusion. Coercing with an English word of what you’d like something to ‘be’ would be “simpler” just as example = {} seems easier on the eyes than example = Hash.new {…}.
I’m not saying your wrong, far from it. It seems that those who have spoken up on this article have preferred the style you’ve suggested. Perhaps you all are trying to “coerce” me ;-).
Before I had learned about the coercive Array() method I was doing some ugly code such as:
It worked, but I wasn’t happy with it visually.
There are many ways to accomplish the same feat in Ruby, I wrote this to illistrate the usefulness of coercion to protect building data collections. As Kache has recommended on Reddit the complex example would look a lot nicer with “dedicated, serializable Playlist and Mp3Path classes”. I do agree that your suggestion is simple. But I believe it’s only “simpler” as a matter of practice.
I admit I am a bit opinionated. I hope to put that aside though and learn what I can despite myself. Again thank you for sharing! It was insightful to me.
Stephan
February 6, 2015 - 7:06 pm
For the hash example, you can use:
Daniel P. Clark
February 6, 2015 - 7:19 pm
Thanks for the input! That works nicely.
Aaron Schrab
February 11, 2015 - 12:19 pm
That will use the same Array object for every key that isn’t previously set. If after the above somebody did:
hash[:another] << 7
Then `hash[:test]` and `hash[:another]` would both return `[5,7]`.
Daniel P. Clark
February 11, 2015 - 12:30 pm
Good catch. Wouldn’t want that unexpected surprise.
Stephan
February 12, 2015 - 4:24 am
Yeah. I just realised. Strange that it keeps re-using the same array for every non-initialized key-value pair. I do get why though, as it would require a deep clone to fix the issue, which would signifiantly reduce performance. Guess this approach isn’t any good for arrays and hashes (which has the same issue). I have tried to come up with a clever way to fix it using lambdas etc. but I can’t seem to make this work. You could of course monkey-patch the hash-class itself and make it return an empty array in the case that the element does not exist, but that seems like too much work, and that it would not really benefit anyone.
Aaron Schrab
February 12, 2015 - 7:43 am
There’s no need for monkey patching or doing anything overly fancy. You can instead supply a block to Hash.new:
Hash.new{ |hash,key| hash[key] = [] }
The block is responsible for putting the new item into the hash, and returning the value for the initial access.
This method had already been mentioned in a previous comment, so I didn’t address it in my earlier reply.