March 12, 2015 by Daniel P. Clark

Substitution with Regex Groupings

As I continue to grow in experience I was looking into how I might do some in-place substitution that I had been accustomed to performing with array matching (split-map-join).  What I’m referring to isn’t just a letter for letter substitution, but something that would find a match and modify it.  For example; if I wanted to quote words that had a pound prefixed I used to do this:

This was some of my earliest Ruby programming technique for parsing strings.  After learning a bit more my code would look more like this.

And then after finding gsub; I didn’t take full advantage of it.

Needless to say, looking back, these ways of doing things are very wasteful and don’t need to be written out so long.  Learning regex, match data, and gsub more fully have simplified this down to one simple gsub command, and it works with regex groupings.  I’ll show you the style I like the most for and then I’ll detail some alternatives.

The syntax I use above with ?<foo> is setting a group name with foo which matches to \k<foo>.  The regex that gets matched is whatever follows the ?<foo> within the outer () parenthesis and that gets put in place where the \k<foo> is at in the second parameter.  I’ve chosen to use the word foo here, and the group name syntax, as words that resemble English are usually easier to follow and learn.

Regex groups don’t have to be named, they can be numbered in the order of which the match was found.  Each match is determined by the () parenthesis.  Instead of \k<foo> you will simply use \1 for the first matcher (not just the first match).

And to get an idea of multiple matches I will use an or pipe | between to regex match () parenthesis sections.

Notice that since the matching regex options were or’d that the output only had one item to replace in each of the <> sections and notice the order they printed in.

Now putting one inside the other we’ll match the 2nd one starting at o.

Notice that the output had a value for each match and how it worked? If you want multiple matchings with each of them having their own output then just append another gsub on it.  gsub targets a more specific replacement.

A cool feature available with gsub is being able to replace data with a Hash of Key-Value pairs which will substitute exact matches.  In my testing with this I found it didn’t work on complex results such as HTML/XML tag substitution.  But you can keep it simple.

Going back to nested matches they may come in handy for something like a Pig Latin translator.

It’s not a perfect Pig Latin translator, but it’s passable.  Here there are three parenthesis (()()).  And we’re only using the inner two and mapping those with \2 and \3 .  The outer parenthesis make sure that all the inner ones match for the result to evaluate.  The first inner parenthesis is matching any one non-vowel character.  And the second inner parenthesis matches the rest of the word.  Then we just swap them and add ay to the end for Pig Latin.

Summary

gsub rocks!  I’ve only recently learned about group variables in regex matchers.  I found  it out while looking into a better way to do inline substitution for my new gem color_pound_spec_reporter.  I wanted colors in my test output, but I didn’t want to have to do some complex string splitting just to wrap the ANSI color methods around it with map.  But now with gsub that’s super easy.

e.g.

And all looks lovely; true shows up in the color green.  Hopefully this will be as invaluable to you as it is to me.  Please feel free to comment, share, subscribe to my RSS Feed, and follow me on twitter @6ftdan!

God Bless!
-Daniel P. Clark

Image by Ian D. Keating via the Creative Commons Attribution 2.0 Generic License.

P.S. I’m still not a master at using regex and I realize that there may be better ways to do things.  I’ve only written this to illustrate substitution examples with gsub.

#gsub#in-place#match#match data#regex#ruby#string#substitution