Ruby’s $_ variable
Ruby can be invoked from the command line in order to create powerful text-processing one-liners. I wrote about this a while back, in “Ruby’s -e, -n, and -p switches”.
These one-liners are powerful, concise, and expressive. For an example that verges on the magical, how about outputting the third field in a CSV file, but only if the line contains a URL?
Or outputting the contents of a Markdown file, with curly quotes switched to straight ones?
In both of these examples, and indeed in all Ruby one-liners, an oddly
named global variable is at work even when we don’t see it used
explicitly. Its name is $_
(dollar underscore).
Ruby has many global variables like this; there’s a complete list of
them
here.
But $_
is one of the most useful. Indeed, along with the globals
relating to regular expressions, it’s the only one I use with genuine
regularity.
There are five key places that $_
is used. In each one, it’s likely
that we won’t actually see the variable itself; instead, it’s used by
Ruby internally. But knowing that it’s there can help to explain what’s
going on; it helps us thread a connection through several different
areas of the Ruby language, allowing us to peer behind the curtain and
understand Ruby’s magic a little better. Let’s dig in.
1. It’s set to the content of the current line
There are two scenarios in which we loop over lines of input. The first
is when, as in the examples we saw earlier, we use the -n
or -p
switches when invoking Ruby.
When we do this, the Ruby interpreter will loop over the lines of input
for us, running the code that we pass to it once for each line of input.
In doing so, it sets the value of the $_
variable to the contents of
the current line. For example:
The reason this happens, though, is because using the -n
and -p
switches is essentially like wrapping your code in the following:
It’s actually gets
that sets the $_
variable, which means it’s also
accessible in regular Ruby scripts too — not only one-liners. Wherever
you call gets
, $_
will be set to the input that gets
received.
2. It’s outputted automatically when using -p
If we use the -p
option when starting Ruby, it’s not necessary for us
to write a puts
or print
statement to generate some output; Ruby
will do it for us. It still executes our code once per line of input,
but after each line it will output something too.
But what does Ruby actually output? You guessed it: the $_
variable.
This means that, if we pass the -p
option to Ruby, we can affect the
output of our script by manipulating the content of the $_
variable:
In that case we reassigned the variable entirely, but we can also mutate it:
In this case, we transform the line of input from lowercase to
uppercase. (We can tell that the method mutates the string, rather than
returning a new one, because of the !
.)
3. It’s an implicit argument to print
When we invoke Ruby with the -n
or -p
switches, the behaviour of
some of Ruby’s core methods changes slightly. One such change is how
print
behaves if we don’t pass it an argument.
In an ordinary Ruby script, or a one-liner without -n
or -p
, calling
print
without any arguments outputs nothing:
If we invoke Ruby with -n
or -p
, though, print
will output $_
if
we call it without arguments:
This makes it really easy to write filters, that only output lines that meet certain conditions. For example:
This one-liner outputs only those lines the start with the letter f
.
4. It’s the implicit receiver of some global string methods
Another behaviour that changes when Ruby is invoked with either the -n
or -p
options is that some global methods are defined. They are:
sub
gsub
chop
chomp
They’re defined in the Kernel
module, the same place as print
and
puts
, which means that we don’t call them with a receiver — they’re
global methods.
But how can we call, say, gsub
in this way? Normally, the receiver of
the gsub
method is the string that we want to perform a substitution
within. If there’s no receiver, what string will be used instead?
There are no prizes for guessing that the answer is $_
. In this way,
these global methods allow us to perform operations on each line of
input without having to refer to that input explicitly. For example:
In this case, we output each line of input, except with all vowels replaced with underscores.
This behaviour is even more useful when used with -p
, since we can
skip the output step:
This works because these global methods actually modify $_
as well as
manipulating its content; they’re actually equivalent to the
!
-suffixed methods on String
, and so the above example is equivalent
to:
Particularly if you’re not comfortable with using sed
, but even if you
are, this is a really powerful way to perform find-and-replace
operations from the command line.
These global methods are otherwise identical to their counterparts from
the String
class; they’re just a useful shortcut for a common
operation.
5. It’s the implicit matcher of regular expressions
The final place that $_
is used is as the implicit subject of regular
expression matches. It’s this behaviour that I exploited in the very
first example in this post, and it’s this behaviour that’s perhaps most
obscure (or magical, depending on your viewpoint).
This behaviour is triggered either when we use a regular expression in
a conditional context, or by using the ~
operator on a regular
expression. For example:
In the former case, we see that an integer is returned if the expression
matched the current line of input (in this case 0
, since the
expression matched at the very first character). If the expression
didn’t match, the method returns nil
.
It’s the latter, conditional form that’s most useful, since it allows us to do something based on whether a line matches a given expression — an incredibly common requirement for filter scripts.
Behind the scenes, this translates to the following:
The implicit example is much more magical, but it’s also much shorter and easier to read — and with one-liners, every character counts.
Summing up
Ruby has lots of cryptic globals, but one that crops up in lots of
different places is $_
. It’s always connected to the idea of
processing input line-by-line, which is a really common requirement.
Getting to know it can help you write nicely concise text processing
scripts — and concision is particularly helpful when you’re writing
one-liners.
Text Processing with Ruby
Enjoyed this and want to find out more about data wrangling and text munging in Ruby? You might be interested in Text Processing with Ruby, a book that covers all that and more. It’s published by Pragmatic Bookshelf and is available now!
Add a comment