Ruby's -e, -n and -p switches
It’s often said that Ruby is “Perl done right”: that it combines the terseness and text processing power of Perl with inspiration from Smalltalk, Lisp, and CLU, and in doing so creates a language that’s “the best of all possible worlds”.
Regardless of the merit of this idea, it’s certainly true that — when it comes to text processing — anything you can do in Perl you can do in Ruby, thanks mainly to the fact that Ruby steals wholesale many of the best text processing ideas Perl has. And yet lots of Ruby developers aren’t aware of the power Ruby can give you when it comes to writing throwaway one-liners in the shell.
All users of Unix-like operating systems will find themselves in this
position eventually: you have to process some output from a process or
some files, and you’re just reaching the point where standard tools like
grep
, head
, cut
, tr
, wc
and their brethren are beginning to
show their limitations.
You could learn awk
. Or you could reach for a powerful tool that you
already have in your box: Ruby!
The -e
switch
We all know, I’m sure, that you can invoke Ruby from the command line by passing it the filename of a script to run:
But did you know you can also pass code as an argument and have Ruby
interpret it? Just use the -e
flag when invoking Ruby:
Nifty, perhaps. But we can get much niftier.
The -n
switch
The -n
switch acts as though the code you pass to Ruby was wrapped in
the following:
In short, this means that the code you pass in the -e
argument is
executed once for each line in your input. So, imagining that you had
a file called foo.txt
, with the following content:
foo
bar
baz
Then invoking Ruby like so:
Will output:
foo
bar
baz
Congratulations! You’ve just implemented cat
in Ruby.
But what’s this $_
?
Throughout these examples, you’ll perhaps have noticed the use of the
special global variable $_
. When you invoke Ruby this way, it sets
$_
to the current line that’s being processed; so if you wanted to do
something like only print lines that start with “f”, that would be very
easy:
Working with standard input
Of course, like cat
, this doesn’t work only with files; you can also
pipe the output of another process, and use its output as your input.
To us a slightly contrived example, we might want to find the ID of any
instances of top
that are running on our system.
We can get a list of all running processes with ps ax
. It outputs
an enormous amount, but each line is formatted like follows:
49175 s010 Ss 0:00.18 login -fp rob
We have the process ID in the first column, and the process name in the
right; so all we need to do is print the first column if the line
contains top
. Easy:
If you wanted to, you could then pipe that into something like kill
,
if you wanted to get rid of all the matching processes. Handy!
(If you’d like to find out more about how you’re able to use the same code to work with both files and standard input, without changing anything, then you can read up on ARGF in Ruby.)
The -p
switch
These solutions are pretty concise already. But what if you feel as
though all the puts
statements are a bit unnecessary? Well, Ruby has
you covered.
The -p
switch acts similarly to -n
, in that it loops over each of
the lines in the input. However, it goes a bit further: after your code
has finished, it always prints the value of $_
. So, you can imagine it
as:
It’s really useful, then, for doing transformations on the input. If you
wanted to take every line you were given, but replace every instance of
the letter e
you found with the letter a
, you could do:
Here, we modify the value of $_
, and this modified value is what’s
printed to the screen.
Using BEGIN
blocks
Of course, our code here runs in a loop; what if we wanted to run something just once, before our loop starts? We might want to initialise a variable, for example.
In Ruby, we can use BEGIN
blocks to do this. They’re an idiom borrowed
from awk
, and allow us to execute code just once, at the start of the
program.
So, to output line numbers from your input, you could do:
Here, we initialise i
to 0
at the start of the script. The BEGIN
block executes only once, so is ignored on subsequent loops; we can then
increment i
, producing the following output:
1 foo
2 bar
3 baz
Wrapping up
Of course, all of these examples are fairly contrived; I haven’t done
anything that wouldn’t already be possible with tools like grep
,
pgrep
, tr
, and so on.
But in reality you have access to the whole world of not just the Ruby
standard library but every Ruby Gem too. Just think of the power in
Ruby’s String
class alone: gsub
, scan
, ljust
and rjust
,
squeeze
. Think of Digest
; think of all of the power of Regexp
;
Ruby’s date and time processing; CSV
, Net::HTTP
, and Zlib
. The
possibilities are endless.
Getting used to the idea that Ruby can be as much a part of your standard pipeline toolchain as any of the usual Unix tools is an important idea: it suddenly opens up a world of possibilities to do complex processing in a terse and expressive way. Go try it!
Text Processing with Ruby
Enjoyed this and want to find out more about data wrangling and text munging in Ruby? You might be interested in Text Processing with Ruby, a book that covers all that and more. It’s published by Pragmatic Bookshelf and is available now!
Add a comment