ARGF in Ruby
In my recent post about using Ruby for text processing, I used examples that worked with both standard input and files without actually having to alter my code in any way.
I was able to do this using a construct that’s yet another part of
Ruby’s Perl heritage:1 ARGF
. It’s a stream that reads from
either the files that’ve been passed on the command line or, if none
have been specified, from standard input.
Importantly, it does this without the calling code actually having to
know or care which input it’s reading from; this enables you to emulate
the behaviour of many Unix utilities — such as cat
, cut
, grep
, and
hosts of others — that allow you either to pipe input or read from
files.
Diving in
Like other streams in Ruby, ARGF
responds to each
; the block you
pass to it will be invoked once per line in the stream. So to
demonstrate how ARGF
works, here’s perhaps the simplest possible use
of it:
Reading from files
If we run the above script with arguments, like so:
Then Ruby will assume that each of the arguments is a file, and ARGF
will read from each of the files in turn, from left to right. That means
that our script is equivalent to:
If one of the files doesn’t exist, Ruby will throw its standard ENOENT
error, like so:
$ ruby argf.rb nonexistent.txt
argf.rb:1:in `each': No such file or directory - nonexistent.txt (Errno::ENOENT)
from argf.rb:1:in `<main>'
Reading from standard input
If no arguments are specified, then Ruby will read from standard input. That means that our example script is equivalent to:
This enables us to pipe input into our script. So we could call:
$ echo "foo\nbar" | ruby argf.rb
And we’d see the output:
foo
bar
More usefully, this means that we could pipe the input from another process into our script and do something interesting with it.
This “simplest possible” script is, you may have noticed, functionally
equivalent to cat
; it will concatenate files passed to it, and it will
echo back standard input.
Digging deeper
ARGF
has a few methods that are unique to it.
A few are useful when ARGF
is reading from files: we can use
ARGF.filename
to get the name of the file that’s currently being read,
and use ARGF.file
to get an IO
object pointing to the current file.
If you want to know when you’ve moved onto a new file, ARGF.file
will
come in handy: ARGF.file.lineno
stores the line number that’s
currently being read, which will naturally be 1
when a new file is
started. So, to read from all the files passed on the command line, but
output the name of the file before starting a new file, you could use:
If you’d like not to process a file, ARGF
has you covered too; just
call ARGF.skip
. This is useful if you only want to process files of
a certain type, or want to stop processing part-way through a file (once
you’ve got what you need, for example).
Summing up
ARGF
is one of the many great examples of how Ruby’s built-in
functionality respects “the Unix way”. It’s essential that flexible and
well-behaved Unix tools accept input both from standard input and from
files, and with ARGF
Ruby makes it trivial to support just that
behaviour.
If you’ve written scripts that either emulate this behaviour themselves
or that only support one method of input (e.g. only accepting standard
input, or only reading from files), then consider using ARGF
instead;
it can make your life easier and make your scripts more flexible — one
of those win-win situations that are pleasingly frequent in Ruby.
Text Processing with Ruby
Enjoyed this and want to find out more about data wrangling and text munging in Ruby? You might be interested in Text Processing with Ruby, a book that covers all that and more. It’s published by Pragmatic Bookshelf and is available now!
-
It’s the equivalent of Perl’s
while(<>)
idiom. ↩
Add a comment