Paths aren't strings
In Ruby, we deal often with files; reading them, writing them, checking
whether or not they exist. When working with these files, we generally
reference them by their paths on the filesystem: /etc/hosts
, for
example, or /usr/local/bin/git
.
As in other languages, it’s pretty common in Ruby to represent these
filesystem paths as strings. In a way, that’s fine: it works okay, and
if we want to do something that gets to the files that they represent,
there are methods on File
that can help us find what we want (for
fetching the absolute path of a relative filename, or checking whether
a file exists, for example).
But in the world of Ruby, with its rich object model, this feels neither very idiomatic nor very object oriented. There’s lots of behaviour associated with paths, and strings don’t encapsulate this behaviour very well.
Paths can be relative, for example. That is, multiple paths that seem
different when expressed as a string can in fact correspond to the same
file; if we’re in the /usr/local
directory, for example, we can reach
/etc/hosts
using the paths /etc/hosts
or ../../etc/hosts
; we can
reach /usr/local/bin/git
with both /usr/local/bin/git
and bin/git
.
To check if one string path is the same as the other, then, we can’t
just do path1 == path2
.
That’s not all. Paths are representations of files, and those files have attributes and states that matter to our programs. Does the path point to a directory, for example? Does the file the path points to exist? How big is it? Can we read from it? Can we write to it?
Paths are fundamentally also a hierarchical data type, expressed using
a delimiter (usually /
); we can traverse deeper into the filesystem by
adding slash-separated values to a path, and climb back up the
filesystem hierarchy by removing them.
The String
class in Ruby is aware of precisely none of these
behaviours, and so if we want to use them then we’re forced to use
a kludgey mix of static methods; things like File.join
to build up
paths, File.exists?
to check for the existence of files, and so on.
Some things can’t really be done at all if we store our paths as
strings, assuming that things like traversing the filesystem by using
split("/")
fills you — rightly — with unease.
So if storing paths as strings is an anti-pattern, what are we to do?
Well, it turns out that the Ruby standard library comes with a type for
just this purpose, albeit one that’s underused: Pathname
.
Pathname
is part of the standard library in Ruby; it’s not an external
dependency like a Gem, so you can safely rely on it being present in all
your scripts. Once we’ve require
d the library, we can create
a Pathname
in Ruby by passing a string to Pathname.new
:
In fact, there’s a shortcut for Pathname.new
; just call Pathname
like a method:
If we do nothing else, we’ve got ourselves an object that behaves in
many ways like a string. Its to_s
method, for example, returns the
path as a human-readable, ordinary string:
In places where things are implicitly converted to strings, then — like
puts
and print
— we can use our Pathname
object just as we would
a normal string.
It also implements to_path
, which is used internally
by the File
class; so, we can pass our Pathname
object into
something like File.open
, and it will act just the same as if we
passed it the path as a string:
But we also gain a lot of methods that a string doesn’t have. In this brief overview, I’m going to split them into two categories: inquiry and traversal.
Inquiry
Since our Pathname
object knows that it represents a path to a file,
unlike a string would, we can ask it questions about the file that our
path represents. To continue our above example, we might want to check
whether the path points to a directory:
Or whether the file actually exists:
We can also check whether the current process has permission to either read from or write to the file:
Of course, these aren’t particularly exciting features; they’re already
fairly accessible as part of the File
class thanks to the FileTest
module. But it certainly feels a lot more OO to pass these messages to
the path itself, rather than using some entirely separate static
methods.
Traversal
For my money, though, it’s when traversing the filesystem that
representing paths as Pathname
objects really starts to feel
worthwhile.
Let’s imagine that you have the following folder structure:
lib/
+ script.rb
data/
+ file.txt
You want to access file.txt
from your script.rb
script, but you want
to make sure that this works whatever working directory you run the
script from. That means you need to figure out what the absolute path to
file.txt
is, and then reference it using this absolute path.
If you’re written a gem, for example, you might well have encountered this sort of task before. A solution I often see is something like the following:
I see this pattern in gems a lot, and despite having seen it hundreds of times and knowing instinctively what it’s doing, it still throws me a little when I encounter it: there’s so much noise there that I have to actively think about what the author is doing.
Let’s rewrite this to use Pathname
, and see if we can’t reveal our
intentions a little more clearly:
We start by getting a reference to the current file. Then, we go up one level to the directory that the file resides in; then up another to the directory one level above.1
The next step, if you’re used to representing files as strings, might
seem odd: we’re just using the +
operator to add elements onto the
path, but we’re not adding a separator as we might otherwise do either
manually ("foo" + "/bar"
) or with File.join
. That’s because
Pathname
will take care of adding the separators for us every time we
append a new element to the path.
I don’t know about you, but to me the second example seems clearer.
We’re not just limited to this simple traversal, either. Let’s imagine we have a path to a file deep in the hierarchy of the file system:
Imagine we want to work our way up the filesystem from our current
location until we hit a certain point: a directory with a certain name,
for example. There’s no straightforward way to do this with the path
represented as a string, but with Pathname
it’s easy:
Here, we climb upwards through the filesystem (so we get to folder
,
then up to deep
, then up to really
, and so on backwards through the
path). As soon as we find a directory whose name is some
, we’ve found
what we’re looking for and so break out of our loop.
(If we wanted to proceed in the opposite direction — that is, to start
with /
, then /some
, then /some/really
, and so on — we could use
descend
, which is otherwise identical to ascend
.)
The great thing about this type of traversal is that we don’t have to
touch the filesystem at all. The above example, with its path that
doesn’t exist at all, will still execute perfectly well; Pathname
has
enough information from the path to know each step along the way, right
up to the filesystem root.
That’s not to say that we can’t access the filesystem when we want to,
though. For example, we don’t have to traverse the filesystem upwards:
we can drill down into it with children
:
The array returned by children
contains references — as Pathname
objects, naturally — to all the files and directories in the /etc
directory.
From here, it’s a short leap to powerful and expressive traversal of the
filesystem, especially for methods like children
that return arrays
(and so have the full power of Enumerable
available to them). For
example, let’s fetch all the directories in the current directory that
have more than 10 files in them:
Or find all the directories that have at least one CSS file in them:
There’s much more to Pathname
than the small snippet I’ve presented
here, but hopefully it’s been enough to convince you to think about
using Pathname
the next time you want to represent file paths in Ruby.
It’s powerful, semantic and, since it’s part of the Ruby standard
library, there’s not much excuse not to use it.
-
In Ruby 2.0, we can simplify this further by calling
Pathname(__dir__)
, eliminating the need for the call todirname
. ↩
Add a comment