Persisting data in Ruby with PStore
It’s quite common when developing scripts to want to persist data. Configuration variables; the last options chosen; the previous files read; a cache of method return values or the results of complex calculations. To use a database for this sort of simple persistence, even a lightweight database like SQLite, can seem like overkill.
Faced with this situation, many developers opt to “roll their own” persistence solution. Generally, they store what they want to persist in a hash, then write that hash as JSON or YAML to a file. Sometimes these writes happen as the script runs; sometimes they happen at the end of the script’s execution.
But rolling your own in this way is a bad idea. Apart from requiring you to write a lot of boilerplate, it’s also less convenient: you’re forced to write the file manually whenever you want to persist your data. It’s also prone to data loss if multiple processes are reading from and writing to the same file, forcing you to implement locking (or to not, and then corrupt your data).
Wouldn’t it be nice if there was a hash or hash-like data structure that persisted itself to disk for us, without us having to worry about serialising data or writing files?
Fortunately, Ruby’s standard library has us covered with its persistent
store library, which it calls PStore
; it’s both a specific
implementation of a persistent store and also an interface for other
implementations. Let’s take a look at both the regular form of PStore
and an alternate implementation, to see how it can help us persist data
simply and safely.
Regular PStore
PStore
is part of the standard library, so we can start using it with
a simple require
— no gems or external dependencies required.
We can create a new persistent store by passing a filename (or an IO
object) to PStore.new
:
If the file doesn’t exist, it’ll be created; otherwise, the existing data will be read.
That’s all the groundwork we need. Our store
variable now points to
a persistent hash, and we can start writing data to it.
Writing data
If we try to treat our persistent store like a regular hash, though, we’ll not have much luck:
That’s because PStore
requires you to both read and write data from
within transactions. But that’s a blessing, not a hindrance.
Transactions neatly solve the problem of multiple processes accessing
the same store, since only one transaction can run at a time; they also
allow you to roll back your changes if you encounter a problem
— ensuring that data is never written in an incomplete or corrupted
state.
We can start a new transaction by calling the transaction
method on
our store and passing it a block to execute. Here, we make the same
modification we tried to make before:
This time an exception isn’t raised, and after our block finishes the changes we’ve made are written to the store automatically. We don’t need to wait for our script to finish or for some other later point for the persistence to happen; it’s a constant process, happening after each transaction.
We can make as many changes to the store as we like during one transaction:
Sometimes, we’ll be calculating data as we progress through
a transaction. If we discover part-way through that we can’t or don’t
want to finish the transaction, we can call abort
. It’ll return from
the block and discard any changes that have been made.
Our final line in the transaction, storing the user’s details, will only be reached if we actually have a user; otherwise, no data will be stored at all, not even the first value we wrote.
We might also want to do the opposite, exiting the block but saving what
we’ve done so far. For that, we can use commit
rather than abort
:
In this case, if a user isn’t found, we’ll exit the block before writing
the :last_file_user
value, but the :last_file
value will be
written.
Reading data
Reading data is straightforward. Just like when we write data, we need to read it in a transaction — to make sure nobody’s trying to write the data at the same time — but otherwise, we just treat the store like a normal hash:
transaction
returns whatever the block returns, so if we’re just
fetching a single value we can do this neat one-liner.
PStore
offers a fetch
method too, which — like Hash#fetch
— allows
you to specify a default value, used when the key doesn’t exist in the
hash. So if we were to call:
We see that, since our key doesn’t exist, fetch returns its second argument.
The file format
Under the hood, Ruby uses Marshal
to to convert the hash to something
it can write to disk. Marshal
returns a byte stream; regardless of
other issues, this fact is enough for it to be unsuitable for some
applications.
Fortunately, the PStore
class isn’t the only way to to use PStore
;
there are other implementations that use the same interface, but read
and write data from and to different formats. One of these that also
comes with the standard library is YAML::Store
.
YAML::Store
YAML::Store
, as the name suggests, uses YAML as its data format when
persisting the hash; the nice thing about this is that the data as
written is human-readable and human-editable. Although finding yourself
editing your persisted data regularly should probably prompt you to
reconsider how you’re doing things, personally I often find it useful to
be able to peek into the data and modify it, removing individual keys or
adjusting individual values — something that’s impossible with Marshaled
data.
YAML::Store
actually inherits from PStore
; so, the only difference
between the two is how you create them. Rather than PStore.new
, we
call YAML::Store.new
:
After that, the store we have works the same as a regular PStore
; we
read and write data using transactions in exactly the same way, and it
behaves like a hash.
Which one you choose, for my money, depends on whether you need to value
performance or human readability in the particular application you’re
writing. I almost never have a need to optimise the performance of my
persistent store, and I generally place a value on being able to view
and edit the data; so, I primarily choose YAML::Store
. Your mileage,
as ever, may vary.
Next time you find yourself wanting to cache small snippets of data,
persist options, or otherwise write structured data to disk in
a painless way that you know is secure, reach for PStore
. Compared to
rolling your own solution, you’re likely both to save time and to
improve reliability.
Add a comment