Real progress in long-running command-line scripts
Earlier in the week I wrote a post on how, when executing long-running sub-processes in your Ruby scripts, it might be useful to show some kind of progress to your user.
But the progress displayed there was fake: it bore no resemblance to the actual progress of the sub-process. In many respects this is better than nothing, but in others it’s worse; although we’re displaying an indeterminate progress bar, so we’re not actually lying to the user, we’re still not distinguishing adequately between the importing state and the hung state.
Surely there must be a way to improve on this, and show real progress to the user? Turns out, there is. With the application of a little more Unix knowledge, we can do just that.
For this example, I’m going to revisit the use case of importing a database dump into mySQL that I discussed in my first post. For reference, it was equivalent to a system call of:
mysql some_database < dump.sql
Showing real progress
This solution, while perhaps involving more conceptual understanding of Unix fundamentals, is actually simpler than our previous “fake” solution.
Here’s the code in full. I’ll then break it down and discuss it.
What’s going on here, then?
First, we use the Unix wc
command to get the total number of lines in
the file — it’s much quicker than trying to do this calculation ourselves
in Ruby.
Next, we open the dump.sql
file for reading, and create a new progress
bar. We set the total value of the progress bar to the number of lines
in the dump; this will allow us to increment once per line and have the
progress bar finish when we’ve finished reading the file.
This is where the magic happens. We use the popen
system call to
execute our mySQL command as a sub-process. We tell popen
that we’re
interested in writing to it by passing a mode of w
(just like when we
open a file for writing).
This is just like opening a file: if we passed a block to File.open
,
the block would be passed a handle that would allow it to write to the
file. We’re doing the same here, except the handle we have will write to
the mysql
command’s standard input stream, rather than to a file.
Now we loop over the lines in the dump. For each one, we increment the
progress bar by one, and then pass the line to the mySQL process.
stdin
here refers to the handle that we opened with popen
.
Now, this code might be a bit hard to wrap your head around, especially if you’re not familiar with Unix pipelines. But if you’ve ever used a shell, you’ve used this idiom: it’s what happens under the hood when we pipe between two processes in our shell. In essence, what we’re doing here is:
cat dump.sql | mysql some_database
The only difference is that our Ruby script sits in the middle, orchestrating the pipeline and displaying progress to the user in their shell.
This illustrates neatly one of the foundational principles of the Unix
philosophy: everything is a file. Swap out popen
for open
, and we’re
doing exactly the same things that we’d do if we wanted to write the
database dump to another file; it’s also the same as we’d do if we
wanted to write to a network socket. Across all of these disparate
interfaces, Unix presents the same, consistent abstraction: the file.
That’s it! We should now see a proper progress bar on the command-line that increments as the file is imported into mySQL.
Since the file is never read into memory at any point — it’s processed line-by-line — it should scale to any size of input file that you can throw at it.
There aren’t really many downsides to this approach. I’ve never found it to be measurably slower than simply shelling out the command — unsurprising, really, given that the shell is making the same system calls — and even if there was a performance hit, there are many use-cases where a slightly slower import with accurate feedback would be the preferable option.
Ta-da; another way that a little use of Unix processes can help us out in our day-to-day development.
Add a comment