Irregular Expression: Implementing IO::Path in Rakudo

class='post-header-line-1'></div> </div> <div class='post-body entry-content' id='post-body-6687406856608519456' itemprop='description articleBody'> I started off implementing the File::Spec module for Perl 6, <a href="">as explained in the last blog post</a>, but what I really wanted to do was to get some sanity in working with paths through IO::Path objects.&nbsp; And if I was going to do this, I needed to actually edit the core modules of Rakudo.<br /> <br /> Starting with a module which does stringy operations on directory paths, I set out with the goal of making some sort of easy-to-use, path manipulation class in the core.&nbsp; Something like how <a href="">Path::Class</a> works in Perl 5.&nbsp; Then I looked at <a href="">S32::IO</a>, and realized that IO::Path was exactly what I was seeking.&nbsp; But it was only partially implemented, and only for POSIX.<br /> <br /> So I'm going to walk through the steps I took to integrate multiple-OS path support into Rakudo, in the hopes that it will help other people to avoid my mistakes.&nbsp; Which were fairly numerous. :/<br /> <br /> This was my first foray into hacking a compiler, and I must confess it was fairly intimidating.&nbsp; I'm no script kiddie, but I hadn't worked on any large open-source projects before.&nbsp; However, the setting is made of Perl 6 code, so it was more or less a matter of integrating what FROGGS and I had already written.<br /> <br /> <h3> Baby's first steps</h3> <br /> I had played around with using File::Spec as a backend to an IO::Path in the <a href="">IO::Path::More module</a>, so it became clear that this was the best way forward for Rakudo.&nbsp; <br /> <br /> I realized right away that IO::Path's interface would have to change to include systems with a concept of volume.&nbsp; I did a small edit to to add a <code>$.volume</code> attribute, changed a few lines of code in <code>sub dir</code>, and compiled.&nbsp; Everything worked.&nbsp; I sent a pull request into Rakudo, just to get the interface-changing out of the way first.&nbsp; It tested okay, and was accepted.&nbsp; Wow, I'm good at this!<br /> <br /> Naturally, it all got worse from there.<br /> <br /> <h3> Problem 1: Biting off more than I can chew</h3> <br /> The next step was to add the File::Spec modules into the core.&nbsp; So I just started by copying over the .pm files into the core directory.&nbsp; Unlike in normal Perl code, the modules aren't included with <code>use Module;</code>.&nbsp; Instead, I edited the <code></code> to add the modules in the correct order.<br /> <br /> Since the File::Spec object needs to inherit from the subclasses, and the other subclasses inherit from File::Spec::Unix, I went with this order:&nbsp; File::Spec::Unix, File::Spec::Win32, File::Spec::Cygwin, File::Spec.&nbsp; Now, some of you might already see a problem with that.&nbsp; To them, I say: "Shh!&nbsp; No spoilers!"<br /> <br /> I realized that I couldn't use the file-scoped class definition (<code>class Foo;</code>) if it was going to end up all in one file, so I switched those out for curly braces.&nbsp; Then I rebuilt the makefile and compiled rakudo.<br /> <br /> That generated a whole mess of errors.&nbsp; And not all nice errors like before -- hadn't even loaded yet!&nbsp; This was a bunch of nqp recursion errors.&nbsp; I tried scaling back a little at a time, even commenting out the entire inside of classes, but I still had issues building Rakudo.&nbsp; Eventually, I had to scale back my approach, and add files to the build one at a time.<br /> <br /> <h3> Problem 2: Inheritance</h3> <br /> It turned out that each file in my additions had it's own unique problems.&nbsp; The first was well, it seemed like File::Spec::Unix just, well, disappeared.&nbsp; Unless I completely removed the File::Spec class, and then it worked.<br /> <br /> When you declare a subclass, you're actually adding to the main class' package.&nbsp; So File::Spec::Unix is really <code>File::Spec.WHO&lt;Unix&gt;</code>.&nbsp; So if you initialize File::Spec after File::Spec::Unix, it nukes the previous package and its symbol table.&nbsp; This problem was a lot of no fun to figure out, and I'm glad moritz++ and jnthn++ walked me through it.<br /> <br /> The solution here was simple enough -- stub out File::Spec with <code>class File::Spec { ... }</code> before creating File::Spec::Unix.&nbsp; This is enough to make sure File::Spec will be able to refer to its children.<br /> <br /> Although... the last thing I need is some yahoo doing <code>my class File</code> and then complaining about why they can't load File::Spec.&nbsp; So it was at this time I decided to change File::Spec to IO::Spec.&nbsp; Making a File class I can see -- if you decide to replace class IO, then you deserve what you get.<br /> <br /> <h3> Problem 3: The language is in the process of building</h3> <br /> The setting may feel like normal Perl code, but it's not.&nbsp; It's still in the process of being built.&nbsp; It's like a house in the process of construction.&nbsp; If there's only a wood frame, you can still hang a portrait on the "walls" -- but this will only get in the way when it's time to hang the drywall.&nbsp; Things need to come together in the correct order.<br /> <br /> I encountered these problems in a couple of different ways.&nbsp; The first was in using <code>rx//</code> to precompile some patterns before was loaded.&nbsp; Windows-style paths really need this for readablity, because they use both kinds of slashes as separators and the concept of volume is fairly complicated.&nbsp; I tried a lot of different ways of formatting, each of which made the build fail in new and unique ways. Then I discovered <code>MAKE_REGEX()</code> had loaded a bit before the IO modules.&nbsp; This particular problem seemed to be solved.<br /> <br /> The next couple of problems were caused by <code>$*OS</code> not being in scope at build time, as was way, way down at the end.&nbsp; It works just fine in method calls, but if it's needed as a class attribute, it's simply not in scope when you're building the class.&nbsp; I ended up replicating the same op used in to get the kernel string, so I could have it available earlier.&nbsp; Early enough to figure out which subclass of IO::Spec to use for the main object.<br /> <br /> So remember, object building happens right away, but subs and methods can carry references to things that happen later.<br /> <br /> <h3> Problem 4: <code>Br</code>eaking <code>Pa</code>nda</h3> <br /> Everything seemed like it was working pretty okay at this point.&nbsp; Until I got to the day of the masakism IRC seminar.&nbsp; It was at this point that tried to install a module for the class, so I couldn't help but notice that Panda seemed to die horribly.&nbsp; I checked out and built the nom branch to use for the duration, but I really had no idea what was going on.<br /> <br /> When I golfed the breakage in Panda, it came down to its "use lib" line -- and is shockingly simple.&nbsp; Running <code>use lib 'foo'</code> in the REPL alternated between three different errors from the NQP level.&nbsp; Something was seriously wrong.<br /> <br /> My only choice was to work backwards, and see what was causing the problem.&nbsp; I would say <code>git bisect</code> here, but I hadn't actually been making enough commits to effectively get at the problem.&nbsp; So that was the first learning experience here -- commit any time you think you have functional code.<br /> <br /> Anyway, it took a lot of edits, and I got most of the way through a novel while waiting for Rakudo to recompile, but I eventually traced it back to the precompiled regexes that were giving me a problem earlier.&nbsp; At this point I was about to give up, and make long, ugly regexes. Finally jnthn++ noted that hadn't loaded when this was trying to run, so I should just move all of the IO modules to later in the build.<br /> <br /> So I swapped back in the <code>rx//</code> syntax, and naturally, it all worked.&nbsp; The lesson here is that running some real software can pick up bugs (although the spectests would have shown it too).&nbsp; And that you really do need to make sure that dependencies come earlier.&nbsp; And most of all, if you're stuck, just ask in #perl6.<br /> <br /> <h3> Spectesting</h3> <br /> The methods I developed for IO::Path::More to IO::Path went in painlessly.&nbsp; I ended up writing an additional set of methods for IO::Spec -- <code>.split</code> and <code>.join</code>, to replace <code>.splitpath</code> and <code>.catpath</code> but with <code>basename</code> and <code>dirname</code> syntax.&nbsp; That allows IO::Path.basename to always have the current item in question, and all of the trailing slashes are gone.<br /> <br /> It was at this point where I started thinking about testing.&nbsp; IO::Spec had literally hundreds of tests from File::Spec in Perl 5.&nbsp; But ironically, IO::Spec wasn't actually specced.&nbsp; So the question became, should IO::Spec be just a backend, or a fully specified part of Perl 6?<br /> <br /> Implementations in Perl 6 are supposed to inform the spec, as well as the other way around.&nbsp; And the more I thought about it, *something* has to do the low-level string operations on paths.&nbsp; And there is no reason to hide it, either.&nbsp; Rakudo already provides access to all of its lower layers via nqp or pir ops, so it made sense to include it as a specced part of Perl 6.<br /> <br /> So I went ahead and edited the Specification for S32::IO, adding IO::Spec and several methods for manipulating IO::Paths.&nbsp; Lots of text.&nbsp; And then even more went into writing tests for IO::Path.&nbsp; Naturally, these uncovered some more minor bugs, but that's what tests are for.<br /> <br /> <h3> Patch Approved</h3> <br /> It didn't take all that long for my pull request to get merged, especially after I started writing tests.&nbsp; This whole process took about three weeks.&nbsp; I'll have a few minor cleanup I'm going to have to do in the next couple of days, as I resolve a bug in using IO::Spec::Unix.rel2abs.&nbsp; Parrot just added a readlink op on my request, so IO::Path.resolve should be working soon.<br /> <br /> And what we have to show for all this work is a Perl 6 implementation that does file path modifications on Linux, Cygwin, or Windows/DOS.<br /> <code>&nbsp;&nbsp;&nbsp; On Linux:<br />&nbsp;&nbsp;&nbsp; "/foo/./bar//"\&nbsp;&nbsp; .path.cleanup.parent;&nbsp; #yields "/foo"<br />&nbsp;&nbsp;&nbsp; On Windows: <br />&nbsp;&nbsp;&nbsp; "C:/foo\\.\\bar\\".path.cleanup.parent;&nbsp; #yields "C:\foo"<br />&nbsp;&nbsp;&nbsp; On any platform:<br />&nbsp;&nbsp;&nbsp;"C:/baz").volume;&nbsp;&nbsp;&nbsp; #yields "C:"</code><br /> I never finished VMS, or Mac Classic, but at this point, they can just be dropped in, by adding a new IO::Spec subclass.<br /> <br /> So there it is, at long last: sanity in file paths in Perl.&nbsp; I think, if I had known how it was going to go from the beginning, I would have been even <i>more</i> intimidated.&nbsp; Even so, it was just the same kind of debugging I'm used to in modules.&nbsp; Only without the safety rails of the parser and a much longer build time.<br /> <br /> But if you can write a module and a class in Perl 6, you already have most of the skills to contribute to the setting.&nbsp; A compiler with internals that feel like a high-level scripting language:&nbsp; that, like digital watches, is a pretty neat idea. 