1. Debugging in the Deep End

    The Problem

    Last week I was working with Aaron on a series of VCV Rack plugin modules, and we were trying to add our own custom graphics for them. VCV Rack uses SVG for its plugins, so Aaron had built a front face for one of our modules, but it wasn't properly aligned. I imported it into Affinity Designer and tried to fix it up, but when I exported my new version and loaded it, suddenly all of our modules had vanished. Since our module wasn't supposed to vanish, and I hadn't done anything obviously wrong, I decided that this must be a bug in VCV Rack. Over the next few hours, I diagnosed and managed to fix this bug, and by the magic of open source and some luck, the PRs got merged the next day. In particular, I managed to make this fix without having ever looked at any of this code before, and I'd like to share the process I followed to manage to do this.

    Debugging the SVG

    The first phase when fixing a bug is to reproduce the bug. Here, because the rendering worked fine with Aaron's SVG until I re-exported it, I suspected that some feature being used in Affinity's SVG export wasn't supported by the VCV Rack SVG renderer. To figure out which, I used the first technique: minimize your failing case.

    First, I tried changing export settings, removing groups to flatten the SVG, doing everything I could to remove different features. As I went, I inspected the working and not working SVG side-by-side to see what the differences were. I didn't make much progress this way, so I started from the other direction, building up instead of tearing down. I saved a simple blank grey square, just a single element. When that didn't work, I figured it must have something to do with one of the attributes on the <svg> container element. For reference, a minimal SVG exported from Affinity might look something like:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
              "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
    <svg width="100%" height="100%" viewBox="0 0 240 380" version="1.1"
         xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"
         xml:space="preserve" xmlns:serif="http://www.serif.com/"
         style="fill-rule:evenodd;clip-rule:evenodd;stroke-linejoin:round;
                stroke-miterlimit:1.41421;">
         <rect x="0" y="0" width="240" height="380" style="fill:rgb(235,235,235);"/>
    </svg>
    

    So I looked through all the settings on that, I noticed that it was setting the width and height to 100%, whereas the working one was setting it to explicit pixel numbers. I copied the width and height out of the working one into the not-working one… and that fixed it. That suggested a possible problem: If you used percentage dimensions on the <svg> element, it wouldn't correctly calculate the size of the object, and would simply make it 0 by 0. This was a good enough guess for me, so I set about trying to figure out how to fix that.

    Spelunking the Code

    This brings us to the second phase of solving the bug: find a piece of code that's related to the bug, so you have a place to start. I suspected that I could find where the SVGs were loaded in VCV Rack and fix it to handle those percentages correctly. I didn't know exactly how I would handle them yet, I had to see what it was doing first. To find this, I took a simple approach: search the source code for the word "SVG" and see what I could find! I used ripgrep, a very good search tool, but you can use whatever tool you have available as long as it can search all the code at once. If your editor can jump to definitions in a project, searching for related words and then jumping from definition to definition can help you find the part of the code you're interested in very quickly; having good code navigation tools helps a lot.

    Using this, I found SVG widgets, followed their class hierarchy up to rendering components, and then eventually I found my way to a class calling functions from "nanosvg." Curious, I looked it up, and saw that it was a small SVG parser library, and that it produces a bunch of shape paths. In order to not have to resize all those paths (I assumed), I decided to try fixing the bug from inside nanosvg instead of inside VCV Rack. Knowing that it was a problem with dimensions, I searched the nanosvg code for the string "width". The second result was a very promising looking function:

    static void nsvg__parseSVG(NSVGparser* p, const char** attr)
    {
        int i;
        for (i = 0; attr[i]; i += 2) {
            if (!nsvg__parseAttr(p, attr[i], attr[i + 1])) {
                if (strcmp(attr[i], "width") == 0) {
                    p->image->width = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 0.0f);
                } else if (strcmp(attr[i], "height") == 0) {
                    p->image->height = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 0.0f);
    // …
    

    Writing a Fix

    I'd located a likely location for the bug, so now I changed mode from code spelunking to trying to understand what the code did. Since this function looked so relevant, I first tried to figure out what nsvg__parseSVG was doing. A good tool for this was finding where it was used: it was getting called in one place, from nsvg__startElement, and seemed to be being called when an <svg> tag was found, to compute the context from the attributes… perfect. The parameter const char** attr suggested a list of attribute strings, and the usage attr[i] and attr[i + 1] suggested the SVG key/value pairs. Therefore, it seemed like

    if (strcmp(attr[i], "width") == 0)
        p->image->width = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 1.0f);
    

    would parse the width coordinate value. In order to figure this out, we want to go look at nsvg__parseCoordinate.

    static float nsvg__parseCoordinate(NSVGparser* p, const char* str,
                                       float orig, float length)
    {
        NSVGcoordinate coord = nsvg__parseCoordinateRaw(str);
        return nsvg__convertToPixels(p, coord, orig, length);
    }
    

    Following those definitions, nsvg__parseCoordinateRaw follows a few steps to get to unit parsing, but it seems largely straightforward parsing of the data, no fancy processing. The fact that we've got an issue in % suggests that nsvg__convertToPixels is doing something interesting. And indeed, looking at the code for that function, it made clear what the length argument did:

    static float nsvg__convertToPixels(NSVGparser* p, NSVGcoordinate c,
                                       float orig, float length)
    {
        NSVGattrib* attr = nsvg__getAttr(p);
        switch (c.units) {
            // …
            case NSVG_UNITS_PERCENT:    return orig + c.value / 100.0f * length;
            default:                    return c.value;
        }
        return c.value;
    }
    

    It was used as the base value that the percentage should be relative to. Then, it becomes clear: nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 1.0f); makes 100% into 1px So, now we know what exactly has gone wrong, how do we solve it? Since I didn't know what the percentages should be relative to, I started researching, looking at Mozilla references for how the percent should behave.

    I didn't find an answer, but while I was researching, I ran into lots of examples that didn't specify dimensions at all. This made me suspicious: nanosvg handles most SVGs correctly, so it must have some code to handle this case. When you're fixing a bug, often the edge case that you're running into is similar to another edge case that's already handled, and you just need to make it cover your case as well. Since this must be related to the dimensions, and the dimension handling sets the width field while parsing the <svg> element, I went out searching for ->width and .width in the code. I immediately found nsvg__scaleToViewbox which contains a promising looking block of code:

    if (p->viewWidth == 0) {
        if (p->image->width > 0) {
            p->viewWidth = p->image->width;
        } else {
            p->viewMinx = bounds[0];
            p->viewWidth = bounds[2] - bounds[0];
        }
    }
    

    This looks like what we want! It will recalculate the width and height if they're set to 0, so we just need to make sure that our 100% sets it to 0 instead of 1. And to fix that, we can simply change:

     if (strcmp(attr[i], "width") == 0) {
    -    p->image->width = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 1.0f);
    +    p->image->width = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 0.0f);
     } else if (strcmp(attr[i], "height") == 0) {
    -    p->image->height = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 1.0f);
    +    p->image->height = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 0.0f);
     } else if (strcmp(attr[i], "viewBox") == 0) {
    

    And that's the whole fix!

    Conclusions

    You can use these techniques the next time you have to jump into a large codebase that's unfamiliar. Finding a simple case that fails, making a hypothesis about why it fails, and then searching for terms related to that gives you a big head-start navigating the code. Being able to jump to definitions helps you build a mental map of a thin slice of the code. Even though Rack is about 11K lines of code, and nanosvg is almost 3K, in the process of fixing this bug I only glanced at a few hundred lines of code, and only tried to understand a few dozen of them. The next time you want to try to examine a new codebase, keep these tricks in mind.

  2. You Got Your Race Condition Inside My Package Manager!

    A Case of Broken Builds

    The continuous integration servers at my current job are unfortunately stateful. Every week or so, we run a bunch of configuration processes to reinstall packages to keep the environment clean. One of these reinstalls pip and the Python libraries used by build tools. This morning, I got a message from one of the build engineers telling me that the Python libraries weren't installing correctly anymore. (Even though I'm an intern, I'm apparently one of the office Python experts now.) So, I opened up the build log, and began looking around.

    What was failing was pretty clear:

    Collecting ruamel.yaml
      Using cached ruamel.yaml-0.15.28.tar.gz
    Installing collected packages: ruamel.yaml
     ...
     error: Microsoft Visual C++ 9.0 is required
    

    but why was this suddenly happening now, without us making any changes to our configuration? Also, why are we installing ruamel.yaml? We're not using that!

    Long story short, ruamel.yaml was a transitive dependency of dateparser, an excellent library for parsing natural language dates. It wasn't clear to me why it would be suddenly failing though, so I decided to go investigating further. Looking at the release notes of dateparser, I saw that they had recently pinned ruamel.yaml to <0.14, which we clearly weren't getting. Previously, the version was un-pinned, so I decided to go look at the release notes for ruamel.yaml, and sure enough, there were releases over the weekend—those must've been what broke it.

    We upgraded our dependency on dateparser to 0.6, and tried again... and it still failed while trying to build the newest version of ruamel.yaml. One period of looking at GitHub blame views, commit histories, and unpacking PyPI tarballs later, I determined that version 0.6 of dateparser released on PyPI doesn't actually have the pin the version of ruamel.yaml, despite what the changelog claims. (I opened dateparser issue #342 for this.)

    Since the version wasn't pinned, we just asked pip to first install an older version of ruamel.yaml, to hopefully get priority when dateparser tried to install it. So, we put ruamel.yaml==0.13.14 in our package list, and then tried again. Finally, everything worked perfectly.

    Case closed.


    This Fix is a Mystery

    But wait, what's this? Looking closer at the successful build logs, we can see that both ruamel.yaml-0.13.14 and ruamel.yaml-0.15.29 are installing without complaint. What's stopped the error? Well, if you'll look at the version number up at the top, we were installing ruamel.yaml-0.15.28 before—just one hour previously, while I was on my lunch break, an update to ruamel.yaml had been released. Looking back at previous versions on PyPI, I finally figured out what had gone wrong. If you look at the downloads on the PyPI page for ruamel.yaml version 0.15.28, you'll see that there are no Windows wheels. (Wheels are the format that Python uses to distribute compiled C extensions and pre-packed libraries.) However, if you go to the page for version 0.15.29, then you'll see that Windows wheels are finally present. So, I guess until dateparser fixes their version pinning, we'll just have to hope that ruamel.yaml stays packaged correctly.

    Case closed.


    We Get Very Unlucky

    Oops, nope it's not. Later in the afternoon, I got another message that some of the builds had failed. Looking at the first build that started failing, again we see that...

    Collecting ruamel.yaml
      Using cached ruamel.yaml-0.15.30.tar.gz
    Installing collected packages: ruamel.yaml
     ...
     error: Microsoft Visual C++ 9.0 is required
    

    okay, this project releases fast, this is the fourth release in 2 days. In any case, the last few builds succeeded with 0.15.30, so what happened? Well, I don't know for sure, but I have a pretty good guess. I suspect that the release process for ruamel.yaml isn't atomic, and that they upload their source releases first, and the wheels come a bit later. We were unlucky enough to start a build during that first upload, where only the source package was available, and no Windows wheels. But, the few builds that got held up and started 4 minutes after the others took long enough that the wheels were available, and so they installed without any fuss.

    This was an exceptionally unlucky But, I've got a very good story now—and also a much greater appreciation for various package manager .lock files.

  3. I'm a Rust Contributor

    A month or two ago I was on the #rust IRC when someone discovered that pow() didn't act quite right for unsigned numbers. This was a bug that was isolated to a single function, so it seemed like something that I could handle. The issue got posted, I claimed it and debugged it, and actually managed to fix it! It took a little while, but very early this morning PR #34942 Fix overflow checking in unsigned pow() was merged. Now I'm a contributor to Rust!

    Try to find a small thing that you can fix in something that you use. Somewhere in there is the right issue that you can fix. It's a great experience. (I really look forward to the release notes for 1.12...)

    Update: It turns out my fix made it into the 1.13 release, and my name is in the contributors section in the release notes.

  4. Trying Docker

    I only vaguely know what Docker is, but I know the pain of manually installing the requirements of a piece of software. A few days ago I read Creating a Basic Webservice in Rust and it inspired me to go back and look at Docker again. I have several ideas for small webservices knocking around in my head right now, so I figured "why not try to deploy them right?"

    The first thing I did was go to the Docker site. The newest version on OS X is a fancy application with some GUI components and a nice menu whale icon. Installing and getting started is a breeze as well, the getting started on Mac docs show every step with pictures, and event walk you through some examples.

    Trying out some of the examples shows a few cool things right away. One, trying the nginx example:

    ~ caleb$ docker run -d -p 80:80 --name webserver nginx
    Unable to find image 'nginx:latest' locally
    latest: Pulling from library/nginx
    51f5c6a04d83: Downloading 30.41 MB/51.36 MB
    a3ed95caeb02: Download complete
    51d229e136d0: Download complete
    bcd41daec8cc: Download complete
    

    you can see that multiple downloads took place concurrently, the largest one is still running. Once it completes, just going to localhost shows that we now have a webserver running beautifully, with essentially no setup. I also notice that when I was starting the server, I mapped port 80 inside the container to port 80 outside. Again, coming at this with essentially no experience, I hadn't realized quite how much Docker did for you.

    Now that I've made it through the Mac getting started page, I'm going to move on to the Getting Started With Docker page. It's exceptionally well written and self-explanatory, and the Docker docs (at least all the introductory ones I've looked at so far) are just as helpful.

    ~ caleb$ docker run docker/whalesay cowsay $(fortune)
    _________________________________________
    / I worked in a health food store once. A \
    | guy came in and asked me, "If I melt    |
    | dry ice, can I take a bath without      |
    \ getting wet?" -- Steven Wright          /
    -----------------------------------------
        \
        \
          \
                        ##        .
                  ## ## ##       ==
              ## ## ## ##      ===
          /""""""""""""""""___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
          \______ o          __/
            \    \        __/
              \____\______/
    

    I haven't even really used Docker yet, but from a first impression I love the experience. The docs are great, the interface is great, even just the Whale aesthetic is really cute. I'm going to go back to experimenting with this and tell you how to deploy a program next time.

  5. Trying Pelican

    I wrote my own Markdown parser because I wanted the option to add features, and to generate more semantic HTML than I was used to with other static site/blogging platforms. That turned out to be a maintainability problem, and I never got some of the more tricky features like lists (and especially nested lists) working propertly. Instead, I've decided to use a standard platform, Pelican, to write my blog and build my site. I chose Pelican because, while Ruby is nice, I still prefer Python and use it far more frequently. In addition, Pelican uses reStructuredText, which is extensible, so if I want to add more features to my documents, I can. Even by default, reStructuredText has more features than Markdown does [1].

    [1]See? It has footnotes!