Following Up on a Debian Bug
What happens when you make one contribution to Debian? You get that irresistible urge to follow up. Human beings are so predictable in doing things that are rewarding.
I wrote about my first real contribution to Debian the other day. My merge request was accepted and the patches I made landed in the unstable version of Debian.
I had a full-blown installation of Debian Unstable running on my hard disk by then. It is kind of funny how you install unstable (also called sid). You first install a stable version. And then you go to
/etc/apt/sources.list and make it point to unstable. And then you
apt full-upgrade your system. That’s it. You are now “unstable”. I also went ahead and installed KDE Plasma - the winner of DE showoff in free software camp. I installed a hundred Matrix clients and discovered this glossy one called Mirage. I reveled in some nostalgia looking at Synaptic package manager (and XFCE before installing KDE). I setup Kmail. I setup an schroot for package testing. And by then Praveen had found a critical bug in the node-yarnpkg package.
Till then we were happy that yarn is shipped and Praveen even suggested that I write a yarn plugin similar to rubygems-integration as my next contribution. But then things went bad. The yarnpkg we had shipped would fail at installing packages if
~/.cache/yarn was empty. I installed yarnpkg on my own fresh unstable and reproduced the same issue. Praveen suspected my hacky second patch to be the issue. But I knew that the function call I had patched out was only doing something non-critical. But how do I prove it? The only way was to fix the broken package.
I first tried a bisect (although I could have used
git bisect I did this by hand) to see if I could pin-point a later commit to be the issue. (The debian-js team is so fast that they had made a dozen commits on that package after mine). I
git checkout c122f5c2 which is the last commit I made and built the package. This time the
yarnpkg command was working even though I removed
~/.cache/yarn. Then I switched to a commit somewhere in between mine and the most recent and repeated the test. It was failing. I was convinced that something had changed after my commit.
So I bisected even more. But I could only narrow it down to two commits (because one of them wouldn’t build). And one of them included lots of changes because it involved importing upstream changes. Which means figuring out what caused the issue was not going to be easy from looking at commit history.
Therefore I decided on a different approach. I would work off the latest commit and make it work.
I switched to the HEAD of master and built a local deb. In the next terminal tab I had a root user session of the same schroot where I did
dpkg -i on the newly created deb. In the third terminal tab, I would run
rm -rf ~/.cache/yarn && rm -rf node_modules followed by
yarnpkg add express. Having three terminals open like this allowed me to not miss any step in the rest of the debugging session.
Next I had to figure out where
yarnpkg add was failing. If the command was failing with an error message, it would have been super easy to figure out where the issue was. But in our case it wasn’t failing. It was just fetching dependencies and then exiting without doing anything else. Earlier I had written a post about using strace in situations where logs aren’t available to debug. But in this case we had full access to the source code and could add a debugging statement anywhere. So it was about where to add the debugging statement and to what avail.
In situations like these, it is a good use of time to read the source code. If you’re lucky and the code is readable, you can follow along the path of execution and figure out what’s happening. I started with
src/cli/index.js which was using the
console.log statement and run the three steps (build deb, install, and run) to see what the output would be. The only trouble was that the build would take around 15 seconds and so this wasn’t a very quick feedback cycle. Nevertheless, there was a feedback cycle.
I finally tracked it down to a
fetchFromExternal function in tarball-fetcher. The execution was entering this function, and stops somewhere inside. This is where I opened the node streams can of worms.
Streams/pipes are a concept that node had before promises and async/await became a thing. You can have readable streams and writeable streams. You can read from readable streams and write into writeable streams. A network request’s response would be a readable stream which you can pipe into a writeable file stream. And you would have saved the network page to a file. With streams comes events and callbacks and all of those dark things from the past of nodejs. These days, you don’t need to use streams except if you’re actually “streaming” data. But unfortunately yarn 1 is legacy code and it uses streams quite extensively.
In the fetchFromExternal function above there is a stream created by the (now deprecated) request library - the package’s tgz file from npm. This is piped to a
hashValidateStream which probably checks the hash. Then it is piped onto an
integrityValidateStream (which we can presume checks that the package integrity is maintained). And then it is piped to gunzip-maybe which checks the stream and unzips it on the fly if it is gzipped. And then it is finally piped into a tar-fs stream which untars the files into a target directory. The issue could be with any of these pipes.
I had to isolate which pipe was causing the trouble. The hashValidateStream and integrityValidateStream seemed like they would only check the stream and not really transform it. So I removed them from the pipeline by directly connecting the network response to gunzip-maybe. Now I was left with only gunzip-maybe and tar-fs.
I first looked at the internals of gunzip-maybe. It seemed to be using multiple pipes within it. Fortunately for me gunzip-maybe was an embedded dependency and I could directly debug it from the source code of node-yarnpkg. After some poking around I figured out that gunzip-maybe was working correctly. Which meant that it was tar-fs who was the culprit.
Sure enough, when I
cd‘ed into the tar-js directory and tried to run
npm test all the tests were failing. I looked at what had changed between the previous versions of that directory in node-yarnpkg. It seemed like the
debian/watch file was changed to make tar-fs use version 1.x only. I downloaded the latest version (2.x) of tar-fs from github and replaced the
tar-fs folder manually. This time the tests were more successful. So I built to see if yarnpkg would succeed and sure enough it did.
After 58 builds with various trials and errors, I had a working build. I committed the tar-fs directory changes and made an MR. But the CI checks failed. Paolo was quick to help and pointed me to GroupSourcesTutorial. And that’s when I set out to learn what
debian/watch actually does and what
debian/watch is configuration for
uscan which is a way to keep the Debian package uptodate with the upstream versions. Essentially, you specify a URL to check for updates and a regular expression to find newer versions on that URL.
gbp, git-buildpackage, is a complicated beast. It allows package maintenance through git. It has a good manual although it doesn’t cover embedded modules. I’m yet to understand it fully, but it uses different git branches for different uses. One pristine-tar branch keeps track of upstream (or built?) tars. One upstream branch keeps track of upstream files. And one master branch keeps track of Debian changes.
I suppose it is a little bit complicated because Debian is a project that started long before git itself. One good thing about learning Debian packaging is that it forces you to use some ancient technologies that are nevertheless solid. Like IRC, emails, tar, Makefile.
I tried changing the regexp in
debian/watch for tar-fs, but
uscan -dd left me with some very cryptic version numbers and therefore I requested Praveen to fix the tarballs. Praveen was kind to document the commands used to update as well.
Later that day there was an online meeting of the Debian packaging team at free software camp. I was sleeping when it started, I joined towards the very end. The chat log was interesting and I will try to decode what they were probably discussing.
There was a discussion about CentOS alternatives. Praveen probably said how Debian is a community maintained operating system and nobody has single-handed control over its future.
There were mentions of youtube-dl and gnome 2-3/mate. Probably they were discussing how free software, by design, distributes control; how people can fork things if someone tries to take too much control.
There is a mention of a FreeBSD version of Debian which also is about the advantages of freedom.
Then they start discussing packaging. First about environment with schroot or containers (podman?). Schroot setup is discussed with link to Abraham Raji’s wiki. (Later they also shared simple packaging tutorial and a packaging intro video in Malayalam)
Then a few links about packaging itself was shared: intro and new maintainers’ guide and one specifically for node
There was a mention of using tracker.debian.org for seeing the status of any package, for example node-yarnpkg.
Someone asked a question here about why some version numbers have
-dfsg in it. The link that follows is to Debian Free Software Guidelines (DFSG). But the answer seems to be related to software that don’t comply with DFSG for whom a DFSG compliant version would be tagged -dfsg. (That DebianMentorsFaq page seems very useful)
Someone asked what a tarball is and the first response is a link to Linear Tape Open - something I’ve never heard about. Possibly tar format has something to do with this? Next link is to different type of archive formats
What followed was probably a live quiz where Praveen gave upstream version numbers in different scenarios and the participants had to guess the Debian version number. This is something I truly would have benefitted from attending.
Then they looked at the various links in node-yarnpkg on tracker, including the news item where Praveen updated tar-fs as above.
Huge ecosystem comes with differing quality of projects. Many libraries are written by people for fun, some are half-baked, some are buggy, some are frankly malicious. Sometimes people take something they’ve written for a project and publish it as a library for others to use it. Although npmjs makes discovering libraries easy, it doesn’t give a lot of indication about quality (except download numbers which often don’t give any clue about quality). Therefore, a lot of nice projects are built with a lot of poor quality dependencies.
Sometimes you can avoid complexities by cleanly separating things or using a dependency that can do the work of two. But that’s where the mindset difference between a developer and a packager comes in. A developer often only cares about working code. A packager is also worried about shipping that code to users (and then managing their expectations). When a developer wears both hats (like a devops person), they write better code that is easier to maintain and upgrade and ship.