the john report: 2009

Sunday, September 06, 2009

mount usb disks using by-label

When the screen on my old family laptop died a few years ago, I found the perfect use for it: as a headless server on the family network. Running Ubuntu 9.04 Server, this laptop primarily acts as a file server on the network, dishing out bits to a wide variety of clients (Mac, PC, iPod Touch and an Xbox 360). The internal hard drive is relatively small, and pretty much used only as the system drive; an external 500 GB USB drive holds data, and I recently added a second 1 TB USB drive to give the growing bit collection some much needed breathing room.

The addition of the second USB drive, however, has exacerbated an annoyance that I've suffered since setting this system up: the challenge of configuring consistent mount points. The USB drives appear as SCSI drives to Ubuntu, and therefore show up as /dev/sd* in the /dev hierarchy. Although Ubuntu magically handles recognition of the drives as they are plugged in or unplugged from the server, the drives don't consistently show up as a particular sd* device. I don't often unplug these drives, but when I do and then plug them back in (or when the system has been restarted), I have to hunt around the /dev directory, find the correct sd* device, and then manually mount the drive. With one drive, I could easily tell which sd* device had just been added to the system: with two, it became troublesome enough that I searched for some better way to handle my USB drives.

Friday, August 28, 2009

HN reader survey results

UPDATE: the charts below are smaller than I'd like, so I've posted full-sized versions in a Picasa Web Album if you'd prefer to view something larger.

I've been a fan of the Hacker News aggregation web site ever since I discovered it, and I was intrigued by the quick survey that Dave Lyon posted to HN in order to gather data for a class in machine learning algorithms. In a little more than a day, Dave collected more than 2000 responses, and posted a page pointing to the data collected. Jon von Gillem noticed that the standard charts generated by the Google Spreadsheets survey were fairly simplistic, and crunched the data to squeeze out some histogram and scatter plot goodness.

I've started learning more about R recently, and since I learn best by doing, I decided to take a crack at analyzing the data using R. I quickly abandoned the standard plotting package in favour of the excellent ggplot2 package by Hadley Wickham, which made even the somewhat complex colour scatter plots below easy to generate. Before crunching the data, I removed some of the more "suspect" submissions, and in the end decided to remove submissions with reported income > $200k to better highlight the majority of submissions in the scatter plots below.