Sunday, September 06, 2009

mount usb disks using by-label

When the screen on my old family laptop died a few years ago, I found the perfect use for it: as a headless server on the family network. Running Ubuntu 9.04 Server, this laptop primarily acts as a file server on the network, dishing out bits to a wide variety of clients (Mac, PC, iPod Touch and an Xbox 360). The internal hard drive is relatively small, and pretty much used only as the system drive; an external 500 GB USB drive holds data, and I recently added a second 1 TB USB drive to give the growing bit collection some much needed breathing room.
The addition of the second USB drive, however, has exacerbated an annoyance that I've suffered since setting this system up: the challenge of configuring consistent mount points. The USB drives appear as SCSI drives to Ubuntu, and therefore show up as /dev/sd* in the /dev hierarchy. Although Ubuntu magically handles recognition of the drives as they are plugged in or unplugged from the server, the drives don't consistently show up as a particular sd* device. I don't often unplug these drives, but when I do and then plug them back in (or when the system has been restarted), I have to hunt around the /dev directory, find the correct sd* device, and then manually mount the drive. With one drive, I could easily tell which sd* device had just been added to the system: with two, it became troublesome enough that I searched for some better way to handle my USB drives.
Luckily, I quickly stumbled on this howto article describing a hierarchy under /dev that I've never explored before: /dev/disk (argh!). In a nutshell, labels and other identifiers associated with filesystems on attached drives can be used to consistently reference those drive and mount them. Better yet, I discovered that the volume label on my FAT32 drives was automatically recognized, making the mounting operation as simple as shown by the example below:
mount /dev/disk/by-label/jukebox /mnt/jukebox
Since the drives can be referenced in a consistent manner, I've *finally* been able to add them in fstab, making mount and umount operations *much* easier to deal with! 

Friday, August 28, 2009

HN reader survey results

UPDATE: the charts below are smaller than I'd like, so I've posted full-sized versions in a Picasa Web Album if you'd prefer to view something larger.

I've been a fan of the Hacker News aggregation web site ever since I discovered it, and I was intrigued by the quick survey that Dave Lyon posted to HN in order to gather data for a class in machine learning algorithms. In a little more than a day, Dave collected more than 2000 responses, and posted a page pointing to the data collected. Jon von Gillem noticed that the standard charts generated by the Google Spreadsheets survey were fairly simplistic, and crunched the data to squeeze out some histogram and scatter plot goodness.
I've started learning more about R recently, and since I learn best by doing, I decided to take a crack at analyzing the data using R. I quickly abandoned the standard plotting package in favour of the excellent ggplot2 package by Hadley Wickham, which made even the somewhat complex colour scatter plots below easy to generate. Before crunching the data, I removed some of the more "suspect" submissions, and in the end decided to remove submissions with reported income > $200k to better highlight the majority of submissions in the scatter plots below.
The following histograms provide a more detailed profile of HN survey participants by age, income, years in their industry, and hours worked each week. I find the age histogram particularly depressing, as I'm definitely in the long tail of the chart. I wonder why there are so few older geeks? Perhaps we disappear in some Logan's Run-esque fashion?
I was curious to see what relationship there might be between some of the variables captured by the survey, and decided to test how income and hours worked each week. The scatter plot below does suggest that those working 20 hours or less during the week earn less than those working more hours in the week, but working more than 40 hours a week doesn't appear to dramatically increase income.
I also wanted to test the relationship between age and income, but group the data by factors such as education and type of employment to see what impact such factors had. I used the spiffy capabilities of ggplot2 to quickly generate the two scatter plots below. To my (admittedly aging) eyes, no patterns immediately jump out - perhaps generating separate scatter plots by the factor elements would help highlight any patterns that may exist.
I've made available the survey results data (filtered to remove both suspect entries and entries with income > $200k) that I used in my analysis if anyone is interested in crunching the data themselves. If you do find something interesting, be sure to drop a note in a comment to this post!

Sunday, May 25, 2008

cmd key confusion


When I picked up an old 12" PowerBook G4 last fall (just in time for the launch of Leopard!) I wondered how long it would take to become familiar enough with keyboard shortcuts under OS X such that they'd become reflex actions. Not long at all, as it turns out - especially once I worked out that the Command key on a Mac is similar in concept to the Windows key on a PC.

After several months of switching between Windows PC (at work) and a Mac (at home), I find the daily transition fairly seamless - with one exception. I use keyboard shortcuts fairly often when browsing, especially to jump to the browser Search and Address fields. Jumping to the Address field in Safari is Cmd-L, so after an evening of web surfing I find the first thing I do the next day at work on my Windows PC is hit the Windows-L key combination - which logs me out. It's been months now, and I still do this every two or three days! Argh!

Sunday, May 18, 2008

ted, this case is closed

I've been excited about receiving a TED 1001 home energy meters for two weeks now. I downloaded and skimmed through the TED manual to see what the installation requirements were. I Googled online forums to learn how I might connect the TED display unit to my home Linux server and archive energy data. On Thursday, I received a new TED on loan that I could use for the energy efficiency experiment I had in mind. Even better - with the Victoria Day long weekend, I figured I had plenty of time to install the metering unit and test out communications over the power lines to the display unit.

Then I opened my electrical panel and found this:



The main service conductors come into the panel from the top left, just below the yellow sticker. These bad boys are thick and sturdy, and placed far too close together for me to get the CTs clamped on. Better yet, I can't de-energize these cables in order to feed them through the CTs - that would have to be done on the BC Hydro side of the circuit.

So that's it - the monitoring experiment case is closed. BC Hydro does offer reasonable historical billing data, however, so I may still grab coincident weather data and run a quick model to check the efficiency of my home. The TED will be passed on to some lucky colleague at work to play with - I hope their electrical panel is easier to work with than mine!

Friday, May 09, 2008

email purgatory

As mentioned in a previous post, I'm a fan of desktop search engines like Google Desktop. With files at home and at work numbering in the tens of thousands, I can't imagine any other way of quickly finding what I'm looking for. I often still organize files into directories by project or category, but also often find that even if I remember exactly which folder I've placed a file in, it's faster to retrieve it via search rather than drill down five folders to get it.

I used to have rich hierarchies of folders for storing work email messages, but now I rely primarily on just two - Keepers and Purgatory. The vast majority of messages I receive at work are either (a) messages I can scan and delete immediately, (b) messages I wish to keep for reference forever, or (c) messages I wish to keep for reference but which have a limited "shelf life". The realization that this third "Purgatory" category exists has helped me prune down my inbox tremendously. I found that I wanted to keep many messages with information that was useful over a span of weeks or months, but that after that period, the information was "stale" and no longer needed. I now drop such messages into the Purgatory folder, which has a simple Outlook archive rule - delete all messages older than 6 months. This sliding 6-month window lets me keep messages while they are useful and prunes out those that are not.

A final note - a number of new tools are arriving to help people organize and find their email, one of which is Xobni, an extension for Outlook which uses social connections gleaned from your email to help you find messages and information about contacts. The Xobni Insight extension beta recently went public, but I had a chance to participate in the private beta a few months ago. My verdict? Although the organization by social context was cool and the email stats were spiffy, I still found my self gravitating towards Google Desktop to find messages. If I wanted to find a recent message from Joe, I found it faster to hit CTRL-CTRL to bring up the Google Desktop search box and type "from joe" rather than find the contact in the Xobni sidebar in Outlook.


Sunday, May 04, 2008

franken-coder

My first computer (everyone has impossibly fond memories of their first computer, don't they?) was a Radio Shack Colour Computer. With 16k of RAM. For the kids out there, that's not a typo - I'm talkin' 16 kilo-bytes of space to drop code into. Friends helped me double that to 32k by piggy-back soldering additional DIP-style RAM chips on top (except the address/strobe pin - that we bent and connected to some address/strobe line on motherboard). Yup - those were the days...

The reason I wanted a computer in the first place was so that I could code - I'd been fascinated with the concept of creating my own programs ever since buying a book on programming in Basic and reading it cover-to-cover the year before. I messed around with several little programs from that book (like Hunt the Wumpus) but the program I was most proud of creating was a Tron-style light cycles game for two players. I remember it taking forever to get several timing delays just right!

I never really coded much after leaving high school. Sure, there were Fortran and Pascal and other courses that were part of the standard engineering program, but those courses were never fun in the way coding the Tron light cycles game was. Assigned projects were just that - assigned by someone else, to write a program that didn't "scratch an itch" that I had myself.

The release of Google App Engine has sparked my interest in programming again, especially a style that I'll call franken-coding. I expect many see cloud platforms such as GAE as low-cost ways to host the comprehensive applications that they wish to write and deploy. I'm more interested in the new kinds of mini-applications that will be enabled by the zero-cost approach taken by GAE. Think Unix-style tools, but for the cloud - the equivalent of grep, cat, etc, but for web applications. While some folks may draw satisfaction writing everything from scratch, I'm quite happy to stitch together such mini-applications to accomplish a task.

I'm tempted to make my first GAE application a Tron light cycles game...

Saturday, April 12, 2008

embedded google app engine



Sometimes I get the craziest ideas whilst talking an hour-long walk by my lonesome. Here's one of them.

The recent launch of Google App Engine has all a-twitter about the possibilities offered by cloud computing - especially when the starting cost is zero. I heard the news the day after GAE launched, which means I was half a day late in trying to get one of the 10,000 accounts open during the initial beta release. I did notice, however, that anyone could download the GAE SDK, which, as one of the GAE help pages says:

...includes a web server application that simulates the App Engine environment, including a local version of the datastore, Google Accounts, and the ability to fetch URLs and send email directly from your computer using the App Engine APIs.

The SDK runs on any computer with Python 2.5 and comes packaged for Windows, Mac OS X and Linux.

These facts were tumbling in my mind during my walk when I switched to thinking about embedded device projects I'd like to work on sometime. I switched to Tomato firmware on my Linksys WRT54G router some time ago, and I've been interested in turning a Linksys NSLU2 (aka "the slug") into a Linux box for dedicated applications (web server, iTunes shared library, etc).

Then the statement above from the GAE help page popped into my head: "The SDK runs on any computer with Python 2.5...". Wait, WHAT? A few quick Google searches later confirmed what I expected: several Linux distros for embedded devices have optional Python 2.5 packages. Another quick check shows that the GAE SDK is less than 3MB in size.

Question: does this mean I can essentially run the GAE development environment on an embedded device? And if so, what the heck would you use that for? I'm not sure yet, but it would be fun to be one of the first to run GAE applications on my router...