I2C using USI on the MSP430

I've released a tiny library that implements I2C master functionality for MSP430 chips that have the USI module (MSP430G2412 or MSP430G2452 are ones that I use often).

The code is on GitHub, it is MIT-licensed, so you can do whatever you want with it.

From the README:


  • Small.
  • Works.
  • Reads and writes.
  • Implements repeated start.
  • Uses the Bus Pirate convention.


I wrote this out of frustration. There is lots of code floating around, most of which I didn't like. TI supplies examples which seem to have been written by an intern and never looked at again. The examples are overly complex, unusable in practical applications, ugly and badly formatted, and sometimes even incorrect.

The MSP430G2xx2 devices are tiny and inexpensive and could be used in many application requiring I2C, but many people avoid them because it is so annoyingly difficult to use I2C with the USI module.

This code is very, very loosely based on the msp430g2xx2usi16.c example from TI, but if you compare you will notice that:

  • the state machine is different (simpler): see doc/usi-i2c-state-diagram.pdf for details,
  • it actually has a useful interface,
  • it is smaller.


This is a simple I2C master that needs to fit on devices that have 128 bytes of RAM, so scale your expectations accordingly. There is no error detection, no arbitration loss detection, only master mode is implemented. Addressing is fully manual: it is your responsibility to shift the 7-bit I2C address to the left and add the R/W bit.

Have fun!

Bus Pirate Reference Card

The Bus Pirate from Dangerous Prototypes is a really useful tool. I use it regularly to bring up and test new I2C devices, or as a cheap protocol analyzer for I2C and SPI.

Unfortunately, the supplied cables aren't clearly labeled (only color-coded), and I found the reference cards available on the net lacking: they were usually not clearly readable and didn't help much. So I created my own.

You can print it on A4 paper and laminate it (which is what I did), or cut away just the upper color-coding portion and make a smaller laminated card.

Enjoy — Bus Pirate Reference Card (PDF)

Kinetis K and L Series Power Consumption

While considering a new microcontroller to use, I looked at the power consumption figures for the Freescale Kinetis K and L Series. Having some experience with various MSP430 MCUs, I am used to shaving off microamps and running systems on battery power, sometimes for years.

While the real answer can only be gotten with a real setup, one can get some preliminary information from the datasheets. And I found some surprises lurking there.

I was mostly interested in comparing three MCUs: KL05, KL25 (basically a KL05+USB) and the K20. The K10 which appears in the figures can be thought of as the K20 without USB, and I didn't expect its power figures to be any different from the K20.

The ARM Cortex M0+ has a range of power modes, so comparisons aren't easy. I left out those that basically amount to a full power-down (losing the contents of the RAM). The numbers do not include analog supply current, and are taken at 3V, 25C. The RUN values are at full core frequency (48MHz) with a while(1) loop executing from flash. VLPR and RUN values are with peripheral clocks enabled. Let's look at the first four modes:

As expected, the Cortex M4 core draws significantly more current when running at full speed. About three times more than Cortex M0+, in fact. Note that this doesn't necessarily mean the final product will use more power — if you draw three times more current, but get your computations done three times as fast, you're basically even.

Note what begins to happen in STOP mode, though. The power draw for all chips is nearly the same. Continuing on into the lower-power modes (note the change of scale to 5µA max):

Apart from the strange discrepancy with KL25 in VLPS mode, all numbers are nearly identical.

This is not something I expected — although in retrospect, it makes sense: a stopped core and stopped peripherals consume little to no current, and you can stop nearly everything on all MCUs. Still, this has some real-life implications.

My takeaway from this is:

  • if your application spends most time in deep sleep modes, then it doesn't really matter which MCU you choose, from the power consumption point of view they are nearly the same,

  • if you use the CPU a lot, the choice isn't at all clear, and will really depend on the application and its computation patterns.

For comparison, let's look at the MSP430F5510, a chip comparable (from the application point of view) with the KL25, though the CPU is significantly less powerful. Values are in mA, taken at 3V, 25C, RUN values at full frequency (25MHz) running from flash:


The interesting fact here is that the F5510 at 25MHz running code consumes about as much current as the KL05 or KL25 running at 48MHz. ARM Cortex M0+ really is a very power efficient core. It makes sense: the MSP430 is a rather old design. But note how years of experience allow TI to achieve the impressive LPM0 number: 83µA. LPM0 is the lightest sleep mode on the MSP430, roughly comparable to WAIT mode (core stopped, peripheral clocks active, ability to wake up on any interrupt). When you get down to deep-sleep modes (LPM3), the numbers become more comparable.

Take all this with a grain of salt, as it is only based on datasheet numbers. Still, I found the data interesting and worth sharing.

TI MSP430 vs Freescale Kinetis: a price comparison

I've been using MSP430 microcontrollers for a while now, but I wanted to look into something more powerful, but not necessarily much more expensive. I believe component cost is a major factor when designing electronics (for many reasons).

I found the Freescale Kinetis line of devices, which looks really good capability-wise. So let's try to compare the pricing of (some) MSP430 devices and (some) Kinetis K and L series devices, with a Tiva-C (Stellaris) thrown in for comparison.

Here's the table of prices compiled from Farnell. The prices are in PLN, but it really doesn't matter that much, I cared mostly about the relative pricing. Divide by 3 to get USD if you need to. All prices are from the first price break (usually qty 10), and I only chose devices that make sense to me in my projects.


Things I noted:

  • The low-end Kinetis KL05 devices (MKL05Z32VFK4, MKL05Z32VFM4, MKL05Z32VLC4 and MKL05Z32VLF4) are almost the same price as the tiny MSP430G2412 (~$1.60-$2 or so). That is amazing, considering they are modern 48MHz 32-bit ARM processors, have 4 times the flash memory, 16 times the RAM, and many more peripherals (G2412 has no ADC).
  • It makes little sense to use the MSP430G2553 unless you already have designs using it (I do).
  • The cheapest USB-capable device in the kinetis line is the KL25 (MKL25Z128VFM4). It is slightly more expensive than the USB-capable MSP430F5502, but with much more flash and RAM. Also, Farnell doesn't sell cheaper KL25 devices, I suspect they would end up being the lowest cost USB solution.
  • The first K10 (MK10DN32VFM5) at roughly $2.50 is a 48MHz Cortex-M4 core with DSP instructions.
  • The first K20 (MK20DN32VLF5) is only 3 times as expensive (at $3.65) as the tiniest MSP430 I use. That is also amazing.
  • Tiva-C (TM4C123AH6PMI) is only price-competitive if you need large memory sizes and floating-point. Sadly, it makes zero sense to me as a hobbyist — also because I could not find Code Composer licensing terms for Tiva-C devices. For MSP430 there is a 16kB memory limit.

My key takeaways from this:

  • I will switch to Freescale Kinetis (KL05, KL25 and K20) for most of my new projects. The chips are good, inexpensive, and the tools are free (CodeWarrior is free up to 128kB, which covers all low-end devices I might want to use).
  • I will continue to use MSP430 in either legacy projects, for really simple projects, or for designs where every microamp counts (another blog post on power consumption of MSP430 vs Kinetis is coming).
  • Sadly, the Tiva-C (formerly Stellaris) which I had high hopes for, isn't an option at all. It's way too expensive plus it isn't clear if developer tools will be free.

Home-made reflow oven for SMD

I wanted to build electronic devices with SMD components. From what I can see, many people shy away from SMD, trying to solder QFN components with a soldering iron and fighting the trend. I have no idea why.

I find SMD components easy to work with, cheap and small. If you restrict yourself to components 0603 or larger (I use mostly 0603 in my designs) and don't try to use BGA components, you'll be fine. Even the dreaded QFN packages aren't a problem at all.

For reflowing boards you can just get solder paste in a syringe, apply it manually, place your components manually, and then reflow the board. For tiny boards even a hot-air soldering station is enough, for larger ones, it turns out you can get decent results from a cheap assembly involving a tiny oven, a thermocouple (to measure the temperature), a thermocouple interface chip (I used the MAX31855), an SSR (solid-state relay) and a TI MSP430 Launchpad to control it all.

See the graph: it's a plot of temperature vs time. Sure, it isn't perfect, but it's good enough for amateur work.

EuroClojure 2012 impressions

I just got back from the EuroClojure 2012 conference. I won't try to summarize all the talks here, just convey some general impressions that I got:

  • Clojure community is awesome. People are incredibly nice to each other. Ask a question on the mailing list, and you'll get a number of replies from people much smarter and more experienced than you. Same at the conference: you can approach anyone and expect to get great advice even if the competence gap between you and the other person is comparable in size to the Grand Canyon.

  • A lot of the talks focused on how to approach difficult and complex real-life problems better. Those weren't talks about syntactic sugar, "best practices", or new "features" the language should have. Instead, speakers presented results of months of thinking and experimenting: new architectures, new approaches, new ways to think about problems. If semicolons were discussed, it was a discussion about how to preserve them while doing source-code transformations. This is incredibly important: you can't overestimate the value of listening to smart people talk about ideas they thought hard about and developed for many months. I had several "aha!" moments where I suddenly saw that the architectures I developed were a poor-man's subset of a more general solution.

  • There was a focus on building real systems. I'd say that hobbyists and academics were in the minority: most people were there to learn how to better build code that makes money. It was also interesing to see the range of sizes of companies that use Clojure: from one-person consultancies through startups and small web-development shops, all the way through large financial institutions.

  • The median age of participants was probably between 35 and 40 years. This is clearly not a bunch of teenagers, but rather a group that gained significant experience in various languages and then moved on to Clojure. I think this has a lot to do with my first point — the community is both incredibly nice and mature, which often go together.

Overall, the conference was a success — more so than I expected. The organization was nearly flawless (not an easy task, I know, so hats off to Marco Abis). And it is now very clear than a European conference about Clojure is necessary. I'm looking forward to EuroClojure 2013!

Fixing a Tektronix 2246A oscilloscope

I bought an (obviously used) Tektronix 2246A oscilloscope at an auction site. It wasn't expensive, it looked fairly good and the seller was nice and provided start-up warranty.

After getting the scope it turned out that channel 3 was dead. Channels 1, 2 and 4 worked fine, but channel 3 would just draw a flat line, no signals registered.

I negotiated a discount deal with the seller and set on to repair the thing. Thank heavens for the extremely detailed service manuals! The manual for the 2246A Mod A wasn't difficult to find. I opened the enclosure and after performing the standard self-test procedures suggested by the manual started following the channel 3 signal path with an old scope I had.

Tektronix 2246A main board

I quickly discovered that the input signal was fine right until it entered the preamp IC (U230), a custom Tektronix chip. The inputs were fine, the output was flat. I was rather disappointed, as a failed IC would mean I'd need to find a replacement, which would be neither easy nor inexpensive.

But -- I then supplied the same signal to channel 4 and started comparing the two paths. According to the schematics, they should be identical. When I got to the preamp IC for channel 4, I started comparing the preamp ICs for both channels pin by pin. And… it turned out that the enable signal for U230 wasn't there!

One of these four solder joints is unlike the others...

The enable signals are generated in the U600 ("slow logic ic") chip, at the back of the main board. This is where things got weird, as the service manual I had diverged from the board I had in front of me. Not sure why, but clearly my main board was very different, with U600 and U602 placed in completely different locations and rotated. But, with some detective work I managed to figure out which chip is the U600 "slow logic ic". And the enable signal for channel 3 preamp was clearly there on the chip's pin. So I started following the signal and bingo! One of the resistors on the CH 3 EN signal path wasn't soldered at all!

Now, I'm not sure how this is possible -- would a cold solder joint produce such an effect? But clearly the connection was broken. Fortunately, this was easily fixable with a soldering iron, and a couple of minutes later I had my scope with all 4 channels working just fine.

Given that I paid about $200 for the scope, I'm quite happy with the end result.

[Posts like this one are written for search engines: one day someone might be looking for repair tips and find this page. I hope it helps someone.]

Making your Targus Bluetooth Presenter actually usable

Here's a tip that will come in useful if you'd like to use a Targus Bluetooth Presenter (AMP11US or AMP11EU) with your Mac.

It seems that the Targus wireless presenter remote (an otherwise nice device) was designed by a committee of morons, none of which actually ever gave any Keynote presentations.

Apparently someone at Targus said that the buttons are supposed to be for "Next Slide" and "Previous Slide", which other people took literally, so the buttons just jump over to the next slide, skipping any builds or transitions that you might have in place. All you can have is flat slides. Goodbye builds, goodbye special effects, goodbye bullet points, goodbye movies. The buttons generate "Shift-CursorDown" and "Shift-CursorUp", forcibly skipping over anything that isn't a full slide.

Am I being unnecessarily harsh calling the designers "morons"? I don't think so. If you design a device whose only purpose is to facilitate presenting, and then you create a version specifically for the Mac (I quote from the Targus web page: "the only wireless presenter dedicated to Mac users") to be used with Keynote — is it too much to ask that you design it so that the two keys on the device actually perform useful functions? I mean, seriously — two keys, next step and previous step, how hard is that?

It's also rather clear that most "reviews" that you can find online are junk and the "reviewers" haven't actually used the device to perform presentations.

Fortunately, there is a solution. There is a small, free utility called KeyRemap4Macbook. Download it, install it (requires a restart), then go into your Mac OS X Preferences, access the KeyRemapper panel, and from within its last settings pane access the private.xml file that stores custom key mappings.

Once you get there, enter the following:

<?xml version="1.0"?>
      <name>Targus Wireless Presenter Keynote Fix</name>
      <autogen>--KeyToKey-- KeyCode::CURSOR_DOWN, VK_SHIFT, KeyCode::CURSOR_RIGHT</autogen>
      <autogen>--KeyToKey-- KeyCode::CURSOR_UP, VK_SHIFT, KeyCode::CURSOR_LEFT</autogen>

Save the file, go back to the first pane of the KeyRemap configuration and click "Reload XML". You might also want to check the box that says "Don't remap an internal keyboard".

And there you go — what this does is remaps the useless key combinations that the Targus Presenter generates to simple "cursor right" and "cursor left", which do the right thing in Keynote.


Check your provider's spam reputation before signing up

Before you choose a hosting provider, always check their reputation with Spamhaus.

I rent virtual servers with Bluemile (formerly also Fivebean). This morning I was greeted with bouncing E-mail, and a quick check showed:

Ref: SBL99441 is listed on the Spamhaus Block List (SBL) 30-Nov-2010 08:26 GMT | SR02

bluemilenetworks.com (escalation)

BLUEMILENETWORKS.COM ignores spam complaints, hosts spammers including known spam operations (ROKSO), assigns non-SBL'd IPs to spammers who get their assigned IPs listed in SBL, provides snowshoe spam configurations, fails to provide rwhois information as required by ARIN (thus providing anonymity for spammers) and generally acts like a network unconcerned with its mailing reputation. Spamhaus thereby treats it accordingly.

Here's a link to the actual updated spamhaus page, which might be different when you look it up. Notice that's a /20 block that is being blocked — I can't do anything about it!

I reported this to Bluemile support, who were completely unconcerned. They would deal with it once an engineer comes to work in the morning (business hours). Well, several hours have passed, then several business hours have passed, and there are no results to be seen. Meanwhile, almost all my outgoing e-mail keeps bouncing, which I have zero control over.

I am not a spammer. I take extensive care to make sure my servers never relay any E-mail. And yet here I am, listed in the SBL because I didn't check my provider's reputation.

I am really angry. And for those of you who suggest changing providers: sure, but moving mail and DNS servers is not that easy. It takes time and effort.

One thing is sure: the next time I look for a hosting provider, I will check their IP ranges and check with spamhaus (and other lists, possibly) to see if they are a spammer haven. I don't want to have anything to do with providers that are.

Drobo and DroboShare — a review

Executive summary: don't buy it.

Convinced by people on podcasts (mostly TWiP and This Week in Tech) raving about how great the Drobo (from Data Robotics) storage device is, I decided to budget two into a project I'm working on. Expectations were high — Drobo marketing pushes the devices as easy to use, reliable and flexible. Being a Mac user, I expected an "Apple experience": plug it in and forget it's even there.

Nothing could be farther from the truth.

To begin with, the Drobo is Loud. Not just "loud", but REALLY LOUD. And it isn't the drives, it's the fan that cools the whole thing. To give you an idea of what I mean by Loud, one single Drobo with ultra-quiet WD Green drives spun down is louder than my 8-core Mac Pro with 4 drives and an army of fans in it. It's that loud. To make matters worse, the fan in the Drobo turns on very frequently, even when the drives have been spun down for hours. I don't know why, as the drives are very cool to the touch.

You won't want to have a Drobo under your desk, or anywhere in your vicinity, trust me. And that means the fancy fast FireWire-800 interface that you just paid for is pretty much useless. I used a DroboShare to setup my Drobo in a remote location where I can't hear it.

The DroboShare comes with Gigabit Ethernet, as the marketing will point out. What they won't point out is that it connects to your Drobo with a USB cable, which (together with SMB) pretty much limits your transfer speeds to about 5-8MB/s. That's about 6 times slower than when connected via FireWire-800.

What you should also know is that using the DroboShare will provide its own annoyances. As an example, I found it impossible to create a sparsebundle disk image for use with SuperDuper on the Drobo. Go figure. SMB introduces other annoying problems, too — I couldn't copy my music collection onto the Drobo, because some filenames had non-ascii characters in them.

But all of the above are merely inconveniences. The real issue is with reliability. I bought the Drobo so that I can trust it with my data and forget about failing drives and losing data. Which is why I was slightly miffed when Drobo Dashboard kept crashing on me and reporting unreliable data, annoyed when it hung in the middle of the night when doing my first real backup, slightly angry when support told me my Drobo is defective and needs to be replaced, and really pissed off when the second unit I got corrupted my volume and lost data (when connected to a DroboShare). And then Data Robotics support asked me... whether I have a backup. Or a copy of DiskWarrior.

I have so far been through TWO Drobo replacements. Despite my asking, Data Robotics was unwilling to provide an upgraded (better) unit.

What's worse is that now I don't trust the Drobo at all. I looked closer: the DroboShare seems to use the plain Linux support for HFS+ that is known to be shaky. There is NO FSCK (Filesystem Check) program for HFS+ at all! Data Robotics will tell you that you can switch your Drobo between a Mac and DroboShare and you will be ok — but that seems to be exactly what resulted in my data corruption problems.

Then there is Data Robotics support. When you make "reliable data storage devices", you really need to have support that cares about customers, reads their emails and responds instantly. Responding after one business day is not enough. Given that support people will forget what was written before, or begin by asking what your address is and when you bought your Drobo, it will easily take a week before you get to the real issue.

What you should also realize is that when your Drobo unit fails, there is no way for you to read data off the drives. You need a working Drobo unit to do that, and it has to recognize the filesystem and mount it.

I bought a Drobo so that I can have reliable data storage without worrying about reliable data storage. The net effect was that I got an unreliable solution that I have to manage, worry about and spend time and money on. That's a failure in my book. I will never buy another Drobo unit again.

[... the above was been drafted, and then 3 months passed ...]

So, today my volume (drobo mounted via a droboshare) unexpectedly disappeared on my Mac. Investigation of the DroboShare logs shows:

MOUNT HFS+ : s_id = [sda1]
scsi: unknown opcode 0xea
SCSI error : <2 0 0 0> return code = 0x70000
end_request: I/O error, dev sda, sector 4533105544
Buffer I/O error on device sda1, logical block 566638188
SCSI error : <2 0 0 0> return code = 0x70000
end_request: I/O error, dev sda, sector 4533105552
Buffer I/O error on device sda1, logical block 566638189
SCSI error : <2 0 0 0> return code = 0x70000
end_request: I/O error, dev sda, sector 4533105560
Buffer I/O error on device sda1, logical block 566638190
SCSI error : <2 0 0 0> return code = 0x70000
end_request: I/O error, dev sda, sector 4533105568
Buffer I/O error on device sda1, logical block 566638191
SCSI error : <2 0 0 0> return code = 0x70000
end_request: I/O error, dev sda, sector 4533105576
Buffer I/O error on device sda1, logical block 566638192
SCSI error : <2 0 0 0> return code = 0x70000
end_request: I/O error, dev sda, sector 4533105584
Buffer I/O error on device sda1, logical block 566638193
SCSI error : <2 0 0 0> return code = 0x70000
end_request: I/O error, dev sda, sector 4533105592
Buffer I/O error on device sda1, logical block 566638194
SCSI error : <2 0 0 0> return code = 0x70000
end_request: I/O error, dev sda, sector 4533105600
Buffer I/O error on device sda1, logical block 566638195
SCSI error : <2 0 0 0> return code = 0x70000
end_request: I/O error, dev sda, sector 4533105608
Buffer I/O error on device sda1, logical block 566638196
usb 1-1: USB disconnect, address 2
SCSI error : <2 0 0 0> return code = 0x70000
end_request: I/O error, dev sda, sector 4533105616
Buffer I/O error on device sda1, logical block 566638197

Buffer I/O error on device sda1, logical block 270838
scsi2 (0:0): rejecting I/O to dead device
Buffer I/O error on device sda1, logical block 270838
scsi2 (0:0): rejecting I/O to dead device
Buffer I/O error on device sda1, logical block 276472
scsi2 (0:0): rejecting I/O to dead device
Buffer I/O error on device sda1, logical block 276472
scsi2 (0:0): rejecting I/O to dead device
Buffer I/O error on device sda1, logical block 422806275
Buffer I/O error on device sda1, logical block 422806276
Buffer I/O error on device sda1, logical block 422806277
scsi2 (0:0): rejecting I/O to dead device
scsi2 (0:0): rejecting I/O to dead device
scsi2 (0:0): rejecting I/O to dead device

Drobo Dashboard doesn't launch, console shows me crash logs for the ddserviced daemon, which crashes every 10 seconds or so. Reinstalling drobo dashboard doesn't help.

I am so tired. I bought the Drobo so that I can save time, not so that I can run around and service it all the time, jumping through hoops set up by "support" from Data Robotics. I can already see how I'll have to spend several hours debugging the problems, dealing with support, reinstalling things.

I am posting this so that people are warned. Hopefully people will google for "Drobo" before buying it and I will save someone the hassle and frustration.

Will I lose data again this time?

Don't buy a Drobo.

Who's the sheep?

Cory Doctorow tells us we shouldn't buy iPads. Others join him, whining about how iPad makes us all consumers, sheep, or worse, and how we are headed for a future similar to the one in Idiocracy, where we won't be able to do much except consume digital media.

To all those who complain about how un-hackable the iPad is: what have you hacked recently? Have you actually modified any hardware? Written interesting new software for an existing device? Released anything as open-source perhaps?

Well guess what: I have. I have been using Linux for 15 years, on desktops, laptops, handhelds, servers, tablets and embedded devices. I compiled software, fixed bugs, wrote drivers, improved things. I took to my HP-48G with a soldering iron and expanded the memory. I struggled with Linux on a Sharp Zaurus because I believed in an open device. I had to reverse engineer a Fujitsu tablet and write a Linux driver for a microcontroller that serviced the keys and orientation sensor, just so that I could use Linux on that tablet.

And you know what -- life is too short. I'll be buying an iPad so that I can work on more interesting things than making my hardware work properly. I'll use the device to jot down ideas, read articles, write notes, create presentations, sketch diagrams.

I'm not "losing" anything by buying and using the iPad. Just as I don't have to tinker with the jet engine of the airplane that will take me to London, I don't have to tinker with the internals of the iPad. If I want to tinker and hack, I can build a model airplane or an ultralite. In the computer world, there is Arduino, OpenMoko, and many other similar projects. Tinker and hack to your heart's delight and get educated about how electronics and software work on every level.

But I wonder -- why aren't you hacking and tinkering? Where are those "cool ideas from the creative universe" that you need so badly to give to me to run on my hardware?

More importantly, why aren't you designing something better?

Look at you: you have to actively convince people not to buy iPads. This means the product is so good and people want it so badly, that you have to fight the trend. So why hasn't anyone invented and designed a product that is this good and ships with full schematics and has this all-open architecture you crave?

Why haven't you?

If you actually wanted to write software for your iPad, instead of writing lengthy articles complaining about stuff, nothing prevents you from doing so. Just download the SDK and off you go. Yes, you will need Apple's acceptance to sell your app in the App Store, but it's all about ideals, isn't it, so no worries.

I know. It's easier to complain. But who's the sheep now?

Dear American Website Owner

You live in the United States of America. You design all your forms to have a mandatory "State" field. And then you decide it might just be a good idea to sell to the other 95.4% of the world. But you know, most of the world does not use the concept of a "State" all that much.

The moment you put a "country" field in your form, two things should happen:

  • you should remove the State field if the country isn't set to U.S.A. or at least make it optional
  • you should stop insisting on NANPA-formatted phone numbers (NNN-NNNNNNN)

I write this after wrestling with a number of unbelievably stupid web forms, all of which required me to provide a "State" name (I don't have one), choose from a list of states, or provide a fake phone number just to satisfy a stupid validator routine.

HTML5, H.264 and Free Software: it's the wrong game!

Two important articles appeared in the last few days, both elaborating on why Mozilla is reluctant to adopt the H.264 video codec. Both are well thought out, but Mozilla is playing the wrong game here.

The implied conclusion is that we should all switch to Theora, since that is unencumbered with patents. Well guess what — pretty much every algorithm used in modern video compression is patented. And there are only so many ways you can slice and 2D-DCT a macroblock. There is no reason to believe that Theora is somehow designed "around" all those patents. It might very well be impossible to create a video codec that doesn't infringe on something. This article has a much more realistic approach to the issue at hand.

The game to play is to either abolish the patent system altogether (it has outlived its usefulness), or to make patent claims on algorithms void and unenforceable. Simply avoiding H.264 just because the licensing situation there is sorted out won't get us anywhere. We'll end up adopting something else (be it Theora or On2 VPwhatever) and finding out about patent claims years later, once the codec becomes popular.

Mozilla, HTML5 and H.264

Robert O’Callahan, Mozilla Hacker, wrote an interesting article about why he believes Mozilla should not support the H.264 format.

Other issues aside, I don’t understand why supporting a proprietary Flash plugin from a single vendor is better than opening support for a standardized (albeit similarly patent-encumbered) video format with open-source implementations.

x86 assembly encounter

Every couple of years I have an encounter with assembly programming. It's funny how rules that applied years ago are useless now. The most recent encounter lasted about two weeks and resulted in a 600x speedup in a critical function. But, all wasn't nice and rosy: it was more difficult than I initially planned, took more time and provided a few surprises.

Key takeaway points, so that I can remember them and so that people googling for answers may find them:

  • If you're looking for the PSRLB (parallel shift right logical bytes) SSE instruction, it isn't there. But there are two ways around it: you can either shift words using PSRLW and then mask out the higher bits, or for shifts with a count of one, use (xmm14 contains 1 in every byte and xmm15 is 0):
   psubusb xmm0, xmm14
   pavgb xmm0, xmm15
  • If you need to "horizontally" sum 16 bytes in an XMM register, you will find that the PHADDB instruction doesn't exist, either. There is PHADDW and you could use that in combination with PMADDUBSW (multiply-add bytes to words), but the resulting sequence of instructions is far from optimal. Fortunately, there is a trick: use PSADBW. This computes the sum of absolute differences, which if you use 0 as the source parameter will correspond to your sum, and stores it in two quadwords, which gets you halfway there. In my case, I simply accumulated the results using two quadwords per register, and combined them at the end.
  • There is a nice PMOVMSKB instruction which converts a byte mask to a bit mask. But why, oh why isn't there an instruction which does the opposite? Extracting a 16-bit mask to a 16-byte register turns out to be painful.
  • The last time I programmed in x86 assembly was using a Pentium 4 with the infamous NetBurst architecture. It was an ugly, unpredictable beast, where a mispredicted branch could cost you a fortune in performance terms. It seems that with the newer Nehalem chips Intel really got things right -- latencies for most instructions are small and predictable and overall performance is more consistent across the board. There are fewer traps. And unaligned data accesses aren't penalized as badly as before!
  • LOOP is slower than
   sub rcx, 1
   jnz .loop

Go figure.

  • Thank God and AMD for FINALLY adding registers. Back in the P4 days it was ridiculous: having a 3GHz processor with only 6 usable general-purpose registers and 8 SIMD ones sounded like a joke.

And the final observation: just as several years ago, the state of x86 assemblers is a sad, sad affair. To use a construction industry metaphor, an average x86 assembler has the complexity and usefulness of a hammer, while the DSP world is using high-speed mag-rail blast-o-matic nail guns with automatic feeders and superconducting magnets. I mean, seriously, do I really have to manually track register allocations?! Manually reorder instructions and measure performance to see which arrangement is faster (hoping not to break any dependencies)? Manually update stack pointer offsets after pushing something onto the stack? Write prologs and epilogs for C-linkable functions myself?

If anybody is thinking about writing or improving an x86 assembler, take a look at what Texas Instruments provides for their DSPs. See how you can write "linear assembly" and have your compiler schedule VLIW execution units for you. See how you don't need a piece of paper with a huge table detailing which registers are used in which part of your code.

I find it ridiculous that the most popular computing platform in the world does not have a decent assembler. What's even worse, from the discussions I've seen on the net, people are mostly interested in how fast the assembler is (?!) rather than how much time it saves the programmer.

Anyway, the net result of this encounter is a function that is about 600x faster than the original C implementation. It is about 4x slower than the theoretical limit (calculated assuming only arithmetic ops, no overhead, no memory accesses, and 16 ops per cycle), which I'm very happy with.

x86 assembly, see you in several years!

UPDATE (22.12.2009): I wrote this post hoping that it will help people searching for the non-existing PSRLB instruction -- and it worked -- I can already see it in the logs!

Folder actions on Mac OS X: usable now?

AppleScript release notes for Snow Leopard (Mac OS 10.5.6):
Folder Actions now attempts to delay calling “files added” actions on files until they are done being copied. Previous versions called “files added” actions on new files as soon as they appeared in the file system. This was a problem for files that were being copied into place: if the file was sufficiently large or coming over a slow server link, the file might appear several seconds before it was done being copied, and running the folder action before it finished copying would behave badly. Folder Actions now watches if the file is changing size: if it stays the same size for more than three seconds, it is deemed “done”, and the action is called.

My experience with folder actions was that they are one big race condition waiting to bite you. It’s something all the tutorials conveniently glossed over. I kept wondering why Apple kept them if they are so obviously unreliable.

Hopefully that change, while far from correct, will make them usable.

Why I will steal music

Dear Music Industry Executives,

This is to explain why I will “steal” music using BitTorrent, eDonkey and any other easily available means.

I will not do it because I want to save money or because I’m cheap. Far from it. My 600-or-so CD collection packaged into boxes and stored in my basement should attest to it.

I’ve been trying to pay for music online. I really have. I wanted to use the iTunes store, but it doesn’t sell music or movies in my country. I tried to register as a US customer, but a US-based credit card is required to do that.

I managed to buy several albums from Amazon MP3 right when it opened, before it told me my money was not welcome (“Please note that AmazonMP3.com is currently only available to US customers”). Hulu told me its video library can only be streamed from within the United States.

I own a Sonos system, so I tried to get a Rhapsody subscription. But they didn’t want my money (“The Rhapsody MP3 Store is currently only available inside the United States”). Pandora didn’t want me either (“We are deeply, deeply sorry to say that due to licensing constraints, we can no longer allow access to Pandora for listeners located outside of the U.S.”).

Spotify was a glimmer of hope (it isn’t in the US!), until it told me that “Unfortunately, due to licensing restrictions we are not yet available in your country.”

So, in spite of my best efforts over the past several years, I have been unable to pay for music online. And frankly, I’m tired of trying. What difference does it make which country I’m in? Is my credit card any different from any other one? Are my dollars/euros of lesser value?

You have been playing your silly regional games and you think you can keep playing them forever. Make these people wait. Release the album here, see if it gets traction, then price it higher there. Regionalize DVDs to control releases and pricing. Well, the game is over.

From now on, I will have no qualms about downloading digital music. I will continue to buy from sources that want my money (Magnatune, artists like Ronald Jenkees). For everything else, I will just download it. It only takes a couple of minutes anyway.

So, next time you wonder about why your sales and profits are declining, remember — it’s because you didn’t want my money. And perhaps instead of complaining about P2P, hiring hordes of lawyers or buying expensive ad campaigns it is easier to simply start SELLING your stuff to people who want to pay for it.

GTD apps for the Mac: a subjective review

Having tried all major GTD apps for the Mac I thought I’d summarize my thoughts. While many people try to compare features, I would like to concentrate on a more subjective review. After all, a GTD app is something you use on a daily basis, so it isn’t just tables with features that matter.

Since I used all major GTD apps on the Mac extensively (e.g. I moved my entire life into each of them in turn), I think I’m qualified to form an opinion.

There are three major contenders in the Mac OS native application GTD arena:
* OmniFocus
* Things
* The Hit List

There used to be iGTD as well — but it has been discontinued now that its developer joined Cultured Code and works on Things. I used iGTD a long time ago, but found it too heavy on features and too crash-prone.

I should also probably mention TaskPaper, which while cool, isn’t really a full-blown GTD app.

Let’s go through each of the three in turn.

OmniFocus (The Omni Group) is the most mature of the apps. It was clearly developed with lots of user feedback. It is quite complex, with lots of user interface. However, I found that I’m spending lots of time on the mechanics of managing tasks instead of actually doing stuff. There is lots of clicking, tabbing and cursoring around to be had in OmniFocus. Plus there is that ubuquitous Omni inspector thing, which some people love and some people hate. I fall in the second category. I don’t like multiple window apps.

Things is carefully designed to look nice, which scores it a lot of marketing points. It also seems simple to use. I jumped onto it with enthusiasm, also buying the Things Touch iPhone app. But after several weeks problems became apparent. First, Things forces a structure upon you and that structure isn’t very well designed. There are projects, areas and „focuses”, which don’t really complement each other. In theory, Projects are for ordered, sequential lists of tasks, Areas for single-shot tasks and Focuses cut across them, letting you see which tasks you have to do immediately and which can wait. But if this is so, why can’t I schedule a task in a project to be done in the future?

The biggest problem with Things might seem inconsequential unless you realize this happens dozens of times a day. Let’s say I have a task in my Inbox. I know it belongs to a project and I need to start it today. I can either drag it to a project or drag it to „Today”, but in either case the task will disappear from my Inbox. I then have to hunt it down again, searching for it. This is a complete showstopper problem.

Until very recently Things also had no keyboard support at all — even the tabbing order seemed wrong. This has been improved in recent versions, but it is clear the developers never use the app without a mouse.

Things Touch was nice until I filled it with tasks. Then it became so slow that it was virtually useless. Unreliable syncing didn’t help either.

I then tried The Hit List — and after an hour moved my life into it and never looked back. It isn’t perfect, but it gets most things right. Here’s what I really like about the app, all of this is in contrast to the others:
* In the Inbox, you can drag things to „Today” and they still remain in the Inbox, which lets you then assign them to projects,
* There are lists and folders. You can use these lists as projects, areas, shopping lists, anything you want. No artificial distinction into „Areas” and „Projects”.
* Smart Folders let you organize tasks your own way (I have a „Stale tasks” smart folder that picks up untouched stale things for review).
* Insanely great keyboard support. Navigate to a task, press „F”, and then type several letters from any of your project names, press enter and your task gets moved. Similarly for jumping to projects, use „G” and type any subsequence of characters of your project’s name. I wish all apps had this nailed down so well.
* Great interface for repeating tasks. Press „Cmd-R” on a task, type „every week” and the task becomes a repeating one.
* Tabs that let you keep frequently used views easily accessible.
* Auto-suggested tags that really work (surprisingly).

Overall feeling after several weeks of usage was that I was on top of things. I could manage my tasks easily without spending too much time on the mechanics of it.

The Hit List seems to contain everything I wanted from OmniFocus, but with a much better interface. I just hope the author will keep improving it very carefully, without implementing every feature people ask for. In GTD apps, streamlined interface and usability are more important than features!

Roughly quoting Merlin Mann (43Folders.com): „asking which GTD app is better is like asking if mustard is better than ketchup”. Those are subjective choices, hence my subjective review.

Experiments with parallel genetic programming in Clojure

I’ve been experimenting with genetic programming, learning Clojure as I go. I came to the point where I wanted to make my program parallel.

First of all, I am amazed at how readable and concise the code turns out to be. As an example, take a look at this function:

(defn choose-reproducing-parents [individuals]
  (take 2 
    (sort-by :fitness > 
      (select-randomly 5 individuals))))

It doesn’t get more readable than this!

But the real joy came when I started to parallelize my code. Normally, the process would involve extra libraries, lots of fussing around with locks, and hours spent debugging deadlocks. So let’s look at an example function I needed to parallelize. Generating random code takes quite a bit of CPU time, especially if one needs to generate code for thousands of individuals. There is a function for that:

(defn create-random-individuals [number code-generator]
  (map create-individual-from-code (take number (repeatedly code-generator))))

And here’s the reworked parallel version:

(defn create-random-individuals [number code-generator]
  (pmap create-individual-from-code (take number (repeatedly code-generator))))

Can you spot the difference? Yes, that’s it. The little letter ‘p’ is all it takes for the work to be spread among all of my CPU cores.

Other functions required more work (again, notice how concise and readable the code is):

(defn produce-offspring [population number]
  (take number
	(repeatedly #(reproduce
		      (choose-reproducing-parents population)))))

For those unfamiliar with Clojure, repeatedly produces a lazy sequence whose elements are produced by the function (in this case an anonymous function) supplied as the argument. take simply takes the first number elements of a sequence.

And now for the parallel version:

(defn produce-offspring [population number]
  (pmap reproduce
	(take number
	      (repeatedly #(choose-reproducing-parents population)))))

Encouraged by this, I then moved on to parallelize the most time-consuming step of all GP programs: fitness evaluations. I’ll spare you the boring details of extended parallelization work I did on the function (but see examples above). The result was:

(defn test-generation [population fitness-function]
  (pmap #(set-fitness % (fitness-function %)) population))

After fitness was evaluated for all individuals, the next generation was produced and the fitness evaluations were run in parallel again.

This worked fine, but I thought there should be a better way. pmap limits me to a single multicore machine. This is fine for now, but in the future I plan to move to a distributed cluster, where the synchronous nature of map would be limiting. So I tried to write an asynchronous implementation.

First, I defined my pool: a collection where individuals are gathered once their fitness is evaluated:

(def pool (agent (vector)))

The pool is a Clojure agent. Agents are a synchronization primitive: you can send them actions (functions), which will be queued and executed in order.

As you can see, the pool initially starts as an empty vector. So how do individuals get to the pool? Their fitness needs to be evaluated, and then they need to be added to the pool. It all starts with this function:

(defn run-individuals [individuals]
   (map #(send % test-individual *fitness-function*)
	(map #(add-watcher (agent %) :send pool fitness-tested) individuals))))

First, we make each individual an agent, so that we can add a watcher to it. Watchers are a cool Clojure feature — they let you watch for state changes. Using add-watcher we add a watcher to each individual, telling it to send a fitness-tested action to the pool (which is also an agent, remember?). Then, once we’ve set up watching for state changes, we send a test-individual action to each individual, giving it the fitness function as a parameter. test-individual is a really simple function, all it does is call the fitness evaluation function and return the new state of the individual.

The dorun is necessary, because we’re dealing with lazy sequences and discarding the result (sending agent actions is a side effect). If the dorun wasn’t there, the entire sequence would never get evaluated and actions would never get sent.

Let’s see what happens once the pool is notified of a state change in an individual:

(defn fitness-tested [watched-population individual]
  (let [population (conj watched-population @individual)]
    (if (>= (count population) target-population-size)
      (let [new-population (prune-population population)]
	 (produce-offspring new-population
			    (- target-population-size (count new-population))))

First, we add the new individual to the pool. If we haven’t gathered enough individuals, we simply return the pool with the individual added — this updates the global pool.

The fun part begins when we have enough individuals to produce the next generation. Then, we prune the population, deleting the poorest individuals and produce a new batch of individuals, letting them go using the previously described run-individuals.

If you’ve done any parallel programming, you’ll probably worry about multiple threads modifying (pruning) the population simultaneously. Not to worry — Clojure agents are monitors, you are guaranteed that only one action will execute at a time.

We now have a fully asynchronous parallel GP implementation. Notice how there aren’t any queues, locks, thread pools to manage. All we have is a single global variable and a couple of simple functions. We don’t need any new data structures! The beauty of this solution is that because we’re using agents and watchers, Clojure does the queueing for us. Look, ma, no queues!

I am very happy with how easy and clean this solution turned out to be. I can now see why people keep raving about Clojure. Somebody finally did some serious thinking and implemented a new approach to parallel programming, not just a rehash of old ideas.

In less than an hour I went from a sequential implementation to a parallel asynchronous one. I’d say that’s impressive. And most importantly, the same code still runs on a single-cpu machine, with minimal performance impact. I am very impressed.