<?xml version="1.0" encoding="UTF-8"?>
<!--Generated by Squarespace Site Server v5.9.2 (http://www.squarespace.com/) on Sat, 13 Mar 2010 18:12:37 GMT--><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><title>Jan Rychter: Blog [EN]</title><link>http://jan.rychter.com/enblog/</link><description></description><lastBuildDate>Sat, 23 Jan 2010 14:55:59 +0000</lastBuildDate><copyright></copyright><language>en-US</language><generator>Squarespace Site Server v5.9.2 (http://www.squarespace.com/)</generator><item><title>HTML5, H.264 and Free Software: it's the wrong game!</title><dc:creator>Jan Rychter</dc:creator><pubDate>Mon, 25 Jan 2010 09:58:22 +0000</pubDate><link>http://jan.rychter.com/enblog/2010/1/25/html5-h264-and-free-software-its-the-wrong-game.html</link><guid isPermaLink="false">329406:3464523:6423343</guid><description><![CDATA[<p>Two <a href="http://weblogs.mozillazine.org/roc/archives/2010/01/video_freedom_a.html">important</a> <a href="http://www.0xdeadbeef.com/weblog/2010/01/html5-video-and-h-264-what-history-tells-us-and-why-were-standing-with-the-web/">articles</a> appeared in the last few days, both elaborating on why Mozilla is reluctant to adopt the <span class="caps">H.264 </span>video codec. Both are well thought out, but Mozilla is playing the wrong game here.</p>

<p>The implied conclusion is that we should all switch to Theora, since that is unencumbered with patents. Well guess what — pretty much every algorithm used in modern video compression is patented. And there are only so many ways you can slice and 2D-DCT a macroblock. There is no reason to believe that Theora is somehow designed &#8220;around&#8221; <strong>all</strong> those patents. It might very well be <em>impossible</em> to create a video codec that doesn&#8217;t infringe on <strong>something</strong>. <a href="http://lockshot.wordpress.com/2009/07/30/whats-the-problem-with-ogg-theora/">This article</a> has a much more realistic approach to the issue at hand.</p>

<p>The game to play is to either abolish the patent system altogether (it has outlived its usefulness), or to make patent claims on algorithms void and unenforceable. Simply avoiding <span class="caps">H.264 </span>just because the licensing situation there is sorted out won&#8217;t get us anywhere. We&#8217;ll end up adopting something else (be it Theora or On2 VPwhatever) and finding out about patent claims years later, once the codec becomes popular.</p>
]]></description><wfw:commentRss>http://jan.rychter.com/enblog/rss-comments-entry-6423343.xml</wfw:commentRss></item><item><title>Mozilla, HTML5 and H.264</title><dc:creator>Jan Rychter</dc:creator><pubDate>Sat, 23 Jan 2010 14:53:22 +0000</pubDate><link>http://jan.rychter.com/enblog/2010/1/23/mozilla-html5-and-h264.html</link><guid isPermaLink="false">329406:3464523:6406518</guid><description><![CDATA[<p>Robert <span class="caps">O&#8217;C</span>allahan, Mozilla Hacker, <a href="http://weblogs.mozillazine.org/roc/archives/2010/01/video_freedom_a.html">wrote an interesting article</a> about why he believes Mozilla should not support the <span class="caps">H.264 </span>format.</p>

<p>Other issues aside, I don&#8217;t understand why supporting a proprietary Flash plugin from a single vendor is better than opening support for a standardized (albeit similarly patent-encumbered) video format with open-source implementations.</p>
]]></description><wfw:commentRss>http://jan.rychter.com/enblog/rss-comments-entry-6406518.xml</wfw:commentRss></item><item><title>x86 assembly encounter</title><dc:creator>Jan Rychter</dc:creator><pubDate>Fri, 04 Dec 2009 14:59:48 +0000</pubDate><link>http://jan.rychter.com/enblog/2009/12/4/x86-assembly-encounter.html</link><guid isPermaLink="false">329406:3464523:5986822</guid><description><![CDATA[<p>Every couple of years I have an encounter with assembly programming. It&#8217;s funny how rules that applied years ago are useless now. The most recent encounter lasted about two weeks and resulted in a 600x speedup in a critical function. But, all wasn&#8217;t nice and rosy: it was more difficult than I initially planned, took more time and provided a few surprises.</p>

<p>Key takeaway points, so that I can remember them and so that people googling for answers may find them:</p>

<ul>
<li>If you&#8217;re looking for the <span class="caps">PSRLB </span>(parallel shift right logical bytes) <span class="caps">SSE </span>instruction, it isn&#8217;t there. But there are two ways around it: you can either shift words using <span class="caps">PSRLW </span>and then mask out the higher bits, or for shifts with a count of one, use (xmm14 contains 1 in every byte and xmm15 is 0):</li>
</ul>


<pre class="prettyprint">
   psubusb xmm0, xmm14
   pavgb xmm0, xmm15
</pre>



<ul>
<li>If you need to &#8220;horizontally&#8221; sum 16 bytes in an <span class="caps">XMM </span>register, you will find that the <span class="caps">PHADDB </span>instruction doesn&#8217;t exist, either. There is <span class="caps">PHADDW </span>and you could use that in combination with <span class="caps">PMADDUBSW </span>(multiply-add bytes to words), but the resulting sequence of instructions is far from optimal. Fortunately, there is a trick: use <span class="caps">PSADBW.</span> This computes the sum of absolute differences, which if you use 0 as the source parameter will correspond to your sum, and stores it in two quadwords, which gets you halfway there. In my case, I simply accumulated the results using two quadwords per register, and combined them at the end.</li>
</ul>

<ul>
<li>There is a nice <span class="caps">PMOVMSKB </span>instruction which converts a byte mask to a bit mask. But why, oh why isn&#8217;t there an instruction which does the opposite? Extracting a 16-bit mask to a 16-byte register turns out to be painful.</li>
</ul>

<ul>
<li>The last time I programmed in x86 assembly was using a Pentium 4 with the infamous NetBurst architecture. It was an ugly, unpredictable beast, where a mispredicted branch could cost you a fortune in performance terms. It seems that with the newer Nehalem chips Intel really got things right &#8212; latencies for most instructions are small and predictable and overall performance is more consistent across the board. There are fewer traps. And unaligned data accesses aren&#8217;t penalized as badly as before!</li>
</ul>

<ul>
<li><span class="caps">LOOP </span>is slower than </li>
</ul>


<pre class="prettyprint">
   sub rcx, 1
   jnz .loop
</pre>


<p>Go figure.</p>

<ul>
<li>Thank God and <span class="caps">AMD </span>for <span class="caps">FINALLY </span>adding registers. Back in the P4 days it was ridiculous: having a 3GHz processor with only 6 usable general-purpose registers and 8 <span class="caps">SIMD </span>ones sounded like a joke.</li>
</ul>

<p>And the final observation: just as several years ago, the state of x86 assemblers is a sad, sad affair. To use a construction industry metaphor, an average x86 assembler has the complexity and usefulness of a hammer, while the <span class="caps">DSP </span>world is using high-speed mag-rail blast-o-matic nail guns with automatic feeders and superconducting magnets. I mean, seriously, do I really have to manually track register allocations?! Manually reorder instructions and measure performance to see which arrangement is faster (hoping not to break any dependencies)? Manually update stack pointer offsets after pushing something onto the stack? Write prologs and epilogs for C-linkable functions myself?</p>

<p>If anybody is thinking about writing or improving an x86 assembler, take a look at what Texas Instruments provides for their <span class="caps">DSP</span>s. See how you can write &#8220;linear assembly&#8221; and have your compiler schedule <span class="caps">VLIW </span>execution units for you. See how you don&#8217;t need a piece of paper with a huge table detailing which registers are used in which part of your code.</p>

<p>I find it ridiculous that the most popular computing platform in the world does not have a decent assembler. What&#8217;s even worse, from the discussions I&#8217;ve seen on the net, people are mostly interested in how fast the assembler is (?!) rather than how much time it saves the programmer.</p>

<p>Anyway, the net result of this encounter is a function that is about 600x faster than the original C implementation. It is about 4x slower than the theoretical limit (calculated assuming only arithmetic ops, no overhead, no memory accesses, and 16 ops per cycle), which I&#8217;m very happy with.</p>

<p>x86 assembly, see you in several years!</p>

<p><strong><span class="caps">UPDATE </span>(22.12.2009):</strong> I wrote this post hoping that it will help people searching for the non-existing <span class="caps">PSRLB </span>instruction &#8212; and it worked &#8212; I can already see it in the logs!</p>
]]></description><wfw:commentRss>http://jan.rychter.com/enblog/rss-comments-entry-5986822.xml</wfw:commentRss></item><item><title>Folder actions on Mac OS X: usable now?</title><category>mac</category><dc:creator>Jan Rychter</dc:creator><pubDate>Thu, 01 Oct 2009 17:46:18 +0000</pubDate><link>http://jan.rychter.com/enblog/2009/10/1/folder-actions-on-mac-os-x-usable-now.html</link><guid isPermaLink="false">329406:3464523:5357824</guid><description><![CDATA[AppleScript <a href="http://developer.apple.com/mac/library/releasenotes/AppleScript/RN-AppleScript/RN-10_6/RN-10_6.html">release notes for Snow Leopard</a> (Mac OS 10.5.6):<br />
<blockquote>Folder Actions now attempts to delay calling “files added” actions on files until they are done being copied. Previous versions called “files added” actions on new files as soon as they appeared in the file system. This was a problem for files that were being copied into place: if the file was sufficiently large or coming over a slow server link, the file might appear several seconds before it was done being copied, and running the folder action before it finished copying would behave badly. Folder Actions now watches if the file is changing size: if it stays the same size for more than three seconds, it is deemed “done”, and the action is called.</blockquote>

<p>My experience with folder actions was that they are one big race condition waiting to bite you. It&#8217;s something all the tutorials conveniently glossed over. I kept wondering why Apple kept them if they are so obviously unreliable.</p>

<p>Hopefully that change, while far from correct, will make them usable.</p>
]]></description><wfw:commentRss>http://jan.rychter.com/enblog/rss-comments-entry-5357824.xml</wfw:commentRss></item><item><title>Adobe applications on Macs</title><dc:creator>Jan Rychter</dc:creator><pubDate>Mon, 28 Sep 2009 20:33:29 +0000</pubDate><link>http://jan.rychter.com/enblog/2009/9/28/adobe-applications-on-macs.html</link><guid isPermaLink="false">329406:3464523:5328788</guid><description><![CDATA[<p><a href="http://www.kungfugrippe.com/post/199148868/adobe-bricks">Merlin Mann</a>:</p>

<blockquote>Because, with Adobe apps, everything from installation through activation through re-activation through software updates through more re-re-reactivations through (HEY! more updates!) is like a giant rectal exam. That I paid for. </blockquote>

<p>Couldn&#8217;t have phrased it better myself.</p>
]]></description><wfw:commentRss>http://jan.rychter.com/enblog/rss-comments-entry-5328788.xml</wfw:commentRss></item><item><title>Why I will steal music</title><dc:creator>Jan Rychter</dc:creator><pubDate>Thu, 17 Sep 2009 19:21:02 +0000</pubDate><link>http://jan.rychter.com/enblog/2009/9/17/why-i-will-steal-music.html</link><guid isPermaLink="false">329406:3464523:5224443</guid><description><![CDATA[<p><span class="full-image-float-right ssNonEditable"><span><img src="http://jan.rychter.com/storage/spotify-not-available.png?__SQUARESPACE_CACHEVERSION=1253215679054" alt=""/></span></span>Dear Music Industry Executives,</p>

<p>This is to explain why I will &#8220;steal&#8221; music using BitTorrent, eDonkey and any other easily available means.</p>

<p>I will not do it because I want to save money or because I&#8217;m cheap. Far from it. My 600-or-so CD collection packaged into boxes and stored in my basement should attest to it.</p>

<p>I&#8217;ve been trying to pay for music online. I really have. I wanted to use the iTunes store, but it doesn&#8217;t sell music or movies in my country. I tried to register as a US customer, but a US-based credit card is required to do that.</p>

<p>I managed to buy several albums from Amazon <span class="caps">MP3 </span>right when it opened, before it told me my money was not welcome (&#8220;Please note that AmazonMP3.com is currently only available to US customers&#8221;). Hulu told me its video library can only be streamed from within the United States.</p>

<p>I own a <a href="http://sonos.com/">Sonos</a> system, so I tried to get a Rhapsody subscription. But they didn&#8217;t want my money (&#8220;The Rhapsody <span class="caps">MP3</span> Store is currently only available inside the United States&#8221;). Pandora didn&#8217;t want me either (&#8220;We are deeply, deeply sorry to say that due to licensing constraints, we can no longer allow access to Pandora for listeners located outside of the <span class="caps">U.S.</span>&#8221;).</p>

<p>Spotify was a glimmer of hope (it isn&#8217;t in the US!), until it told me that &#8220;Unfortunately, due to licensing restrictions we are not yet available in your country.&#8221;</p>

<p>So, in spite of my best efforts over the past several years, I have been unable to pay for music online. And frankly, I&#8217;m tired of trying. What difference does it make which country I&#8217;m in? Is my credit card any different from any other one? Are my dollars/euros of lesser value?</p>

<p>You have been playing your silly regional games and you think you can keep playing them forever. Make these people wait. Release the album here, see if it gets traction, then price it higher there. Regionalize <span class="caps">DVD</span>s to control releases and pricing. Well, the game is over.</p>

<p>From now on, I will have no qualms about downloading digital music. I will continue to buy from sources that want my money (<a href="http://magnatune.com/">Magnatune</a>, artists like <a href="http://www.ronaldjenkees.com/">Ronald Jenkees</a>). For everything else, I will just download it. It only takes a couple of minutes anyway.</p>

<p>So, next time you wonder about why your sales and profits are declining, remember &#8212; it&#8217;s because you didn&#8217;t want my money. And perhaps instead of complaining about <span class="caps">P2P, </span>hiring hordes of lawyers or buying expensive ad campaigns it is easier to simply start <strong><span class="caps">SELLING</span></strong> your stuff to people who <strong>want to pay for it</strong>.</p>
]]></description><wfw:commentRss>http://jan.rychter.com/enblog/rss-comments-entry-5224443.xml</wfw:commentRss></item><item><title>GTD apps for the Mac: a subjective review</title><dc:creator>Jan Rychter</dc:creator><pubDate>Wed, 02 Sep 2009 09:16:44 +0000</pubDate><link>http://jan.rychter.com/enblog/2009/9/2/gtd-apps-for-the-mac-a-subjective-review.html</link><guid isPermaLink="false">329406:3464523:5060736</guid><description><![CDATA[<p><span class="full-image-float-right ssNonEditable"><span><img src="http://jan.rychter.com/storage/gtd-3icons-s.png?__SQUARESPACE_CACHEVERSION=1251883673091" alt=""/></span></span>Having tried all major <span class="caps">GTD </span>apps for the Mac I thought I’d summarize my thoughts. While many people try to compare features, I would like to concentrate on a more subjective review. After all, a <span class="caps">GTD </span>app is something you use on a daily basis, so it isn’t just tables with features that matter.</p>

<p>Since I used all major <span class="caps">GTD </span>apps on the Mac extensively (e.g. I moved my entire life into each of them in turn), I think I’m qualified to form an opinion.</p>

<p>There are three major contenders in the Mac OS native application <span class="caps">GTD </span>arena:<br />
* <a href="http://www.omnigroup.com/applications/omnifocus/">OmniFocus</a><br />
* <a href="http://culturedcode.com/things/">Things</a><br />
* <a href="http://www.potionfactory.com/thehitlist/">The Hit List</a></p>

<p>There used to be iGTD as well — but it has been discontinued now that its developer joined Cultured Code and works on Things. I used iGTD a long time ago, but found it too heavy on features and too crash-prone.</p>

<p>I should also probably mention TaskPaper, which while cool, isn’t really a full-blown <span class="caps">GTD </span>app.</p>

<p>Let’s go through each of the three in turn.</p>

<p><strong>OmniFocus</strong> (The Omni Group) is the most mature of the apps. It was clearly developed with lots of user feedback. It is quite complex, with lots of user interface. However, I found that I’m spending lots of time on the mechanics of managing tasks instead of actually doing stuff. There is lots of clicking, tabbing and cursoring around to be had in OmniFocus. Plus there is that ubuquitous Omni inspector thing, which some people love and some people hate. I fall in the second category. I don’t like multiple window apps.</p>

<p><strong>Things</strong> is carefully designed to look nice, which scores it a lot of marketing points. It also seems simple to use. I jumped onto it with enthusiasm, also buying the Things Touch iPhone app. But after several weeks problems became apparent. First, Things forces a structure upon you and that structure isn’t very well designed. There are projects, areas and „focuses”, which don’t really complement each other. In theory, Projects are for ordered, sequential lists of tasks, Areas for single-shot tasks and Focuses cut across them, letting you see which tasks you have to do immediately and which can wait. But if this is so, why can’t I schedule a task in a project to be done in the future?</p>

<p>The biggest problem with Things might seem inconsequential unless you realize this happens dozens of times a day. Let’s say I have a task in my Inbox. I know it belongs to a project and I need to start it today. I can either drag it to a project or drag it to „Today”, but in either case the task will disappear from my Inbox. I then have to hunt it down again, searching for it. This is a complete showstopper problem.</p>

<p>Until very recently Things also had no keyboard support at all — even the tabbing order seemed wrong. This has been improved in recent versions, but it is clear the developers never use the app without a mouse.</p>

<p>Things Touch was nice until I filled it with tasks. Then it became so slow that it was virtually useless. Unreliable syncing didn’t help either.</p>

<p>I then tried <strong>The Hit List</strong> — and after an hour moved my life into it and never looked back. It isn’t perfect, but it gets most things right. Here’s what I really like about the app, all of this is in contrast to the others:<br />
* In the Inbox, you can drag things to „Today” and they still remain in the Inbox, which lets you then assign them to projects,<br />
* There are lists and folders. You can use these lists as projects, areas, shopping lists, anything you want. No artificial distinction into „Areas” and „Projects”.<br />
* Smart Folders let you organize tasks your own way (I have a „Stale tasks” smart folder that picks up untouched stale things for review).<br />
* Insanely great keyboard support. Navigate to a task, press „F”, and then type several letters from any of your project names, press enter and your task gets moved. Similarly for jumping to projects, use „G” and type any subsequence of characters of your project’s name. I wish all apps had this nailed down so well.<br />
* Great interface for repeating tasks. Press „Cmd-R” on a task, type „every week” and the task becomes a repeating one.<br />
* Tabs that let you keep frequently used views easily accessible.<br />
* Auto-suggested tags that really work (surprisingly).</p>

<p>Overall feeling after several weeks of usage was that I was on top of things. I could manage my tasks easily without spending too much time on the mechanics of it.</p>

<p>The Hit List seems to contain everything I wanted from OmniFocus, but with a much better interface. I just hope the author will keep improving it very carefully, without implementing every feature people ask for. In <span class="caps">GTD </span>apps, streamlined interface and usability are more important than features!</p>

<p>Roughly quoting Merlin Mann (43Folders.com): „asking which <span class="caps">GTD </span>app is better is like asking if mustard is better than ketchup”. Those are subjective choices, hence my subjective review.</p>
]]></description><wfw:commentRss>http://jan.rychter.com/enblog/rss-comments-entry-5060736.xml</wfw:commentRss></item><item><title>Experiments with parallel genetic programming in Clojure</title><category>Clojure</category><dc:creator>Jan Rychter</dc:creator><pubDate>Wed, 26 Aug 2009 16:20:54 +0000</pubDate><link>http://jan.rychter.com/enblog/2009/8/26/experiments-with-parallel-genetic-programming-in-clojure.html</link><guid isPermaLink="false">329406:3464523:5011267</guid><description><![CDATA[<p><span class="full-image-float-right ssNonEditable"><span><img src="http://jan.rychter.com/storage/crw_3737-s.png?__SQUARESPACE_CACHEVERSION=1251304399929" alt=""/></span></span>I&#8217;ve been experimenting with genetic programming, learning Clojure as I go. I came to the point where I wanted to make my program parallel.</p>

<p>First of all, I am amazed at how readable and concise the code turns out to be. As an example, take a look at this function:</p>



<pre class="prettyprint lang-lisp">
(defn choose-reproducing-parents [individuals]
  (take 2 
    (sort-by :fitness &gt; 
      (select-randomly 5 individuals))))
</pre>



<p>It doesn&#8217;t get more readable than this!</p>

<p>But the real joy came when I started to parallelize my code. Normally,  the process would involve extra libraries, lots of fussing around with locks, and hours spent debugging deadlocks. So let&#8217;s look at an example function I needed to parallelize. Generating random code takes quite a bit of <span class="caps">CPU </span>time, especially if one needs to generate code for thousands of individuals. There is a function for that:</p>



<pre class="prettyprint lang-lisp">
(defn create-random-individuals [number code-generator]
  (map create-individual-from-code (take number (repeatedly code-generator))))
</pre>



<p>And here&#8217;s the reworked parallel version:</p>



<pre class="prettyprint lang-lisp">
(defn create-random-individuals [number code-generator]
  (pmap create-individual-from-code (take number (repeatedly code-generator))))
</pre>



<p>Can you spot the difference? Yes, that&#8217;s it. The little letter &#8216;p&#8217; is all it takes for the work to be spread among all of my <span class="caps">CPU </span>cores.</p>

<p>Other functions required more work (again, notice how concise and readable the code is):</p>



<pre class="prettyprint lang-lisp">
(defn produce-offspring [population number]
  (take number
	(repeatedly #(reproduce
		      (choose-reproducing-parents population)))))
</pre>



<p>For those unfamiliar with Clojure, <em>repeatedly</em> produces a lazy sequence whose elements are produced by the function (in this case an anonymous function) supplied as the argument. <em>take</em> simply takes the first <em>number</em> elements of a sequence.</p>

<p>And now for the parallel version:</p>



<pre class="prettyprint lang-lisp">
(defn produce-offspring [population number]
  (pmap reproduce
	(take number
	      (repeatedly #(choose-reproducing-parents population)))))
</pre>



<p>Encouraged by this, I then moved on to parallelize the most time-consuming step of all GP programs: fitness evaluations. I&#8217;ll spare you the boring details of extended parallelization work I did on the function (but see examples above). The result was:</p>



<pre class="prettyprint lang-lisp">
(defn test-generation [population fitness-function]
  (pmap #(set-fitness % (fitness-function %)) population))
</pre>



<p>After fitness was evaluated for all individuals, the next generation was produced and the fitness evaluations were run in parallel again.</p>

<p>This worked fine, but I thought there should be a better way. <em>pmap</em> limits me to a single multicore machine. This is fine for now, but in the future I plan to move to a distributed cluster, where the synchronous nature of <em>map</em> would be limiting. So I tried to write an asynchronous implementation.</p>

<p>First, I defined my <em>pool</em>: a collection where individuals are gathered once their fitness is evaluated:</p>



<pre class="prettyprint lang-lisp">
(def pool (agent (vector)))
</pre>



<p>The <em>pool</em> is a Clojure <em>agent</em>. Agents are a synchronization primitive: you can send them actions (functions), which will be queued and executed in order.</p>

<p>As you can see, the pool initially starts as an empty vector. So how do individuals get to the pool? Their fitness needs to be evaluated, and then they need to be added to the pool. It all starts with this function:</p>



<pre class="prettyprint lang-lisp">
(defn run-individuals [individuals]
  (dorun
   (map #(send % test-individual *fitness-function*)
	(map #(add-watcher (agent %) :send pool fitness-tested) individuals))))
</pre>



<p>First, we make each individual an agent, so that we can add a watcher to it. Watchers are a cool Clojure feature &#8212; they let you watch for state changes. Using <em>add-watcher</em> we add a watcher to each individual, telling it to send a <em>fitness-tested</em> action to the <em>pool</em> (which is also an agent, remember?). Then, once we&#8217;ve set up watching for state changes, we send a <em>test-individual</em> action to each individual, giving it the fitness function as a parameter. <em>test-individual</em> is a really simple function, all it does is call the fitness evaluation function and return the new state of the individual.</p>

<p>The <em>dorun</em> is necessary, because we&#8217;re dealing with lazy sequences and discarding the result (sending agent actions is a side effect). If the <em>dorun</em> wasn&#8217;t there, the entire sequence would never get evaluated and actions would never get sent.</p>

<p>Let&#8217;s see what happens once the pool is notified of a state change in an individual:</p>



<pre class="prettyprint lang-lisp">
(defn fitness-tested [watched-population individual]
  (let [population (conj watched-population @individual)]
    (if (&gt;= (count population) target-population-size)
      (let [new-population (prune-population population)]
	(run-individuals
	 (produce-offspring new-population
			    (- target-population-size (count new-population))))
	new-population)
      population)))
</pre>



<p>First, we add the new individual to the pool. If we haven&#8217;t gathered enough individuals, we simply return the pool with the individual added &#8212; this updates the global pool.</p>

<p>The fun part begins when we have enough individuals to produce the next generation. Then, we prune the population, deleting the poorest individuals and produce a new batch of individuals, letting them go using the previously described <em>run-individuals</em>.</p>

<p>If you&#8217;ve done any parallel programming, you&#8217;ll probably worry about multiple threads modifying (pruning) the population simultaneously. Not to worry &#8212; Clojure agents are monitors, you are guaranteed that only one action will execute at a time.</p>

<p>We now have a fully asynchronous parallel GP implementation. Notice how there aren&#8217;t any queues, locks, thread pools to manage. All we have is a single global variable and a couple of simple functions. We don&#8217;t need any new data structures! The beauty of this solution is that because we&#8217;re using agents and watchers, Clojure does the queueing for us. Look, ma, no queues!</p>

<p>I am very happy with how easy and clean this solution turned out to be. I can now see why people keep raving about Clojure. Somebody finally did some serious thinking and implemented a new approach to parallel programming, not just a rehash of old ideas.</p>

<p>In less than an hour I went from a sequential implementation to a parallel asynchronous one. I&#8217;d say that&#8217;s impressive. And most importantly, the same code still runs on a single-cpu machine, with minimal performance impact. I am very impressed.</p>
]]></description><wfw:commentRss>http://jan.rychter.com/enblog/rss-comments-entry-5011267.xml</wfw:commentRss></item><item><title>Clojure performance revisited</title><category>Clojure</category><dc:creator>Jan Rychter</dc:creator><pubDate>Wed, 29 Jul 2009 11:07:19 +0000</pubDate><link>http://jan.rychter.com/enblog/2009/7/29/clojure-performance-revisited.html</link><guid isPermaLink="false">329406:3464523:4775910</guid><description><![CDATA[<p>Since many people asked me about this, here are some additional notes about Clojure performance.</p>

<p>First, something which came to me as a surprise: the single biggest performance jump I got with my application was achieved by switching from Java 5 to Java 6 (64-bit, Mac OS X). The jump was huge — from interpreting around 850,000 instructions per second right up to 1,300,000 instr/s. That’s a nearly 60% improvement that required <span class="caps">ZERO </span>work on my part. Two clicks in Java Preferences.</p>

<p>Invoking a function is expensive. I am back to old Common Lisp techniques of using macros instead of functions in many places.</p>

<p>Watch out for var lookups (yes, I mentioned this before, but this is important).</p>

<p>The other things I did were application-specific, so there isn’t much point in describing them here.</p>

<p>And if you’re interested in how the <span class="caps">JIT </span>performs, here’s a sample run of the application. As you can see, it takes almost 20 runs until the times stabilize at around 1.5 million interpreted instructions per second. The improvement is dramatic: 276% from the first run to the last one.</p>



<pre>
<code>
Executed 616154 instructions in 1.543867 seconds, instruction rate: 399097.84 inst/s
Executed 616154 instructions in 0.653465 seconds, instruction rate: 942902.9 inst/s
Executed 616154 instructions in 0.522443 seconds, instruction rate: 1179370.8 inst/s
Executed 616154 instructions in 0.492671 seconds, instruction rate: 1250639.9 inst/s
Executed 616154 instructions in 0.482119 seconds, instruction rate: 1278012.2 inst/s
Executed 616154 instructions in 0.424934 seconds, instruction rate: 1449999.2 inst/s
Executed 616154 instructions in 0.424169 seconds, instruction rate: 1452614.4 inst/s
Executed 616154 instructions in 0.416273 seconds, instruction rate: 1480168.1 inst/s
Executed 616154 instructions in 0.420429 seconds, instruction rate: 1465536.4 inst/s
Executed 616154 instructions in 0.421797 seconds, instruction rate: 1460783.2 inst/s
Executed 616154 instructions in 0.421114 seconds, instruction rate: 1463152.5 inst/s
Executed 616154 instructions in 0.4115 seconds, instruction rate: 1497336.5 inst/s
Executed 616154 instructions in 0.410837 seconds, instruction rate: 1499753.0 inst/s
Executed 616154 instructions in 0.411064 seconds, instruction rate: 1498924.8 inst/s
Executed 616154 instructions in 0.410936 seconds, instruction rate: 1499391.6 inst/s
Executed 616154 instructions in 0.410301 seconds, instruction rate: 1501712.1 inst/s
Executed 616154 instructions in 0.410638 seconds, instruction rate: 1500479.8 inst/s
Executed 616154 instructions in 0.408832 seconds, instruction rate: 1507108.0 inst/s
Executed 616154 instructions in 0.410466 seconds, instruction rate: 1501108.5 inst/s
Executed 616154 instructions in 0.410113 seconds, instruction rate: 1502400.5 inst/s
Executed 616154 instructions in 0.409741 seconds, instruction rate: 1503764.5 inst/s
</code></pre>
]]></description><wfw:commentRss>http://jan.rychter.com/enblog/rss-comments-entry-4775910.xml</wfw:commentRss></item><item><title>Clojure performance tuning</title><category>Clojure</category><dc:creator>Jan Rychter</dc:creator><pubDate>Mon, 20 Jul 2009 16:59:09 +0000</pubDate><link>http://jan.rychter.com/enblog/2009/7/20/clojure-performance-tuning.html</link><guid isPermaLink="false">329406:3464523:4685485</guid><description><![CDATA[<p>Having done some actual coding in Clojure I can post notes that will hopefully help others. I have code that gets executed a lot (it&#8217;s a stack-based language interpreter) and needed to bring it up to reasonable performance. Here are some notes from the process:</p>

<ol>
<li>You really should know the difference between a seq and a list. If you use any of the seq functions (such as drop or take), your list will no longer have an O(1) count operation. Instead, it will become a LazySeq and count will become O(n). If count is something you call frequently (I do), you will want to avoid this. In my case, my stacks were implemented as lists and count gets called very frequently, so this was a serious and surprising problem. Surprising, because most tutorials skim over the difference, only emphasizing how general seqs are. So if your <code>count</code> performance takes a dive, see if you can replace <code>(drop 2 mylist)</code> with <code>(pop (pop mylist))</code>. The latter will keep the <code>PersistentList</code> structure.</li>
<li>I implemented my stacks as both lists and vectors. There was almost no performance difference between the two (lists were actually slightly faster). I found this to be a surprising result, I expected vectors to be faster. I still think vectors might have lower memory requirements, but I don&#8217;t know how to check.</li>
<li>For vector access, <code>(v n)</code> is faster than <code>(nth v n)</code> which is faster than <code>(get v n)</code>. This is not something I expected. I think this will get ironed out in the future, as Clojure matures.</li>
<li>Attempts to replace Clojure vectors containing integers with primitive Java int arrays produced no performance gains. In fact, they managed to hurt performance because of some necessary conversions.</li>
<li>As expected, there are very few places where type declarations improve performance. But if you have loops that get executed a lot, you might want to check. Always measure, as the results might be unexpected, and remember that less is more.</li>
<li>When measuring anything running on top of the <span class="caps">JVM, </span>let it run for a while before you draw any conclusions. The Hotspot <span class="caps">JIT </span>compiler does really cool things with the code, but it does them after a while. In my case, I run code for at least 10-20 seconds and watch the results. I take them into account only after they stabilize. It is common to see a 2x improvement between the first run and the last one.</li>
<li>Accessing vars costs cycles. Clojure has no constants, so many people use <code>*global-parameters*</code>. This is not a good idea in performance-sensitive code. There are two ways around it:<ol>
<li>Define macros that expand to constants:<br />
<code class="prettyprint lang-lisp">
  (defmacro global-parameter [] *global-parameter*)
</code><br />
and use <code>(global-parameter)</code> in your code.</li>
<li>Enclose your function definitions in let forms, rebinding global constants to lexical variables:<br />
<code class="prettyprint lang-lisp">
  (let [global-parameter *global-parameter*]
    (defn my-function []
      ...))
</code></li>
</ol></li>
</ol>

<p>As for profiling tools, the old and tried method of actually measuring wall time worked best. I tried the YourKit profiler, but wasn&#8217;t impressed, at at $499 it is way overpriced. If they come out with a Clojure edition (drop some features) at $79, I&#8217;ll consider. I also tried JVisualVM, but it turned out that it is buggy on a Mac and the profiler doesn&#8217;t work. I hope this will be fixed in the future.</p>
]]></description><wfw:commentRss>http://jan.rychter.com/enblog/rss-comments-entry-4685485.xml</wfw:commentRss></item></channel></rss>