<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Beyond Syntax</title>
	<atom:link href="http://www.beyond-syntax.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.beyond-syntax.com</link>
	<description>Looking beyond syntactical meaning</description>
	<lastBuildDate>Thu, 01 Jul 2010 22:45:23 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Performance Monitoring with OProfile</title>
		<link>http://www.beyond-syntax.com/2010/07/performance-monitoring-with-oprofile/</link>
		<comments>http://www.beyond-syntax.com/2010/07/performance-monitoring-with-oprofile/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 21:32:00 +0000</pubDate>
		<dc:creator>Michael Schultz</dc:creator>
				<category><![CDATA[computers]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[guide]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[oprofile]]></category>
		<category><![CDATA[performance monitoring]]></category>

		<guid isPermaLink="false">http://www.beyond-syntax.com/?p=162</guid>
		<description><![CDATA[oprofile is a low overhead, open-source tool that hooks into Linux and can keep track of CPU event monitoring information.  This is a fairly general statement and for this post I&#8217;ll be using the Intel Penryn microarchitecture, which should have similar event counters to most recent Intel processors.  You can get the canonical [...]]]></description>
			<content:encoded><![CDATA[<p><a title="oprofile home page" href="http://oprofile.sourceforge.net/">oprofile</a> is a low overhead, open-source tool that hooks into Linux and can keep track of CPU event monitoring information.  This is a fairly general statement and for this post I&#8217;ll be using the Intel Penryn microarchitecture, which should have similar event counters to most recent Intel processors.  You can get the canonical list of event counters from Intel&#8217;s own documentation in Chapter 30, Performance Monitoring, of Volume 3B, System Programming Guide (available from <a title="Intel 64 and IA-32 Architectures Software Developer's Manuals" href="http://www.intel.com/products/processor/manuals/">Intel&#8217;s site</a>).  Alternatively, the Japan Advanced Institute of Science and Technology have an <a href="http://www.jaist.ac.jp/iscenter-new/mpc/altix/altixdata/opt/intel/vtune/doc/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/index.htm">interactive version</a> with all the events for most Intel processors.</p>
<p><span id="more-162"></span></p>
<h3>Event Counters</h3>
<p>If you are unaware, almost every processor manufactured in recent history has some collection of event counters that are incremented when some processor event occurs.  These events can range from clock cycles ticking by, instructions being retired, thermal thresholds being passed, or second level cache misses.</p>
<p>So far, I&#8217;ve only really used the CPU clock cycles, level 1 cache line replacement, and instructions retired event counters.  Your needs might not match mine, so venture over to the Programmer Manual when you need something else!</p>
<h4>Event Ratios</h4>
<p>Related to the event counters are event ratios.  These simple ratios can help you find specific performance issues in your program.  For example, if your program does a lot of memory accesses, the processor may need to replaced cache lines frequently.  But cache line replacements are naturally occurring in programs, how do we find excessive?  Simple!  We can just use the ratio of L1 cache replacements to the number of instructions retired.  Then we&#8217;ll have an idea of how many times per instruction an L1 cache line is replaced.</p>
<h3>Using <code>oprofile</code></h3>
<p>First, you&#8217;ll have to be running Linux, then you&#8217;ll want to install the &#8220;oprofile&#8221; package.  Since this software installs kernel modules for monitoring, you&#8217;ll also need root/sudo access to allow the module to be loaded and unloaded for monitoring sessions.  Here,  I&#8217;ll be running as a user and using the <code>sudo</code> command when needed.</p>
<h4><code>opcontrol</code></h4>
<p><code>opcontrol</code> is main program that lets you interact with the kernel.  If you need a down-and-dirty list of the events available for monitoring, <code>opcontrol --list-events</code> will show you all the event counters at your disposal.</p>
<p>On my processor, the default event to monitor is CPU_CLK_UNHALTED which will tell me where the processor spent most of the time executing.  If you want to monitor different events, you can specify what event(s) to monitor at the command line.</p>
<pre>$ sudo opcontrol --event L1D_REPL:10000 --event INST_RETIRED:10000</pre>
<p>The <code>:10000</code> after each counter simply specifies what the trigger threshold is for raising processor exception.  In other words, every 10,000 instructions retired the processor raises an exception that the oprofile daemon will catch and then increment the sample counter for that event.  So, if you see that oprofile has 1 sample of the INST_RETIRED counter then the processor has seen 10,000 such events.</p>
<p>Now that we have the event counters configured, we can start the monitoring.</p>
<pre>$ sudo opcontrol --start</pre>
<p>Since the system is doing other activities, it is best if what you want to monitor can monopolize the system for the while.  In my case I build a simple program that purposefully causes the L1 cache to have a lot of misses (<a href="http://dev.beyond-syntax.com/l1thrash/l1thrash.c">l1thrash source code</a>).  I&#8217;ll also set the program to execute on one processor (CPU 1).</p>
<pre>$ taskset 02 ./l1thrash</pre>
<p>After it finishes executing, stop oprofile from running and save the profile session on the disk.</p>
<pre>$ sudo opcontrol --stop
$ sudo opcontrol --save l1thrash</pre>
<p>Now we have our profile saved to disk and we can view it with <code>opreport</code>.</p>
<h4><code>opreport</code></h4>
<p>Finally, we get to see how the program handled!  Since we were smart and saved our profile to a session, we&#8217;ll have to specify that at the command line.  You might want to pipe the output to less since it can be long at times.  On my eight core system the output looks ugly.</p>
<pre>$ opreport session:l1thrash
CPU: Core 2, speed 2494.04 MHz (estimated)
Counted L1D_REPL events (Cache lines allocated in the L1 data cache) with a unit mask of 0x0f (No unit mask) count 10000
Samples on CPU 0
Samples on CPU 1
Samples on CPU 2
Samples on CPU 3
Samples on CPU 4
Samples on CPU 5
Samples on CPU 6
Samples on CPU 7
    cpu:0|            cpu:1|            cpu:2|            cpu:3|            cpu:4|            cpu:5|            cpu:6|            cpu:7|
  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|
------------------------------------------------------------------------------------------------------------------------------------------------
      541 95.7522      2969  0.9630       301 92.6154       484 69.9422       797 92.6744       707 88.2647       675 90.3614       707 89.8348 vmlinux
        7  1.2389        21  0.0068         6  1.8462         6  0.8671         9  1.0465         6  0.7491         6  0.8032         5  0.6353 oprofile
        6  1.0619         3 9.7e-04         6  1.8462         1  0.1445         3  0.3488         2  0.2497         1  0.1339         4  0.5083 nf_ses_watch
        5  0.8850        16  0.0052         6  1.8462         7  1.0116        25  2.9070        23  2.8714        30  4.0161        27  3.4307 libc-2.5.so
        3  0.5310         2 6.5e-04         1  0.3077         1  0.1445         2  0.2326         4  0.4994         0       0         1  0.1271 libpython2.4.so.1.0
        1  0.1770         0       0         0       0         0       0         0       0         0       0         0       0         0       0 e1000e
        1  0.1770         0       0         0       0         0       0         0       0         0       0         0       0         0       0 irqbalance
        1  0.1770         1 3.2e-04         1  0.3077         0       0         0       0         0       0         0       0         0       0 sshd
        0       0         2 6.5e-04         0       0         0       0         6  0.6977         8  0.9988         8  1.0710        18  2.2872 bash
        0       0         0       0         0       0         0       0         1  0.1163         0       0         0       0         1  0.1271 gawk
        0       0         0       0         0       0         3  0.4335         1  0.1163         2  0.2497         1  0.1339         0       0 bnx2
        0       0         0       0         3  0.9231         0       0         0       0         0       0         0       0         0       0 ehci_hcd
        0       0    305283 99.0179         0       0         0       0         1  0.1163         4  0.4994         1  0.1339         2  0.2541 l1thrash
        0       0        10  0.0032         0       0         0       0        14  1.6279        12  1.4981        19  2.5435        13  1.6518 ld-2.5.so
        0       0         3 9.7e-04         0       0         0       0         1  0.1163         1  0.1248         2  0.2677         2  0.2541 libcrypto.so.0.9.8b
        0       0         0       0         1  0.3077         0       0         0       0         0       0         0       0         0       0 libm-2.5.so
        0       0         0       0         0       0         0       0         0       0         0       0         1  0.1339         0       0 libpthread-2.5.so
        0       0         0       0         0       0         0       0         0       0         0       0         1  0.1339         0       0 syslogd
        0       0         0       0         0       0         0       0         0       0         1  0.1248         0       0         0       0 which
        0       0         0       0         0       0         1  0.1445         0       0         0       0         0       0         0       0 libcups.so.2
        0       0         0       0         0       0         0       0         0       0         0       0         2  0.2677         0       0 libusb-0.1.so.4.4.4
        0       0         0       0         0       0       189 27.3121         0       0        30  3.7453         0       0         7  0.8895 oprofiled
        0       0         1 3.2e-04         0       0         0       0         0       0         1  0.1248         0       0         0       0 cupsd</pre>
<p>You may notice that the columns try to be sorted in descending order by the number of samples taken for a specific process.  However, on CPU 1 (where we ran <code>l1thrash</code>) the sorted order isn&#8217;t close to correct.  Luckily, we know that the bulk of our program only ran on CPU 1, so we can reissue the <code>opreport</code> command specifying that we only care about that processor.</p>
<pre>$ opreport session:l1thrash cpu:1
CPU: Core 2, speed 2494.04 MHz (estimated)
Counted INST_RETIRED.ANY_P events (number of instructions retired) with a unit mask of 0x00 (No unit mask) count 10000
Counted L1D_REPL events (Cache lines allocated in the L1 data cache) with a unit mask of 0x0f (No unit mask) count 10000
INST_RETIRED:1...|   L1D_REPL:10000|
  samples|      %|  samples|      %|
------------------------------------
  1834500 91.0882    305283 99.0179 l1thrash
   154499  7.6713      2969  0.9630 vmlinux
    21655  1.0752        21  0.0068 oprofile
     2176  0.1080        16  0.0052 libc-2.5.so
      442  0.0219        10  0.0032 ld-2.5.so
      435  0.0216         2 6.5e-04 bash
      108  0.0054         3 9.7e-04 libcrypto.so.0.9.8b
       47  0.0023         3 9.7e-04 nf_ses_watch
       43  0.0021         1 3.2e-04 sshd
       35  0.0017         2 6.5e-04 libpython2.4.so.1.0
       10 5.0e-04         0       0 libavahi-common.so.3.4.3
       10 5.0e-04         1 3.2e-04 cupsd
        9 4.5e-04         0       0 libcups.so.2
        7 3.5e-04         0       0 bnx2
        3 1.5e-04         0       0 libavahi-core.so.4.0.5
        1 5.0e-05         0       0 libpthread-2.5.so
        1 5.0e-05         0       0 timemodule.so</pre>
<p>That looks better!  Since we&#8217;ve narrowed down the output to one CPU, we now get to see both events that we monitored too.  You can see that the majority of the time was spent in our <code>l1thrash</code> program, but how did it do?</p>
<p>We know that the number of samples is the number of times that the event counter on the processor hit 10,000 for both counters.  So, we find that our <code>l1thrash</code> program caused <img src='http://s.wordpress.com/latex.php?latex=%28305283%29%2810000%29%20%3D%203052830000&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(305283)(10000) = 3052830000' title='(305283)(10000) = 3052830000' class='latex' /> level 1 cache replacements and retired <img src='http://s.wordpress.com/latex.php?latex=%281834500%29%2810000%29%20%3D%2018345000000&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(1834500)(10000) = 18345000000' title='(1834500)(10000) = 18345000000' class='latex' /> instructions.  Egads!  Is that good or bad?  Well, now we can throw in our ratio calculation for the L1 data cache miss:</p>
<p style="text-align:center;"><img src='http://s.wordpress.com/latex.php?latex=L1_%7Bmiss%7D%20%3D%20%5Cfrac%7BL1D%5C_REPL%7D%7BINST%5C_RETIRED%7D%20%3D%20%5Cfrac%7B305283%7D%7B834500%7D%20%3D%20%5Csim%2016.6%5C%25&#038;bg=ffffff&#038;fg=000000&#038;s=2' alt='L1_{miss} = \frac{L1D\_REPL}{INST\_RETIRED} = \frac{305283}{834500} = \sim 16.6\%' title='L1_{miss} = \frac{L1D\_REPL}{INST\_RETIRED} = \frac{305283}{834500} = \sim 16.6\%' class='latex' /></p>
<p>That seems pretty bad to me!  We can also see that the Linux kernel (<code>vmlinux</code>) had a ratio of 2,969:154,499 or about 1.9%, that is a fairly typical miss ratio.</p>
<h3>A Second Example</h3>
<p>This is a real example of a program I am actively trying to improve.  The program is a kernel module (<code>nf_ses_watch</code>) designed to intercept packets at a decent rate, it is not performing well.  Here I&#8217;ll use the default CPU_CLK_UNHALTED event monitor to see where the processor spends most of its time.</p>
<pre>$ # I've already loaded the kernel module and started my packet generator
$ sudo opcontrol --event default
$ sudo opcontrol --start
$ # I'll wait about 30 seconds so there are enough samples to be meaningful
$ sudo opcontrol --stop
$ sudo opcontrol --save bombard</pre>
<p>Now I have my saved session and can look at the profile.  I&#8217;ve also taken the time to set the interrupt affinity of the Ethernet device to a specific processor (CPU 7), so now we can see if all the time was spent in my code of Linux code.</p>
<pre>$ opreport session:bombard cpu:7
CPU: Core 2, speed 2494.04 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 10000
CPU_CLK_UNHALT...|
  samples|      %|
------------------
   737746 86.6169 nf_ses_watch
    88183 10.3533 vmlinux
    16810  1.9736 e1000e
     3680  0.4321 oprofiled
     2594  0.3046 oprofile
     1578  0.1853 libc-2.5.so
      900  0.1057 bash
       78  0.0092 ld-2.5.so
       52  0.0061 ophelp
       26  0.0031 libavahi-common.so.3.4.3
       22  0.0026 libavahi-core.so.4.0.5
       13  0.0015 gawk
        9  0.0011 libcrypto.so.0.9.8b
        9  0.0011 libpython2.4.so.1.0
        9  0.0011 sshd
        8 9.4e-04 bnx2
        4 4.7e-04 libpthread-2.5.so
        3 3.5e-04 grep
        2 2.3e-04 ipv6
        2 2.3e-04 auditd
        1 1.2e-04 cat
        1 1.2e-04 libdl-2.5.so
        1 1.2e-04 libm-2.5.so
        1 1.2e-04 libpcre.so.0.0.1
        1 1.2e-04 dirname
        1 1.2e-04 automount</pre>
<p>Wow!  Over 86% of the time we were executing code in the <code>nf_ses_watch</code> kernel module (my code)!  Let&#8217;s see if we can dig a little deeper.  First, oprofile has already done the work for us and tracks the specific symbol name within a piece of code that was active when the sample was taken with the <code>--symbols</code> option (this results in a very long list of kernel symbols).  But, in the case of a kernel module, <code>opreport</code> doesn&#8217;t know where to find the symbol names so we have to tell it where the kernel module lives with <code>--image-path</code>.</p>
<pre>$ opreport session:bombard cpu:7 --symbols --image-path ~/nf_ses_watch/kmod | head
warning: /bnx2 could not be found.
warning: /e1000e could not be found.
warning: /ipv6 could not be found.
warning: /oprofile could not be found.
warning: /sbin/auditd could not be read.
CPU: Core 2, speed 2494.04 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 10000
warning: could not check that the binary file /home/mjschultz/mon/module/kmod/nf_ses_watch.ko has not been modified since the profile was taken. Results may be inaccurate.
samples  %        image name               app name                 symbol name
733996   86.1767  nf_ses_watch.ko          nf_ses_watch             do_rip_entry
16810     1.9736  e1000e                   e1000e                   (no symbols)
10308     1.2102  vmlinux                  vmlinux                  rb_get_reader_page
9785      1.1488  vmlinux                  vmlinux                  read_hpet
8701      1.0216  vmlinux                  vmlinux                  ring_buffer_consume
3606      0.4234  vmlinux                  vmlinux                  netif_receive_skb
3530      0.4144  vmlinux                  vmlinux                  kfree</pre>
<p><em>(I&#8217;ve piped the output through <code>head</code> to keep it reasonable.)</em>  We can see the real dirt here!  By a huge margin, the <code>do_rip_entry</code> symbol in my <code>nf_ses_watch</code> module executes more than the Ethernet driver that is handling the raw packets.  So that is where I&#8217;ll be looking when I try to resolve my bug.</p>
<h3>Conclusions</h3>
<p>If you are looking to optimize your program, oprofile is a great tool to use.  The default event monitor (CPU clock cycles on most processors), can give you an idea of what part of your program is using most of the processor time.  Once you know that, you can focus your efforts on reducing the number of cycles spent in that function.  But don&#8217;t forget about all those other events too.  If you have a memory intensive application, maybe you could reduce the memory contention and get an effective speedup with almost no refactoring!</p>
<p><em>(I&#8217;ve tried my best to be accurate with this information and I welcome any explicit corrections or clarifications.)</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.beyond-syntax.com/2010/07/performance-monitoring-with-oprofile/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The nth Backup Solution</title>
		<link>http://www.beyond-syntax.com/2010/02/the-nth-backup-solution/</link>
		<comments>http://www.beyond-syntax.com/2010/02/the-nth-backup-solution/#comments</comments>
		<pubDate>Fri, 19 Feb 2010 20:01:24 +0000</pubDate>
		<dc:creator>Michael Schultz</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[os x]]></category>
		<category><![CDATA[backups]]></category>
		<category><![CDATA[cron]]></category>

		<guid isPermaLink="false">http://www.beyond-syntax.com/?p=147</guid>
		<description><![CDATA[In the past, I had developed my own backup solution.  Unfortunately, over time it didn&#8217;t work out (mainly from changing systems, moving, using a laptop instead of a desktop, and maintaining it).  However, I still like the idea of incremental backups as well as a mirrored version of my files (it saves space and lets [...]]]></description>
			<content:encoded><![CDATA[<p>In the past, I had developed my own <a href="http://www.beyond-syntax.com/2007/10/automatic-backups-using-cron-and-tar/">backup solution</a>.  Unfortunately, over time it didn&#8217;t work out (mainly from changing systems, moving, using a laptop instead of a desktop, and maintaining it).  However, I still like the idea of incremental backups as well as a mirrored version of my files (it saves space and lets me keep a history going back some number of days).</p>
<p><span id="more-147"></span>Now that I&#8217;m somewhat settled (and a little wiser), I decided to once more try my hand at a solid backup plan.  This was mainly motivated by a recent reinstall of my wife&#8217;s system (no lost data, just operating system upgrade).  Since I don&#8217;t have vast amounts of time on my hands, I didn&#8217;t want to forward port my old solution to get it to work on Linux and Mac OS X, so I looked for new solutions.  I recalled <a href="http://www.mscs.mu.edu/~brylow/">my advisor</a> from Marquette mentioning <a href="http://rdiff-backup.nongnu.org/">rdiff-backup</a> as what he put on his wife&#8217;s machine during her dissertation days.</p>
<p>As it turns out, rdiff-backup does most of what I wanted out of my backup solution and, in fact, does it a little better.  The main issue I had with my system was that it would periodically (monthly) take a snapshot of my home directory, after that it would periodically (weekly) build incremental diffs based off that snapshot.  What this boils down to is that, if a catastrophic failure happens I would roll back to the most recent snapshot, then progress forward in time to the most recent incremental file.  Not bad, but if you want better-than-weekly granularity it could be a lot of work.  Obviously, I had scripted this part, but still it is wasted time.  With rdiff-backup, it would be a single copy operation to restore to the most recent version.  If you wanted older versions you could roll back through the incremental diffs (again, it is automated).</p>
<p>The other feature that I needed was the ability to remove backups/incremental data older than some time frame (monthly).  Again, rdiff-backup gives me this ability at the command line.  Other bonuses include the fact that it is cross-platform (via macports or most Linux repositories), written in Python, and not maintained by me!</p>
<p>With the basic service in place, it was time to make it automated.  Again, linked off rdiff-backup&#8217;s page is an article on <a href="http://arctic.org/~dean/rdiff-backup/unattended.html">how to do unattended backups</a>.  Besides the typical unattended SSH-keypair-without-a-passphrase and protecting-the-account steps, it introduced me to a new trick (which for some reason, despite having the knowledge on how to do it, never put together) using SSH config.</p>
<pre>Host athena-backup
	Hostname athena.olympus
	User backups
	IdentityFile ~/.ssh/backups_rsa
	Compression yes
	Protocol 2</pre>
<p>Now, if I try to <code>ssh athena-backup</code>, it&#8217;ll automatically use the correct identity file and user name which saves me from having to specify it on the command line (which you can&#8217;t typically do with wrapper functionality).  More importantly, it doesn&#8217;t break normal SSHing onto that host since we made it a special host (that&#8217;s the part I never put together, even though I knew it was possible).</p>
<p>The next issue I never took that time to think about before was my having moved from desktop to laptop (thereby making 1:00am backups worthless sense the laptop isn&#8217;t always on).  Because rdiff-backup does a roll-back model instead of my roll-forward model, I decided to do hourly backups to my home machine, thus I&#8217;ll likely catch at least one of these a day.  But I&#8217;m not always at home!  Getting around that is trivial, I&#8217;ll just ping the backup server before trying.  If it doesn&#8217;t respond, I don&#8217;t backup.  This is done through:</p>
<pre>ping -c1 -t1 $SERVER &gt; /dev/null 2&gt;&amp;1</pre>
<p>where <code>$SERVER</code> is just the name of the backup server.  It pings the host once with a timeout of 1 second, if it succeeds the backup continues; otherwise the script exits.</p>
<p>Of course, setting up the cronjob is as simple as:</p>
<pre>0 */1 * * * $HOME/.crontab/rdiff-backup.sh</pre>
<p>Hopefully this time around the backup solution is more robust than before.</p>
<p>Attachment: <a href="http://dev.beyond-syntax.com/scripts/rdiff-backup.sh">rdiff-backup.sh</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.beyond-syntax.com/2010/02/the-nth-backup-solution/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mozilla Fun</title>
		<link>http://www.beyond-syntax.com/2010/01/mozilla-fun/</link>
		<comments>http://www.beyond-syntax.com/2010/01/mozilla-fun/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 22:29:25 +0000</pubDate>
		<dc:creator>Michael Schultz</dc:creator>
				<category><![CDATA[meta]]></category>

		<guid isPermaLink="false">http://www.beyond-syntax.com/?p=105</guid>
		<description><![CDATA[I was just looking at some XML, and saw that the namespace for XUL is
http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul
It looks like someone likes Ghostbusters at Mozilla.  I just found it amusing.
]]></description>
			<content:encoded><![CDATA[<p>I was just looking at some XML, and saw that the namespace for XUL is</p>
<p><a href="http://www.mozilla .org/keymaster/gatekeeper/there.is.only.xul">http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul</a></p>
<p>It looks like someone likes Ghostbusters at Mozilla.  I just found it amusing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.beyond-syntax.com/2010/01/mozilla-fun/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Technology and Courage</title>
		<link>http://www.beyond-syntax.com/2009/10/technology-and-courage/</link>
		<comments>http://www.beyond-syntax.com/2009/10/technology-and-courage/#comments</comments>
		<pubDate>Fri, 30 Oct 2009 15:09:24 +0000</pubDate>
		<dc:creator>Michael Schultz</dc:creator>
				<category><![CDATA[meta]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.beyond-syntax.com/?p=102</guid>
		<description><![CDATA[A few weeks ago, Ivan Sutherland came to Washington University to give a talk to drum up interest in a new idea he is working on (Fleet, Infinity &#38; Marina [PDF slideshow]).  In my experience, most &#8220;old guy&#8221; talks aren&#8217;t that interesting because they meander with long tangential stories about their children.  Luckily, those were [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago, <a href="http://www.wikipedia.org/wiki/Ivan_Sutherland">Ivan Sutherland</a> came to Washington University to give a talk to drum up interest in a new idea he is working on (<a href="http://fleet.cs.berkeley.edu/docs/07.Jul.2009-slides.pdf">Fleet, Infinity &amp; Marina</a> [PDF slideshow]).  In my experience, most &#8220;old guy&#8221; talks aren&#8217;t that interesting because they meander with long tangential stories about their children.  Luckily, those were kept to a minimum and he had a good sense of humor too!</p>
<p>Now&#8212;interesting as the talk was&#8212;he suggested everyone read his only non-technical paper titled, &#8220;<a href="http://research.sun.com/techrep/Perspectives/smli_ps-1.pdf">Technology and Courage</a>&#8221; [PDF from Sun].  It took me until yesterday to read it, but it was certainly an interesting article.  I recommend everyone read it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.beyond-syntax.com/2009/10/technology-and-courage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RSS Sucks.</title>
		<link>http://www.beyond-syntax.com/2009/09/rss-sucks/</link>
		<comments>http://www.beyond-syntax.com/2009/09/rss-sucks/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 18:25:38 +0000</pubDate>
		<dc:creator>Michael Schultz</dc:creator>
				<category><![CDATA[meta]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[distributed]]></category>
		<category><![CDATA[peer-to-peer]]></category>
		<category><![CDATA[rss]]></category>
		<category><![CDATA[syndication]]></category>

		<guid isPermaLink="false">http://www.beyond-syntax.com/?p=100</guid>
		<description><![CDATA[I&#8217;ll admit that I haven&#8217;t spent too much time working with RSS feeds, but so far I&#8217;m unimpressed.  All they really seem to provide is a consistent view of published data for clients to read when they want.  That seems okay, but inefficient and a little redundant.  It seems like you could implement the same [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll admit that I haven&#8217;t spent too much time working with RSS feeds, but so far I&#8217;m unimpressed.  All they really seem to provide is a consistent view of published data for clients to read when they want.  That seems okay, but inefficient and a little redundant.  It seems like you could implement the same thing by just sending an email to people who want to subscribe.  At least then the end-user doesn&#8217;t have to use both an email client and feed reader (yes, I understand some programs combine the two technologies).  Alright, fine maybe you don&#8217;t want to give the &#8220;evil-faceless-corporate-giant&#8221; your email address, after all you <em>know </em>they&#8217;re going to sell it to someone.  Is there a better way to publish data?<span id="more-100"></span>So, I&#8217;ll start with what possessed me to write this.  I&#8217;m trying to watch a Google Code project and I want to get updates whenever something changes.  The &#8220;easiest&#8221; way to do that is through the RSS feed.  But, I don&#8217;t really want to download and use another application just to watch the feed for updates.  Even if I did download the program, it wouldn&#8217;t really gain me anything since it is just going to query (&#8220;poll&#8221;) the server for new updates periodically, just like my email client already does.</p>
<p>I begin searching online for something that will watch RSS feeds on my behalf and send me an email when it updates.  The first thing I come across is <a href="http://www.feedmyinbox.com/">Feed My Inbox</a>, they seem to offer the exact service I want.  Upon closer inspection, they promise to only send one email every 24 hours.  That won&#8217;t cut it&#8212;I want my updates and I want them now!  After a bit more searching I find <a href="http://rss2email.infogami.com/">rss2email</a>, a simple Python program that keeps track of multiple RSS feeds and converts entries into emails when it executes.  I go through the initial configuration and set up a cronjob to check for new entries every 4 minutes.  Good enough for now.</p>
<p>However, this brings up an annoyance with RSS feeds:</p>
<blockquote><p>RSS feeds do not provide real-time updates</p></blockquote>
<p>Once you get down to it, all an RSS feed does is provide some subset of content on a page.  It is still up to the client to ask the server when new content exists.  This fact has bugged me very slightly in the past since I know <a href="http://www.xkcd.com/">xkcd</a> updates every Monday, Wednesday, and Friday at 11pm central time, but my RSS subscription in Firefox won&#8217;t provide me with the link until (at its discretion) polls the server for new content.  Now, it bothers me slightly more since I know I have to wait <em>at least </em>4 minutes for my cronjob to run, plus any time it might take Google to publish the updates in the feed (but I&#8217;ll ignore that part).</p>
<p>Is there any way to improve this and give end-users real-time updates from content-providers?  It seems like this would be great for users of <a href="http://www.twitter.com/">Twitter</a> and <a href="http://www.facebook.com/">Facebook</a>, since the end-users want to know what is happening <strong>now</strong>.</p>
<p>One answer seems to be in push notifications, that seem to have been popularized by Blackberry and iPhone applications.  These allow a central server to send a tiny message to the phone that nudges the device that there is data to be had.  Of course, this works very well on phone systems that can easily associate an content generator with a telephone number to contact.  However, it is a bit tougher with IP-only devices that migrate from network to network.  Although, it seems Apple&#8217;s Push Notification Service (APNS) should be able to do this.  Either way, this technology seems to be heading in the right direction.</p>
<p>APNS works by maintaining a connection between the client and server, that way when an event happens server side it just sends it on to all the connected clients.  Unfortunately, I&#8217;m not convinced at how well these push notifications will scale.  A system implementing IP push notifications seems like it could easily have on the order of 1000s of simultaneous, persistent connections.  According to this &#8220;<a href="http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2007-016.pdf">Comparison of Push and Pull Techniques for AJAX</a>,&#8221; (tech report, PDF) push-style system do bog down servers a bit.  (Admittedly, the methodology for that paper might not be the best, but I&#8217;m guessing the conclusions are valid&#8212;I would like to see more/better studies of this.)</p>
<p>This bring me to what I want to see implemented or for someone to point me to the implementation of a distributed content syndication protocol (DCSP).  The high-level view that I think would work (I haven&#8217;t thought long or carefully about it), would be similar to other distributed networks.  The content provider would maintain a complete list of current computers subscribed to the feed and the feed itself.  The client would run software that asks the server who to connect to and select a few peers and create a long running connection.  When new content arrives, the server pushes the content to its peers, who push to their peers, and so forth.  This removes the burden of pushing content to <strong>all </strong>subscribers from the server, giving it scalability (in my mind).  It would then be up to the client to connect and maintain connections with a collection of peers to get the real-time updates.  I&#8217;m sure there would have to be some control messages to prevent flooding.  But, it seems like it would give real-time updates to users.</p>
<p>I suppose this would mean I would have to run another program on my system, but it could either be a front-end client that handles my content feeds or a daemon running in the background and set up to deliver an email to a local mailbox (or even a remote mailbox) when fresh content arrives.</p>
<p>Ah well, who knows if it would work.  Hell, maybe I just missed a fact about RSS that doesn&#8217;t make it suck as much as I think it does.  Thus ends my stream of though.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.beyond-syntax.com/2009/09/rss-sucks/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Remote Instance of Firefox via SSH -X</title>
		<link>http://www.beyond-syntax.com/2009/07/remote-instance-of-firefox-via-ssh-x/</link>
		<comments>http://www.beyond-syntax.com/2009/07/remote-instance-of-firefox-via-ssh-x/#comments</comments>
		<pubDate>Mon, 27 Jul 2009 18:46:57 +0000</pubDate>
		<dc:creator>Michael Schultz</dc:creator>
				<category><![CDATA[bash]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[firefox]]></category>
		<category><![CDATA[shell]]></category>

		<guid isPermaLink="false">http://www.beyond-syntax.com/?p=95</guid>
		<description><![CDATA[Firefox is a pretty decent web browser.  However, it can be a bit more clever than I want it at times.  For example, if I want to SSH into a remote machine and launch a instance of Firefox &#8212; to take on the remote machine&#8217;s IP address or access localhost &#8212; I would have to [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Read about the Firefox web browser" href="http://www.getfirefox.com/">Firefox</a> is a pretty decent web browser.  However, it can be a bit more clever than I want it at times.  For example, if I want to SSH into a remote machine and launch a instance of Firefox &#8212; to take on the remote machine&#8217;s IP address or access localhost &#8212; I would have to close the local instance then launch the remote instance.  That is annoying and unacceptable behaviour.</p>
<p>Luckily, the solution is fairly straightforward.  Once you have SSH&#8217;d into a remote host (using <code>ssh -X</code>), you simply need to run <code>firefox -no-remote</code>.  Of course you may want to tack on <code>&gt; /dev/null</code> and an ampersand <code>&amp;</code> to ignore the output and background the task. (Thanks to <a href="http://www.theopensourcerer.com/2007/11/15/remote-firefox-over-xssh/">The Open Sourcer</a>.)</p>
<p>With Firefox 2.x this behaviour was somewhat undocumented, but with Firefox 3.x, running <code>firefox --help</code> from the command line shows the <code>-no-remote</code> option.  It also seems that the default (i.e. <code>-remote</code>), is &#8220;documented&#8221; on Mozilla&#8217;s site for <a href="http://www.mozilla.org/unix/remote.html">Remote Control of UNIX Mozilla</a>.</p>
<p>If you wanted to make the <code>-no-remote</code> behaviour the default when SSH&#8217;d into remote machines, you could simply add a few lines to your bash profile to alias the <code>firefox</code> command.</p>
<pre># If we're forwarding X over SSH, make firefox execute on this machine
if [ -n "$SSH_CONNECTION" -a -n "$DISPLAY" ]; then
    alias firefox='firefox -no-remote'
fi</pre>
<p>At least that is what I did.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.beyond-syntax.com/2009/07/remote-instance-of-firefox-via-ssh-x/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Archiving your Mail</title>
		<link>http://www.beyond-syntax.com/2009/07/archiving-your-mail/</link>
		<comments>http://www.beyond-syntax.com/2009/07/archiving-your-mail/#comments</comments>
		<pubDate>Mon, 06 Jul 2009 19:30:03 +0000</pubDate>
		<dc:creator>Michael Schultz</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[archival]]></category>
		<category><![CDATA[mail]]></category>
		<category><![CDATA[props]]></category>

		<guid isPermaLink="false">http://www.beyond-syntax.com/?p=67</guid>
		<description><![CDATA[For those that don&#8217;t know, I use mutt for my email needs.  This provides several niceties such as stripping out all the various formatting people like to include in their emails (fonts, graphics, etc), a keyboard driven interface, and, well, it just sucks less that most mail clients.
With mutt I choose to download all [...]]]></description>
			<content:encoded><![CDATA[<p>For those that don&#8217;t know, I use <a href="http://www.mutt.org/">mutt</a> for my email needs.  This provides several niceties such as stripping out all the various formatting people like to include in their emails (fonts, graphics, etc), a keyboard driven interface, and, well, it just sucks less that most mail clients.</p>
<p>With mutt I choose to download all my email via POP3 to a local machine where I can read it when I get around to it (rigorous, isn&#8217;t it).  After I read a message and deem it complete I move it to a folder named after the sender (or possibly a group) where I can <code>grep</code> the files and read them at a later date.</p>
<p>However, after a while these files pile up and I need to periodically compress and archive them.  This, of course, gets annoying and frequently forgotten.  To solve this I needed a script that could parse messages in a number of mail formats, find a date, and determine if it is beyond some threshold at which point it should be archived.  These requirements brought me to <a href="http://archivemail.sourceforge.net/">archivemail</a>.  Archivemail supports several input formats (IMAP, mh, mbox, Maildir), archives the messages, and outputs a single mbox formatted file (that can be compresses).  While I&#8217;m not a huge fan of the mbox format I can easily deal with it for archived mail.</p>
<p><span id="more-67"></span></p>
<p>Archivemail has several perks that fit my requirements quite well.  First, it was easy to get (packages availables on OS X, Fedora, Debian, and Ubuntu), this probably stems from the fact that it is written in python and can easily run on almost any system.  Next, it provides several useful command line options, I personally have a cronjob that archives four message folders every 30 days (logwatches and mail lists) and archives other messages after 180 days.  This is simply done with the <code>--days</code> command line switch.  I also specify a directory to dump all the archived messages into so they don&#8217;t clutter up my mail directory.  Depending on how you handle you mail there are also options to not archive unread messages or only archive messages older than some fixed date.</p>
<p>For those interested, here is my script that I run as a weekly cronjob to archive and compress my mail messages:</p>
<pre>ARCMAIL="/usr/bin/archivemail --quiet --output-dir=$HOME/mail/archive/"

$ARCMAIL --days  30 $HOME/mail/logwatch \
                    $HOME/mail/netflix  \
                    $HOME/mail/amazon   \
                    $HOME/mail/dreamhost

$ARCMAIL --days 180 $HOME/mail/*</pre>
<p>Fairly straightforward, eh?</p>
<p>To search through an archive you can just change into the <code>archive/</code> directory and execute a <code>gunzip -c &lt;filename&gt; | grep &lt;word&gt;</code>. Alternatively, you can use mutt&#8217;s built in search and run <code>gunzip &lt;filename&gt;.gz ; mutt -f &lt;filename&gt;</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.beyond-syntax.com/2009/07/archiving-your-mail/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>git branch &#8211;force</title>
		<link>http://www.beyond-syntax.com/2009/05/git-branch-force/</link>
		<comments>http://www.beyond-syntax.com/2009/05/git-branch-force/#comments</comments>
		<pubDate>Fri, 29 May 2009 04:24:12 +0000</pubDate>
		<dc:creator>phinze</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[git]]></category>

		<guid isPermaLink="false">http://www.beyond-syntax.com/2009/05/git-branch-force/</guid>
		<description><![CDATA[The headaches of coordinating a transition from svn => bzr at $JOB a few months ago have had time to fade, and I&#8217;ve now had some time to get used to using a DVCS on a day-to-day basis.  Much like the experience of working with revision control is to the absence of it, I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p>The headaches of coordinating a transition from svn => bzr at $JOB a few months ago have had time to fade, and I&#8217;ve now had some time to get used to using a DVCS on a day-to-day basis.  Much like the experience of working with revision control is to the absence of it, I&#8217;ve found that the move to distributed revision control from centralized leaves one reluctant to return to the old way.</p>
<p>So, I&#8217;ve got Bazaar down pretty well&#8212;it&#8217;s design has been engineered for a smooth transition from Subversion&#8212;but Git is a different beast.</p>
<p>Git breaks a lot of the conceptual assumptions made in svn-land that you didn&#8217;t even know you were using to understand your daily VCS use; the basic &#8220;checkout, update, commit&#8221; operations don&#8217;t cleanly map to the git paradigm.  This can create many WTF moments spent staring at lines of help like:</p>
<blockquote>
<pre>git-rebase - Forward-port local commits to the updated upstream head</pre>
</blockquote>
<p>The payoff for diving into these treacherous waters is, IMHO, worth it.  When your first &#8216;git commit&#8217; or &#8216;git checkout&#8217; comes back within a few milliseconds and you&#8217;re left sitting there still trying to believe it&#8217;s already done&#8230; this is when you realize &#8220;Wow, this tool might actually change the way I work.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.beyond-syntax.com/2009/05/git-branch-force/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>rubygems requirement syntax for config.gem :version</title>
		<link>http://www.beyond-syntax.com/2009/05/rubygems-requirement-syntax-for-configgem-version/</link>
		<comments>http://www.beyond-syntax.com/2009/05/rubygems-requirement-syntax-for-configgem-version/#comments</comments>
		<pubDate>Mon, 11 May 2009 18:44:55 +0000</pubDate>
		<dc:creator>phinze</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ruby rails rubygems]]></category>

		<guid isPermaLink="false">http://www.beyond-syntax.com/?p=73</guid>
		<description><![CDATA[This shouldn&#8217;t have been as difficult to find as it was, so I figured I&#8217;d better throw it somewhere.

  OPS = &#123;
    &#34;=&#34;  =&#62;  lambda &#123; &#124;v, r&#124; v == r &#125;,
    &#34;!=&#34; =&#62;  lambda &#123; &#124;v, r&#124; v != r &#125;,
    [...]]]></description>
			<content:encoded><![CDATA[<p>This shouldn&#8217;t have been as difficult to find as it was, so I figured I&#8217;d better throw it somewhere.</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">  OPS = <span style="color:#006600; font-weight:bold;">&#123;</span>
    <span style="color:#996600;">&quot;=&quot;</span>  <span style="color:#006600; font-weight:bold;">=&gt;</span>  <span style="color:#CC0066; font-weight:bold;">lambda</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>v, r<span style="color:#006600; font-weight:bold;">|</span> v == r <span style="color:#006600; font-weight:bold;">&#125;</span>,
    <span style="color:#996600;">&quot;!=&quot;</span> <span style="color:#006600; font-weight:bold;">=&gt;</span>  <span style="color:#CC0066; font-weight:bold;">lambda</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>v, r<span style="color:#006600; font-weight:bold;">|</span> v != r <span style="color:#006600; font-weight:bold;">&#125;</span>,
    <span style="color:#996600;">&quot;&gt;&quot;</span>  <span style="color:#006600; font-weight:bold;">=&gt;</span>  <span style="color:#CC0066; font-weight:bold;">lambda</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>v, r<span style="color:#006600; font-weight:bold;">|</span> v <span style="color:#006600; font-weight:bold;">&gt;</span> r <span style="color:#006600; font-weight:bold;">&#125;</span>,
    <span style="color:#996600;">&quot;&lt;&quot;</span>  <span style="color:#006600; font-weight:bold;">=&gt;</span>  <span style="color:#CC0066; font-weight:bold;">lambda</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>v, r<span style="color:#006600; font-weight:bold;">|</span> v <span style="color:#006600; font-weight:bold;">&lt;</span> r <span style="color:#006600; font-weight:bold;">&#125;</span>,
    <span style="color:#996600;">&quot;&gt;=&quot;</span> <span style="color:#006600; font-weight:bold;">=&gt;</span>  <span style="color:#CC0066; font-weight:bold;">lambda</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>v, r<span style="color:#006600; font-weight:bold;">|</span> v <span style="color:#006600; font-weight:bold;">&gt;</span>= r <span style="color:#006600; font-weight:bold;">&#125;</span>,
    <span style="color:#996600;">&quot;&lt;=&quot;</span> <span style="color:#006600; font-weight:bold;">=&gt;</span>  <span style="color:#CC0066; font-weight:bold;">lambda</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>v, r<span style="color:#006600; font-weight:bold;">|</span> v <span style="color:#006600; font-weight:bold;">&lt;</span>= r <span style="color:#006600; font-weight:bold;">&#125;</span>,
    <span style="color:#996600;">&quot;~&gt;&quot;</span> <span style="color:#006600; font-weight:bold;">=&gt;</span>  <span style="color:#CC0066; font-weight:bold;">lambda</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>v, r<span style="color:#006600; font-weight:bold;">|</span> v = v.<span style="color:#9900CC;">release</span>; v <span style="color:#006600; font-weight:bold;">&gt;</span>= r <span style="color:#006600; font-weight:bold;">&amp;&amp;</span> v <span style="color:#006600; font-weight:bold;">&lt;</span> r.<span style="color:#9900CC;">bump</span> <span style="color:#006600; font-weight:bold;">&#125;</span>
  <span style="color:#006600; font-weight:bold;">&#125;</span></pre></div></div>

<p>Found deep in the <a href="http://rubygems.rubyforge.org/svn/trunk/lib/rubygems/requirement.rb">rubygems source</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.beyond-syntax.com/2009/05/rubygems-requirement-syntax-for-configgem-version/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>If YAML.dump can&#8217;t produce valid YAML&#8230;</title>
		<link>http://www.beyond-syntax.com/2009/02/if-yaml-dump-cant-produce-valid-yaml/</link>
		<comments>http://www.beyond-syntax.com/2009/02/if-yaml-dump-cant-produce-valid-yaml/#comments</comments>
		<pubDate>Thu, 05 Feb 2009 05:12:05 +0000</pubDate>
		<dc:creator>phinze</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[bug]]></category>
		<category><![CDATA[rant]]></category>
		<category><![CDATA[yaml]]></category>

		<guid isPermaLink="false">http://www.beyond-syntax.com/?p=3</guid>
		<description><![CDATA[Yesterday I had fun with ruby&#8217;s YAML module not loading a piece of YAML I needed it to load.  Syntax Error&#8230; fine&#8211;all in a day&#8217;s work. But then I started looking at  the YAML in question and I realized, this was GENERATED by the module itself!

First of all, the reason I was using [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday I had fun with ruby&#8217;s YAML module not loading a piece of YAML I needed it to load.  <code>Syntax Error</code>&#8230; fine&#8211;all in a day&#8217;s work. But then I started looking at  the YAML in question and I realized, this was GENERATED by the module itself!</p>
<p><span id="more-3"></span></p>
<p>First of all, the reason I was using YAML was as a quick, simple, human-readable backend for a tiny little side-project.  I chose YAML over SQLite or CSV because neither have the human-readable thing going for them, plus I&#8217;m working in Ruby and it seemed to have nice built-in support for YAML.</p>
<p>You can take any object you like (with obvious exceptions like streams and sockets) and just <code>YAML.dump</code> it, creating a happy string waiting to be written to the nearest file.  So given this class:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#9966CC; font-weight:bold;">class</span> Foo
  <span style="color:#9966CC; font-weight:bold;">def</span> initialize
    <span style="color:#0066ff; font-weight:bold;">@bar</span> = <span style="color:#996600;">'one'</span>
    <span style="color:#0066ff; font-weight:bold;">@baz</span> = <span style="color:#006666;">2</span>
    <span style="color:#0066ff; font-weight:bold;">@qux</span> = <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">3</span>, <span style="color:#996600;">'four'</span>, <span style="color:#006666;">5</span><span style="color:#006600; font-weight:bold;">&#93;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>You can do something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#006600; font-weight:bold;">&amp;</span>gt;<span style="color:#006600; font-weight:bold;">&amp;</span>gt; f = Foo.<span style="color:#9900CC;">new</span>
=<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#008000; font-style:italic;">#</span>
<span style="color:#006600; font-weight:bold;">&amp;</span>gt;<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#CC00FF; font-weight:bold;">YAML</span>.<span style="color:#9900CC;">dump</span><span style="color:#006600; font-weight:bold;">&#40;</span>f<span style="color:#006600; font-weight:bold;">&#41;</span>
=<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#996600;">&quot;--- !ruby/object:Foo <span style="color:#000099;">\n</span>bar: one<span style="color:#000099;">\n</span>baz: 2<span style="color:#000099;">\n</span>qux: <span style="color:#000099;">\n</span>- 3<span style="color:#000099;">\n</span>- four<span style="color:#000099;">\n</span>- 5<span style="color:#000099;">\n</span>&quot;</span></pre></div></div>

<p>And that YAML string at the end there looks like this:</p>

<div class="wp_syntax"><div class="code"><pre class="yaml" style="font-family:monospace;">--- !ruby/object:Foo
bar: one
baz: 2
qux:
- 3
- four
- 5</pre></div></div>

<p>So you can stuff that wherever you&#8217;d like (<code>file.yml</code> let&#8217;s say) and then at some later date all you need to do is:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#006600; font-weight:bold;">&amp;</span>gt;<span style="color:#006600; font-weight:bold;">&amp;</span>gt; new_f = <span style="color:#CC00FF; font-weight:bold;">YAML</span>.<span style="color:#9900CC;">load_file</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'file.yml'</span><span style="color:#006600; font-weight:bold;">&#41;</span>
=<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#008000; font-style:italic;">#</span>
<span style="color:#006600; font-weight:bold;">&amp;</span>gt;<span style="color:#006600; font-weight:bold;">&amp;</span>gt; f == new_f
=<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#0000FF; font-weight:bold;">true</span></pre></div></div>

<p>Fun times, no?  And most importantly fun times that don&#8217;t require one learns YAML in great detail.  Well, let me show you exactly where the fun times ended for me.  For an example, let&#8217;s make a simple hash with our <code>f</code>.</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#006600; font-weight:bold;">&amp;</span>gt;<span style="color:#006600; font-weight:bold;">&amp;</span>gt; h = <span style="color:#006600; font-weight:bold;">&#123;</span> f =<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#006666;">2</span> <span style="color:#006600; font-weight:bold;">&#125;</span>
=<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#008000; font-style:italic;">#=&amp;gt;2}</span>
<span style="color:#006600; font-weight:bold;">&amp;</span>gt;<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#CC00FF; font-weight:bold;">YAML</span>.<span style="color:#9900CC;">dump</span><span style="color:#006600; font-weight:bold;">&#40;</span>h<span style="color:#006600; font-weight:bold;">&#41;</span>
=<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#996600;">&quot;--- <span style="color:#000099;">\n</span>!ruby/object:Foo ? <span style="color:#000099;">\n</span>  bar: one<span style="color:#000099;">\n</span>  baz: 2<span style="color:#000099;">\n</span>  qux: <span style="color:#000099;">\n</span>  - 3<span style="color:#000099;">\n</span>  - four<span style="color:#000099;">\n</span>  - 5<span style="color:#000099;">\n</span>: 2<span style="color:#000099;">\n</span><span style="color:#000099;">\n</span>&quot;</span></pre></div></div>

<p>Seems fine, no?</p>

<div class="wp_syntax"><div class="code"><pre class="yaml" style="font-family:monospace;">---
!ruby/object:Foo ?
  bar: one
  baz: 2
  qux:
  - 3
  - four
  - 5
: 2</pre></div></div>

<p>So we go about our business, right?</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#006600; font-weight:bold;">&amp;</span>gt;<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#CC00FF; font-weight:bold;">YAML</span>.<span style="color:#CC0066; font-weight:bold;">load</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC00FF; font-weight:bold;">YAML</span>.<span style="color:#9900CC;">dump</span><span style="color:#006600; font-weight:bold;">&#40;</span>h<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#CC00FF; font-weight:bold;">ArgumentError</span>: syntax error on line <span style="color:#006666;">2</span>, col <span style="color:#006600; font-weight:bold;">-</span><span style="color:#006666;">1</span></pre></div></div>

<p>Boo.  So now I have to go and do what I was trying to avoid doing in the first place: learn about YAML.   Well after wading through the schema documents for YAML, which are long, full of BNF, and no fun.  At the end of the day it turns out that <code>YAML.dump</code> produces invalid YAML whenever an object is used as a key in a Hash.  Without going into too much detail (partially because I don&#8217;t fully understand it), the question mark needs to come before the ruby object tag for it to be valid.</p>
<p>Here&#8217;s a workaround that just does some regexp munging on the string :</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#006600; font-weight:bold;">&amp;</span>gt;<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#CC00FF; font-weight:bold;">YAML</span>.<span style="color:#CC0066; font-weight:bold;">load</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC00FF; font-weight:bold;">YAML</span>.<span style="color:#9900CC;">dump</span><span style="color:#006600; font-weight:bold;">&#40;</span>h<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#CC0066; font-weight:bold;">gsub</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">/</span>^<span style="color:#006600; font-weight:bold;">&#40;</span>!ruby.<span style="color:#006600; font-weight:bold;">*</span><span style="color:#006600; font-weight:bold;">&#41;</span> \? <span style="color:#006600; font-weight:bold;">*</span>$<span style="color:#006600; font-weight:bold;">/</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>m<span style="color:#006600; font-weight:bold;">|</span> <span style="color:#996600;">&quot;? #{$1}&quot;</span> <span style="color:#006600; font-weight:bold;">&#125;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
=<span style="color:#006600; font-weight:bold;">&amp;</span>gt; <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#008000; font-style:italic;">#=&amp;gt;2}</span></pre></div></div>

<p>Looks like I&#8217;ll be submitting my first bug report to rubylang.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.beyond-syntax.com/2009/02/if-yaml-dump-cant-produce-valid-yaml/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
