Benchmarking row multiply in Ruby’s Linalg

While using LinAlg to write and understand/review information gain, I found that I wanted the equivalent of a row multiply, notated as [*] (it’s non-standard. I made up the notation), where:

A [*] x = B

[[a00, ..., a0M] [[x0] [[a00 * x0, ..., a0M * x0]
[... ...] [*] [x1] = [... * x1, .... ...]
[aN0, ..., aNM]] [x2]] [aN0 * x2, ..., aNM * x2]]

I want to map each row with each corresponding value of x. This operation is not standard matrix multiplication, though I feel like it should be!

I admit I slept in Linear Algebra class. Thus I was a bit dumbfounded about how to express it using normal matrix multiplication. While I could do it using LinAlg’s mapping functions, I knew that it could be done because it was all linear transformations.

Luckily James Lawrence, who maintains LinAlg was emailing me, and asked me about it (thanks!). He gave me some code that did it both ways. After I read it, I slapped myself.

# using straight up matrix multiplication
def calculate1(con)

# using map
def calculate2(con)
DMatrix.diagonal((con*,1,1)).map { |e| 1.0/e })*con

I wasn’t sure which one was faster. Maybe they’d be about the same. Maybe doing matrix multiply would be slower because you’d have to go through all the multiplications. Per James’s suggestion, I benchmarked them, in the fine form of yak shaving.

Run on a laptop 1.1GHz Pentium M, 758MBs, I did two experiments. For one experiment, I kept the matrix at 800 rows, then I grew the columns. Let’s see what it looks like:

time (secs) vs column size

I should have labeled the axis in GnuPlot, but you’ll live. The y axis is the number of seconds, and the x axis is the column size. Uninteresting, but nice to know it’s linear. The two functions aren’t significantly different from each other in the amount of time that it takes. I expected the map (calculate2) to be much faster since it didn’t have to do all the multiplies. oh well.

I almost didn’t run the second test, but it proved to be a bit more interesting. This time I kept 800 columns and grew the number of rows. Same axis. Different graph:

time (secs) vs row size

Whoa! It’s exponential. Or quadratic. I can’t tell. Anyway, anything curving up is bad news. I suspect this might have something to do with row major/column major ordering. C stores matrices row by row, whereas Fortran stores it column by column.. Update: As corrected by James L., in the first experiment, growing columns will create more multiplies inside the diag() call, but the size of the diagonal will stay the same. Growing the rows, however, will create less multiplies inside the diag() call, but each increase in row size will increase both dimensions of the resulting diagonal matrix, giving us n^2. So it’s quadratic.

So what I went looking for wasn’t what I meant to find. But given that I’ve read about it before, it wasn’t super interestingit would have made sense had I thought about it, I guess it’s not super surprising. Let’s see if we can use transpose to cut down on the time. For one part, we’ll grow the rows as before, and compare it to growing rows, but transposing the input then transposing the output, to get the same deal. What’s it look like:

time(secs) vs row size

This is good news. Even if transpose is an extra couple of manipulations, it saves us computation for bigger matrix sizes. The most interesting part of the graph is the crossing of the two graphs. If somehow, LinAlg (or any other package for that matter) can detect where that inflection point is going to be, it can switch off between the two. The only thing I can think of is another package lying underneath doing sampling of each implementation randomly whenever a user calls the function to do interpolation of its growth curve, and then calculate the crossing analytically. I don’t currently know of any package that does this (or if it does, I don’t know about it, cuz it already performs so well by doing the switch!)

This was a nice little diversion from my side projects…a side project of side projects. Back to learning about information gain and its ilk. At least something came out of it. I have a nice little experiment module that I can use to do other experiments. And I spent way too much time on it not to post something…

I share my code below, and you can run it if you want. snippet!

# These are the experiments that I ran

require 'experiment'

require 'linalg'
include Linalg

def calculate1(con)

def calculate2(con)
DMatrix.diagonal((con*,1,1)).map { |e| 1.0/e })*con

MAX = 800
MID = MAX / 2

# Growing columns
Experiment::compare(1..MAX, STEP,
proc { |col_size| DMatrix.rand(MID, col_size) },
proc { |matrix, col_size|
proc { |matrix, col_size|

# Growing rows
Experiment::compare(1..MAX, STEP,
proc { |row_size| DMatrix.rand(row_size, MID) },
proc { |matrix, row_size|
proc { |matrix, row_size|

# Growing rows, transposing
Experiment::compare(1..MAX, STEP,
proc { |row_size| DMatrix.rand(row_size, MID) },
proc { |matrix, col_size|
proc { |matrix, col_size|
# Experiment.rb - perform timed experiments across on one dimension
require 'rubygems'
require 'gnuplot'

# Experiment is a module where you can use to plot and benchmark pieces of code
# and plot it on a gnuplot
module Experiment
class << self

def timer
start_time =
diff_time = - start_time
puts "That run took #{diff_time} seconds"
return diff_time

# takes range to try and the step resolution of the range and runs
# the benchmark_block with each of the different ranges.
# the init_proc only gets run once during setup.
# Experiment::benchmark((1..50), 10,
# proc { |col_size| DMatrix.rand(500, col_size)},
# proc { |matrix, col_size| })
def benchmark(range, resolution, init_proc, benchmark_block)
xs, ts = [], []
range.step(resolution) do |x_size|
object =
xs << x_size
ts << timer {, x_size) }
plot(xs, ts)

# same idea as benchmark, but does two at the same time. So
# it takes an init_proc to initialize the experiment,
# and two different procs, standard and comparison, to run
# for each step in the range at the given resolution.
# Experiment::benchmark((1..50), 10,
# proc { |col_size| DMatrix.rand(500, col_size)},
# proc { |matrix, col_size| },
# proc { |matrix, col_size| })
def compare(range, resolution, init_proc,
standard_block, comparison_block)
xs, s_ts, c_ts = [], [], []
range.step(resolution) do |x_size|
object =
xs << x_size
s_ts << timer {, x_size) }
c_ts << timer {, x_size) }
puts "#{x_size} = standard : comparison :: #{s_ts.last} : #{c_ts.last} secs"
plot(xs, s_ts, c_ts)

def plot(x, *ys) do |gp| do |plot|
ys.each do |y| <<[x, y]) do |ds|
ds.with = "linespoints"


Algorithm always matters, even when parallel

I’ve finally had some time to start looking at things that looked interesting to me. One of the things that I started looking at was parallel algorithms.

I was reading Hacker News, and I saw some link that lead me to discover Cilk, a C extension to easily make parallel programs.

thread int fib(int n) {
if (n < 2)
return n;
else {
cont int x, y;
x = spawn fib (n-2);
y = spawn fib (n-1);
return x+y;

This was the example given for calculating the Fibonacci sequence in parallel. This is the standard mathematical way to define it, and it looks clean enough. So instead of trying it out in Cilk, I fired up Erlang to try my hand at doing a port. I found it a little bit difficult because while you can easily spawn processes in Erlang, there was no quick way for the parent process to wait/sync/join child processes and get their results. Since that was besides the point of the exercise, I fired up Ruby, even though they had a slow Threading library (which is suppose to be fixed in 1.9, plus Fibers!) I’ll do with it Erlang some other time.

First I wrote the threaded version to mimic the Cilk code:

def fib_threaded(n)
if n < 2
return 1
threads = []
x = 0
y = 0
threads << - 2) { |i| x = fib_threaded(i) }
threads << - 1) { |i| y = fib_threaded(i) }
threads.each { |thread| thread.join }
return x + y

I don’t have a multi-core processor. I’m on a 3 year old 1GHz laptop. At a mere fib(18), it was taking about 21 seconds to run. To see if there was a difference, I wrote a serial version.

def fib_serial(n)
n < 2 ? 1 : fib_serial(n - 1) + fib_serial(n - 2)

This one ran much much faster. It took about 0.02594 seconds. At this point, it’s probably the overhead of thread creation that’s making it take so long to run. Maybe with green threads or lightweight threads, the threaded version would run much faster. That makes me want to try it in Erlang just to compare. But wtf, adding shouldn’t take that long, even if it is 0.025 seconds

When I thought about it, it was an efficient algorithm: there’s a lot of wasted cycles. In order to compute f(n), you have to calculate f(n – 1) and f(n – 2) in separate threads.

  • The f(n – 1) thread requires it to spawn two more threads to compute f(n – 2) and f(n – 3).
  • The f(n – 2) thread requires it to spawn two more threads to compute f(n – 3) and f(n – 4).

Notice that both the threads for f(n – 1) and f(n – 2) have to spawn two different threads to calculate f(n – 3). And since this algorithm has no way for threads to share their results, they have to recompute values all over again. The higher the n, the worse the problem is, exponentially.

To calculate the speedup given to an algorithm by adding more processors, you calculate the amount of total work required and divide it by the span of the parallel graph. If that didn’t make sense, read lecture one for Cilk, which is where the diagram comes from. So for fib(n)

Twork = O(n^2)

The total amount of work is the total number of processes spawned. Since every f(n) recursively spawns two other processes, it’s about n^2 processes.

Tspan = O(ln n)

The total span is how many nodes a particular calculation traverses. A la the diagram, it’s about the height of the tree, so that’s about ln n nodes.

Therefore, for fib(n), the processor speed up is at most:

Tw / Ts = O(n^2) / O(ln n)

I don’t know of any reductions for that off the top of my head, but you can see that the processor speedup gain grows somewhere in between n and n^2. On one hand, it means this algorithm can benefit from speedups by adding up to somewhere between n and n^2 processors. However, that also means that to make this algorithm go as fast as it can to process fib(1000), you need more than 1000 processors to make it worthwhile. Not so good for a parallel program that’s just doing addition.

As a last version, I wrote one that computed the Fibonacci from 0 up to n, and keeping the total as I go along, instead of the recursive version that has to work its way n back down to zero.

def fib_loop(n)
sequence = [1, 1]
if n < 2
sequence = (1..n-1).inject(sequence) do |t, e|
t << (t[-2] + t[-1])

It’s not space effective since I wrote it quickly, but this beat the pants off the other two running at 0.00014 seconds. As you can see, you’re not recalculating any f(n) more times than you need to.

I wish Cilk had a better first example to parallel programs. Given that the guy making Cilk is the same guy that co-wrote the famous mobile book for algorithms, I guess I was surprised. However, it was a fun exercise, and made me think about algorithms again.

I’ll find some other little project that requires me to write in Erlang, rather than falling back on the comfortable Ruby. snippet! Below if you want to run it yourself.

# rubinacci.rb by Wilhelm

def fib_threaded(n)
if n < 2
return 1
threads = []
x = 0
y = 0
threads << – 2) { |i| x = fib_threaded(i) }
threads << – 1) { |i| y = fib_threaded(i) }
threads.each { |thread| thread.join }
return x + y

def fib_serial(n)
n < 2 ? 1 : fib_serial(n – 1) + fib_serial(n – 2)

def fib_loop(n)
sequence = [1, 1]
if n < 2
sequence = (1..n-1).inject(sequence) do |t, e|
t << (t[-2] + t[-1])

def run(&algorithm)
start_time =
integer = 18
answer =
elapsed = – start_time

puts “Fib(#{integer}) = #{answer}”
puts “Total elapsed time: #{elapsed} seconds”

puts “threaded:”
run do |integer|

puts “serial:”
run do |integer|

puts “loop:”
run do |integer|

The twenty-seven word score club

Like lots of people on facebook, I’ve been playing scrabulous on facebook. I’m not much of a wordsmith, but I have fun playing people. Justin told me that his goal in life was to span two triple-word scores, to get a 9x word score. So not to be outdone, I wondered what words would be able to give you a 27-word score if you spanned across all three triple-word scores. We would need fifteen lettered words.

Since you can only put down at most seven tiles per turn, there needs to be a word in-between the triple letter scores to help you fill it out. These “bridge words” can’t already be on the triple word score already, and they must be between two and six letters long on each side, where the total length of both words has to be greater than eight.

So I wrote a program in a couple hours to find them. I did take into account whether a word was possible to make based on the scrabble tile distribution, as well as taking into account blanks. There’s 286 of them thus far in the TWL scrabble dictionary. I didn’t find ones that used more than one bridge word on a side. The points aren’t completely accurate either.

The first number is the points you’d get, and then the two bridge words. Based simply on the probability of drawing the numbers from a full bag, “irrationalities” is the most likely word. (in reality, this never happens, since you need to draw tile in order to place those words to reach the side.)

459 : irrationalistic : [“ratio”, “alist”]

You can score a whopping 459 points with it. The word that has the biggest word score is “coenzymatically”

972 : coenzymatically : [“enzym”, “tical”]

Yes. “tical” is a word.

ti·cal –noun, plural
1. a former silver coin and monetary unit of Siam, equal to 100 satang: replaced in 1928 by the baht.

There are quite a number of common words, you wouldn’t think, as well as quite a number odd ones. As a note, the point scores aren’t exactly accurate. I didn’t take into account the double letter scores that might occur if you place a letter one it. But given that the multiplier is 27 here, and I picked the longest bridge words (which usually cover the double letter score), it shouldn’t affect it too much. I had held off posting it until I fixed that, but this was sort of a one off amusement and curiosity, rather than anything significant, so I figured I’d just post it. Enjoy!

567 : accountableness : [“count”, “lenes”]
648 : accountantships : [“count”, “ship”]
486 : administrations : [“minis”, “ration”]
648 : ammonifications : [“mon”, “cation”]
594 : amorphousnesses : [“morpho”, “ness”]
621 : anthropological : [“thro”, “logic”]
783 : anthropomorphic : [“thro”, “morph”]
594 : antihistaminics : [“his”, “aminic”]
540 : antitheoretical : [“tithe”, “etic”]
594 : aromatherapists : [“math”, “rapist”]
540 : astronautically : [“trona”, “tical”]
756 : astrophysically : [“strop”, “call”]
675 : astrophysicists : [“strop”, “cist”]
540 : atheroscleroses : [“heros”, “erose”]
540 : atherosclerosis : [“heros”, “eros”]
594 : atherosclerotic : [“heros”, “roti”]
540 : authentications : [“then”, “cation”]
594 : autoradiographs : [“tora”, “graph”]
675 : autoradiography : [“tora”, “graph”]
810 : bathymetrically : [“thyme”, “call”]
594 : beautifications : [“eau”, “cation”]
594 : benightednesses : [“night”, “ness”]
675 : blameworthiness : [“lame”, “thine”]
540 : brotherlinesses : [“other”, “ness”]
486 : businesspersons : [“sines”, “person”]
405 : ceaselessnesses : [“easel”, “ness”]
729 : chancellorships : [“hance”, “ship”]
729 : chemotherapists : [“moth”, “rapist”]
810 : cholangiography : [“lang”, “graph”]
675 : cholecystitises : [“hole”, “titis”]
675 : cholestyramines : [“holes”, “amine”]
540 : cholinesterases : [“line”, “erase”]
675 : cinematographer : [“nema”, “graph”]
729 : cinematographic : [“nema”, “graph”]
486 : clandestineness : [“land”, “nenes”]
594 : classifications : [“lassi”, “cation”]
972 : coenzymatically : [“enzym”, “tical”]
648 : comfortableness : [“fort”, “lenes”]
567 : commensurations : [“men”, “ration”]
594 : commercialistic : [“merc”, “alist”]
702 : communistically : [“muni”, “tical”]
756 : computerphobias : [“put”, “phobia”]
648 : conceivableness : [“once”, “lenes”]
567 : concelebrations : [“once”, “ration”]
540 : conceptualistic : [“once”, “alist”]
540 : conglomerations : [“glom”, “ration”]
486 : conglutinations : [“glut”, “nation”]
540 : congresspersons : [“res”, “person”]
486 : considerateness : [“onside”, “ate”]
540 : containerboards : [“tain”, “board”]
594 : convertibleness : [“vert”, “lenes”]
702 : crashworthiness : [“rash”, “thine”]
594 : crotchetinesses : [“rotche”, “ness”]
513 : customarinesses : [“stoma”, “ness”]
459 : dangerousnesses : [“anger”, “ness”]
837 : decarboxylating : [“carbo”, “lati”]
810 : decarboxylation : [“carbo”, “lati”]
540 : deconcentration : [“once”, “ratio”]
540 : deconsecrations : [“cons”, “ration”]
621 : dedifferentiate : [“diff”, “entia”]
513 : defenestrations : [“fen”, “ration”]
513 : delegitimations : [“elegit”, “mat”]
567 : delightednesses : [“light”, “ness”]
864 : demisemiquavers : [“mise”, “quaver”]
810 : denazifications : [“nazi”, “cation”]
567 : dialectological : [“alec”, “logic”]
486 : diastereoisomer : [“aster”, “some”]
648 : dichloroethanes : [“ich”, “ethane”]
540 : discontinuances : [“con”, “nuance”]
540 : discriminations : [“scrim”, “nation”]
486 : disinclinations : [“sin”, “nation”]
459 : disintegrations : [“sin”, “ration”]
648 : dissatisfactory : [“sati”, “factor”]
567 : divertissements : [“vert”, “semen”]
675 : dyslogistically : [“slog”, “tical”]
567 : eclaircissement : [“lair”, “semen”]
486 : educationalists : [“ducat”, “alist”]
459 : elaboratenesses : [“labor”, “ness”]
459 : emotionlessness : [“motion”, “ess”]
756 : encephalographs : [“epha”, “graph”]
837 : encephalography : [“epha”, “graph”]
675 : enfranchisement : [“franc”, “semen”]
621 : epidemiological : [“idem”, “logic”]
594 : epistemological : [“piste”, “logic”]
540 : epistemologists : [“piste”, “gist”]
378 : essentialnesses : [“senti”, “ness”]
540 : esterifications : [“rif”, “cation”]
594 : eutrophications : [“trop”, “cation”]
702 : excrementitious : [“creme”, “titi”]
621 : exsanguinations : [“sang”, “nation”]
702 : extemporisation : [“tempo”, “sati”]
540 : fantastications : [“antas”, “cation”]
567 : flibbertigibbet : [“libber”, “gib”]
513 : foreordinations : [“ore”, “nation”]
567 : fragmentariness : [“ragmen”, “rin”]
675 : frightfulnesses : [“right”, “ness”]
567 : fundamentalists : [“dame”, “alist”]
567 : gentrifications : [“rif”, “cation”]
513 : gluconeogeneses : [“cone”, “genes”]
513 : gluconeogenesis : [“cone”, “genes”]
675 : grotesquenesses : [“rotes”, “ness”]
648 : historiographer : [“tori”, “graph”]
702 : historiographic : [“tori”, “graph”]
432 : houselessnesses : [“ousel”, “ness”]
702 : humidifications : [“midi”, “cation”]
783 : hypervigilances : [“perv”, “lance”]
756 : hypnotherapists : [“not”, “rapist”]
567 : identifications : [“dent”, “cation”]
459 : illiberalnesses : [“liber”, “ness”]
513 : illimitableness : [“limit”, “lenes”]
486 : illogicalnesses : [“logic”, “ness”]
513 : immaterialities : [“mater”, “alit”]
540 : immediatenesses : [“media”, “ness”]
459 : inalterableness : [“alter”, “lenes”]
459 : inanimatenesses : [“anima”, “ness”]
702 : inconsequential : [“cons”, “entia”]
567 : inconsiderately : [“cons”, “ratel”]
486 : inconsideration : [“cons”, “ratio”]
486 : incoordinations : [“coo”, “nation”]
567 : incrementalisms : [“creme”, “tali”]
513 : incrementalists : [“creme”, “alist”]
459 : incuriousnesses : [“curio”, “ness”]
567 : indefinableness : [“defi”, “lenes”]
486 : indoctrinations : [“doc”, “nation”]
540 : indomitableness : [“omit”, “lenes”]
648 : inflammableness : [“flam”, “lenes”]
513 : instrumentalism : [“strum”, “tali”]
459 : instrumentalist : [“strum”, “tali”]
540 : instrumentality : [“strum”, “tali”]
459 : intolerableness : [“tole”, “lenes”]
486 : inviolatenesses : [“viola”, “ness”]
459 : irrationalistic : [“ratio”, “alist”]
405 : irrationalities : [“ratio”, “alit”]
567 : irreconcilables : [“recon”, “able”]
729 : kinesthetically : [“nest”, “tical”]
648 : lickerishnesses : [“icker”, “ness”]
540 : loathsomenesses : [“oaths”, “ness”]
621 : magistratically : [“agist”, “tical”]
594 : martensitically : [“tens”, “tical”]
540 : masterfulnesses : [“aster”, “ness”]
621 : mercaptopurines : [“cap”, “purine”]
621 : metallographers : [“tall”, “raphe”]
702 : methamphetamine : [“eth”, “etamin”]
891 : methoxyfluranes : [“ethoxy”, “ran”]
729 : methylmercuries : [“ethyl”, “curie”]
891 : methylxanthines : [“ethyl”, “thine”]
648 : microanalytical : [“roan”, “lytic”]
621 : misapplications : [“sap”, “cation”]
513 : momentarinesses : [“omenta”, “ness”]
486 : monounsaturated : [“nouns”, “urate”]
459 : monounsaturates : [“nouns”, “urate”]
513 : monumentalities : [“numen”, “alit”]
567 : multiplications : [“tip”, “cation”]
540 : neurobiological : [“euro”, “logic”]
513 : neuroblastomata : [“euro”, “stoma”]
783 : neuropsychology : [“euro”, “cholo”]
513 : nonbarbiturates : [“barb”, “urate”]
486 : nonbelligerents : [“bell”, “gerent”]
513 : noncelebrations : [“once”, “ration”]
513 : noncooperations : [“coop”, “ration”]
594 : nonenforcements : [“one”, “cement”]
567 : nonimplications : [“nim”, “cation”]
756 : nonphotographic : [“phot”, “graph”]
486 : opinionatedness : [“pinion”, “ted”]
729 : overextractions : [“ere”, “action”]
621 : overmedications : [“med”, “cation”]
486 : oversaturations : [“ers”, “ration”]
567 : overstimulating : [“verst”, “lati”]
540 : overstimulation : [“verst”, “lati”]
864 : oxytetracycline : [“tet”, “cyclin”]
459 : painterlinesses : [“inter”, “ness”]
621 : paragenetically : [“rage”, “tical”]
675 : parenthetically : [“rent”, “tical”]
648 : parthenocarpies : [“then”, “carpi”]
567 : parthenogeneses : [“then”, “genes”]
567 : parthenogenesis : [“then”, “genes”]
621 : parthenogenetic : [“then”, “genet”]
513 : pectinesterases : [“tine”, “erase”]
567 : permissibleness : [“miss”, “lenes”]
702 : pharmaceuticals : [“harm”, “tical”]
729 : pharmacological : [“harm”, “logic”]
648 : phenomenalistic : [“nome”, “alist”]
675 : photoengravings : [“hot”, “raving”]
675 : photoperiodisms : [“tope”, “iodism”]
783 : phototelegraphy : [“tote”, “graph”]
729 : pithecanthropus : [“theca”, “thro”]
648 : planimetrically : [“anime”, “call”]
621 : platinocyanides : [“latino”, “nide”]
513 : pleasurableness : [“leas”, “lenes”]
486 : predestinarians : [“redes”, “aria”]
486 : predestinations : [“redes”, “nation”]
621 : preestablishing : [“reest”, “shin”]
648 : prefabrications : [“ref”, “cation”]
594 : preformationist : [“reform”, “ion”]
540 : preponderations : [“repo”, “ration”]
621 : prepublications : [“rep”, “cation”]
513 : presentableness : [“resent”, “lenes”]
513 : preterminations : [“rete”, “nation”]
594 : prettifications : [“ret”, “cation”]
702 : problematically : [“roble”, “tical”]
486 : proletarianised : [“role”, “anise”]
459 : proletarianises : [“role”, “anise”]
675 : proteolytically : [“rote”, “tical”]
783 : quantifications : [“anti”, “cation”]
729 : rechoreographed : [“chore”, “graph”]
621 : recodifications : [“cod”, “cation”]
513 : reconcentration : [“once”, “ratio”]
513 : reconsecrations : [“cons”, “ration”]
486 : reconsideration : [“cons”, “ratio”]
459 : redintegrations : [“dint”, “ration”]
567 : reductivenesses : [“educt”, “ness”]
567 : refrangibleness : [“rang”, “lenes”]
513 : regretfulnesses : [“egret”, “ness”]
513 : reinvigorations : [“vig”, “ration”]
432 : reregistrations : [“egis”, “ration”]
567 : respectableness : [“spec”, “lenes”]
513 : responsibleness : [“pons”, “lenes”]
459 : retroperitoneal : [“trope”, “tone”]
540 : retroreflectors : [“ore”, “lector”]
594 : rigidifications : [“gid”, “cation”]
513 : sacramentalists : [“cram”, “alist”]
594 : saponifications : [“apo”, “cation”]
810 : saprophytically : [“prop”, “tical”]
513 : scintillometers : [“inti”, “meter”]
567 : seductivenesses : [“educt”, “ness”]
540 : selectivenesses : [“elect”, “ness”]
459 : semiterrestrial : [“miter”, “stria”]
513 : semitransparent : [“emit”, “spare”]
405 : sensationalists : [“sati”, “alist”]
459 : sentimentalists : [“time”, “alist”]
594 : serviceableness : [“vice”, “lenes”]
648 : simplifications : [“imp”, “cation”]
594 : slaughterhouses : [“laugh”, “house”]
459 : snippersnappers : [“nipper”, “napper”]
567 : solidifications : [“lid”, “cation”]
594 : solipsistically : [“lips”, “tical”]
594 : sophistications : [“phis”, “cation”]
540 : spermatogeneses : [“perm”, “genes”]
540 : spermatogenesis : [“perm”, “genes”]
648 : spinthariscopes : [“pint”, “scope”]
567 : sprightlinesses : [“right”, “ness”]
540 : stadtholderates : [“tad”, “derate”]
864 : straightjackets : [“rai”, “jacket”]
540 : stratifications : [“rat”, “cation”]
540 : stratovolcanoes : [“rato”, “canoe”]
567 : strikebreakings : [“trike”, “akin”]
648 : superphenomenon : [“perp”, “nomen”]
810 : sympathetically : [“path”, “tical”]
351 : tastelessnesses : [“stele”, “ness”]
621 : teletypewriters : [“let”, “writer”]
567 : thanklessnesses : [“ankle”, “ness”]
675 : therapeutically : [“rape”, “tical”]
675 : thunderstricken : [“under”, “trick”]
432 : toastmistresses : [“oast”, “tress”]
432 : traditionalists : [“adit”, “alist”]
459 : transaminations : [“ansa”, “nation”]
486 : transmigrations : [“ran”, “ration”]
621 : trihalomethanes : [“halo”, “ethane”]
540 : troubleshooters : [“rouble”, “hooter”]
567 : troubleshooting : [“rouble”, “hoot”]
513 : troublesomeness : [“rouble”, “omen”]
567 : trustworthiness : [“rust”, “thine”]
540 : unadulteratedly : [“adult”, “rated”]
459 : unalterableness : [“alter”, “lenes”]
621 : unanticipatedly : [“antic”, “pated”]
513 : unboundednesses : [“bound”, “ness”]
729 : unchoreographed : [“chore”, “graph”]
459 : uncleanlinesses : [“clean”, “ness”]
621 : unclimbableness : [“climb”, “lenes”]
702 : uncomplimentary : [“comp”, “menta”]
513 : underestimating : [“dere”, “matin”]
486 : unearthlinesses : [“earth”, “ness”]
702 : unextraordinary : [“extra”, “dinar”]
621 : unfavorableness : [“favor”, “lenes”]
486 : unguardednesses : [“guard”, “ness”]
540 : unrealistically : [“real”, “tical”]
513 : unsightlinesses : [“sight”, “ness”]
513 : unworldlinesses : [“world”, “ness”]
837 : wappenschawings : [“pens”, “hawing”]
540 : warrantableness : [“arrant”, “lenes”]
486 : westernisations : [“ester”, “sati”]
810 : whatchamacallit : [“hatch”, “call”]
621 : whippersnappers : [“hipper”, “napper”]
621 : wholesomenesses : [“holes”, “ness”]
540 : worrisomenesses : [“orris”, “ness”]
864 : xeroradiography : [“orad”, “graph”]

Nine letter word riddle

It’s not too common that I get forwards nowadays. With the advent of social news, all the stupid links have migrated there. But on occasion, I’ll get one from the older crowd. This one was a riddle with a movie of the answer attached.

What common English word is 9 letters long, and each time you remove a letter from it, it still remains an English word… from 9 letters all the way down to a single remaining letter?

It was only one answer, however, which it gave as “startling”. I ended up wondering if there were more than one, so I wanted to see how fast I could write something to give me the answer. It’d be good practice, since most web programming is design and architectural hard, rather than algorithms hard. Besides, it’s been a while since I wrote a recursive function.

Embarrasingly, it took 2.5-3 hours. I thought I’d be able to knock it out in one. I had some problems first removing a single letter from a word. Ever since I came to Ruby, I hardly ever deal with indicies, so finding those methods took a bit of time. Then, the recursion is always a bit of a mind bender when you don’t do it often.

I also spent some time looking up what were considered one letter words, but then found out that there’s a whole dictionary of one letter words. So I only considered “i” and “a” as valid one letter words. I also threw out all contractions.

See if you can write shorter/faster/better code. It’s certainly a better workout than fizzbuzz. Seeing how it took me a bit, and I didn’t clean up the code, I set the bar pretty low to beat. There were other things that would optimize it. I didn’t throw out shorter words to check in the dictionary if I already checked them in a longer word–ie. I just ran down the list of dictionary words. Try it out yourself, in whatever language. (This sounds like a good beginning problem to write in Arc.) Go ahead. I’ll wait.

Got it?

Here’s the list I came up with along with their chains. You’ll notice that it’s actually a tree that branches.

[[“cleanses”, [“cleanse”, [“cleans”, [“clean”, [“clan”, [“can”, [“an”, [“a”]]]], [“lean”, [“lea”, [“la”, [“a”]]]]], [“clans”, [“clan”, [“can”, [“an”, [“a”]]]], [“cans”, [“can”, [“an”, [“a”]]]]], [“leans”, [“lean”, [“lea”, [“la”, [“a”]]]], [“leas”, [“lea”, [“la”, [“a”]]]]]]]], [“cleanser”, [“cleanse”, [“cleans”, [“clean”, [“clan”, [“can”, [“an”, [“a”]]]], [“lean”, [“lea”, [“la”, [“a”]]]]], [“clans”, [“clan”, [“can”, [“an”, [“a”]]]], [“cans”, [“can”, [“an”, [“a”]]]]], [“leans”, [“lean”, [“lea”, [“la”, [“a”]]]], [“leas”, [“lea”, [“la”, [“a”]]]]]]]]]

[[“discuses”, [“discuss”, [“discus”, [“discs”, [“disc”, [“dis”, [“is”, [“i”]]]], [“diss”, [“dis”, [“is”, [“i”]]]]]]]]]

[[“drowning”, [“downing”, [“owning”, [“owing”, [“wing”, [“win”, [“in”, [“i”]]]]]]]]]

[[“painting”, [“paining”, [“pining”, [“piing”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]]]]]

[[“piercing”, [“piecing”, [“pieing”, [“piing”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]]]]]

[[“pickling”, [“picking”, [“piking”, [“piing”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]]]], [“pricking”, [“picking”, [“piking”, [“piing”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]]]]]

[[“restated”, [“restate”, [“estate”, [“state”, [“sate”, [“sat”, [“at”, [“a”]]], [“ate”, [“at”, [“a”]]]]]]]]]

[[“crapping”, [“rapping”, [“raping”, [“aping”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]]]]]

[[“sparking”, [“sparing”, [“spring”, [“sprig”, [“prig”, [“pig”, [“pi”, [“i”]]]]]]]]]

[[“platters”, [“platter”, [“latter”, [“later”, [“late”, [“ate”, [“at”, [“a”]]]]]]]], [“splatter”, [“platter”, [“latter”, [“later”, [“late”, [“ate”, [“at”, [“a”]]]]]]]]]

[[“slitting”, [“sitting”, [“siting”, [“sting”, [“sing”, [“sin”, [“in”, [“i”]]]], [“ting”, [“tin”, [“ti”, [“i”]], [“in”, [“i”]]]]]]]], [“spitting”, [“spiting”, [“siting”, [“sting”, [“sing”, [“sin”, [“in”, [“i”]]]], [“ting”, [“tin”, [“ti”, [“i”]], [“in”, [“i”]]]]]]], [“sitting”, [“siting”, [“sting”, [“sing”, [“sin”, [“in”, [“i”]]]], [“ting”, [“tin”, [“ti”, [“i”]], [“in”, [“i”]]]]]]]]]

[[“stampede”, [“stamped”, [“tamped”, [“tamed”, [“tame”, [“tam”, [“am”, [“a”]]]]]]]]]

[[“stampede”, [“stamped”, [“tamped”, [“tamed”, [“tame”, [“tam”, [“am”, [“a”]]]]]]]]]

[[“starling”, [“staring”, [“string”, [“sting”, [“sing”, [“sin”, [“in”, [“i”]]]], [“ting”, [“tin”, [“ti”, [“i”]], [“in”, [“i”]]]]]]]], [“starting”, [“staring”, [“string”, [“sting”, [“sing”, [“sin”, [“in”, [“i”]]]], [“ting”, [“tin”, [“ti”, [“i”]], [“in”, [“i”]]]]]]], [“stating”, [“sating”, [“sting”, [“sing”, [“sin”, [“in”, [“i”]]]], [“ting”, [“tin”, [“ti”, [“i”]], [“in”, [“i”]]]]]]]]]

[[“starving”, [“staring”, [“string”, [“sting”, [“sing”, [“sin”, [“in”, [“i”]]]], [“ting”, [“tin”, [“ti”, [“i”]], [“in”, [“i”]]]]]]]]]

[[“trapping”, [“tapping”, [“taping”, [“aping”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]]], [“rapping”, [“raping”, [“aping”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]]]]]

[[“stingers”, [“stinger”, [“singer”, [“singe”, [“sing”, [“sin”, [“in”, [“i”]]]], [“sine”, [“sin”, [“in”, [“i”]]]]]]], [“singers”, [“singer”, [“singe”, [“sing”, [“sin”, [“in”, [“i”]]]], [“sine”, [“sin”, [“in”, [“i”]]]]]], [“singes”, [“singe”, [“sing”, [“sin”, [“in”, [“i”]]]], [“sine”, [“sin”, [“in”, [“i”]]]]], [“sings”, [“sing”, [“sin”, [“in”, [“i”]]]], [“sins”, [“sin”, [“in”, [“i”]]], [“sis”, [“is”, [“i”]]], [“ins”, [“in”, [“i”]], [“is”, [“i”]]]]]]]], [“stringer”, [“stinger”, [“singer”, [“singe”, [“sing”, [“sin”, [“in”, [“i”]]]], [“sine”, [“sin”, [“in”, [“i”]]]]]]]]]

[[“stingier”, [“stinger”, [“singer”, [“singe”, [“sing”, [“sin”, [“in”, [“i”]]]], [“sine”, [“sin”, [“in”, [“i”]]]]]]]], [“stringer”, [“stinger”, [“singer”, [“singe”, [“sing”, [“sin”, [“in”, [“i”]]]], [“sine”, [“sin”, [“in”, [“i”]]]]]]]]]

[[“tramping”, [“tamping”, [“taping”, [“aping”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]], [“amping”, [“aping”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]]]]]

[[“trapping”, [“tapping”, [“taping”, [“aping”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]]], [“rapping”, [“raping”, [“aping”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]]]]]

[[“whittler”, [“whitter”, [“whiter”, [“white”, [“whit”, [“wit”, [“it”, [“i”]]], [“hit”, [“hi”, [“i”]], [“it”, [“i”]]]], [“wite”, [“wit”, [“it”, [“i”]]]]]]]]]

[[“wrapping”, [“rapping”, [“raping”, [“aping”, [“ping”, [“pin”, [“pi”, [“i”]], [“in”, [“i”]]], [“pig”, [“pi”, [“i”]]]]]]]]]

And here’s my code:


class Array
def one_less
total = [self[1..-1]]
self.each_with_index do |e, i|
total << self.values_at(0..i, (i+2)..-1)
return total[0..-2]

def sub_words(word)
result = word.split("") { |word_array| word_array.join }.uniq
result == [""] ? [] : result

def find_sub_word_chain(word)
return nil unless @@dict.include?(word)
return [word] if word.length == 1 && @@dict.include?(word)
valid_sub_words = sub_words(word).reject { |w| !@@dict.include?(w) }
word_chain = do |sub_word|
chain = find_sub_word_chain(sub_word)
if chain.nil?
if sub_word.length == 1
(chain << sub_word).reverse
word_chain.empty? ? nil : word_chain

@@dict = {}
ARGV[0] ||= "/etc/dictionaries-common/words"
words = File.readlines(ARGV[0]).map { |w| w.chomp }
words.each do |word|
if word.length == 1
next unless ["a", "i"].include?(word)
@@dict[word] = word

words.reject { |e| e.length != 9 }.reject { |word| word =~ /'/ }.map do |word|
chain = find_sub_word_chain(word)
next if chain.nil?
puts word
puts chain.inspect

Testing MIME response types

I feel like I might have covered this before, but I was looking for a way to test respond_to. I had found this post on how to test it, but after looking at it for a while, I found myself rewriting it. Mainly, I took out parts that convert the Mime types, and inserted Rail’s own Mime type objects. You can use it like this:

request_mime(:fbml) do
get :list

request_mime("text/xml") do
get :list

Just include it in your test_helper.rb file in test/

class Test::Unit::TestCase
include Threecglabs::MimeTestHelpers

Here’s “mime_test_helpers.rb”. Just throw it in lib/

module Threecglabs
module MimeTestHelpers

def self.included(mod)
mod.class_eval do
include MimeRequest
include MimeAssertions

module MimeRequest
# changes the mime type of the request within the block
# request_mime(:fbml) do
# get :list
# assert_response_mime(:fbml)
# end
def request_mime(mime_type_name_or_extension)
if mime_type_name_or_extension.kind_of?(String)
mime_type = Mime::Type.lookup(mime_type_name_or_extension)
elsif mime_type_name_or_extension.kind_of?(Symbol)
mime_type = Mime::Type.lookup_by_extension(mime_type_name_or_extension.to_s)
raise"mime type must be string or symbol")
old_mime_type = @request.accepts
@request.accept = mime_type.to_s
@request.accept = old_mime_type

# These are assertions to test respond_to, whether they return a specific mime type
# as a response to a request
module MimeAssertions

# Helps out with response testing, by letting to assert that the most recently-made
# request responded with a particular MIME extension, like :html, :fbml, :xml, etc.
def assert_response_mime(expected_mime_type_ext)
expected_mime_type = Mime::Type.lookup_by_extension(expected_mime_type_ext.to_s)

# Mime::Type.parse doesn't parse accept parameters correctly, therefore
# we account for having multiple types in the accept
response_mime_types = @response.headers['type'].split(/,\s*/).map do |accept_type|
mime_type_name = accept_type.split(/;\s*/).first
assert_block("Responded with #{} when expecting #{expected_mime_type}") {
response_mime_types.any? { |response_mime_type| expected_mime_type == response_mime_type }



MIME responder filter for Rails

I didn’t think I had to do this, but I ended up writing a filter that acts like a switch statement for different MIME types. Let me explain. Normally, in Rails, you can respond to different requests for different content with something like this:

class PostController < ActionController::Base
def list
@posts = Post.find(:all)
respond_to do |format|
format.xml { render :xml => @people.to_xml }

When you have something like this, the browser (or whatever client) can ask for different MIME types. Here, we can return html to a browser, xml to maybe a data importer, and fbml to facebook.

I spent last week integrating Mobtropolis with facebook. Mobtropolis doesn’t require a facebook account to use it, so like other websites, it has its own authentication mechanism, something like:

class PostController < ActionController::Base
before_filter :website_authenticate_filter, :except => [:index, :list]

When I started using facebooker library, it already came with an authentication before_filter. That means we have two authentication filters, one native, and one for facebook. Mobtropolis users don’t have to be in facebook to use it, and facebookers don’t have to sign up again in mobtropolis to use it.

However, since before_filters are executed in succession, it leads to a case where the facebook authentication would be called if html was requested, and vice versa. The alternative was to take apart both authentication filters, and create a monolithic filter to handle the two different cases. Instead, I did this:

class PostController < ActionController::Base
before_respond_to_filter :except => [ :index, :list ] do |format|
format.html :website_authentication_filter
format.fbml :facebook_authentication_filter

That way, I didn’t have to mix together the guts of each authentication filter, and it solved the problem of the wrong authentication filter being run. You can also use it like:

class PostController < ActionController::Base
before_responds_to_filter :only => :home do |format|
format.html do |controller|
return if controller.logged_in?
controller.send(:redirect_to, :controller => :home)
format.fbml :ensure_application_is_installed_by_facebook_user

By the way, I tried to alter the filter_chain as a request came in. Filter chains are copied and passed around the filters, so you can’t write a filter that alters the filter chain. So don’t waste your time crawling around in the guts of Rails to do this like I did. It’s just as well, as that’d be a nightmare to maintain.

It does have some weaknesses though. You can only assign the filters to the same set of :except and :only options in the filters.

It ended up the code for this sort of magic was fairly easy. I’m not sure if there’s an easier way to do what I wanted, but I’ll see if Rails core people would find it useful (or not). In the meanwhile, for those of you Rubyists that have written plugins before that want to play with it. As with the usual mumbo jumbo, it’s provided as is, I’m not maintaining it, and do whatever you want with it:

module Threecglabs
module Filters

# MimeResponderFilter
module MimeResponderFilter

def self.included(mod)

# Filters can respond to different mime types, so that you can use
# different filters depending on which mime type is being requested
# before_responds_to_filter :except => [:login, :signup, :forgot, :invite_request, :profile] do |format|
# format.html :authentication_filter
# format.fbml :ensure_application_is_installed_by_facebook_user
# end
# This way, one can take the appropriate actions in setting up authentication
# from different mime types, and still separate the implemenation of the different
# kinds of implementations
# The formats also take blocks, like regular filters
# before_responds_to_filter :only => :home do |format|
# format.html do |controller|
# return if controller.logged_in?
# controller.send(:redirect_to, :controller => :home)
# end
# format.fbml :ensure_application_is_installed_by_facebook_user
# end
# NOTE: an :all format defaults to :html, therefore, a format.html is required
module ClassMethods
def before_respond_to_filter(options = {}, &block)
before_filter, options

# This is a call that implements a MIME responder filter
class MimeResponderFilter#:nodoc:
attr_reader :filters

def initialize(&block)
@filters = {}

def filter(controller)
filter = @filters[controller.request.format.to_sym] || @filters[:html]
if filter.kind_of?(Proc)

# implements the "format.#{mime_type}" part of the filter
def method_missing(mime_type, method_name = nil, &block)
if block_given?
@filters[mime_type.to_sym] = block
@filters[mime_type.to_sym] = method_name.to_sym



DRYing up your views with a TableBuilder

In the last two months, when I was adding more features to mobtropolis, I found it painful to try and lay things out all over again from scratch. As a result, it sucked to see ugly layouts on the new pages juxtaposed with all the styling I had done before. It wasn’t until a week ago that I said to myself, “Stop the madness!” and started refactoring my views–something I never thought of doing much of until now. When you don’t, the barrage of angle brackets blows out of proportion and it starts to look pretty damn fugly with complex views.

What I should be able to do is take common mini-layouts in my views and make them available as helpers so that I can think in terms of bigger chunks to piece together pages, rather than in divs and spans. In addition, it makes your interface more consistent for your users.

Some good resources were presentations from the 2007 RailsConf, like V is for Vexing and Keeping your views DRY. While a lot of view DRYing talks about form builders, I didn’t see any on table builders, so I decided to take a stab at it. Personally, I don’t like to overuse tables for layouts. But as certain elements in my page layouts have been repeated, I refactored them into first helpers, and then when I did more than one, I extracted it out into a simple table builder. This is how you’d use it:

For example, I have a mini-layout where I show simple stats:

Here’s how I used a simple table builder to display the above:

And I find that I started using the same sort of thing in other places, like in a user’s profile:

I cut out some details so you can see that it’s just a block that gets passed a ScoreCard object, from which you call placard to add another score to the score_card. It sure beats writing

over and over again.

To declare the helper, we create a helper with the structure of the table inside the declaration of a ScoreCard object. We have a ScoreCard object to hold the contents of the placards. When they’re called in the block above in the template, they get stored in the ScoreCard object, and not written out to erb immediately. That way, we can place them wherever in the table we please, by making the call to card.display(:placards):

module ScorecardHelper
def score_card(html_options = {}, &block)
options = { :class => :scorecard, :width => "100%" }.merge(html_options) do |xm, card|
xm.table(options) do => :top) do
xm << card.display(:placards)

So then what’s ScoreCard look like? Pretty simple. It has a call to each cell that can be filled in the mini-layout. It’s kinda analogous to how form_for passes in a form object, on which you can call text_field, etc.

require 'lib/table_builder'

# Used to create a scorecard helper
class ScoreCard < TableBuilder
cells :placards

# a placard is placed into scorecards
def placard(text, text_id, score, widget)
xm =
@placards[:html] += do
xm.span(:style => "font-size: 1.2em") { xm << "#{text}" }
xm.em(:id => text_id, :class => "primary_number") { xm << "#{score}" }
xm << "#{widget}"


Notice that there’s a call to cells() to initialize the type of cell, and a method of the same name that builds the html for that cell. If you have other types of cells, you simply put it in the list of cells, and then create a method for it that is called in the template. By convention, you’d stick the html of the cell contents in “@#{name_of_cell}”[:html], and if you wanted, pass in the html_options, and stick it in “@#{name_of_cell}”[:options]. Then, you can access those in the helper wherever you want.

Let’s try another one. I have a mini_layout with a picture, and some caption underneath it, like a polaroid.

The associated helper and PolaroidCard object are pretty simple:

module PolaroidCardHelper
# a polaroid card is used to show a picture with a caption below it.
def polaroid_card(html_options = {}, &block)
options = { :class => :polaroidcard, :style => "display: inline;" } do |xm, card|
xm.table(options) do { {
xm << card.display(:picture)
}} { => "caption")) {
xm << card.display(:caption)

require 'lib/table_builder'

class PolaroidCard < TableBuilder
cells :picture, :caption

def picture(html_options = {}, &block)
@picture[:html] = capture(&block)
@picture[:options] = html_options

def caption(html_options = {}, &block)
@caption[:html] = capture(&block)
@caption[:options] = html_options

I’ve tried to pull all the plumbing out into TableBuilder (dropped it into lib/), and only leave the flexibility of creating the table structure in the helper, and the format of the cell in the object. It ends up TableBuilder isn’t too complex either. It uses some metaprogramming to create some instance variables. I know it pollutes the object instance variable namespace, but I wanted to be able to say @caption[:html], rather than @cells[:caption][:html].

class TableBuilder < ActionView::Base
class << self
# used in the child class declaration to name and initialize cells
def cells(*names)
define_method(:initialize_cells) do
@cell_names = { |n| "@#{n}".to_sym }
@cell_names.each do |name|
if instance_variable_defined?(name)
raise"name clash with ActionView::Base instance variables")
instance_variable_set(name, { :html => "", :options => {} })

def initialize(decor_block, &table_block)
html =, self)
concat(html, decor_block.binding)

def display(cell_name)

def html_options(cell_name)


I’ve found have these helpers cleans up my views significantly, though I have to admit, it’s not exactly easiest to use yet. In addition, I’m not exactly thrilled about having TableBuilder inherit from ActionView::Base, but it was the only way I could figure out to get the call to concat() to work. In any case, the point is to show you that refactoring your views into helpers is a good idea, and even something like a table builder goes a long way, even if you don’t do it the way I did it. Lemme know if this helps or hinders you. snippet!

A simple distributed crawler in Ruby

A week ago, I took a break from Mobtropolis, and…of all things ended up writing a simple distributed crawler in Ruby. I hesitated posting it at first, since crawlers are conceptually pretty simple. But eh, what the heck.

This was just an exercise to do some DRb and Hpricot, so don’t use this for your production work, whatever it may be. An actual crawler is far more robust than what I wrote. And don’t keep it running hammering at stuff, since it’ll get you banned.

First, this is how you use it:

WebCrawler.start("") do |doc|
puts "#{"title").inner_html}"

And that’s it. It returns documents in an XPath traversable form, courtesy of Hpricot.

A web crawler is a program that simply downloads pages, takes notes of what links there are on that page, and puts those links on its queue of links to crawl. Then it takes the next link off its queue and downloads that page and does the same thing. Rise and Repeat.

First, we create a class method named start that creates an instance of a webcrawler and then starts it. Of course, we could have done without this helper method, but it makes it easier to call.

module Crawler
class WebCrawler
class << self
def start(url)
crawler =
crawler.start(url) do |doc|
yield doc
return crawler

So next, we define the initialization method.

module Crawler
class WebCrawler
def initialize
puts "Starting WebCrawler..."
DRb.start_service "druby://localhost:9999"
puts "Initializing first crawler"
puts "Starting RingServer..."

puts "Starting URL work queue"
@work_provider =,, "Queue of URLs to crawl")

puts "Starting URL visited tuple"
@visited_provider =,, "Tuplespace of URLs visited")
rescue Errno::EADDRINUSE => e
puts "Initialize other crawlers"
puts "Looking for RingServer..."
@ring_server = Rinda::RingFinger.primary

@urls_to_crawl =[:name, :urls_to_crawl, nil, nil])[2]
@urls_status =[:name, :urls_status, nil, nil])[2]
@delay = 1

This bears a little explaining. The first webcrawler you start will create a DRb server if it doesn’t already exist and do the setup. Then, every subsequent webcrawler it’ll connect to the server and start picking URLs off the work queue.

So when you start a DRb server, you call start_server with a URI, then you start a RingServer. What a RingServer provides is a way from subsequent clients to find services provided by the server or other clients.

Next, we register a URL work queue and a URLs visited hash as services. The URL work queue is a TupleSpace. If you haven’t heard of TupleSpace, the easiest way to think of it is as like a bulletin board. Clients post items on there, and other clients can take them out. This is what we’ll use as a work queue of URLs to crawl.

The URLs visited is a Hash so we can check which URLs we’ve already visited. Ideally, we’d use the URL work queue, but DRb seems to only provide blocking calls for reading/taking from the TupleSpace. That doesn’t make sense, but I couldn’t find a call that day. Lemme know if I’m wrong.

module Crawler
class WebCrawler
def start(start_url)
@urls_to_crawl.write([:url, URI(start_url)])
crawl do |doc|
yield doc


def crawl
loop do
url = @urls_to_crawl.take([:url, nil])[1]
@urls_status[url.to_s] = true

doc = download_resource(url) do |file|
end or next
yield doc

time_begin =
add_new_urls(extract_urls(doc, url))
puts "Elapsed: #{ - time_begin}"

Here is the guts of the crawler. It loops forever taking a url off the work queue using take(). It looks for a pattern in the TupleSpace, and finds the first one that matches. Then, we mark it as ‘visited’ in @urls_status. Then, we download the resource at the url and use Hpricot to parse it into a document then yield it. If we can’t download it for whatever reason, then we grab the next URL. Lastly, extract all the urls in the document and add it to the work queue TupleSpace. Then we do it again.

The private methods download_resource(), extract_urls(), and add_new_urls() are merely details, and I won’t go over it. But if you want to check it out, you can download the entire file. There are weaknesses to it that I haven’t solved, of course. If the first client goes down, everyone goes down. Also, there’s no central place to put the processing done by the clients. But like lazy textbook writers, I’ll say I’ll leave that as an exercise for the readers. snippet!


State change observer for ActiveRecord

When I started writing some code recently, I noticed that my controllers were getting fat. There was much to do, but there was a bunch of stuff in there that didn’t have anything to do with actually carrying out the action–things like sending notifications. ActiveRecord already has observers to take action on certain callbacks. However, what I needed was to take actions on certain state transitions. Not seeing any immediate solutions in the Rails API, I decided to test myself and try writing one. I was bored too. So while I’m not sure if it was worth the time writing it, it certainly was kinda interesting. Here’s what I came up with:

Just as a contrived example, let’s say we are modeling the transmission of a car. It has three modes: “park”, “reverse”, “drive”. We want to send a notification when a user tries to change it from “reverse” to “drive”, but not when he tries to change it from “park” to “drive”. If it didn’t matter, and we just wanted to send notifications when the state changed to drive, we’d just use the observers that came with ActiveRecord. But since we do care where the state transition came from, here’s what I came up with:

class CreateCarTransmission < ActiveRecord::Migration
def self.up
create_table :car_transmission do |t|
t.column :engine_id, :integer, :null => false
t.column :mode, :string, :null => false, :default => "park"

def self.down
drop_table :car_transmission

class CarTransmission < ActiveRecord::Base
include StateTransition::Observable
state_observable CarTransmissionNotifier, :state_name => :mode

So then for my notifier I have:

class CarTransmissionNotifier < StateTransition::Observer
def mode_from_drive_to_reverse(transmission)
# send out mail and flash lights about how this is bad.

And that’s it. Whenever in the controller, I change the state from “reverse” to “drive”, lights will flash and emails will be sent out condemning the action, and my controllers stay small and lean.

class CarController < ApplicationController
def dismantle
@car = Car.find(params[:id])
@car.update_attribute :mode, "reverse"
@car.update_attribute :mode, "drive"

So where’s the magic? It took a bit of digging around. There were two major things I had to do. I had to insert observers during initialization and I had to override setting of attributes to include an update to notify observers.

ActiveRecord doesn’t exactly allow you to override the constructor. I don’t think I tried too hard to mess around with it. Looking on the web, I happened upon has_many :through again, where he has some good tips that helped me through Rail’s rough edges. While I didn’t exactly follow his advice, I did find out about the call back, :after_initialize. It must be something new, because I don’t see it in the 2nd edition of the Rails book, and the current official API doesn’t list it. Other Rails API manuals seem to be more comprehensive, like RailsBrain and Rails Manual.

Then overridding attributes has always been a bit of a mystery. I found a listing of the attribute update semantics, which was helpful to figure out what I was looking for, but it was false, in that you can’t use the first one (article.attributes[:attr_name]=value) to set an attribute. Looking in the Rails code for 1.2.3, it shows that attributes is a read_only hash. But it’s right that you should override the second one (article.attr_name=value), since update_attribute() and update_attributes() depends on it.

Again, it ends up that the function I was looking for wasn’t found in the official API as a method, other than a short mention in the description of ActiveRecord under Overriding Attributes, which makes it harder to find. Ends up that we can use write_attribute().

So that’s pretty much it. Using some standard meta-programming like how plugins do it, you wrap it up, and it’s pretty simple:

require 'observer'

module StateTransition
module Observable
class StateNameNotFoundError < RuntimeError
def message
"option :state_name needs to be set to the name of an attribute"

def self.included(mod)

module ClassMethods

def state_observable(observer_class, options)
raise if options[:state_name].nil?
state_name = options[:state_name].to_s

include Object::Observable

define_method(:after_initialize) do

define_method("#{state_name}=") do |new_state|
old_state = read_attribute(state_name)
if old_state != new_state
write_attribute(state_name, new_state) # TODO yield the update method
notify_observers(self, state_name, old_state, new_state)



class Observer
def update(observable, state_name, old_state, new_state)
send("#{state_name}_from_#{old_state}_to_#{new_state}", observable)
rescue NoMethodError => e
# ignore any methods not found here


I had a difficult time figuring out how to define methods for an instance of a class. The only thing I came up with was to use define_method, or to include a module with instance methods in them. instance_eval() didn’t work. The meta programming for ruby gets rather confusing when you’re doing it inside a method–it seems hard to keep track of which context you’re in.

So if you can make a use of this, great. If you think it’s worth moving it into a plugin, let me know that too. If you know of a better way, by all means, let me know. tip!

Using ensure in yielding methods

I never really ever find too many occasion to use ‘ensure’. It’s a Ruby keyword that you can use for blocks of code that ensures, no matter what happens, exceptions or not, the code will be run when the block is finished. And then a quickie that I found in the rails core:

  def silence_warnings
old_verbose, $VERBOSE = $VERBOSE, nil
$VERBOSE = old_verbose

It does something simple, just silences the warnings for a particular block of code. On first glance, I would have just written it without the ‘ensure’. However, that won’t work for yielded blocks that call return or if exceptions are thrown in it, I think. This way, no matter what happens in the block, it will always restore the state that it changed.