Benchmarking row multiply in Ruby’s Linalg

While using LinAlg to write and understand/review information gain, I found that I wanted the equivalent of a row multiply, notated as [*] (it’s non-standard. I made up the notation), where:

A [*] x = B

[[a00, ..., a0M] [[x0] [[a00 * x0, ..., a0M * x0]
[... ...] [*] [x1] = [... * x1, .... ...]
[aN0, ..., aNM]] [x2]] [aN0 * x2, ..., aNM * x2]]

I want to map each row with each corresponding value of x. This operation is not standard matrix multiplication, though I feel like it should be!

I admit I slept in Linear Algebra class. Thus I was a bit dumbfounded about how to express it using normal matrix multiplication. While I could do it using LinAlg’s mapping functions, I knew that it could be done because it was all linear transformations.

Luckily James Lawrence, who maintains LinAlg was emailing me, and asked me about it (thanks!). He gave me some code that did it both ways. After I read it, I slapped myself.


# using straight up matrix multiplication
def calculate1(con)
DMatrix.diagonal(con*DMatrix.new(con.hsize,1,1)).inverse*con
end

# using map
def calculate2(con)
DMatrix.diagonal((con*DMatrix.new(con.hsize,1,1)).map { |e| 1.0/e })*con
end

I wasn’t sure which one was faster. Maybe they’d be about the same. Maybe doing matrix multiply would be slower because you’d have to go through all the multiplications. Per James’s suggestion, I benchmarked them, in the fine form of yak shaving.

Run on a laptop 1.1GHz Pentium M, 758MBs, I did two experiments. For one experiment, I kept the matrix at 800 rows, then I grew the columns. Let’s see what it looks like:

time (secs) vs column size

I should have labeled the axis in GnuPlot, but you’ll live. The y axis is the number of seconds, and the x axis is the column size. Uninteresting, but nice to know it’s linear. The two functions aren’t significantly different from each other in the amount of time that it takes. I expected the map (calculate2) to be much faster since it didn’t have to do all the multiplies. oh well.

I almost didn’t run the second test, but it proved to be a bit more interesting. This time I kept 800 columns and grew the number of rows. Same axis. Different graph:

time (secs) vs row size

Whoa! It’s exponential. Or quadratic. I can’t tell. Anyway, anything curving up is bad news. I suspect this might have something to do with row major/column major ordering. C stores matrices row by row, whereas Fortran stores it column by column.. Update: As corrected by James L., in the first experiment, growing columns will create more multiplies inside the diag() call, but the size of the diagonal will stay the same. Growing the rows, however, will create less multiplies inside the diag() call, but each increase in row size will increase both dimensions of the resulting diagonal matrix, giving us n^2. So it’s quadratic.

So what I went looking for wasn’t what I meant to find. But given that I’ve read about it before, it wasn’t super interestingit would have made sense had I thought about it, I guess it’s not super surprising. Let’s see if we can use transpose to cut down on the time. For one part, we’ll grow the rows as before, and compare it to growing rows, but transposing the input then transposing the output, to get the same deal. What’s it look like:

time(secs) vs row size

This is good news. Even if transpose is an extra couple of manipulations, it saves us computation for bigger matrix sizes. The most interesting part of the graph is the crossing of the two graphs. If somehow, LinAlg (or any other package for that matter) can detect where that inflection point is going to be, it can switch off between the two. The only thing I can think of is another package lying underneath doing sampling of each implementation randomly whenever a user calls the function to do interpolation of its growth curve, and then calculate the crossing analytically. I don’t currently know of any package that does this (or if it does, I don’t know about it, cuz it already performs so well by doing the switch!)

This was a nice little diversion from my side projects…a side project of side projects. Back to learning about information gain and its ilk. At least something came out of it. I have a nice little experiment module that I can use to do other experiments. And I spent way too much time on it not to post something…

I share my code below, and you can run it if you want. snippet!

# These are the experiments that I ran
#!/usr/bin/ruby

require 'experiment'

require 'linalg'
include Linalg

def calculate1(con)
DMatrix.diagonal(con*DMatrix.new(con.hsize,1,1)).inverse*con
end

def calculate2(con)
DMatrix.diagonal((con*DMatrix.new(con.hsize,1,1)).map { |e| 1.0/e })*con
end

DATA_PTS = 10
MAX = 800
MID = MAX / 2
STEP = MAX / DATA_PTS

# Growing columns
Experiment::compare(1..MAX, STEP,
proc { |col_size| DMatrix.rand(MID, col_size) },
proc { |matrix, col_size|
calculate1(matrix)
},
proc { |matrix, col_size|
calculate2(matrix)
})

# Growing rows
Experiment::compare(1..MAX, STEP,
proc { |row_size| DMatrix.rand(row_size, MID) },
proc { |matrix, row_size|
calculate1(matrix)
},
proc { |matrix, row_size|
calculate2(matrix)
})

# Growing rows, transposing
Experiment::compare(1..MAX, STEP,
proc { |row_size| DMatrix.rand(row_size, MID) },
proc { |matrix, col_size|
calculate1(matrix)
},
proc { |matrix, col_size|
calculate1(matrix.transpose).transpose
})
# Experiment.rb - perform timed experiments across on one dimension
require 'rubygems'
require 'gnuplot'

# Experiment is a module where you can use to plot and benchmark pieces of code
# and plot it on a gnuplot
#
#
module Experiment
class << self

def timer
start_time = Time.now
yield
diff_time = Time.now - start_time
puts "That run took #{diff_time} seconds"
return diff_time
end

# takes range to try and the step resolution of the range and runs
# the benchmark_block with each of the different ranges.
# the init_proc only gets run once during setup.
#
# Experiment::benchmark((1..50), 10,
# proc { |col_size| DMatrix.rand(500, col_size)},
# proc { |matrix, col_size| ProbSpace.new(matrix).entropy })
def benchmark(range, resolution, init_proc, benchmark_block)
xs, ts = [], []
range.step(resolution) do |x_size|
object = init_proc.call(x_size)
xs << x_size
ts << timer { benchmark_block.call(object, x_size) }
end
plot(xs, ts)
end

# same idea as benchmark, but does two at the same time. So
# it takes an init_proc to initialize the experiment,
# and two different procs, standard and comparison, to run
# for each step in the range at the given resolution.
#
# Experiment::benchmark((1..50), 10,
# proc { |col_size| DMatrix.rand(500, col_size)},
# proc { |matrix, col_size| ProbSpace.new(matrix).entropy1 },
# proc { |matrix, col_size| ProbSpace.new(matrix).entropy2 })
def compare(range, resolution, init_proc,
standard_block, comparison_block)
xs, s_ts, c_ts = [], [], []
range.step(resolution) do |x_size|
object = init_proc.call(x_size)
xs << x_size
s_ts << timer { standard_block.call(object, x_size) }
c_ts << timer { comparison_block.call(object, x_size) }
puts "#{x_size} = standard : comparison :: #{s_ts.last} : #{c_ts.last} secs"
end
plot(xs, s_ts, c_ts)
end

def plot(x, *ys)
Gnuplot.open do |gp|
Gnuplot::Plot.new(gp) do |plot|
ys.each do |y|
plot.data << Gnuplot::DataSet.new([x, y]) do |ds|
ds.with = "linespoints"
ds.notitle
end
end
end
end
end

end
end

Is preloading child tables always a good idea?

Optimization isn’t something you should do too early on, but I think a little house cleaning every so often to make sure your pages aren’t ridiculously slow is healthy. With any optimization task, you’d want to benchmark the results and see if there’s an actual gain. The very basic tool for benchmarking is the ordinary script/performance/benchmark. The easiest to find analysis tools is the rails_analyzer gem. The last time I used rails analyzer, it wasn’t that easy to use. The command line arguments seemed arcane. But its bench tool, which can benchmark controllers as opposed to just object models, is fairly easy to use.

Using the bookmarking example from before, let’s say you have something like:

class SceneController < ApplicationController
def list
@books = Book.find_books
end
end

class Book < ActiveRecord::Base
def self.find_books
find(:all, :include => [:bookmarks],
:conditions => ["books.created_on > ?", 6.month.ago])
end

def bookmarked_by?(user)
self.bookmarks.select { |bm| bm.owner_id == user.id }.empty? ? false : true
end
end

In the listing of books, one would display whether it’s actually bookmarked by a user or not. Normally, without the :include, the listing would make repeated queries to the DB every time it displayed a book list element, since it will use bookmarked_by?(user_id) to determine if a user bookmarked the book. So instead of just 1 query, it would make n + 1 queries.

Preloading child tables isn’t necessarily wise all the time. It really depends on what you intend to do with the data after you fetch it. As the Agile rails book warns, preloading all that data will take time. If you look at your log files, you’ll see that it’s a significant amount.

If you’re only going to load a limited number of these book list elements on a single page at a time, it actually might make sense to forgo preloading of child tables, and just use a find() instead of a select.

class SceneController < ApplicationController
def list
@books = Book.find_books
end
end

class Book < ActiveRecord::Base
def self.find_books
find(:all, :conditions => ["created_on > ?", 6.month.ago],
:limit => 20, :order => "created_on desc")
end

def bookmarked_by?(user)
Bookmark.find(:first,
:conditions => ["book_id = ? and owner_id = ?", id, user.id]) ? true : false

end
end

And if you’re going to display counts of arrays, but all means, use counter caching. It’s easy to do (as long as you follow instructions!), for most situations.

Intuitively, if you want to display over a certain n number of book list elements, it makes more sense to use :include and select it. However, I wanted to point out that when you make decisions like this, you’ll always want to measure the load times, because you earn what you measure.

Also, use the right number of runs. Too short number of a number of times you run a function, the more variation you’ll have in your benchmarks. Let’s say that you get two numbers for two different methods.

$ bench -u http://localhost:3000/method1 -r 50 -c 5
50....45....40....35....30....25....20....15....10....5....
Total time: 240.383527755737
Average time: 4.80767055511475

$ bench -u http://localhost:3000/method2 -r 50 -c 5
50....45....40....35....30....25....20....15....10....5....
Total time: 156.147093772888
Average time: 3.12294187545776

So it’s obvious that method2 is better right? Well, not necessarily. While benchmarks only show averages, you’ll need to pay attention to standard deviations. The bigger the standard deviation, the more runs you’ll need to figure out the average load time, and the number of decimal points you can trust. That way, you can figure out whether the difference in load times is statistically significant or not.

That way, you can ascertain whether the optimization you made were worth the trouble or not. tip!