Returning the keys of all documents in CouchDb

There’s a bit of a learning curve when trying to use CouchDb’s mapreduce. One of the harder parts is to write the reduce function, which can have two separate cases: called from the map functions, and called again from reduce functions.

When you emit data from map, the examples show you emitting the document, but you can emit any data structure you care to dream up in the key and value portion of the emit. I needed a mapreduce view that returned all the keys that were present in the all the documents. So if I had documents in the db in the form:

{"year": 2008, "birth_rate": 20.0 }
{"year": 2009, "birth_rate": 21.0 }
{"year": 2008, "death_rate": 20.0 }
{"year": 2009, "death_rate": 20.0 }

I wanted something that returned: [“year”, “birth_rate”, “death_rate”]

Here’s one way to do it:

Tip!

Advertisements

How to add helpers, controllers, models, and views of your plugin into the Rails loadpath

Sometimes, when you’re writing a plugin, you end up writing models,
helpers, and controllers that the main app can use.  However, you
don’t want to copy it into the main app all the time.  You’d like to
keep things separate between the plugin, but you’d like to be able to
include it in the path of the main app.

To do this, put the following in your init.rb file in the root
of your plugin.  To add a new view path in your plugin that’s at
PLUGIN_ROOT/lib/views (where PLUGIN_ROOT is the root directory of your
plugin):


ActionController::Base.append_view_path(File.join(PLUGIN_ROOT, "lib", "views"))

Any template files (like html.erb) that you put in that path will be
seen in your app.

To add new helper, model, or controller directories in the rails load path:


%w{ helpers model controller }.each do |dir|
path = File.join(PLUGIN_ROOT, 'lib', dir)
$LOAD_PATH << path
Dependencies.load_paths << path
Dependencies.load_once_paths.delete(path)
end

And now, any models you put in lib/model, lib/controller, and
lib/helpers will be in the rails load path.

Of course, this might all be moot with the reintroduction of Rails
engines in 2.3.  I haven’t gotten around to using them or figuring it
out yet, but for now, this is how you do it with plugins.  tip!

Bastardized recursion

I seem to be posting less. I’ve been thinking about why that is. Perhaps, less things are surprising to me now (i.e. I’m not learning as much as before). When doing this Rails stuff, the bulk is standard fare, and only occasionally do you run into something mildly interesting. I have been queuing up posts, however. Between work, small side projects, reading, and hanging, there’s less time than before.

I stumbled on something, which I saw in the Rails source once. Thought I’d share.

Say I have a :blog that has_many :posts. But Posts are subclassed to have many different types. But I wanted that post_type information from Blog in different formats. Originally, it looked something like this


class Blog
has_many :posts

def post_types
Post.subclasses
end

def post_names
post_types.map { |pt| pt.name.gsub('Post::','') }
end

def post_string
post_names.map { |n| "'" + n + "'" }.join(",")
end
end

Since they progressively built off of each other, I figured I can use a bastardized recursion, like I saw in find() in ActiveRecord::Base.


class Blog
has_many :posts

def post_types(format = :classes)
case format
when :classes
Post.subclasses
when :names
post_types(:classes).map { |pt| pt.name.gsub('Post::','') }
when :string
post_types(:names).map { |n| "'" + n + "'" }.join(",")
end
end
end

Seems alright. Reduces the clutter of functions that are related to each other, so I’m on the lookout for being able to reduce related functions together like that. tip~!

Updated:
Found this reverse engineering brief on obfuscated code that recites the 12 days of Christmas. It uses the same technique that I described above. I suppose as always, case statements can be abused.

Figuring out a branching strategy

To be honest, before I started working at Frogmetrics, I didn’t know how project branches were managed. I was working at a research lab, and most of what we did was prototyping. After we demonstrated that it worked, we threw it over a wall, and then it was someone else’s problem (poor them). A lot of the time, I was doing work on my own, so there was no (perceived) need for source control. The only time we used source control was when I was working on the New Horizon spacecraft. And even then, someone else managed the branches. Because it was SVN, we all mainly worked off trunk.

Until recently, I never contributed to an open source project either. Therefore, I really had no idea when to branch. So when we started working on the analytics, we really had no idea what a good branching strategy would be.

Googling didn’t help, because either everyone else doesn’t call it “branching strategy”, or everyone already knew how to do it. I eventually figured something out though.

Last week, after talking to AJ of Scoopler about git, he ended up asking about branching strategy. It became apparent that branching strategy wasn’t an obvious thing, so I decided to write something here. This is obviously not the only way to do it, so if you have other suggestions, by all means, comment.

At first, we didn’t know what we were doing. We knew that we wanted to have a branch that had the same code on the production server, and another branch where we’re working on the ‘next’ version. So we had a branching strategy that looked like this:

As you can see, we branched every time we deployed weekly. This gave us the option of doing bug fixes on the every deploy version, while keeping a working branch. However, this was a terrible way to do things. This branching strategy required you to keep merging back bug fixes that you had made earlier. In addition, we were using bug tracking software to track all the issues from week to week, which results in much ticket shuffling and overhead.

Now, we’re doing this. Locally, we still branch for every feature that we’re working on. And when there’s a major set of features that need to be implemented by more than one developer, we create a remote branch for it that we push and pull to/from.

Thus, we’re treating master as the golden copy of the code base. It is always deploy-able, passes all tests, and is perfect code as we know it. This allows us still the advantage of doing bug fixes and deploying independent of what features are currently on deck or in the hole, and yet we don’t have to do merges every time are about to do a new version. We simply (and somewhat arbitrarily) tag versions as we go along, and only merge feature branches back. When we can, we rebase the branches to keep the history clean. In addition, we try to make our commits atomic and about one thing, rather than one feature set. That way, it makes it very helpful to remove a piece of code, cherry-pick a changeset to another branch, or find an offending commit that broke something.

So far, it’s worked pretty well, but it might evolve as we go on.

Well, hope that helps. This post wasn’t as fun to write, but it was something I hadn’t see too much of out on the web, so I figured I’d contribute. Fun times.

Installing Ruby’s linalg in Ubuntu

Ruby has a project for download called linalg that exposes the LAPACK functions in Ruby. LAPACK is a set of routines written in Fortran to do Linear Algebra operations. Now, before you cry foul about Fortran, LAPACK is used in a lot of places, and it’s pretty damn old and has held up over the years.

The linalg package just updated about two weeks ago, first since 2004, so it has some updates. Unfortunately, there’s no easy gem to install. You’ll have to download the tar, and then run


sudo ruby install.rb

Of course, you’ll run into some problems, unless you have other packages installed:


sudo apt-get install lapack3 lapack3-dev libg2c0-dev

Don’t worry, if you don’t like it, you can run uninstall. Read INSTALL file for more directions.
I use to wonder how people knew which packages to look for. Apparently, you look at the output and see what files the configure is crapping out on. Then use apt-file to search for which package the file is in. It’s one of those basic things that seems not mentioning in hindsight. Like calling “source ~/.bashrc” after you’ve edited your bash files. No one ever mentions that.


sudo apt-get install apt-file
sudo apt-file update
apt-file list g2c.h

Knowing that, you’ll know how to fish, and I’ll not post how to install stuff in the future.

tip!

Anonymous scope, the unknown cousin of Named scope

Last time, I showed you the well known named scopes. This time, I’ll talk about the little documented anonymous scopes.

Anonymous scopes were mentioned briefly on Ryan’s Scraps. And in the API, I found it tucked away in ActiveRecord::NamedScope module documentation.

All subclasses of ActiveRecord::Base have two named_scopes:

  • all, which is similar to a find(:all) query, and
  • scoped, which allows for the creation of anonymous scopes, on the fly:
    Shirt.scoped(:conditions => {:color => ‘red’}).scoped(:include => :washing_instructions)

These anonymous scopes tend to be useful when procedurally generating complex queries, where passing intermediate values (scopes) around as first-class objects is convenient.

How would this be useful? In the example given, it’s really not. And most of the time, what you need to do will suffice with named scope. However, there are times when named scope doesn’t give you the flexibility that you need, and it is actually quite powerful when it’s used in conjunction with association proxies.

I was using the better nested set plugin. It allows you to have a fast access tree structure in a relational database. And while it’s a neat plugin, I couldn’t chain my calls like such:

@father.subtree.with_email  # => fails

to find all the father’s descendants that had an email. That’s because subtree() exists in the plugin and it uses find(), and that returns an array of objects. You can’t further extend the call, because by find returns an array, not an association proxy.

In our association proxies, if we expect to chain the calls, we can use scoped() instead of find(). Just to demonstrate:

class Person < ActiveRecord::Base
has_many :people do
def subtree
scoped(:conditions => ["lft between self.id and self.id", self.lft, self.rgt])
end
end

named_scope :with_emails, :conditions => ["email is not null"]
end

That means we would be able to change other scoped after subtree():

@father.subtree.with_emails # => returns all children

There’s not much to it, but it’s nice when, once again, you’re into breaking Law of Demeter.

tip!

Named scope, how do I love thee

I’m not sure how I missed it, but named_scope is something that I’ve been looking for. I should really read more of Ryan’s scraps. Just in case you don’t know, named_scope is a way to add filters and conditions to the finder methods on your model.

There’s a couple other hipper rails programmers that have covered it months ago, so I’ll defer to original author and the aforementioned Ryan and his table scraps to tell you about the basic things you need to know. This functionality has been absorbed into Rails 2.1 and you can find it under the method name, named_scope.

In this post, I’ll talk about some of the uses I’ve found for it. There’s more code posting in this one than usual, but it’s incremental, so all you have to do is notice what’s different between the sets of code examples.

Lately, I’ve found that I needed to mix and match different kinds of conditions in my finder methods in my models. Let’s say we have articles each that have many comments. How do we find comments that have an email address? How about if we wanted articles with a url address included in the comment post? We could make another has_many association.


class Article < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc"
has_many :comments_with_email,
:conditions => "email is not null",
:order => "comments.created_at desc"
has_many :comments_with_url,
:conditions => "url is not null",
:order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article
end

Or instead of cluttering things up in the class namespace, we can use an association proxy extension so that instead of calling @article.comments_with_email, we can call @article.comments.with_email (and violate Law of Demeter)


class Article < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc" do
def with_email
# we can do it this way
with_scope(:find => { :conditions => "email is not null",
:order => "comments.created_at desc" }) do
find(:all)
end
end

def with_url
# or we can do it this way
find(:all, :conditions => "url is not null",
:order => "comments.created_at desc")
end
end
end

class Comment < ActiveRecord::Base
belongs_to :article
end

This is all fine and well, until you need to find all comments with emails and url. You can make finders that take arguments, but entertain the following possibility. find() in the association proxy extensions actually return an Array, so you cannot chain them, like @article.comments.with_email.with_url

How do we do this? named_scope() is one way to do it.


class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article

named_scope :with_email, :conditions => "email is not null"
named_scope :with_url, :conditions => "url is not null"
end

That means you can do things like


@article.comments.with_email

Or you can actually call count(), so that the sql is calling a count instead of instanciating all the active record objects in an array then calling size, which is much faster:


@article.comments.with_email.count

Not only that, but if there are other models that associate with comments, you have the scoping filters in one place in the code.


class User < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc"
end

class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article
belongs_to :user

named_scope :with_email, :conditions => "email is not null"
named_scope :with_url, :conditions => "url is not null"
end

So not only can you find all comments with both email and url for an article, you can do the same for users:


@article.comments.with_email.with_url # all comments with email and url of an article
@user.comments.with_email.with_url # all comments with email and url by a user

Therefore, if you have common intersecting conditions that you need to do, like all the comments in a period of time for an article, named scope will help. For, I’d like to be able to call:


class User < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc"
end

class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article
belongs_to :user

named_scope :with_email, :conditions => "email is not null"
named_scope :with_url, :conditions => "url is not null"
named_scope :in_period, lambda { |start_date, end_date|
{ :conditions => ["respondents.created_at >= ? and " +
"respondents.created_at <= ?",
start_date, end_date] }
}
end

So now we can call:


@article.comments.in_period(@start_date, @end_date)
@article.comments.with_email.in_period(@start_date, @end_date)

Cool you say! Now before you go back into your code and start replacing all of your stuff with named_scopes, keep in mind that there are edge cases where named_scopes wouldn’t be appropriate. I fell into the trap of thinking that I could used named_scope for everything like a kid that found a new hammer, the world looked like a nail. So I spend more time than I should trying to bend named_scope to my will.

One of the things that fails is that there is no way (as far as I know) to override named scope conditions, like with_scope, outside of going into rails and messing with it and submitting a patch.

For example, if we already have an association of comments with the article that sorts in descending order, we cannot have named scopes that ask for the earliest and latest article using named_scope.


class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article

named_scope :earliest, :order => "comments.created_at asc",
:limit => 1
named_scope :latest, :order => "comments.created_at desc",
:limit => 1
end

This won’t work because named_scope assumes that you’d want to merge all the conditions throughout the entire chain.


@article.comments.latest # will work because the sql will look like:
# SELECT * FROM `comments`
# ......blah blah....
# ORDER BY respondents.created_at desc,
# respondents.created_at desc
# LIMIT 1

@article.comments.earliest # will not work because the
# SELECT * FROM `comments`
# ......blah blah....
# ORDER BY respondents.created_at desc,
# respondents.created_at asc
# LIMIT 1

Next time, I’ll cover named_scopes cousin that’s not very documented, so it’s easy to skip over: anonymous scopes.

Tip!