Returning the keys of all documents in CouchDb

There’s a bit of a learning curve when trying to use CouchDb’s mapreduce. One of the harder parts is to write the reduce function, which can have two separate cases: called from the map functions, and called again from reduce functions.

When you emit data from map, the examples show you emitting the document, but you can emit any data structure you care to dream up in the key and value portion of the emit. I needed a mapreduce view that returned all the keys that were present in the all the documents. So if I had documents in the db in the form:

{"year": 2008, "birth_rate": 20.0 }
{"year": 2009, "birth_rate": 21.0 }
{"year": 2008, "death_rate": 20.0 }
{"year": 2009, "death_rate": 20.0 }

I wanted something that returned: [“year”, “birth_rate”, “death_rate”]

Here’s one way to do it:

Tip!

How to add helpers, controllers, models, and views of your plugin into the Rails loadpath

Sometimes, when you’re writing a plugin, you end up writing models,
helpers, and controllers that the main app can use.  However, you
don’t want to copy it into the main app all the time.  You’d like to
keep things separate between the plugin, but you’d like to be able to
include it in the path of the main app.

To do this, put the following in your init.rb file in the root
of your plugin.  To add a new view path in your plugin that’s at
PLUGIN_ROOT/lib/views (where PLUGIN_ROOT is the root directory of your
plugin):


ActionController::Base.append_view_path(File.join(PLUGIN_ROOT, "lib", "views"))

Any template files (like html.erb) that you put in that path will be
seen in your app.

To add new helper, model, or controller directories in the rails load path:


%w{ helpers model controller }.each do |dir|
path = File.join(PLUGIN_ROOT, 'lib', dir)
$LOAD_PATH << path
Dependencies.load_paths << path
Dependencies.load_once_paths.delete(path)
end

And now, any models you put in lib/model, lib/controller, and
lib/helpers will be in the rails load path.

Of course, this might all be moot with the reintroduction of Rails
engines in 2.3.  I haven’t gotten around to using them or figuring it
out yet, but for now, this is how you do it with plugins.  tip!

Bastardized recursion

I seem to be posting less. I’ve been thinking about why that is. Perhaps, less things are surprising to me now (i.e. I’m not learning as much as before). When doing this Rails stuff, the bulk is standard fare, and only occasionally do you run into something mildly interesting. I have been queuing up posts, however. Between work, small side projects, reading, and hanging, there’s less time than before.

I stumbled on something, which I saw in the Rails source once. Thought I’d share.

Say I have a :blog that has_many :posts. But Posts are subclassed to have many different types. But I wanted that post_type information from Blog in different formats. Originally, it looked something like this


class Blog
has_many :posts

def post_types
Post.subclasses
end

def post_names
post_types.map { |pt| pt.name.gsub('Post::','') }
end

def post_string
post_names.map { |n| "'" + n + "'" }.join(",")
end
end

Since they progressively built off of each other, I figured I can use a bastardized recursion, like I saw in find() in ActiveRecord::Base.


class Blog
has_many :posts

def post_types(format = :classes)
case format
when :classes
Post.subclasses
when :names
post_types(:classes).map { |pt| pt.name.gsub('Post::','') }
when :string
post_types(:names).map { |n| "'" + n + "'" }.join(",")
end
end
end

Seems alright. Reduces the clutter of functions that are related to each other, so I’m on the lookout for being able to reduce related functions together like that. tip~!

Updated:
Found this reverse engineering brief on obfuscated code that recites the 12 days of Christmas. It uses the same technique that I described above. I suppose as always, case statements can be abused.

Figuring out a branching strategy

To be honest, before I started working at Frogmetrics, I didn’t know how project branches were managed. I was working at a research lab, and most of what we did was prototyping. After we demonstrated that it worked, we threw it over a wall, and then it was someone else’s problem (poor them). A lot of the time, I was doing work on my own, so there was no (perceived) need for source control. The only time we used source control was when I was working on the New Horizon spacecraft. And even then, someone else managed the branches. Because it was SVN, we all mainly worked off trunk.

Until recently, I never contributed to an open source project either. Therefore, I really had no idea when to branch. So when we started working on the analytics, we really had no idea what a good branching strategy would be.

Googling didn’t help, because either everyone else doesn’t call it “branching strategy”, or everyone already knew how to do it. I eventually figured something out though.

Last week, after talking to AJ of Scoopler about git, he ended up asking about branching strategy. It became apparent that branching strategy wasn’t an obvious thing, so I decided to write something here. This is obviously not the only way to do it, so if you have other suggestions, by all means, comment.

At first, we didn’t know what we were doing. We knew that we wanted to have a branch that had the same code on the production server, and another branch where we’re working on the ‘next’ version. So we had a branching strategy that looked like this:

As you can see, we branched every time we deployed weekly. This gave us the option of doing bug fixes on the every deploy version, while keeping a working branch. However, this was a terrible way to do things. This branching strategy required you to keep merging back bug fixes that you had made earlier. In addition, we were using bug tracking software to track all the issues from week to week, which results in much ticket shuffling and overhead.

Now, we’re doing this. Locally, we still branch for every feature that we’re working on. And when there’s a major set of features that need to be implemented by more than one developer, we create a remote branch for it that we push and pull to/from.

Thus, we’re treating master as the golden copy of the code base. It is always deploy-able, passes all tests, and is perfect code as we know it. This allows us still the advantage of doing bug fixes and deploying independent of what features are currently on deck or in the hole, and yet we don’t have to do merges every time are about to do a new version. We simply (and somewhat arbitrarily) tag versions as we go along, and only merge feature branches back. When we can, we rebase the branches to keep the history clean. In addition, we try to make our commits atomic and about one thing, rather than one feature set. That way, it makes it very helpful to remove a piece of code, cherry-pick a changeset to another branch, or find an offending commit that broke something.

So far, it’s worked pretty well, but it might evolve as we go on.

Well, hope that helps. This post wasn’t as fun to write, but it was something I hadn’t see too much of out on the web, so I figured I’d contribute. Fun times.

Installing Ruby’s linalg in Ubuntu

Ruby has a project for download called linalg that exposes the LAPACK functions in Ruby. LAPACK is a set of routines written in Fortran to do Linear Algebra operations. Now, before you cry foul about Fortran, LAPACK is used in a lot of places, and it’s pretty damn old and has held up over the years.

The linalg package just updated about two weeks ago, first since 2004, so it has some updates. Unfortunately, there’s no easy gem to install. You’ll have to download the tar, and then run


sudo ruby install.rb

Of course, you’ll run into some problems, unless you have other packages installed:


sudo apt-get install lapack3 lapack3-dev libg2c0-dev

Don’t worry, if you don’t like it, you can run uninstall. Read INSTALL file for more directions.
I use to wonder how people knew which packages to look for. Apparently, you look at the output and see what files the configure is crapping out on. Then use apt-file to search for which package the file is in. It’s one of those basic things that seems not mentioning in hindsight. Like calling “source ~/.bashrc” after you’ve edited your bash files. No one ever mentions that.


sudo apt-get install apt-file
sudo apt-file update
apt-file list g2c.h

Knowing that, you’ll know how to fish, and I’ll not post how to install stuff in the future.

tip!

Anonymous scope, the unknown cousin of Named scope

Last time, I showed you the well known named scopes. This time, I’ll talk about the little documented anonymous scopes.

Anonymous scopes were mentioned briefly on Ryan’s Scraps. And in the API, I found it tucked away in ActiveRecord::NamedScope module documentation.

All subclasses of ActiveRecord::Base have two named_scopes:
  • all, which is similar to a find(:all) query, and
  • scoped, which allows for the creation of anonymous scopes, on the fly:
    Shirt.scoped(:conditions => {:color => ‘red’}).scoped(:include => :washing_instructions)

These anonymous scopes tend to be useful when procedurally generating complex queries, where passing intermediate values (scopes) around as first-class objects is convenient.

How would this be useful? In the example given, it’s really not. And most of the time, what you need to do will suffice with named scope. However, there are times when named scope doesn’t give you the flexibility that you need, and it is actually quite powerful when it’s used in conjunction with association proxies.

I was using the better nested set plugin. It allows you to have a fast access tree structure in a relational database. And while it’s a neat plugin, I couldn’t chain my calls like such:

@father.subtree.with_email  # => fails

to find all the father’s descendants that had an email. That’s because subtree() exists in the plugin and it uses find(), and that returns an array of objects. You can’t further extend the call, because by find returns an array, not an association proxy.

In our association proxies, if we expect to chain the calls, we can use scoped() instead of find(). Just to demonstrate:

class Person < ActiveRecord::Base
has_many :people do
def subtree
scoped(:conditions => ["lft between self.id and self.id", self.lft, self.rgt])
end
end

named_scope :with_emails, :conditions => ["email is not null"]
end

That means we would be able to change other scoped after subtree():

@father.subtree.with_emails # => returns all children

There’s not much to it, but it’s nice when, once again, you’re into breaking Law of Demeter.

tip!

Named scope, how do I love thee

I’m not sure how I missed it, but named_scope is something that I’ve been looking for. I should really read more of Ryan’s scraps. Just in case you don’t know, named_scope is a way to add filters and conditions to the finder methods on your model.

There’s a couple other hipper rails programmers that have covered it months ago, so I’ll defer to original author and the aforementioned Ryan and his table scraps to tell you about the basic things you need to know. This functionality has been absorbed into Rails 2.1 and you can find it under the method name, named_scope.

In this post, I’ll talk about some of the uses I’ve found for it. There’s more code posting in this one than usual, but it’s incremental, so all you have to do is notice what’s different between the sets of code examples.

Lately, I’ve found that I needed to mix and match different kinds of conditions in my finder methods in my models. Let’s say we have articles each that have many comments. How do we find comments that have an email address? How about if we wanted articles with a url address included in the comment post? We could make another has_many association.


class Article < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc"
has_many :comments_with_email,
:conditions => "email is not null",
:order => "comments.created_at desc"
has_many :comments_with_url,
:conditions => "url is not null",
:order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article
end

Or instead of cluttering things up in the class namespace, we can use an association proxy extension so that instead of calling @article.comments_with_email, we can call @article.comments.with_email (and violate Law of Demeter)


class Article < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc" do
def with_email
# we can do it this way
with_scope(:find => { :conditions => "email is not null",
:order => "comments.created_at desc" }) do
find(:all)
end
end

def with_url
# or we can do it this way
find(:all, :conditions => "url is not null",
:order => "comments.created_at desc")
end
end
end

class Comment < ActiveRecord::Base
belongs_to :article
end

This is all fine and well, until you need to find all comments with emails and url. You can make finders that take arguments, but entertain the following possibility. find() in the association proxy extensions actually return an Array, so you cannot chain them, like @article.comments.with_email.with_url

How do we do this? named_scope() is one way to do it.


class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article

named_scope :with_email, :conditions => "email is not null"
named_scope :with_url, :conditions => "url is not null"
end

That means you can do things like


@article.comments.with_email

Or you can actually call count(), so that the sql is calling a count instead of instanciating all the active record objects in an array then calling size, which is much faster:


@article.comments.with_email.count

Not only that, but if there are other models that associate with comments, you have the scoping filters in one place in the code.


class User < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc"
end

class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article
belongs_to :user

named_scope :with_email, :conditions => "email is not null"
named_scope :with_url, :conditions => "url is not null"
end

So not only can you find all comments with both email and url for an article, you can do the same for users:


@article.comments.with_email.with_url # all comments with email and url of an article
@user.comments.with_email.with_url # all comments with email and url by a user

Therefore, if you have common intersecting conditions that you need to do, like all the comments in a period of time for an article, named scope will help. For, I’d like to be able to call:


class User < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc"
end

class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article
belongs_to :user

named_scope :with_email, :conditions => "email is not null"
named_scope :with_url, :conditions => "url is not null"
named_scope :in_period, lambda { |start_date, end_date|
{ :conditions => ["respondents.created_at >= ? and " +
"respondents.created_at <= ?",
start_date, end_date] }
}
end

So now we can call:


@article.comments.in_period(@start_date, @end_date)
@article.comments.with_email.in_period(@start_date, @end_date)

Cool you say! Now before you go back into your code and start replacing all of your stuff with named_scopes, keep in mind that there are edge cases where named_scopes wouldn’t be appropriate. I fell into the trap of thinking that I could used named_scope for everything like a kid that found a new hammer, the world looked like a nail. So I spend more time than I should trying to bend named_scope to my will.

One of the things that fails is that there is no way (as far as I know) to override named scope conditions, like with_scope, outside of going into rails and messing with it and submitting a patch.

For example, if we already have an association of comments with the article that sorts in descending order, we cannot have named scopes that ask for the earliest and latest article using named_scope.


class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article

named_scope :earliest, :order => "comments.created_at asc",
:limit => 1
named_scope :latest, :order => "comments.created_at desc",
:limit => 1
end

This won’t work because named_scope assumes that you’d want to merge all the conditions throughout the entire chain.


@article.comments.latest # will work because the sql will look like:
# SELECT * FROM `comments`
# ......blah blah....
# ORDER BY respondents.created_at desc,
# respondents.created_at desc
# LIMIT 1

@article.comments.earliest # will not work because the
# SELECT * FROM `comments`
# ......blah blah....
# ORDER BY respondents.created_at desc,
# respondents.created_at asc
# LIMIT 1

Next time, I’ll cover named_scopes cousin that’s not very documented, so it’s easy to skip over: anonymous scopes.

Tip!

Gotchas of internal iFrame facebook apps and external web apps using Facebooker gem

A while back, I added mobtropolis to facebook as an internal app. I decided to go with using FBML because there was more support in the how-tos about how to use it, and it looked like tighter look and feel and integration.

However, unlike many facebook apps, Mobtropolis also exists as a stand-alone external web app. This decidedly made things a little bit hairier, and I had to write a custom mime-response filter to be able to tell whether a call was coming from a web client (HTML), or as an internal facebook app (FBML), in order to authenticate correctly. I also ended up having to write some custom testing methods for it as well.

Then I revamped the layout of mobtropolis.

It’s major suckage to have to maintain two separate views, so I decided to go with an iFrame with the internal facebook app. It took a bit of work to convert it to use iFrames, because authentication gets a little bit more complicated. However, it’s something that I only have to deal with once. Subsequent changes to the layout won’t affect it as much.

In retrospect, I should have went with using an iFrame from the beginning, though, at the time, mobtropolis was fairly ugly. This is what people call “judgement“, and I made the mistake and it cost me about three weeks. The thing is, you just make the best decision you can at the time, and make sure you can change directions easily.

There were a couple gotchas when using iFrames.

  1. Double facebook frames on redirect to install page.
  2. External app’s layout is wider than iFrame
  3. Facebook only sends fb params on the first call to your app

Hopefully, I’ll save you some time, to whomever’s looking for this info.

1) Double facebook frames

When you use ensure_application_is_installed_by_facebook_user or ensure_authenticated_to_facebook, it will automatically reroute the user to an install page if he didn’t install your application. Problem is, it assumes that you’re not in an iFrame. It ends up that you can override application_is_not_installed_by_facebook_user in your controllers.


def application_is_not_installed_by_facebook_user
redirect_to add_internal_facebook_app_url
end

Where add_internal_facebook_app_url is an action in a controller (say, my_controller), that renders javascript to change the location of the top frame.


def add_internal_facebook_app
render :layout => false, :inline => %Q{
top.location.href = ""
}
end

You have to make sure you connect it as a route in order to redirect it like I did in the overridden application_is_not_installed_by_facebook_user(), in routes.rb under config/


map.add_internal_facebook_app('add_facebook_internal_app',
:controller => "my_controller",
:action => "add_internal_facebook_app")

2) External app is wider than iFrame

I think there is a way to resize the Facebook iFrame, but I didn’t find out about it after I did this. By default, the Facebook iFrame “smartsizes” itself, to fill out rest of the page.

First, I created a stylesheet called fb_internal_layout.css, that had extra stylings that squeezed the interface in a 446px wide iFrame. Then I included it in the headers of my layouts as:


Make sure you include titles in the link, so that you can actually switch it out.

Then we use javascript to turn on or off this alternate stylesheet depending on whether we’re in an iframe or not. You can use something like what’s described in A List Apart’s article on alternate stylesheets to switch out stylesheets.

To detect if I was in an iFrame, I simply checked whether (frames.top == frames.self). If it was, I turned on the alternate stylesheet.

3) Facebook only sends fb params on the first call to your app

This is actually not a problem if you use FBML. This is also not a problem if you’re using iFrames, and you require a user to install your facebook app if they want to see what’s on it.

However, even though this is how a lot of facebook apps operate, I don’t think this is very user friendly. The user has no way to judge whether they want to install your app or not if they can’t even sample it. I would rather have a user add an app because they want to, rather than getting people that add it, but then remove it shortly after. This not only gives you an inaccurate indication of how many people really want to use your app, but also annoys the hell out of them.

But making some pages of an iFrame app to be public is a bit tricky. Only the first click into your facebook app is there fb_params in the request. Every subsequent click by a user is in your iFrame, so looks as if the user is actually on the external webpage.

There are a couple solutions, but I ended up storing session state that the user made a request from an internal app before. You can’t override params on subsequent requests, so using old fb_params to authenticate is difficult at best. Using the flag that a user made a request before, this session is likely to be coming from an internal facebook app. When it comes upon a private page, it should be redirected to install mobtropolis, using 1) detailed above. This is not a perfect solution, but it covers all cases correctly.

This, however, doesn’t account for the instance where a user that already installed. In that particular case, I just went ahead an got a facebook session on every first request to the facebook app.

Hope that helped, and I hope never to have to mess with this sort of stuff again, and that you don’t either. More interesting posts in the future. Tip!

Foxy Fixtures and polymorphic tables

Well, I’m behind on everything, which means a bunch of interesting blog posts are queued up. But this one seemed short enough to warrant a small post.

I’ve always hated fixtures for the same reason that other people hate them, but nonetheless, I’ve bit the bullet to use them. Along comes Rails 2.0’s foxy fixtures, and it becomes a little easier.

What it doesn’t detail, however, is how to use your newly foxy fixtures for polymorphic models. If I have a vote model that I can use to vote on any type of table, with the old fixtures, I’d have:


my_vote:
id: 1
account_id: 1
votable_id: 3
votable_type: "Scene"

Normally, you just get rid of the foreign keys since it now checks the belongs_to associations of each model, and you can just use the label names. Same goes with the primary key id. It’ll be autogenerated based on a hash of the fixture label.


my_vote:
account: my_account
votable: eat_hotdog
votable_type: "Scene"

Note that you’re using the association names, and NOT the foreign key name, so you don’t use “_id” anymore (that bit me in the ass for a little bit).

However, you’ll find that with polymorphic models, you won’t be able to do that. Searching around the good ‘ole web lead me to find that Foxy Fixtures originally came from a plugin called Rathole, and at the very end of the README, it states:

Also, sometimes (like when porting older join table fixtures) you’ll need to be able to get ahold of Rathole’s identifier for a given label. ERB to the rescue:

Go John Barnette! That way, you can simply do this in your fixtures as a fall-back:


my_vote:
account: my_account
votable_id:
votable_type: "Scene"

Tip!