Yeah, I use Google Wave to ‘sign off’ on informal agreements

Yeah, I use Google Wave.  I honestly don't know what everyone else is talking about when they say they don't know what it's for.

Ian and I don't talk on the phone about Graphbug maybe as much as we should.  There's a number of reasons for this, but as a result, we mostly communicate over Google Wave, which also serves as documentation for us.  

It's better than email for communications that's consistently about something specific.  We know what the other's doing code-wise, through pivotal and commit logs, and things outside of code, we update in a google wave.  Or if we have a spec that we need to hammer out, it's better to do it through wave.  It's more like a real-time wiki in this sense.

But we also use it to "Signing off" on informal agreements.  We'll type up a set of bullet points and keep editing and arguing about it in the discussions below the main wave.  The threaded nature of a wave keeps arguments and counter-arguments on target.  And then when we're both in agreement, we sign the bottom of the top wave (with all the bullet points) and the date.  

Because you can replay a wave, and that history doesn't change, we can be sure that neither one of us changed the bullet points after we've 'signed off' on the agreement.  

Of course, there's nothing verifying the authenticity of the signature (meaning we can't verify who signed it), only that it's signed at the right time.  There's probably a way, but for just us, just putting our names and date down is enough.  Given that there's only two of us we know who signed what name.

Tip!

It sucks to visualize public data

Screen_shot_2010-02-25_at_1

A while back, I made a graph overlaying the senate party majority with the national debt.  While correlation isn’t sufficient for causation, I did learn a lot about the shape of the national debt as well as assertions between political party and national debt is not as clear cut as political commenters and pundits would like you to believe.  

However, the entire experience of producing the graphs in those blog post left me dumbfounded with how tedious it was just to see the shape of the data.  

First, finding the data was difficult.  Many of the census bureau and government websites have confusingly bad user interfaces.  It took me a while to figure out how to find the number of single females by year and state.  Go ahead, try it.  

Once you’ve found the data, you need shape it into what you need for the visualization, which is difficult because data on the web is in a sad state of affairs.  Images, video, and text on the web are limited to a few popular formats.  Not only that, the actual format is abstracted out with an image tag–and soon to be video tag–by the browser or flash.  No widespread consensus exists for raw data, with the exception of a limited number of domains dictated by microformats, KML, and its ilk.  Many publish data through html tables or CSV files, which comes in surprising variety of hard-parsable formats.
Then, it was tedious simply finding a free visualization tool that produce moderately good looking graphs.  There are a plethora of javascript graphing libraries, not to mention flash libraries like amcharts and flare, each with their own strengths, weaknesses, aesthetics, and ramp-up time.  It took a while to pick the one that was suitable.
The whole thing was harder than it should have been, and if I couldn’t convey to you how mad I am, just insert expletives above, whenever possible.
As a programmer, I can figure out how to find and graph the data, however painfully.  But it’s completely inaccessible to regular everyday person that uses the internet.  There’s a lot of pubic data out there, but people aren’t able to access it easily.  If it doesn’t show up on google’s search, that’s where they stop.
It’s not that people aren’t interested that data, but it’s because the data is completely inaccessible.  If blog and news articles about unemployment, STD rates, and gas prices are any indication, people want to know this kind of information.  
Even more so, people want to be able to explore different aspects of this data like the New York Times’ interactive visualizations by cutting and slicing the data as they see fit, because often times a statistic only makes sense when you can answer the question, “compared to what and how?”  And people want to be able to explore it visually in a way that gives insight.  

This is why I’m working on Graphbug–making public data more easily accessible through visualizations.  Some questions are best answered visually.  

We want to make it easy to find data that you need, and make it as fluid as possible to move between different datasets to compare them, in the same way that google maps made it fluid to navigate a map.  Then if you need to do more hardcore analysis, you can download it and play with it however you like.  
Of course, the topic is too broad at the moment.  We’d like to focus on particular datasets first, and though we picked the US Census to start, we’d like to hear from you what sorts of datasets you’d be interested in?  iPhone market share?  Number of earthquakes in each state by year?  The male to female ratio at colleges over the years?
So visit graphbug.com, and let us know and give me feedback or comment on whether you felt the same pain or you have a need for this.  And if so, what are you currently using to solve this problem?

What to listen for from technology scrooges

Consider today’s online world. The Usenet, a worldwide bulletin board, allows anyone to post messages across the nation. Your word gets out, leapfrogging editors and publishers. Every voice can be heard cheaply and instantly. The result? Every voice is heard. The cacophany more closely resembles citizens band radio, complete with handles, harrasment, and anonymous threats. When most everyone shouts, few listen.

I saw this short opinion piece from 1995, and it’s related to the technology myopia I wrote about recently, even amongst the technologically well-versed.

Once again, it’s easy for us to say Mr. Stoll here is a fool. But that’s not what we should glean from this. We now know that even if every voice can be heard cheaply, that data can be easily accessed in text form, and someone will come up with an algorithm to organize it all. Not only that, there’s more than one algorithm. Some, like Pagerank is all in a computer. Others, like Digg, Reddit, and HN, are computer-human hybrids that require a bit of social engineering. So while Mr. Stoll correctly identified the problem, he let that stop him there. It was beyond his imagination if, much less how, that problem would be solved.

The same goes when any technology pundit talks about the current new crop of technologies. He might just not have the background to know whether it’s possible at all, or he might just not have the imagination. You need both to see clearly.

That’s not to say you should ignore everything these pundits say. Instead, listen carefully. The hard part of product development is making something people want. What pundits have done is the hard part of identifying the problem. It’s up to you to figure out how to prove them wrong.

I take it back. Haml’s got some vinegar for a reason.

I take it back.

I gave HAML a second shot, and I rather like it. The conversion was pretty easy, and having emac’s haml-mode saved a lot of indentation headaches–even moving blocks left and right in indentation.

How come I liked it better the second time around?  The first time I saw it, it was written by someone that ignored convention and just powered through nesting HAML div elements 10 to 12 levels deep in one file.  You know what?  You’re doing it wrong.  

Writing HAML from scratch this time, and reading Haml Sucks for Content, I’ve found that HAML is designed with intentional weaknesses to make you stay clean and clear. Indentation too far in an HTML document? Chances are, there are repeating elements. Refactor it out to a partial, and you get to start at column zero again. Have the urge to make multi-line ruby statements? Refactor it out to a helper. You’ll live longer that way.

By making some things intentionally hard to do–giving them some syntactic vinegar, you’d hope that your users end up Doing the Right Thing.  But then again, some people just are immune to having code indent 10 to 12 levels deep, or name their variables ‘ii’ and ‘jj’ when they’re not programming in Fortran.

Who would pay for a message sent to nobody in particular?

    I recently read a post on how Technology isn’t Everything on HN, about how the ranter wouldn’t use eBooks.  

Consider books. I still buy and read all of my books in the form of compressed wood pulp. There are newfangled e-book readers, but I don’t want one. Why? Because the only places I read are 1) In the bathtub, and 2) Lying in bed. Taking a computer into the bathtub is generally not a good idea, and holding a Kindle above my head for 3 hours is awkward compared to lying a (3-D) book on the bed beside me with one page bent up so I can read it. 

via briancarper.net

It’s something I hear often about new technologies–“Why would I want to do that?”  I hear it on blog posts for new tech.  I hear it in person during meetups.  When I hear something like this, I’m reminded of quotes about technologies that we take for granted as being obvious now.

Let’s start with the radio.  Back before radio as we know it now (as radio stations), when people said ‘radio’ in the 1920’s, they meant the wireless transmission of messages over the air.  They considered it as a communication medium to relay news, like the sinking of the Titanic.  Because of that, people use to pay directly to send messages.  No one was using it as a way to broadcast music like the way we know it now–the concept of a radio station.  And hence, there was no sense of imagination that advertisers would pay ads alongside music broadcasts.  And when David Sarnoff was pioneering the idea, what was the reaction by his potential investors?

By 1916, along with Armstrong and de Forest, [David Sarnoff] was using his newfound fame to push the idea of commercial radio, something he called the “wireless music box,” although this idea was before its time. Even as late as 1920, one potential investor wrote him to say, “The wireless music box has no imaginable commercial value. Who would pay for a message sent to nobody in particular?” 

Even the Marconi Company, his employer, rejected the idea of radio as anything but a communications medium. So he went to work for the Radio Corporation of America [RCA] in 1920.
— Radio Pioneers enter story of the wire on David Sarnoff’s associates in response to his urgings for investment in radio. [emphasis mine]

Who indeed.  We laugh now, but we have the luxury of living in the future with our buddy Hindsight.  It’s easy to forget what use to be non-obvious.  What’s more interesting is that DeForest, also at the edge of innovating on the wireless music box, had this to say about television:

“While theoretically and technically television may be feasible, commercially and financially I consider it an impossibility, a development of which we need waste little time dreaming.”  

— Wikiquotes — Lee DeForest, American radio pioneer and inventor of the vacuum tube,

Sounds like what people have been saying about Twitter since it came out.

The point to notice here is not: “People in the past are dumb and haha, they were wrong.”  The point is, when a new technology or a new use of technology comes out, we’re often colored by how we use similar technologies now.  This is especially true when the new thing is bad at doing what the current thing does–“Why would I use eBooks when a regular book would never run out of batteries?”  This is why even the most tech savvy amongst us deride new tech with, “Why would I use that?  The current thing does that much better.”  What we often miss is though the new thing isn’t as good at something (yet), what new usage vectors does it introduce?  You might not be able to read eBooks in the tub like paper books, but what does it allow you to do that paper books can’t?

In fact, there’s already a whole book written on the subject.  I’ll summarize from the review in Amazon:  Seagate was the biggest producer of 5.25″ hard drives in 1985, and was doing research into 3″ and 5″ drives.  However, they shelved the research because marketing found out that the biggest current customers weren’t interested in 3″ and 5″ drives.  Who needs that?  They don’t store as much as 5.25″ drives and they’re not as fast.  So Seagate didn’t develop the 3″ and 5″ tech and had their lunch eaten by startups attacking that emerging market because as it turned out, the rise of laptops and small personal devices like the iPod needed small drives.  

The iPod is also another famous example with CmdrTaco, the editor of Slashdot saying:

No wireless. Less space than a nomad. Lame. 

 via Slashdot

The same was true of Posterous when they were in our batch.  I remember founders of other startups in the batch thought it was a pretty dumb idea and thought Posterous was going to fail, but now, they’re some of their biggest fans.  

When it comes to eBooks, what new usages does it allow that normal books can’t?  Though I might not be able to read it in a tub and the batteries might run out, I could easily look up definitions of a word right there and then without consulting a separate dictionary.  I can potentially make comments on a textbook shared by everyone in the class as study cliff notes in real time.  I can search the reference books and textbooks for exactly what I’m looking for.  I can potentially read a book in another language machine translated by my eBook right there and then.  All these things are things that eBooks can do that regular books can’t–and not only that, introduce new possibilities.

Just as we laughed at predecessors, people in the future are going to laugh at you.  So when you look at Google Buzz, don’t just think about what it does that seems the same as Twitter, but think about what it allows you to do that Twitter can’t.  When you look at Blippy (like twittering your reciepts publicly), don’t just ask why anyone would do that, but also what it allows user to do that they couldn’t before.  Same when Facebook introduces a new feature or layout.  Same when you look at FourSquare and Loopt.  Same when you look at the iPad.  

Of course, that’s not to say that every new thing has potential.  I don’t know that the electric can opener has the same potential.  But that’s more a lack of my own imagination.  Be wary about dismissing something on first glance, especially when you never tried using it yourself.  In all likelihood, a failure of imagination on your part doesn’t mean that something isn’t necessarily there.  

Google Buzz seems more like Friendfeed than Twitter

A lot of people compare Google Buzz to Twitter.  On the surface, it looks more or less the same, where you multicast some sort of status.  But I think there's some fundamental differences that make the usage a little different.

Buzz enables public conversations that you can see all in one place.  In twitter, everyone can be talking about something, but most clients don't show this view unless you explicitly search for a particular hashtag.  And peoples' replies to a status is fragmented across the board–if you don't subscribe to someone, you can't hear what they're saying.  So conversations are regulated to people that know each other directly.  Buzz and Friendfeed takes it one step further and make a semi-public conversation between friends and friends of friends possible.

Friendfeed's insight is that people like to converse around something and making it easy to come up with topics.  In fact, Friendfeed–and subsequently Google Buzz–are basically forums and bulletin boards that have really lightweight thread creation.  So lightweight, in fact, instead of having to come up with a topic, you have topics implicit in the activities that you do online.  Posted pictures on Flickr?  Your friends can talk about it, even if they don't know each other.  Listening to music posted on your gchat status?  Your friends can make other suggestions or berate your bad taste.

What Google was able to do that Friendfeed needed a couple years' of traction to do was to tie it to already existing google services without asking.  That provides a lot of topics of conversations without setting anything up and people can get started right away.  

Facebook has all the mechanisms I've described above, with the exception that culturally, it's private.  I've rarely seen friends of mine talk to each other through the commments unless they know each other already.  Google starts semi-public, and it gives people permission to talk to each other, even if they don't know each other.  That, I think is a plus.

Linking data in XML and HATEOAS

Earlier, I had talked about how a description language is not a good idea for REST services.  

To continue in that vein, we start with Tim Berners-Lee.  Tim Berners-Lee was talking at TED about linked data as being the next step for the web.  He's thought deeply on the formats and protocols of the web, and I think he's right about the overall benefits of linking data.  If you could link data together, you can easily join different datasets together simply by traversing it.

However, for a programmer implementing a web app, there's no immediate benefit of linking data.  It doesn't show up in web browsers, usable semantic browsers are pretty much non-existent (maybe Disco), and none of the web frameworks makes it easy to link data.  The fact that none of the maintainers of Rails, Django, etc do is indicative of the high cost to benefit ratio of doing so.  

Taking a look at the Linked Data homepage, I felt the barrier was pretty high and heavyweight just to link data together.  You'd not only have to learn RDF, but also OWL and SPARQL.  And a simple search for RDF projects in github only reveals one project (reddy) with any attention from other devs with 3 forks and 39 watchers.  It seems overcomplicated to have a separate RDF file linking data together.

While having ontologies is great, I don't think it's a low hanging fruit.  I was searching about REST when I tripped on a concept called HATEOAS.  It is a design constraint of REST that gets overlooked as using hypermedia as the engine of application state.  

Given that idea, here's the punchline:  Why don't we link data directly from within the XML data?  Instead of messing around with RDFs, why can't we link in XML?  Here's part of the data returned by the Sunlight Foundation's API when I query for a single legislator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<response>
<legislators>
<legislator>
  <firstname>Neil</firstname>
  <lastname>Abercrombie</lastname>
  <email>Neil.Abercrombie@mail.house.gov</email>
  <official_rss>http://www.house.gov/apps/list/press/hi01_abercrombie/RSS.xml</official_rss>
  <website>http://www.house.gov/abercrombie/</website>
  <congresspedia_url>http://www.opencongress.org/wiki/Neil_Abercrombie</congresspedia_url>
  <state>HI</state>
  <twitter_id>neilabercrombie</twitter_id>
</legislator>
</legislators>
</response>

Notice that some of the fields are pointer to URLs, but for the twitter_id, it only gives the twitter id.  With the state, it only states "HI" for Hawaii.  Why not point twitter attribute at the URL of the twitter API?  Instead of stating the state, why not link it to geonames.org?  It might look something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<response>
<legislators>
<legislator>
  <firstname>Neil</firstname>
  <lastname>Abercrombie</lastname>
  <email>Neil.Abercrombie@mail.house.gov</email>
  <official_rss>http://www.house.gov/apps/list/press/hi01_abercrombie/RSS.xml</official_rss>
  <website>http://www.house.gov/abercrombie/</website>
  <congresspedia_url>http://www.opencongress.org/wiki/Neil_Abercrombie</congresspedia_url>
  <twitter xlink:href="http://twitter.com/statuses/user_timeline/neilabercrombie.xml">neilabercrombie</twitter>
  <state xlink:href="http://ws.geonames.org/search?name_equals=hawaii&country=US">HI</state>
</legislator>
</legislators>
</response>

This way, you can traverse from XML document to XML document.  So if you needed to look up further information about the attribute "state" with value "HI", you can do so by following the link to http://ws.geonames.org/search?name_equals=hawaii&country=US.  So in your application, you can traverse it as if it was composed data.  Now, when you execute @legislator.state, you don't only get back "HI", but you get back another set of data with attributes returned from geonames for the state of Hawaii.

You don't need to link to just other web services, but you can link back to your own API.  Instead of having just the district number our legislator works in, Sunlight should link back to its own API for districts.  When you do this, you push the burden of maintaining application state to the client.  The state of the client is merely the XML document it last requested.  

And more importantly, if different methods in your API need a specific order to be called, no longer will you need to state this in the documentation of your API.  The only allowable methods to be called are the only href links in the XML document.  It's best described by an example quoted by subbu:

There are three pages in a UI. The first page has a link to go to the second page. The second page has a link to go to the previous page as well as the third page. The third has a link to the second page and another link to the first page.

A client starts from the first page, and then through the link on that page, goes to the second page. The fact that this page has one link to the first page and another to the third page implies that the current state of the application (i.e. the interactions) is that "the client is viewing the second page". That is what it means by hypermedia as the engine of application state. It does not necessarily mean serializing application state, such as "<page>2</page>" into representations. 

Right now, our REST APIs are returning XML with just IDs.  It's up to you to figure out what they're pointing to.  It's like if we had webpages that just said "next page" and expect you to know which URL to go to and just change it in the address bar of your browser

Obviously, I'm not the first to think about this.  Tim Bray has talked about linking plain ole XML, and others have mentioned the xlink for xml, but nowhere in my searches did there seem to be any explicit connection with Tim-Berners Lee's Linked data or with finding a consistent way to access REST APIs.  None of the XML data returned by REST APIs had links to them, and none of the API wrapper libraries I've used tried to traverse to a different REST API URI using links in the XML documents.

It seems like a really simple solution to linking data and it's way overlooked.  When it all came together for me, it seemed like something people would be excited about, but doing a search on google, google trends, and google adsense keywords, no one seems to be talking about it.  

Anyone know why XLink was abandoned, or why linked data doesn't follow this concept?

If you forked technomancy’s emacs-starter-kit, upgrade nXhtml

UGH.  I recently forked technomancy’s emacs-starter-kit, and all was well, until I started editing erb files.  After a while, it would stutter (hang and then go), eating up 94% cpu while editing erb files.  I didn’t know exactly what was causing it.  Emacs kept complaining about “MU new post-command chunk” in the *Message* buffer, and something about “not safe forward word”.  It took a whole night’s worth of digging to figure this out.  
Solution: make sure you upgrade to the latest version of nXhtml (2.05-091202), if you forked technomancy’s emacs-starter-kit commit 452c395556e0aac213b0d7d1653f2673554a4b73 or earlier.
Sometimes, I hate emacs.

Edit: looks like I spoke too soon.  It still didn’t fix the problem.  I’m removing nXhtml to find an erb replacement.  If that doesn’t work, I’m just going back to my own emacs config file.  At least there, I’ll know what I recently added that messed it all up.

I want to use SASS, but not HAML.

The title of the post is more-or-less a verbatim quote from a coworker, as well as from an unrelated colleague of mine from a previous job. I was asked that question and, to be honest, I was a little thrown both times. I didn’t actually know why I use HAML.  I guess I’d never really considered it much after I started doing everything with it.  It just seemed better and more fun.

Oddly enough, I want to do Sass, but not Haml.  Html isn’t too hard to begin with, and I guess it’s never bugged me.  

What I do dislike about Haml is the indentation on a larger piece of code.  When I want to shift elements around, suddenly, I’m not sure which level of indentation to put it, let alone make sure all the sub-levels are indented correctly as well.  haml-mode in emacs didn’t seem to help much.

This means either two things:

  1. web markup should be more shallow and I’m doing it wrong.  Perhaps with rails helpers and partials I can compress it more to keep myself DRY.
  2. The very Nature of web markup is a deeply nested tree, and haml’s indentation’s not a right fit for it.  This seems more likely, though again, it depends on the design of the page.  If a page is designed to be with heterogenously abundant page elements, then this is true.

I’ve yet to decide on either.  Perhaps I’ll give Haml another shot.

CSS, on the other hand, has a fairly shallow structure.  In fact, it doesn’t have nested structures at all, and you have to re-declare parent tags.  This lends itself to really long files that I can’t keep track of.  I’ve often wished for color variables too.  SASS indentation is good here because it gives you a way to group your css together in a more compact way with hierarchy.

Hence, HTML is too deep for whitespace indentation, and hence in my opinion, not a good fit for HAML.  And CSS is too shallow, and hence perfect for SASS.

So it’s not that I don’t want to learn HAML because it’s something-new, but because it doesn’t seem to fit a need I have for the markup I’m working with.  SASS, on the other hand, solves a pain problem I’ve had with CSS.