Linking data in XML and HATEOAS

Earlier, I had talked about how a description language is not a good idea for REST services.  

To continue in that vein, we start with Tim Berners-Lee.  Tim Berners-Lee was talking at TED about linked data as being the next step for the web.  He's thought deeply on the formats and protocols of the web, and I think he's right about the overall benefits of linking data.  If you could link data together, you can easily join different datasets together simply by traversing it.

However, for a programmer implementing a web app, there's no immediate benefit of linking data.  It doesn't show up in web browsers, usable semantic browsers are pretty much non-existent (maybe Disco), and none of the web frameworks makes it easy to link data.  The fact that none of the maintainers of Rails, Django, etc do is indicative of the high cost to benefit ratio of doing so.  

Taking a look at the Linked Data homepage, I felt the barrier was pretty high and heavyweight just to link data together.  You'd not only have to learn RDF, but also OWL and SPARQL.  And a simple search for RDF projects in github only reveals one project (reddy) with any attention from other devs with 3 forks and 39 watchers.  It seems overcomplicated to have a separate RDF file linking data together.

While having ontologies is great, I don't think it's a low hanging fruit.  I was searching about REST when I tripped on a concept called HATEOAS.  It is a design constraint of REST that gets overlooked as using hypermedia as the engine of application state.  

Given that idea, here's the punchline:  Why don't we link data directly from within the XML data?  Instead of messing around with RDFs, why can't we link in XML?  Here's part of the data returned by the Sunlight Foundation's API when I query for a single legislator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<response>
<legislators>
<legislator>
  <firstname>Neil</firstname>
  <lastname>Abercrombie</lastname>
  <email>Neil.Abercrombie@mail.house.gov</email>
  <official_rss>http://www.house.gov/apps/list/press/hi01_abercrombie/RSS.xml</official_rss>
  <website>http://www.house.gov/abercrombie/</website>
  <congresspedia_url>http://www.opencongress.org/wiki/Neil_Abercrombie</congresspedia_url>
  <state>HI</state>
  <twitter_id>neilabercrombie</twitter_id>
</legislator>
</legislators>
</response>

Notice that some of the fields are pointer to URLs, but for the twitter_id, it only gives the twitter id.  With the state, it only states "HI" for Hawaii.  Why not point twitter attribute at the URL of the twitter API?  Instead of stating the state, why not link it to geonames.org?  It might look something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<response>
<legislators>
<legislator>
  <firstname>Neil</firstname>
  <lastname>Abercrombie</lastname>
  <email>Neil.Abercrombie@mail.house.gov</email>
  <official_rss>http://www.house.gov/apps/list/press/hi01_abercrombie/RSS.xml</official_rss>
  <website>http://www.house.gov/abercrombie/</website>
  <congresspedia_url>http://www.opencongress.org/wiki/Neil_Abercrombie</congresspedia_url>
  <twitter xlink:href="http://twitter.com/statuses/user_timeline/neilabercrombie.xml">neilabercrombie</twitter>
  <state xlink:href="http://ws.geonames.org/search?name_equals=hawaii&country=US">HI</state>
</legislator>
</legislators>
</response>

This way, you can traverse from XML document to XML document.  So if you needed to look up further information about the attribute "state" with value "HI", you can do so by following the link to http://ws.geonames.org/search?name_equals=hawaii&country=US.  So in your application, you can traverse it as if it was composed data.  Now, when you execute @legislator.state, you don't only get back "HI", but you get back another set of data with attributes returned from geonames for the state of Hawaii.

You don't need to link to just other web services, but you can link back to your own API.  Instead of having just the district number our legislator works in, Sunlight should link back to its own API for districts.  When you do this, you push the burden of maintaining application state to the client.  The state of the client is merely the XML document it last requested.  

And more importantly, if different methods in your API need a specific order to be called, no longer will you need to state this in the documentation of your API.  The only allowable methods to be called are the only href links in the XML document.  It's best described by an example quoted by subbu:

There are three pages in a UI. The first page has a link to go to the second page. The second page has a link to go to the previous page as well as the third page. The third has a link to the second page and another link to the first page.

A client starts from the first page, and then through the link on that page, goes to the second page. The fact that this page has one link to the first page and another to the third page implies that the current state of the application (i.e. the interactions) is that "the client is viewing the second page". That is what it means by hypermedia as the engine of application state. It does not necessarily mean serializing application state, such as "<page>2</page>" into representations. 

Right now, our REST APIs are returning XML with just IDs.  It's up to you to figure out what they're pointing to.  It's like if we had webpages that just said "next page" and expect you to know which URL to go to and just change it in the address bar of your browser

Obviously, I'm not the first to think about this.  Tim Bray has talked about linking plain ole XML, and others have mentioned the xlink for xml, but nowhere in my searches did there seem to be any explicit connection with Tim-Berners Lee's Linked data or with finding a consistent way to access REST APIs.  None of the XML data returned by REST APIs had links to them, and none of the API wrapper libraries I've used tried to traverse to a different REST API URI using links in the XML documents.

It seems like a really simple solution to linking data and it's way overlooked.  When it all came together for me, it seemed like something people would be excited about, but doing a search on google, google trends, and google adsense keywords, no one seems to be talking about it.  

Anyone know why XLink was abandoned, or why linked data doesn't follow this concept?
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s