There's a lot of public data out there, and more being added everyday. However, they're added in all sorts of oddball formats, like HTML in crazy formatted tables, CSV, PDFs, Excel files, etc. For web services, at least the data format is easier to parse, like XML and JSON through SOAP and REST APIs.
And there's no uniform way to access and parse all this data.
Even when you just focus on REST APIs, there's no uniform way to access all the APIs. Right now, if you want to use an API, you need a separate wrapper library for each API. Even though REST is simple enough that all you need are URIs and HTTP verbs, without documentation or a service description, you don't know what methods are available to you from the service. Therefore, if you wanted to join data from two different REST APIs in a mashup, you currently need two separate wrapper libraries, then join the data in your application.
Taking a first crack at the problem, you might think that we should have machine-readable descriptions of these web services. There have been some tries at that. One is YQL, which treats the web as a database in which you use SQL-like statements to get and join data from REST APIs.
Another is WADL, a machine-readable description of REST resources–like a WSDL for REST. The idea is if you have a machine readable description of a resource, you can generate API wrapper code for it. Then to join data from two different REST APIs wouldn't require two different interfaces.
These two solutions have the problem of requiring programmers to write an explicit mapping of which service has which URI methods available, along with what parameters they take. YQL uses a global repository of table mappings in a github repo to convert SQL statements to REST methods. With WADL, it requires maintainers of the REST API to both update their API, their documentation, and a WADL file. When the service API changes, the WADL/WSDL could be out of date, unless their web frameworks generates WADLs for them. To my knowledge, none of the major web frameworks do such a thing.
There's even a blog post by Joe Gregorio about whether we need WADL. He says no. I think I agree, though what he proposes is to have a limited number of service descriptors, the same way that we have a limited mime-types. That way, when we see a web service resource description type, we expect it describe a REST API with certain methods, the same way we expect an image/png or text/html file to have certain properties and operators. He may have something, but if descriptions of REST APIs are not the way to go, what's an alternative?