The 3 body problem in Erlang

The three body problem is a simulation of the paths objects with mass in space would take, if all three had gravity effects on each other. With two bodies, the problem can be solved analytically, as done by Kepler. But with three bodies, the paths are chaotic. If you just hit play on the last link, and watch for a minute, you’ll see what I mean. And that’s the easy version of the problem, since the two suns are fixed. If they were three bodies of comparable masses, then it’d be even harder.

From Ezra: http://ezrakilty.net/research/2006/02/3body_problem_in_erlang.html

The first conceptual problem I hit was the question of synchronization. In order for the sim to be a decent finite approximation to the continuous world of physics, we need to break it into discrete time steps, each of which depends on the previous time step (at least, that’s the only way I know of to get a fair approximation). This means that each particle can’t just crunch away at it’s own speed, working as fast as it can to calculate its position at various times. Easily the particles could grow out of sync with one another, making an inaccurate physical model.

I hadn’t thought about this, but I think Ezra is right. In terms of simulation of the 3 body problem, if the correct calculation in the future depends on current calculations, and the current calculations depend on each other, you need to make sure that the calculations are ‘in step’.

This calls into question my thought before that asynchronous simulations would work, since whenever the messages arrive, that’s when they arrive and process them. In a decentralized simulation of termites gathering wood chips, I imagine an asynchronous simulation would suffice. It doesn’t really matter what exact paths the termites take, but rather, the end result of that chaos. But in a gravity simulation, asynchronous simulation doesn’t seem to work, because what you’re interested in is the actual paths.

If the calculations of all other threads must be synchronous or in lockstep, it would seem like it would give an upper bound to how fast the simulation can go, even in a multi-threaded environment. Since the calculations will be wrong, the further into the future you calculate with slightly incorrect values, what kind of useful computations can you do if you don’t have all the initial conditions in your formula?

The only thing I can think of is if you had different sets of three threads–one for each mass–processing the simulation at different simulation times, you can reduce the processing load for the trailing set of threads. So say you had a leading set of threads that operated on simulation time of t + n always. That leading set can narrow the scope of possible answers. Since it knows it’s operating on a chaotic system, it knows that what the error is given a certain lead time of n. Therefore, it should be able to limit the upper and lower bound of the possible right answers. Then, the trailing set of threads that operate on simulation time of t, only has to adjust the error, which hopefully is less computationally intensive.

Advertisements

Google Gears Lets Developers Take Apps Offline

Google Gears Lets Developers Take Apps Offline

This is certainly newsworthy. Google announced Gears, which is something that you install on your desktop to be able to operate online applications offline. I remember about 3 to 5 years ago when Google said, no, we’re not interested in desktop, because it’s not what we’re good at. We’re doing search.

If anything I think they learned from Netscape’s mistake in the past. Marc Andersen, the founder of Netscape, announced that, as a startup, they were taking on Microsoft, and was going to beat it to the ground. Of course, when you use strong words like that, you’re going to get Bill Gate’s attention, and it’s always dangerous to wake a sleeping dragon, when you’re not bigger yourself.

Despite the ever growing ubiquity of wireless connections and connectivity all around, I think there’s still a place for offline applications. This sort of thing to me, isn’t really about being able to do your work on planes, though it’s certainly useful for that. To me, this is about caching results that the user might possibly want to see/do next, so that the user experience is fast and responsive without possible network latency. While AJAX is fast, and tolerable for most things, I imagine that there will be some applications that can make good use of this type of offline caching mechanism, so that what was impossible before is now possible.

Of course, caching is irrelevant when the bandwidth is high, but you will either find yourself 1) in places where bandwidth is lower or 2) the bandwidth requirement for your dataset is higher than what you currently have. Mapping applications come to mind as benefiting a lot from caching mechanisms. And if bandwidth jumps up, that makes caching in mapping applications obsolete, there will be other datasets that will be too large to stream in the future. I can only imagine classifiers or their training data sets being one example, as well as a record of the user’s digital life.

Update: I didn’t mention this, but I think it makes even more sense for mobile devices, per this opengardens post on it.

Internet retailer connects with TextPayMe – Puget Sound Business Journal (Seattle):

Internet retailer connects with TextPayMe – Puget Sound Business Journal (Seattle):

This is a bit of interesting news to me. Originally, I was wondering where textpayme was going to get its traction. Amazon apparently is the answer. While mobile payments is an idea that’s not new, and is currently implemented in Japan and other Asian countries, it still doesn’t have much traction in the states.

I’m a bit amazed at what’s possible through the limited common cell phone interface. Mobile phones are gaining in power, and lots of people anticipate that something’s around the corner, but no one knows on which platform, and what’s going to be built on top of it. Others are more dismissive, saying that mobile phones are underpowered devices that have limited interface, and therefore not exciting to develop on, compared to the web and desktop.

I think they forget that many said the same about web applications when they first started in the 90’s. Web applications back then were clunky at best, and didn’t boast a responsiveness until two years ago, when google maps came out.

In addition, mobile devices aren’t just small desktops. They have characteristics that are unique to the platform. They are always on, considered personal devices, always on a person, always connected, has sensors on it, and knows its location. While mobile video and music currently offered by the carriers is nice, I don’t think it quite hits the spot yet. The mobile application that plays to the platform’s strengths will be the one that hits it.

Google maps street view released~

2140 Taylor St, San Francisco, CA 94133 – Google Maps

Google just released street views in SF. You can see the street corners of the city as if you were actually there. That’s kinda amazing. At this point, they have enough man power to go and scour a city I suppose…or they bought this information from another company (or the whole company itself) that was doing this. I like how the street arrows tell you which way is north when you’re panning around.

I think the potential for this is in augmented reality for mobile devices. I would not be surprised if Google either released a mobile phone or a mobile phone application that allowed you to do the panning in real-time, all the while telling you which way is north, as well as where the nearest gas-station/food/store is.

What I think they’ll miss is the potential for social information such as, where your friends are all gathering. I’d chalk it up to Loopt or Facebook to see the potential for that.

Upgrading backgroundrb

When upgrading to backgroundrb 0.2.x, make sure you delete the old ./script/backgroundrb directory. Also make sure that backgroundrb.yml is renamed/deleted. After installing, run “rake backgroundrb:setup” to generate the appropriate setup files.

If you get something like
“LoadError: no such file to load — slave”

Make sure you have the gems, slave 1.1.0+ and daemon 1.0.2+ installed.

Then, according to the mailing list, if you get:
“ERROR: there is already one or more instance(s) of the program running”

Make sure you delete the old log/backgroundrb.pid . Thing should work after this.

Exploring: reCAPTCHA: A new way to fight spam

Exploring: reCAPTCHA: A new way to fight spam

This particular piece of news has been floating around lately. It’s a CAPTCHA service that also uses the CAPTCHA information entered by users to teach computers how to digitize books.

It’s so freakin’ obvious, I slapped myself on the forehead. I even advocated and watched Luis Von Ahn’s videos on human computation, and didn’t think about it. Anyway, it seems a little bit odd, though, using a technique that computers can’t solve to teach computers how to read–hence solve CAPTCHAs. Not knowing enough details–I wonder if the success of reCAPTCHA will call for the demise of the CAPTCHA.

The usual concerns of cheating were rampant on reddit comments. “What if people just put in random stuff? Then you’ll have a computer that spew out crap when digitizing books.” If his lecture on the ESP game was any indication, he has a number of ways to fight it (not to mention he specializes in online cheating also). In the ESP game, he counteracts cheating by giving the player a couple ones he knows the answers to and sees how much they’re off. Also, he keeps track of the statistics for each image as well as throwing away results randomly. It’s a little hard to see how he’ll track individual users–other than through their IP–but otherwise, one can feasibly use the same methods for reCAPTCHA.

Naive Bayesian Classifier Hashed and Slashed

I know, I’ve been bad at posting over the course of the week, where readers would be more likely to be reading this stuff at work. But I have been busy myself too…mostly thinking about non-technical stuff.

Well, some weeks ago, I spent some time looking at Bayseian Probabilities. I learned it way back in high school, though I never gave it much thought. Then I stumbled on it again in college in my ECE courses, and nearly failed the class. And then I took it again in grad school, and though I did well enough in the class, I still felt weak in probability.

This time, when implementing a bayesian classfifier, I learned in and outs of a seemingly simple naive bayesian classifier, and I learned how to spell ‘bayesian’. That ‘e’ after the ‘y’ gets me every time.

Anyway, I was looking at Paul Graham’s famous a plan for spam, and I couldn’t figure out where he got that last formula. Kinda mad at myself for not seeing it earlier, cuz, it really is pretty simple. Anyway, it took me a while, but I worked it out. Turns out I learned more when I implemented it…or rather, I would have learned it had I concentrated the first two times.

We know Bayes Theorem is as follows:

(1) P(a,b) = P(a|b) * P(b) = P(b|a) * P(a)

With some algebra of the above we can also derive

(2) P(a|b) = P(b|a) * P(a) / P(b)

But also note that if we take (1), and put a given ‘d’ behind it, it’d hold true if the probabily on the other side also had a given ‘d’ behind it. If you draw out the Venn Diagrams, you’ll see this is true.

(3) P(a,b|d) = P(a|b,d) * P(b|d) = P(b|a,d) * P(a|d)

We also have the Total Probability Rule, which says that the total probability is made up of its parts. If you apply bayes rule, in (1), you’ll see that it’s true.

(4) P(b) = P(b|a) * P(a) + P(b|a’) * P(a’)

So this means that Baye’s rule in (2) can be rewritten with (4) as:

(5) P(a|b) = P(b|a) * P(a) / (P(b|a) * P(a) + P(b|a’) * P(a’))

We also need the Probability Chain Rule. It says that the joint probability of a, b, and c can be rewritten as the following due to equation (1), applied over and over again:

(6) P(a,b,c) = P(a,b|c) * P(c) = P(a|b) * P(b|c) * P(c)

And lastly, the Independence Rule, which makes the bayesian classifier naive:

(7) P(a,b) = P(a|b) * P(b) => P(a) * P(b) iff “a” indp. from “b”

Now, we can solve for what’s the probability of spam given these joint probability of words, where each word is considered an orthogonal and independent dimension?

P(s|f0, f1) = P(f0, f1|s) * P(s) / 
(1) P(f0, f1)
= P(f0, f1|s) * P(s) /
(4) (P(f0, f1|s) * P(s) + P(f0, f1|s') * P(s'))
= P(f0|f1,s) * P(f1|s) * P(s) /
(6) (P(f0|f1,s) * P(f1|s) * P(s) + P(f0|f1,s') * P(f1|s') * P(s')
= P(f0|s) * P(f1|s) * P(s) /
(6) (P(f0|s) * P(f1|s) * P(s) + P(f0|s') * P(f1|s') * P(s')
~= P(f0|s)*..*P(fn|s) * P(s) /
(7) (P(f0|s)*..*P(fn|s) * P(s) + P(f0|s')*..*P(fn|s') * P(s'))
~= P(f0|s)*..*P(fn|s) /
(P(f0|s)*..*P(fn|s) + P(f0|s')*..*P(fn|s'))

The last step needs a little explaining. all the P(s) and P(s’) drop out of the equation when we’re doing a classifier, since for any piece of evidence, f0…fn, the P(s), the probability of spam occurring, is alway the same across any classification. Since P(s) is constant, and P(s’) is (1 – P(s)), it is also constant. Therefore, when we are comparing the values to determine if it belong in the spam or ham category, we can get rid of constants.

The actual hard part about bayesian classifiers is how to estimate the underlying probability distribution. If you’ve never seen a piece of evidence in training, you’re going to say the probability of it occurring is zero, which isn’t correct if the evidence shows up during classification. There’s various techniques for dealing with this, mostly under the term ‘smoothing’. I won’t describe the various techniques here, but that should be enough to get you started.

Hackszine.com: Detecting and reducing power consumption in Linux

Hackszine.com: Detecting and reducing power consumption in Linux

Power consumption was usually something of secondary concern for desktop computer engineers for a while. But not so today. When you’re cramming all those transistors in such a small space, operating at high speeds, power definitely becomes an issue. Now that mobile and embedded devices are experiencing their slow infiltration of our daily lives, power should be on the table for improvement. Google has been doing some work for this, and they now build their own power supplies, and even build their data centers where power is cheaper (old news, so I won’t link it).

Battery life is one area of computing that really is way behind. I’m still hoping for some feat of chemical engineering to save us…but hopefully, in this time of scarce energy for mobile devices will drive some creative hardware engineering.

I have hope for Facebook being the new Google

Facebook just released facebook marketplace, where its members can sell things, like housing, jobs, or textbooks. Strategically, this makes a lot of sense, since it’s something that’s actually useful to its members, especially if it ties your social network information into what you want to buy and sell. From the looks of it though, it doesn’t do that. But I’m sure someone at Facebook is thinking about it.

Facebook is social networking done right–at least better than any competitors that I’ve seen. On the surface they might all seem the same; there’s a personal profile page, there’s a list of friends, and you can send messages back and forth with each other. However, I think there’s some critical differences.

MySpace has a larger user base, but it is largely seen by its owners as a platform for media advertising. It’s an unsupported assertion, but given its mishmash feature set and large ads, it’s hard to think otherwise.

Friendster was the leader for quite some time, but has since lost the attention of the under 25 demographic (anecdotal evidence). Their mistake was adding things that were technically neat, but ultimately made the site too slow to use. It’s a lot better now, and people are still using it. But based on the features they’ve put out it seems like they are interested in helping people publishing media to a user’s personal network–using blogs, videos, etc. However, no news trickles of them attracting otaku developers, and I’m sure firing the now founder of Renkoo didn’t help win over the hearts and minds of otaku developers.

On the other hand, Facebook is seen by its owners as a platform for technology driven innovation to help keep up social interactions between individuals. I’m not sure when the transition happened, but it was more evident to me after news feeds were released. Now, most people were vehemently opposed to it, but I saw it as two things.

First, it was a feedback mechanism to open up sharing. The more you share about yourself to your friends, the more you appear on their radar, and the more interaction you’ll interact/message them. This seems to be inline with the goal of keeping people talking with each other.

Secondly, it was the basis of publishing personal news without even having to push a button. We all gather news about the world, but beyond CNN, there’s also another type of news we’re interested in–information about what our trusted friends are doing. Blogs lets you publish just by pushing a button. Facebook Mini-feeds lets you publish just by doing what you normally do on Facebook. It’s not inconceivable that in the future, you can also publish from your mobile that you have free time to chill out, and people can just join you to hang out because they saw that you were available in their mini-feed on their mobiles.

Facebook is pulling ahead in terms of their feature offerings because they seem to be able to attract developers that are the otaku of programmers that are willing to innovate something that’s actually useful to their users. Which other social network puts programming puzzles in their mini-feeds? Which other social network has an API? The alacrity in which they deploy features is stunning as well. They implemented twitter pretty easily by listing their status updates. It is in this way that I see them being a ‘new Google’–they are setting themselves up as a hacker’s paradise and attracting otaku programmers that way.

When Zuckerberg held out against getting brought out, he was either being greedy or he had future plans on what he would be able to do with a social network. Most of the press criticized him for being the former, but it’s looking like it’s the latter. As long as Facebook is useful for their users, there’s value in the social network data that can be used by future applications. If they can establish themselves as the standard platform from which all social information about an individual is gathered through their API, this world would be a changed place, just as Google changed the world with its technology.