The 3 body problem in Erlang

The three body problem is a simulation of the paths objects with mass in space would take, if all three had gravity effects on each other. With two bodies, the problem can be solved analytically, as done by Kepler. But with three bodies, the paths are chaotic. If you just hit play on the last link, and watch for a minute, you’ll see what I mean. And that’s the easy version of the problem, since the two suns are fixed. If they were three bodies of comparable masses, then it’d be even harder.

From Ezra: http://ezrakilty.net/research/2006/02/3body_problem_in_erlang.html

The first conceptual problem I hit was the question of synchronization. In order for the sim to be a decent finite approximation to the continuous world of physics, we need to break it into discrete time steps, each of which depends on the previous time step (at least, that’s the only way I know of to get a fair approximation). This means that each particle can’t just crunch away at it’s own speed, working as fast as it can to calculate its position at various times. Easily the particles could grow out of sync with one another, making an inaccurate physical model.

I hadn’t thought about this, but I think Ezra is right. In terms of simulation of the 3 body problem, if the correct calculation in the future depends on current calculations, and the current calculations depend on each other, you need to make sure that the calculations are ‘in step’.

This calls into question my thought before that asynchronous simulations would work, since whenever the messages arrive, that’s when they arrive and process them. In a decentralized simulation of termites gathering wood chips, I imagine an asynchronous simulation would suffice. It doesn’t really matter what exact paths the termites take, but rather, the end result of that chaos. But in a gravity simulation, asynchronous simulation doesn’t seem to work, because what you’re interested in is the actual paths.

If the calculations of all other threads must be synchronous or in lockstep, it would seem like it would give an upper bound to how fast the simulation can go, even in a multi-threaded environment. Since the calculations will be wrong, the further into the future you calculate with slightly incorrect values, what kind of useful computations can you do if you don’t have all the initial conditions in your formula?

The only thing I can think of is if you had different sets of three threads–one for each mass–processing the simulation at different simulation times, you can reduce the processing load for the trailing set of threads. So say you had a leading set of threads that operated on simulation time of t + n always. That leading set can narrow the scope of possible answers. Since it knows it’s operating on a chaotic system, it knows that what the error is given a certain lead time of n. Therefore, it should be able to limit the upper and lower bound of the possible right answers. Then, the trailing set of threads that operate on simulation time of t, only has to adjust the error, which hopefully is less computationally intensive.

Advertisements

Google Gears Lets Developers Take Apps Offline

Google Gears Lets Developers Take Apps Offline

This is certainly newsworthy. Google announced Gears, which is something that you install on your desktop to be able to operate online applications offline. I remember about 3 to 5 years ago when Google said, no, we’re not interested in desktop, because it’s not what we’re good at. We’re doing search.

If anything I think they learned from Netscape’s mistake in the past. Marc Andersen, the founder of Netscape, announced that, as a startup, they were taking on Microsoft, and was going to beat it to the ground. Of course, when you use strong words like that, you’re going to get Bill Gate’s attention, and it’s always dangerous to wake a sleeping dragon, when you’re not bigger yourself.

Despite the ever growing ubiquity of wireless connections and connectivity all around, I think there’s still a place for offline applications. This sort of thing to me, isn’t really about being able to do your work on planes, though it’s certainly useful for that. To me, this is about caching results that the user might possibly want to see/do next, so that the user experience is fast and responsive without possible network latency. While AJAX is fast, and tolerable for most things, I imagine that there will be some applications that can make good use of this type of offline caching mechanism, so that what was impossible before is now possible.

Of course, caching is irrelevant when the bandwidth is high, but you will either find yourself 1) in places where bandwidth is lower or 2) the bandwidth requirement for your dataset is higher than what you currently have. Mapping applications come to mind as benefiting a lot from caching mechanisms. And if bandwidth jumps up, that makes caching in mapping applications obsolete, there will be other datasets that will be too large to stream in the future. I can only imagine classifiers or their training data sets being one example, as well as a record of the user’s digital life.

Update: I didn’t mention this, but I think it makes even more sense for mobile devices, per this opengardens post on it.

Internet retailer connects with TextPayMe – Puget Sound Business Journal (Seattle):

Internet retailer connects with TextPayMe – Puget Sound Business Journal (Seattle):

This is a bit of interesting news to me. Originally, I was wondering where textpayme was going to get its traction. Amazon apparently is the answer. While mobile payments is an idea that’s not new, and is currently implemented in Japan and other Asian countries, it still doesn’t have much traction in the states.

I’m a bit amazed at what’s possible through the limited common cell phone interface. Mobile phones are gaining in power, and lots of people anticipate that something’s around the corner, but no one knows on which platform, and what’s going to be built on top of it. Others are more dismissive, saying that mobile phones are underpowered devices that have limited interface, and therefore not exciting to develop on, compared to the web and desktop.

I think they forget that many said the same about web applications when they first started in the 90’s. Web applications back then were clunky at best, and didn’t boast a responsiveness until two years ago, when google maps came out.

In addition, mobile devices aren’t just small desktops. They have characteristics that are unique to the platform. They are always on, considered personal devices, always on a person, always connected, has sensors on it, and knows its location. While mobile video and music currently offered by the carriers is nice, I don’t think it quite hits the spot yet. The mobile application that plays to the platform’s strengths will be the one that hits it.

Google maps street view released~

2140 Taylor St, San Francisco, CA 94133 – Google Maps

Google just released street views in SF. You can see the street corners of the city as if you were actually there. That’s kinda amazing. At this point, they have enough man power to go and scour a city I suppose…or they bought this information from another company (or the whole company itself) that was doing this. I like how the street arrows tell you which way is north when you’re panning around.

I think the potential for this is in augmented reality for mobile devices. I would not be surprised if Google either released a mobile phone or a mobile phone application that allowed you to do the panning in real-time, all the while telling you which way is north, as well as where the nearest gas-station/food/store is.

What I think they’ll miss is the potential for social information such as, where your friends are all gathering. I’d chalk it up to Loopt or Facebook to see the potential for that.

Upgrading backgroundrb

When upgrading to backgroundrb 0.2.x, make sure you delete the old ./script/backgroundrb directory. Also make sure that backgroundrb.yml is renamed/deleted. After installing, run “rake backgroundrb:setup” to generate the appropriate setup files.

If you get something like
“LoadError: no such file to load — slave”

Make sure you have the gems, slave 1.1.0+ and daemon 1.0.2+ installed.

Then, according to the mailing list, if you get:
“ERROR: there is already one or more instance(s) of the program running”

Make sure you delete the old log/backgroundrb.pid . Thing should work after this.

Exploring: reCAPTCHA: A new way to fight spam

Exploring: reCAPTCHA: A new way to fight spam

This particular piece of news has been floating around lately. It’s a CAPTCHA service that also uses the CAPTCHA information entered by users to teach computers how to digitize books.

It’s so freakin’ obvious, I slapped myself on the forehead. I even advocated and watched Luis Von Ahn’s videos on human computation, and didn’t think about it. Anyway, it seems a little bit odd, though, using a technique that computers can’t solve to teach computers how to read–hence solve CAPTCHAs. Not knowing enough details–I wonder if the success of reCAPTCHA will call for the demise of the CAPTCHA.

The usual concerns of cheating were rampant on reddit comments. “What if people just put in random stuff? Then you’ll have a computer that spew out crap when digitizing books.” If his lecture on the ESP game was any indication, he has a number of ways to fight it (not to mention he specializes in online cheating also). In the ESP game, he counteracts cheating by giving the player a couple ones he knows the answers to and sees how much they’re off. Also, he keeps track of the statistics for each image as well as throwing away results randomly. It’s a little hard to see how he’ll track individual users–other than through their IP–but otherwise, one can feasibly use the same methods for reCAPTCHA.

Naive Bayesian Classifier Hashed and Slashed

I know, I’ve been bad at posting over the course of the week, where readers would be more likely to be reading this stuff at work. But I have been busy myself too…mostly thinking about non-technical stuff.

Well, some weeks ago, I spent some time looking at Bayseian Probabilities. I learned it way back in high school, though I never gave it much thought. Then I stumbled on it again in college in my ECE courses, and nearly failed the class. And then I took it again in grad school, and though I did well enough in the class, I still felt weak in probability.

This time, when implementing a bayesian classfifier, I learned in and outs of a seemingly simple naive bayesian classifier, and I learned how to spell ‘bayesian’. That ‘e’ after the ‘y’ gets me every time.

Anyway, I was looking at Paul Graham’s famous a plan for spam, and I couldn’t figure out where he got that last formula. Kinda mad at myself for not seeing it earlier, cuz, it really is pretty simple. Anyway, it took me a while, but I worked it out. Turns out I learned more when I implemented it…or rather, I would have learned it had I concentrated the first two times.

We know Bayes Theorem is as follows:

(1) P(a,b) = P(a|b) * P(b) = P(b|a) * P(a)

With some algebra of the above we can also derive

(2) P(a|b) = P(b|a) * P(a) / P(b)

But also note that if we take (1), and put a given ‘d’ behind it, it’d hold true if the probabily on the other side also had a given ‘d’ behind it. If you draw out the Venn Diagrams, you’ll see this is true.

(3) P(a,b|d) = P(a|b,d) * P(b|d) = P(b|a,d) * P(a|d)

We also have the Total Probability Rule, which says that the total probability is made up of its parts. If you apply bayes rule, in (1), you’ll see that it’s true.

(4) P(b) = P(b|a) * P(a) + P(b|a’) * P(a’)

So this means that Baye’s rule in (2) can be rewritten with (4) as:

(5) P(a|b) = P(b|a) * P(a) / (P(b|a) * P(a) + P(b|a’) * P(a’))

We also need the Probability Chain Rule. It says that the joint probability of a, b, and c can be rewritten as the following due to equation (1), applied over and over again:

(6) P(a,b,c) = P(a,b|c) * P(c) = P(a|b) * P(b|c) * P(c)

And lastly, the Independence Rule, which makes the bayesian classifier naive:

(7) P(a,b) = P(a|b) * P(b) => P(a) * P(b) iff “a” indp. from “b”

Now, we can solve for what’s the probability of spam given these joint probability of words, where each word is considered an orthogonal and independent dimension?

P(s|f0, f1) = P(f0, f1|s) * P(s) / 
(1) P(f0, f1)
= P(f0, f1|s) * P(s) /
(4) (P(f0, f1|s) * P(s) + P(f0, f1|s') * P(s'))
= P(f0|f1,s) * P(f1|s) * P(s) /
(6) (P(f0|f1,s) * P(f1|s) * P(s) + P(f0|f1,s') * P(f1|s') * P(s')
= P(f0|s) * P(f1|s) * P(s) /
(6) (P(f0|s) * P(f1|s) * P(s) + P(f0|s') * P(f1|s') * P(s')
~= P(f0|s)*..*P(fn|s) * P(s) /
(7) (P(f0|s)*..*P(fn|s) * P(s) + P(f0|s')*..*P(fn|s') * P(s'))
~= P(f0|s)*..*P(fn|s) /
(P(f0|s)*..*P(fn|s) + P(f0|s')*..*P(fn|s'))

The last step needs a little explaining. all the P(s) and P(s’) drop out of the equation when we’re doing a classifier, since for any piece of evidence, f0…fn, the P(s), the probability of spam occurring, is alway the same across any classification. Since P(s) is constant, and P(s’) is (1 – P(s)), it is also constant. Therefore, when we are comparing the values to determine if it belong in the spam or ham category, we can get rid of constants.

The actual hard part about bayesian classifiers is how to estimate the underlying probability distribution. If you’ve never seen a piece of evidence in training, you’re going to say the probability of it occurring is zero, which isn’t correct if the evidence shows up during classification. There’s various techniques for dealing with this, mostly under the term ‘smoothing’. I won’t describe the various techniques here, but that should be enough to get you started.