All heil the Robot Jesus

I came across a fairly hillarious group of people who believe they are doing Enormous Good to the Mankind. No, I'm not speaking of some Christian evangelists, but that was a very close guess. I'm speaking of some singularitarians. Not the ones that hang around Kruzweil. Ones that have a a recently deconverted Christian for a director (as of 2012), which may seem low to point out, but religious approach is what essentially distinguishes what I am speaking of from the more mainstream singularitarian crowd that gathers around actual AI scientists such as Kruzweil.

I am speaking of Singularity Institute which "exists to ensure that the creation of smarter-than-human intelligence benefits society.". These folks make estimates of their performance which range all the way up to saving 8 lives per dollar that they are given . A brief introduction of the characters. Rain is one of the people that they somehow managed to talk into giving them money. Vladimir Nesov and Michael Vassar whom explain why it is not rational to react to such claims with incredulity, are from the Singularity Institute. Must feel good to be that good! You donate a mere ten thousands, and you have averted a war crime - for this exclusive low price of only ten thousands. Why, if you accept such argument and you're a super-nerd, they're happy to inform you that you should not donate to anyone else.

Such estimates are all too easy to dismiss as a Pascal's wager, and the whole thing is easy to overlook as a scam based on the appearances alone (Though I would say that at some level they truly believe). Yet seeing that some people are actually falling for this amusing misapplication of arithmetics, it seems that a counter argument is necessary.

Here's the deal: this is not an estimate and the result is not an expected utility nor an estimate of it.
An estimate would need to consider a representative, sufficiently large sample of possible consequences, which would include negative ones. One argument is not an estimate of a sum over all arguments.
For instance, they may annoy editors of important journals, making actual safety research harder to publish (Pei Wang is an accomplished AI scientist that was educated in China, Luke Muehlhauser is an arrogant formerly devout Christian with no background in anything, and Eliezer Yudkowsky is the guy that Luke Muehlhauser thinks has the best insights about artificial intelligence. And it's not an empty stereotype that Chinese really dislike arrogance). There's a Russian proverb that a stupid friend is worse than a smart enemy.

If that doesn't concern you - there's more technical concerns to chew on:

There's an urban legend I heard once or twice - may be a true story - about a nuclear reactor technician who accidentally mis-wired control rod servos - making them move in the opposite direction. There's a true story of Chernobyl accident where the graphite tips on the control rods - which would increase the controllability and safety, at a brief glance - led to much same effect.

Suppose we actually find a way to encode some sort of concept of human well being or extrapolated human desires into an AI. That sounds good, doesn't it? Surely that would be a net benefit? Warning, I am about to privilege a scary hypothesis as an illustration, and do some nonsense arithmetics:

All goes well and a Friendly AI is developed. But a terrible mistake may be made - for what ever simple or complicated reason, the sign of that utility is negated. Maybe someone made a typo. Maybe the specification had a mistake. Maybe the decision theory employed by this AI has a subtle flaw in how it's processing it's hypotheses - a buffer overrun of some kind, or a subtler issue still.

Now that's an Unfriendly¹ AI with a capital 'Un'- an AI that will pretty much maximize human suffering. How bad can it be? Well, first off, humans in the state of maximum suffering would seem to require less resources per person than humans in the state of maximum awesomeness. You don't even need to store actual memories - you can procedurally generate terrible memories on recall. That could not merely allow for some huge constant factor more humans, but also speed up the AI's growth curve. That's a LOT more humans. Huge numbers - maybe billion or trillion times more people, maybe multiplied by N^1.1 vs N (for a factor of N^0.1) due to increase in the growth rate.
Secondarily, when you really think about it, the absolute value of maximum suffering seems much larger than that of maximum awesomeness. Anaesthesia is a good example of this preference, especially if you look at preferences back when anaesthesia was more risky.
So, how careful do you have to be? Well, the chance of flipping the sign has to be unimaginably low to merely break even; if risk is above that, you're not breaking even, the expected utility is negative - you may be more likely to create artificial heaven than hell, but the hell is much more densely populated, and is probably much more awful than heaven is awesome. (Don't be concerned over this either - I'm just illustrating a point here with regards to flawed utility estimation).
Not at all surprisingly, turns out that to create better living through software, far from doing some sort of fuzzy armchair philosophy of mind, motivation, and will, one needs to focus on improving provable programming techniques to an unimaginable level of reliability.

Such "estimates" can not be taken too seriously. I can not stress this enough. Not the 8 lives per dollar, and not the flipped sign argument. These are not a representative sample of possible scenarios (nor do you have any method for making a representative sample). Ignore them. I can make an argument involving alien AI hell expanding towards us at close to the speed of light, or something like this, if you want, to sway the utility the other way. If you think you should take this bullshit seriously - do not. It really is just a story.

Let me tell you instead of my pet butterfly (or as I like to call it, Der Schmetterling!!! - a name that strikes fear into the hearts of men :) ). You see, this butterfly is a hideously evil monster. Any time it flaps it's wings of doom, that creates a disturbance which has 1% chance of, sometime down the road, causing a series of hurricanes which would kill 800 people. And you can't kill the butterfly, otherwise the nazi will win an election sometime. Here's the deal: I am saving 8 lives by discouraging this butterfly from flapping wings. What's wrong with this estimate? Well, I didn't factor in that stopping the butterfly from flapping it's wings can cause hurricane too. I was telling one side of the story.

And with regards to the expected utility - when you need a very huge sum of possible senarios with positive and negative utilities, unless you got a representative sample of parts of the sum and scaled down appropriately for poor statistical significance given the sample size (!), a partial sum is not an expected utility, or an estimate of such. It's just some random measure of un-representativeness of the sample and sampling errors. Nor should a biased choice of a component of the sum change the expected utility. Nor can you employ heuristics, as different heuristics tug on the utility in different directions, and you can't evaluate all or make a sufficiently large representative sample.

1: Curiously, they also hijack "Unfriendly" to mean any AI that isn't explicitly Friendly; through clever sophistry involving implicit assumption that any AI has an utility function over environment (which is actually an unsolved problem), it is argued that any AI which is not explicitly friendly would kill everyone.

[an error occurred while processing this directive]