Some grad students at Columbia have some interesting statistical data suggesting the Iranian election really was stolen.

As much as I feel that asking Obama to wear green ties is silly, there are some things people here can do to assist the Iranian protesters if they wish. Here’s some suggestions by tech savvy readers:

(a) “Help cover the bloggers: change your twitter settings so that your location is TEHRAN and your time zone is GMT +3.30. Security forces are hunting for bloggers using location and timezone searches. If we all become ‘Iranians’ it becomes much harder to find them.” (link)

(b) Make your machine available as a proxy server. I have no idea how to do this, but a bunch of people have mentioned it in the threads. Does anyone have a link on on how to do this?

ghost poet

This link shows how to setup a proxy server, make sure you read the comments though. because the original instructions have some issues.

http://blog.austinheap.com/2009/06/15/how-to-setup-a-proxy-for-iran-citizens-for-windows/

Yutsano

I think I confessed to being a theist here before, so I’ve been praying my ass off to God/Allah/FSM that the Iranian people work this out in the most safe and sane manner available to them. I don’t know if the theocracy will fall any time soon, but I agree with most analysts that it has indeed been changed permanently.

Calming Influence

http://www.torproject.org/

freelancer

http://blog.austinheap.com/2009/06/15/how-to-setup-a-proxy-for-iran-citizens-for-windows/

Here ya go

Simba B

I would actually recommend against the proxy server thing, not because I’m against the principle but because setting up a server is hard to do right and there are

~~security concerns~~.Alright, I see they’ve got it limited to Iranian IP addresses, but maybe it’s just the networking geek in me that sees this as a bad idea.

At any rate, people might run afoul of their ISPs doing this sort of thing.

geg6

I have never done the Twitter thing, mainly because it’s impossible for me to say anything in 140 characters (no need for any smartass commentary, please). But I’d consider opening an account just to do something helpful. I really do think the Iranians risking their lives are amazing.

Calming Influence

http://www.boingboing.net/2009/06/16/cyberwar-guide-for-i.html

Softail

Setting up a proxy server can be problematic. I set up one last year for my own use and was amazed at the traffic I got. I did have security settings so nothing I didn’t want to appeared to be getting through, so by all means follow the instructions for l limiting this to Iranian address blocks. You almost certainly want to take this down after the current crusis passes as well.

Comrade Mary, Would-Be Minion Of Bad Horse

I have an account I’ve never actually posted to, but it’s in Tehran now.

br

Doug, aren’t you a math professor? Their argument is pure garbage numerology.

tammanycall

There seems to be a debate going on in the comments section of Boing Boing about the effectiveness of that:

———–

“Changing your Twitter timezone has absolutely NO EFFECT. I wish people would stop retweeting this because it is useless and, in fact, counterproductive since it distracts people from other, more productive means of showing solidarity.

ALL Tweets are timestamped with US Pacific Time, just as all Twitter maintenance is announced in US Pacific Time. Your Twitter timezone setting (which is not visible to anyone else), merely transposes the difference between Twitter’s Pacific Time and the local time zone you tell it, so that Twitter can present archived tweets in your local time.

Example. I live in Chicago and my Twitter timezone is US Central. If I’m logged in to Twitter and go back past 24 hours in anyone’s timeline, I will see a specific time and date stamp on that tweet. If I log out, that same tweet is now timestamped two hours earlier, in US Pacific.

You can easily try this for yourself. Go to any user’s page and find tweets older than 24 hours, and which thus have a time/date stamp. Note the time. Log out, refresh the page. You now see it in US Pacific because Twitter is no longer transposing that to your “local” time.””

——————————————————-

What do people think about not using twitter right now in order to ensure server resources and tech support is devoted to those who need it the most?

Anton Sirius

Does anybody actually think it matters anymore whether the election was a sham or not… ?

Yutsano

You and several analysts have pondered the same thing, and at this point I’m going with nope. They could agree to hold another election tomorrow and it wouldn’t matter. The genie of change (or djinn to go with a better Middle Eastern metaphor) is out of the bottle, and the youth of Iran are demanding a better life for themselves. We just need to sit back and see where the djinn takes them now.

DougJ

No, they’re not, they’re just the kind of thing that happens when people fake numbers. It’s harder to create random numbers than you think. That’s why this kind of analysis is useful for things like rigged elections.

Martin

Agreed. Shows that the government either didn’t plan this, or they’re really stupid. You hand the job of stealing an election to a good statistician and they should have caught this kind of thing.

jimBOB

change your twitter settings so that your location is TEHRAN and your time zone is GMT +3.30If that sort of thing actually worked, couldn’t Iranian tweeters just change their own settings to something outside Iran? Wouldn’t that be just as effective?

This reminds me of the meaningless internet petitions people circulate against third-world thuggery of various stripes; it’s irrelevant hocus pocus whose only purpose is to let participants pretend they are making some sort of difference when they aren’t.

Nylund

@BR (comment 10). If you’re seeing something I’m not, please explain. I’ve taken a couple Ph.D. level stats courses and should be able to follow any statistical argument you can make.

Assuming the sample size is large enough, I don’t see anything glaringly wrong with assuming the last 2 digits should have a uniform distribution and testing that.

Fr33d0m

Create Anonymous Squid Proxy For Iranian Election Protestors

For the Linux version (Specifically Ubuntu is used).

Ked

After thinking that through a bit, the individual analyses are good, but I’m somewhat more dubious of combining the two to generate the bottom-line .005 probability. It strikes me that the two variables are almost certainly not independent… though proving that is more than I’m up to after two beers.

Dee Loralei

Doug, Hi first time caller, etc. Didn’t Nate Silver cover something much like this article on Thursday ? He was talking about first numbers and not last numbers, so it was a different stat. He said it was Benford’s Law and had reservations. Here’s the link (if I did it right. )http://www.fivethirtyeight.com/2009/06/karroubis-unlucky-7s.html

If I didn’t ,it’s an article about Karroubi’s unlucky 7s.

I’m curious to know what he’d say about the convergence of the two anomalies. And you’re a scientist, I bet you can guestimate some statistics and odds for neither the first nor the last numbers in a sequence to seem to fall within the normal parameters of chance,

Anyway, since I’ve now jumped on to the balloon, or in to the juice, I love you guys! I’ve read you for quite awhile, yes even the comments, and this is one of the best places online. I think I even sent John a recipe way back in Nov for stuffed cabbage. So thanks to the front pagers and all the commenters. And please keep it up.

Dee

Chris O.

@jimBOB: That’s what I think about the Twitter thing too. In fact, having Iranians change should be much MORE effective, if that indeed works at all. If they did, there would be nothing for the government to find, instead of the government having some false positives but also some actual positives. (Which is as close as I can get to a plain-English version of recall vs. precision in searches)

br

@Nylund: But they didn’t test if the numbers came from a uniform distribution! That would require far more data than the 116 numbers they examined. They just came up with some arbitrary sufficient statistics and declared that 1 in 200 odds was implausible.

I mean, just try their bullshit experiment: generate 116 numbers sampled randomly between 00 and 99. I guarantee that, more often than not, you will see digits occurring in the last place more that 14% of the time.

Whatever, I’m with Anton @ 12. It doesn’t matter.

omen

how about this? petition the regime for the release of political prisoners.

http://www.iranhumanrights.org/

people known to be backed by advocacy are less likely to be mistreated.

ninerdave

Be very careful if setting up your personal computer (or a computer on your home network) as a proxy. You are opening a big security hole and could get hacked. KNOW WHAT YOU ARE DOING!

Saying that I have three, and they saw a lot of traffic, until they got blocked. I’m in the process of setting up more.

Best to use Amazon’s cloud servers:

http://www.duanestorey.com/2009/06/help-iran-with-a-proxy-server/

In that case, who cares if you get broken into. It’s on Amazon’s dime. It’s also cheap.

Lastly do not publicize your proxy on Twitter, instead email

[email protected] directly.

I put New Proxy Server 666.666;666.666. in the subject so he doesn’t have to actually read the email. (where 666.666.666.666 is your ip address).

This is easily the best thing you can do to to take direct action for the Iranians citizens. Letting them get their message out is the whole reason this has become a story.

Ked

Speaking of raw numerical analysis, someone ran a Benford’s Law analysis on the election returns and it looked crooked that way too. Not sure where that link went, though I saw it at Sully’s fairly soon after the election.

But numerical analysis isn’t really needed. As I understand, there is no dispute that more than 170 precinct-equivalents returned votes for greater than 100% turnout. Sloppy, sloppy, sloppy.

Punchy

I care about Iran……becuz?

Doug H.

@Punchy: I’ll let Dr. Sagan explain.

Chris Johnson

Actually, that’s pretty awesome- similar to using Benford realness metrics to determine if statistical data is truly random, or doctored by humans.

This is not numerology, not one bit. It’s a numbers trick that comes out of statistics and its primary purpose is to determine if a source of numbers has been doctored- which is exactly the claim here.

I use it to create a dither (wordlength reducer) for audio that has a more naturalistic sound- more of a weird perversion of the trick, but it does have an effect :)

Numerology hell. To those with familiarity with these methods, this is a smoking gun. I’m now 200-to-1 certain the election WAS stolen. We may never know what would have happened if it wasn’t- but the point is, somebody cheated.

omen

@Punchy:

what

doyou care about?omen

why are comments constantly swallowed? is that just the norm for this site and should be expected?

The Dangerman

Information on setting up a Proxy here:

http://gr88.tumblr.com/page/2

gbear

…But this one goes to eleven.

cosanostradamus

.

Latest videos from Iran: Protests continue despite police assaults, woman’s death.

Figures don’t lie, but liars figure. Just ask Bush. Lucky for him Americans have no b*lls.

.

Svlad Jelly

Here’s 538 on the Benford’s Law thing.

It’s all greek to me.

Death By Mosquito Truck

Just a heads up, when this shit is behind us with Iran as a successful theocratic Islamic state with the same president or a different president or the same Supreme Leader or a new Supreme Leader chosen by the Assembly of Experts, I’m gonna remember each and every one of you clueless motherfuckers that couldn’t be fucking bothered to learn what the fuck is going on over there, much the same way I view Montysano as a fucking clown for screaming about peak oil when gas was $4/gallon because speculators were driving up the price.

Christ in a fucking wheelchair.

Mike D.

Wouldn’t a bunch of people claiming to be in Tehran radically undermine the already precarious question of authenticity for the Tweeters?

DougJ

My gut reaction is not to put too much stock in the Benford’s law analysis here. I find the stuff from the grad students at Columbia fairly persuasive.

omen

@Death By Mosquito Truck:

like what? give us some links to give us an idea of what you’re talking about.

Comrade Stuck

@Mike D.:

In 30 years when they have the next revolution to overthrow this one, some student protester will be holding up an IP address and chanting “Death to the Weekly Standard”

omen

Wouldn’t a bunch of people claiming to be in Tehran radically undermine the already precarious question of authenticity for the Tweeters?

we could all claim to be israelis and give the conspiracy theorists something to crow about.

MikeJ

I’m a premature anti-fascist.

Paul

Terrifically pessimistic and not entirely off topic:

http://www.youtube.com/watch?v=oUbGLVvfB7Y

Nylund

BR. After I wrote my comment, I went back and looked at the sample size. I didn’t realize it was only 116. That is way too small. Some guy in the comments over there said he did a simulation and it worked out just as the paper said. I doubted that so I did a simulation myself, and yeah, with a sample size of only 116, getting the same final digit to appear more than 14% of the time is actually quite common.

I am in total agreement with you.

ninerdave

@Death By Mosquito Truck:

Can’t speak for anyone but myself, but I know this is essentially 1979 part deux. Anyone with an open mind, and following what’s going on, can see that. Yet, it’s what the people in Iran want. I, for one, support that. It’s their fucking country and they should be governed as the people wish.

Mousavi ain’t George Washington or Thomas Jefferson. He was one of the people who brought on the Islamic Republic. He doesn’t want democracy as we know it. Now I’ll agree with you, if people don’t understand that, they’re idiots. However with that in mind, as I said above, the people in Iran should be governed as they choose, and from what I can see that isn’t Ahmadinejad.

I see no problem with helping the people of Iran get their story out.

Although I have to admit, watching a holocaust denier get ousted on his keyster would be mighty enjoyable.

AhabTRuler

Yeah, like there are a lot of those around.

Tavi

BR and Nylund —

The 14% is referring to the results from the last U.S. presidential election, not the Iranian results. There (according to the article) the digit 7 appears in the final position 17% of the time and 5 appears 4% of the time. Their claim that this would occur, at random, less than 4 out of 100 times is essentially correct.

Here’s a Mathematica snippet to do 10000 simulations, each of which generates 116 last digits where each digit occurs with probability 1/10:

p = {.1, .1, .1, .1, .1, .1, .1, .1, .1, .1};

mdist = MultinomialDistribution[116, p];

Hits = 0;

High = Floor[.17*116]

Low = Ceiling[.04*116]

For[i = 1, i ≤ 10000, i++, {

X = Random[mdist];

If[Max[X] ≥ High && Min[X] ≤ Low, Hits++]

}]

N[Hits/10000]

The result was 0.0619. This is a bit higher than their claim, but that is probably because of roundoffs differences between my assumptions and their actual numbers. It’s pretty close to the 4/100 probability they give.

If you replace 17% with 14%, the outcome does indeed become very likely (around 0.8 probability), but that isn’t the Iranian situation.

ninerdave

@AhabTRuler:

People with an open mind by default exclude wingnuts and most traditional media types.

scarshapedstar

http://anonygreen.wordpress.com/2009/06/18/how-to-setup-a-tor-relay-or-tor-bridge/

How to set up a tor relay.

Bill E Pilgrim

Via Sadly Naught, proof positive that Peak Wingnut was reached some time ago.

The code words have taken over to where that’s all there is, I now have no idea what they’re even outraged about half the time.

It’s reached a point where it’s like that joke about inmates telling jokes by number because they’ve all heard them so often. “Fourteen!” someone says, and everyone laughs.

bob h

c) Twitter out the g.p.s. coordinates of the homes of all regime officials, Basiji and Revolutionary Guard barracks, etc. for future use.

geg6

Death By Mosquito Truck: I am not being naïve or stupidly thinking the Iranians in the streets are out there because they want some sort of Jeffersonian democracy. However, I think they are incredibly brave and standing up for the kind of government THEY envision as best for them. I’m idealistic enough, despite my age and usual level of cynicism, to want them to have self-determination in how they are governed. Their way may not be ours but they are willing to fight and die in the streets to get it. I find that admirable and I only wish more Americans (including myself) would have displayed such cojones in 2000.

R-Jud

OT:

Y’all may have noticed an item about an inquiry into the causes of the Iraq war that’s about to happen here in the UK. It is supposedly going to be a closed inquiry, but, via this morning’s Observer…

Blair pushed Brown to hold Iraq War Inquiry in private, fears “show trial”And also this:

Confidential Memo Reveals US Plan To Invoke Invasion of IraqYawn.

Robert Johnston

That’s ridiculous. 1 in 200 long shots happen on a routine basis, as we have thousands upon thousands of elections every year. The only way to find the material from the Columbia students fairly persuasive is if you already have an independent belief that the election was stolen. Absent such a belief the Columbia students’ results fall in the range of results we should expect to see many times a year from all over the world.

Jon H

@R-Jud:

The second is old news.

LA Confidential Pantload

Maybe the election wasn’t stolen.

http://www.globalresearch.ca/index.php?context=va&aid=14018

DougJ

The cut off p-value for these things is about 1 in 20, FYI.

Joel

@Punchy: I think your mom needs to cut off internet access down there in the basement.

@Death By Mosquito Truck: Oh for the love of god, get a life.

Ryan Cunningham

@Martin:

I agree with the principle of the study, but there’s no multiple hypothesis correction evident, the analysis was done by political science grad students, and it was published directly to the media. This has the stink of pseudoscience about it.

Ash

@LA Confidential Pantload: From that article:

Anything that sites that god awful poll is a sham.

Zach

I’m trying to simulate this and getting about a 25% chance of this happening… blaming my code instead of accusing the opinion piece of anything, though :)

Whick

Um. Demonstrations by tens of thousands in various Iranian cities with discrediting of the the election results both by prominent Iranians and by well-regarded analysts outside Iran would seem to provide an independent reason to believe the election was stolen. These reactions are not typical of the thousands of elections that happen every year. Or do you suppose that the demonstrations didn’t happen until the demonstrators had looked at the statistics on those last two digits?

Ryan Cunningham

@Whick:

We do have independent reasons to suspect that the election was stolen.

That’s the reason we should so careful about our biases!We don’t want to accept statistics just because they say what we want them to say. We must be skeptical!The power of the scientific method is that it leads you to the truth regardless of your presuppositions.

Garrigus Carraig

@LA Confidential Pantload: Maybe not. But an article that gets the name of one of the candidates wrong after ten days of solid publicity is probably not the most persuasive.

Tavi

DougJ —

While I think the probabilities given in the article are correct, I’m not convinced that the Iranian numbers (by themselves) are indicative of fraud. As a ‘control’, the authors cite the state-by-state numbers in the last U.S. election: “Look,” they say, “the last digits show a more likely spread than the Iranian digits.”

So?

Let’s look at election results a bit further back, say to 1980 (an arbitrary stopping point — I didn’t look at any other years). Here are the state-by-state frequency results of the last digits of the popular vote for each major candidate (Source)

1980

[14, 25, 23, 18, 17, 12, 9, 14, 9, 12]

153

9 25

0.06473

1984

[22, 7, 6, 9, 7, 7, 12, 8, 9, 15]

102

6 22

0.00502

1988

[6, 14, 9, 12, 9, 7, 12, 15, 5, 13]

102

5 15

0.32682

1992

[21, 21, 12, 12, 22, 12, 12, 14, 15, 12]

153

12 22

0.46328

1996

[16, 12, 17, 20, 15, 11, 11, 12, 20, 19]

153

11 20

0.76776

2000

[21, 9, 19, 11, 15, 11, 15, 23, 16, 13]

153

9 23

0.16959

2004

[13, 10, 9, 11, 14, 7, 10, 8, 9, 11]

102

7 14

0.83199

`2008`

[7, 12, 9, 9, 13, 8, 9, 11, 10, 14]

102

7 14

0.83199

So, for example, in 1980 the last digit of a state’s popular vote for one of three candidates was ‘0’ 14 times, ‘1’ 25 times and so on, for a total of 153 last digits; the lowest/highest frequencies that year were 9 and 25.

Also given are the approximate probabilities such an event occurs (‘event’ here being that in a randomly generated set of (102,153) digits, each with probability 1/10, the most frequent has at least the highest frequency that year and the least frequent has at most the lowest frequency that year).

While most of the years look fine, 1980 and 1984 stand out. According to the authors of the article, the fact that the probabilities of the distributions in these years is so small would “provide[] strong evidence that the numbers released…..were manipulated.”

Darn that Reagan.

Now, I’ve read enough other reports from election observers to believe that something fishy went on in Iran. To what extent or how much it determined the election, I don’t know. But I just don’t find the statistics produced in the article convincing.

Persian

Hey DougJ,

I was wrong about you last time. Keep up the great work!

@Robert Johnston: you’re right, 1 in 200 in and of itself is not indicative of fraud, without our independent belief in it … that’s EXACTLY the point. When you combine the 1 in 200 with the 1 in (my opinion) 100 chance that Ahmadinejad actually has 62% of the electorate’s votes AND the 1 in 100 chance that Karroubi would do so miserably in his home state of Luristan, then we have a 0.00005% chance that the official results are correct. How do you like them odds?

Whick

@Ryan Cunningham

This has nothing to do with what I want the results to be. It is objective fact that Iran is in extraordinary tumult over this election result. It doesn’t just seem that way to some of us American liberals. (Yeah, I suppose there were mass demonstrations and a media clampdown after the Lebanese election, I just didn’t notice that because of my bias, right?)

I’m generally skeptical of a lot of claims, including the Columbia students’ article when I first heard of it. But after reading it and doing my own analysis, I’m more skeptical of people who are too clever in “debunking” a pretty basic statistical argument.

Little Dreamer

In case anyone missed the piece that was on Huffington Post two days ago about WHY the election was probably stolen, check this out.

Whick

In answer to myself, I am now ready to join the skeptics on the Beber and Scacco article about statistical hints of fabrication in the Iranian provincial vote totals. I decided to look up the vote totals myself and check the second-to-last and third-to-list digits with the same test Beber and Scacco applied to the last digits,

and the distributions seemed unremarkable. So why would fabricators be so good at randomizing the second- and third-to-list digits and then give themselves away on the final digits?

Nylund

Tavi,

Looking at your code, I think you’re overlooking something.

You’re seeing how many times out of your 10,000 numbers you get 17 or higher.

You should be looking at your 1,000 simulations and seeing how often 1 of those 10 number is 17 or higher.

The answer will be about 10%. The chances that one of them is 14 or higher is more like 80%.

In short, the chances that any particular digit appears more than 17% of the time is quite low, but the chances that at least 1 of the 10 does, is quite high.

That is, if you blindly picked a number out of those 10,000, chances are that the one you chose is 17 or higher is very small. But, if you blindly chose one of your 1,000 simulations, the chance that 1 of the 10 numbers in that set was 17 or higher is actually quite high.

EXAMPLE, here are 4 simulations of the frequency of digits when drawing from a sample of 116:

8 12 18 8 9 8 8 13 10 7

9 16 11 9 8 14 10 6 11 6

12 9 11 6 11 12 11 14 7 6

8 9 14 8 9 10 9 9 12 11

If you just pick a single number from the above, the chances of getting one above 17 is pretty slim (1 in 40). But, if you’re just picking a set of 10 numbers, the chances of getting a set with a number above 17 is only 1 in 4.

PS. None of this has anything to do with my feelings on the election, its solely on the statistics of the paper.

Nylund

For the record, I think the Iranian election was rigged.

Don’t confuse my attack on the paper as support for the election results. A bad paper does not equal a fair election. A bad paper is just a bad paper.

Tavi

@Nylund:

No, I ran 10,000 simulations — in each simulation, 116 digits are picked independently according to the uniform distribution (each digit is equally likely of being selected).

Then I did what you indicated: the vector X contains the frequencies that each digit came up and I’m counting a ‘hit’ if the spread between the highest and lowest frequencies (values in X) is at least (roughly) the amount indicated in the article.

According to the article, the probability of getting a hit is less than 4% — my probability was around 6%, but that’s likely due to differences in rounding.

Here’s some python code to do the same thing:

>>> def SimProb(NumDigits,High,Low,Trials):

. Hits = 0.0

. for i in range(Trials):

. freq = [0 for n in range(10)]

. for n in range(NumDigits):

. x = randint(0,9)

. freq[x] += 1

. if max(freq) >= High and min(freq) <= Low:

. Hits += 1

. return Hits/Trials

>>> SimProb(116,20,5,10000)

0.0352

>>> 1.0 – SimProb(116,17,6,10000)

0.7046

These values match those given in the article, the first for the probability of the Iranian results, the second for the probability of the 2008 U.S. results. (BTW, I’m not sure why the numbers ’17’ and ’14’ keep coming up — are those being confused with percentages?)

All that said, I still think the article’s conclusions (that election fraud is highly likely based solely on the numbers) is faulty.

Tavi

Grrr….the code preview looked okay…trying again…replace dots with appropriate spaces

>>> def SimProb(NumDigits,High,Low,Trials):

.................. Hits = 0.0

.................. for i in range(Trials):

........................ freq = [0 for n in range(10)]

........................ for n in range(NumDigits):

.............................. x = randint(0,9)

.............................. freq[x] += 1

........................ if max(freq) >= High and min(freq) <= Low:

.............................. Hits += 1

.................. return Hits/Trials

Zach

OK, so I simulated this and the article is wrong in one obvious way: the odds of the Iranian election coming out the way it did by chance are definitely less than 0.5%; they’re actually 0.15%, which is the product of the probabilities of both events that they describe happening simultaneously (if you think about it, those are entirely independent events). Here are the results of one 100,000-election simulation if you’re interested (condition/number of times met per 100k simulations):

Last digit occurs 17% of the time: 10,832 (10.8%)

Last digit occurs 4% of the time: 19,842 (19.8%)

Both conditions simultaneous for last digit: 3,546 (3.5%)

62% non-adjacent digits: 4,131 (4.1%)

All three conditions simultaneosly: 155 (0.15%)

So, this means that the Iranian result will occur only once in over 600 elections! Does that mean this is definitive proof of fraud? No way. Consider a real-life example: clinical testing of drugs. To prove efficacy, you have to establish a statistical test prior to the trial. If it fails and you want to try again using a different end-point, you have to come up with a test that’s even harder to meet by random chance. You certainly can’t pour through your results and find some way in which your drug performed better than the placebo. The problem is that every series of random events meets some rare condition.

In this case, there are any number of things that would be equally improbable, and equally convincing to these authors if they occurred (the behavioral psych behind the adjacent numbers example is interesting, though). For example, if the second-to-last digit met the same criteria that these authors use for the last digit, but the last digit did not, they could write the same letter. This doubles the odds of a similarly rare event happening. What if instead of one digit occurring 17% of the time and another 4% of the time, one digit occurred 22% of the time? That’s equally improbable. Now we’ve got a 1 in 150 chance instead of a 1 in 600 chance of a similarly rare combination of events occurring. It’s obvious that you can continue to extend this to the point where it’s nearly certain that a similar set of rare events will happen in a given election.

So, yes, with the exception of the point about non-adjacent numbers (Does the deviation match that in studies about this? The authors don’t discuss it) this article essentially is selling numerology.

Zach

Oh, and FYI if you’re trying to replicate the results, 0 and 9 count as adjacent digits. You should get over 4% of results meeting the adjacent digit criteria and about 3.5% meeting both number frequency criteria.

Zach

This might be an easier way to think about their 17%/4% point. You could distribute these odds across any digit or combination of digits (up to 3 since the lowest count is 3 digits) and come to the same conclusion. So, for a 3-digit number XYZ, digit X could have something occur 17% of the time and digit Y could have something occur 4% of the time at roughly the same probability of both occurring for the same digit (it’s actually somewhat more rare). You could also have some integer appear 17% of the time at X and some integer appear 17% at Z. There are 18 permutations of this; so while any one occurs about 4% of the time, it’s more likely than not that a sequence equivalent to the one identified in this article will occur with randomly selected digits.

Note that this isn’t exactly true because it’s actually twice as rare for a digit to appear more than 17% of the time than it is for it to appear less than 4% of the time. You could change it to 16% and 3%, or 15% and 2%… you get the point.

Zach

Pulling data from the Election 2008 Wikipedia page, for the second-to-last digit (for a set of 102 McCain/Obama vote totals for 50 states + DC) is similarly sketchy. 19.6% of the numbers are 7s and 4.9% of them are 8s. The odds of this happening are less than 2% …

Comrade Stuck

Some day, this thread will die.

4tehlulz

The probability of massive vote fraud in the Iranian election is now over 100%.

I know this not because of fancy math and programming, but because the Guardian Council said so:

But don’t worry, there was only a 3 million vote overcount, and “”it has yet to be determined whether the amount is decisive in the election results”.

Zach

@Tavi: I think the reason you’re getting incompatible results is because randint(0,9) gives random integers from 0 to 8 in Python. I only know this because it happened to me playing with this, too. I use Python so rarely that I run into this range-exclusive-of-the-ceiling quirk every time.

john b

maybe i’m misunderstanding benford’s law (and i haven’t read the columbia paper) but shouldn’t you be tallying the leading digit in the totals and not the final digit? benford’s law states that the longer the number is, the closer the probability come to 10%.

Zach

@john b: This analysis isn’t looking at Benford’s law. Benford’s law explains what the distribution of first digits should be (if the probability of the underlying numbers follows a power law distribution). There’s also a defined distribution of the second digit. It’s debatable whether any of this applies to elections totals.

This article looks at digits beyond the first two (well, one candidate got fewer than 1,000 votes in one province) which will follow a random distribution in a fair election. The fallacy of the article, best exampled by Chris Johnson above (“I’m now 200-to-1 certain the election WAS stolen.”), is that 199 out of 200 elections that fit this criteria are fraudulent; you could come up with any number of 200-to-1 scenarios that characterize any set of 116 random numbers.

Tavi

@Zach:

From the python help docs: “randint(a,b) returns a random integer N such that a <= N <= b.” I just checked it to make sure and got both 0 and 9 as returned values.

I’m not sure what you mean by ‘incompatible’ — my simulations agree with your own results and those given in the article. It’s just that the article’s conclusions based on those numbers are suspect. (See, for example, the stolen U.S. election of 1984 as shown in my comment up-thread.)

Zach

@Tavi: That’s what my 2.5 Python w/ NumPy does; I guess the random module with NumPy is different. I read the 6% number you posted earlier as the number of times you were getting the 17%/4% result.