CVE-2020-28052 Failure Analysis (Or, Internet Dickhead Opines)

I find this incident EXTREMELY fascinating and want to share it as a case study in system failure. Take what I say with a big grain of salt, as I have virtually no authority with which to say it. That said, I believe everything below to be both an accurate portrayal of facts and a charitable interpretation of them, so there’s that.

I came across this tweet this morning, and immediately I was hooked:

This code reads as VERY dumb. Like, how could anyone ever write that?

  1. why on earth roll your own method to compare strings?
  2. The code isn’t complex, and clearly seems wrong. .indexOf(i)? Don’t you mean mystring[i]?

With code this obviously broken and weird, how did it ever ship? How did it get through code review even? The NSA’s mission charter includes ratfucking publicly used infrastructure like this to leave themselves cleverly-disguised backdoors: is this that?


The referenced article has some good facts, like the discovering researchers found the vulnerability in 1.65 and 1.66, but not 1.67. Prior versions unaffected. I’ve used C# bouncycastle before, but the latest is 1.8.x, which has been true since like 2015. Turns out, this is the java version. Pull up tag 1.65 on github, git blame the offending line, and here’s the commit in question, where line 312 is the new (broken) code, replacing line 239 which was a simple oldstring.equals(newstring)

So like, why? Turns out, there’s a totally-non-nefarious answer over at this issue: The gist is, string.equals(string) stops processing as soon as they’re not equal: it’s subject to a timing attack that tells you a little about how much of the string matches, which can reduce practical attack time. Instead, comparing each character successively whether or not you’ve already got an answer doesn’t leak such information, hence the DIY. Of course, the author figured out after not-too-long that it was wrong. A couple months. The fix was actually the result of the CVE responsible disclosure back in November, along with updated testing. Here’s the update to fix it:

Which leaves the question: how did such an elementary mistake ship horribly broken code to zillions of deployment environments?

Factors leading to failure:

1. Human factor

The author of this commit is, David Hook. He appears to be an original author of, but now relatively infrequent contributor to, BouncyCastle. A quick look at linkedin suggests he is rather busy, and maybe doesn’t do a ton of day-in day-out coding anymore. Even a seasoned professional could forgivably WRITE such a dumb line of code, so let’s just assume moving forward that we’re talking about a seasoned professional whose mind was likely in a very different place the day before. Totally easy to WRITE this line, I think.

Given that he’s a core contributor with write access to the upstream repository, his code also doesn’t need to go through review before inclusion. While a sufficiently-awake, competent programmer would parse this line trying to determine its function and immediately question the .indexOf(i), nobody had to prior to shipment. And it seems like a pretty simple piece of code, let’s be honest. You could gloss over it.

2. Test accuracy

Test Driven Development feels like magic pixie dust when it’s working, I know this much from personal experience. You write the tests, realizing that you had no idea what you ACTUALLY wanted the code to do. With that out of the way, the path is clear to write a function that does the thing you now understand, and if your tests pass, you know you did it right!

(Editor’s note: the paragraph below I’m pretty sure isn’t totally correct, but I have Christmas shopping to do. The outcome I believe to be true is that the positive (string equal) test will always pass, and the negative (string unequal) test will pass a statistically significant percent of the time for randomly generated vectors.)

Unfortunately in this case, sane tests are stochastically likely to pass. The researchers suggest 20% of passwords are vulnerable in the first 1000 attempts, which means 80% of random passwords are virtually guaranteed to successfully pass a failure condition test. Unless a high count of randomly selected passwords are tested, sane tests don’t catch this specific failure. And all “supposed to pass” tests will: a correct password will always evaluate as correct.

In other words, if you have sane test coverage on this code, and didn’t go about making that coverage seemingly a little overkill, and if you only wrote code until the tests pass, this exact failure is pretty likely.

3. The lanugage tees you up for confusion/brain-fart

My first thought looking at this, not being a java developer, is “why string.charAt(i) instead of string[i]? My second thought, not being a java developer, is “languages often do weird, prescriptive crap, maybe java doesn’t support indexing with syntactic sugar like every other popular language does. Lo, it is true: In python, it’d be hard to confuse mystring[i] with mystring.index(i) because they look totally different, and you only reach for the method on the rare(r) occasion you need to. Java tees you up for such a brain-fart, by making the less-common idiom syntactically identical to the super-common one.

To make matters worse, look at that third organic search result:

It’s not an answer to the question I asked, but it’s a drop-in replacement that breaks everything without failing tests. Not suggesting David Hooke googled string indexing before writing this, but like, I have. Beginners must make this kind of mistake a LOT in java.


So, long story short:

  1. Seasoned programmer addresses hypothetical vulnerability by increasing complexity (verified)
  2. Seasoned programmer makes simple brain-fart the language readily encourages (highly likely)
  3. Tests pass when they should have failed (~80% likely) (This merits further investigation since it’s verifiable, and I just haven’t. If this WASN’T unit tested and just wasn’t negitave-case manually tested, so much the liklier)
  4. Seasoned programmer pushes without second set of eyes (highly likely)
  5. Nobody notices for months because it ALWAYS works when it should, and rarely fails when it shouldn’t. (we all know the answer here).

Comedy of errors, I think. I guess the real conclusion here is just generically to stay humble.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *