I built some 64-Bit Fedora 13/14 RPM’s for John-The-Ripper 1.7.6 with the OpenMP-7 patch applied so I could use more than one core for cracking the Gawker list. Download here.

I had to remove the Fedora-specific CFLAGS and edit params.h to force “#define JOHN_SYSTEMWIDE 1”, plus applying the single-have_words-fix-1 patch that is needed when not building with Jumbo-9.

So I acquired the Gawker password list, which contains 1,247,893 usernames and passwords; however only 541,520 of those have email addresses too (which I expect will be seeing an increase in their spam soon!) and 748,559 have DES password hashes – some have no passwords at all, which I assume means they’re locked accounts.

wc -l full_db.log 
1247893

grep '@' full_db.log | wc -l
541520

To convert all usernames that have DES passwords (first line) or to just concentrate on those with email addresses (second line) into a JtR-friendly format, we use the following commands:

cat full_db.log | sed -s s/' ::: '/:/g | awk -F: {'print $1":"$2":"$4'} | grep -v ':NULL:' > full_db.log.all

grep '@' full_db.log | sed -s s/' ::: '/:/g | awk -F: {'print $1":"$2":"$4'} | grep -v ':NULL:' > full_db.log.emails

The top 250 passwords discovered so far (linked from here) can be turned into a JtR wordlist using:

cat top250gawker.txt  | awk {'print $2'} > top250_pwonly.txt

Essentially we’re just getting rid of the first “count” column. The top five are:

   2516 123456
   2188 password
   1205 12345678
    696 qwerty
    498 abc123

Then we run my OpenMP-enabled JtR against the 750k entries with our wordlist:

john-ompdes7 --session=gawker --wordlist=top250_pwonly.txt --rules full_db.log.all

This will instantly crack about 15,000 passwords. I found that JtR’s built-in wordlist.lst did pretty well, but all.lst hammered through them, getting about a third of the way through in under an hour on my Core i5 750 @ 3.2GHz:

john --show full_db.log.all
....
209397 password hashes cracked, 537335 left

I tried to speed things up by making a charset based on the results so far:

john --make-charset=gawker.chr full_db.log.all
sudo cp gawker.chr /usr/share/john

Then added the following incremental mode to /etc/john.conf

[Incremental:Gawker]
File = $JOHN/gawker.chr
MinLen = 0
MaxLen = 8
CharCount = 81

Then ran JtR against it:

john-ompdes7 --session=gawker --incremental=Gawker full_db.log.all

It still didn’t seem to be as good as all.lst, which I guess means not too many people are using the same passwords – confirmed by the fact that the top 250 passwords only make up 15k of the 750k.

What’s a bit disturbing is that most of the passwords are lowercase letters only, in fact there’s only two uppercase characters in the top 250 and they’re in “Password” and “Highlife”.

41/250 passwords have numbers in them – including 23/250 that are only numbers, and none of the top 250 have special/symbol characters.

Cracking slowed down after a while – getting to 234k took 4.5 hours, then shortly afterwards all.lst finished at:

238521 password hashes cracked, 508211 left```

So then I recreated gawker.chr file from those results, and it whizzed through those – it cracked another 2k in the first 8mins, but eventually slowed to a crawl, I then ran in single mode and got another 2k or so.

Then I used [rockyou.chr](http://www.korelogic.com/tools.html#jtr) which cracked about 60k in 24 hours, which is a massive slow down from doing the first 210k in an hour (top 250 and wordlist.lst) then 25k in the next 3.5 hours (all.lst, gawker.chr and single). This seems to point to rockyou.chr in incremental mode being slower but matches more than all.lst in wordlist mode.

My computer crashed as I tried to load 330k rows into Gnumeric, so I gave up at:

331642 password hashes cracked, 415090 left```

We can find all the passwords cracked (first line) or just the unique passwords (second line) using:

john --show full_db.log.all | awk -F: '{print $2}' | sort > all_passwords.txt

john --show full_db.log.all | awk -F: '{print $2}' | sort -u > unique_passwords.txt

Then we can do some statistical analysis:

152,747 (46%) are unique, which explains why making a gawker.chr charset from existing results wasn’t very effective:

wc -l unique_passwords.txt```

30,028 (9%) are all-numeric:

```bash
grep -E -c "^[0-9]+$" all_passwords.txt```

224,513 (68%) are all letters:

```bash
grep -E -c "^[A-Za-z]+$" all_passwords.txt```

215,845 (65%) are all lowercase letters:

```bash
grep -E -c "^[a-z]+$" all_passwords.txt```

2,090 (0.6%) are all uppercase letters:

```bash
grep -E -c "^[A-Z]+$" all_passwords.txt```

11,838 (3.6%) include one or more uppercase letters:

```bash
grep -E -c "[A-Z]" all_passwords.txt```

299,224 (90%) include one or more lowercase letters:

```bash
grep -E -c "[a-z]" all_passwords.txt```

105,808 (32%) include one or more numbers:

```bash
grep -E -c "[0-9]" all_passwords.txt```

330,245 (99%) are all alphanumeric:

```bash
grep -E -c "^[0-9A-Za-z]+$" all_passwords.txt```

94 (0.03%) are all symbols, 9 of which are variations of “!@#$%^&*”, which is Shift-12345678 on a US keyboard:

```bash
grep -E "^[^0-9A-Za-z]+$" all_passwords.txt```

319,798 (96%) are made up entirely of lowercase letters and numbers:

```bash
grep -E -c "^[a-z0-9]" all_passwords.txt```

I found the 25 most common passwords were:

123456 : 4153 password : 3327 12345678 : 1442 lifehack : 858 qwerty : 763 abc123 : 529 12345 : 502 monkey : 469 111111 : 438 consumer : 406 letmein : 391 1234 : 371 dragon : 330 gizmodo : 320 baseball : 318 whatever : 311 superman : 305 1234567 : 288 iloveyou : 278 sunshine : 274 fuckyou : 269 starwars : 267 shadow : 267 princess : 245 cheese : 241


I've uploaded my [gawker.chr](http://www.the-jedi.co.uk/downloads/john/gawker.chr) charset based on my last results.

So the ideal wordlist for speed i.e. simplicity, would be all lowercase letters (65%) making sure to include “password”.

If you wanted to crack the majority (96%) of passwords albeit slower – but not slowest – then a mixture of lowercase letters and numbers would be best, making sure to include “abc123”.

You're almost completely wasting your time with symbols or even uppercase letters in wordlists.

I also updated the blog to WordPress [3.0.3](http://codex.wordpress.org/Version_3.0.3), the “just when you thought it was safe….” release.