AverageSecurityGuy

Security, Programming, Pentesting

About

Mastodon

Linked In

Projects

Cheat Sheets

Book

7 February 2011

Improving Brute force Attacks with Frequency Analysis

by {"login"=>"averagesecurityguy", "email"=>"stephen@averagesecurityguy.info", "display_name"=>"averagesecurityguy", "first_name"=>"", "last_name"=>""}

There are two primary methods for cracking passwords, dictionary attacks and brute force attacks. With both methods, a candidate password is generated and then hashed and compared to the hash of the missing password. If the hashes match, then we have found our password; otherwise, we move to the next candidate. In a dictionary attack the candidate passwords are generated using a word list. In some cases the words are modified to add capital letters, numbers or symbols. Dictionary attacks run much faster than a brute force attack because the number of candidate passwords is much smaller but there is no guarantee you will find the password. A brute force attack can guarantee you will find the password but can take a very long time to run.

What I want to do is attempt to get the efficiency of a dictionary attack with the effectiveness of a brute force attack by looking at the frequency analysis of previously cracked passwords. Skull Security maintains lists of previously cracked passwords, including passwords for phpbb, hotmail, and rockyou. I used a script to identify and count each character in each of the files. I then calculated the percent of total characters each character represents and then a cumulative percentage. Using these numbers I was able to find the character set that represented approximately 90% of all the characters in the file.

Filename # of Chars in File # of Chars in 90% Set 90% Character Set
Hotmail 91 29 aeoi1r0ln2st9m8c3756u4dbphgyv
phpbb 119 29 aeoir1nstl02md3chpbu94k8576gy
rockyou 211 29 ae10i2onrls938t45m67cdyhubkgp

I was surprised to see how many characters overlapped in each of the 90% sets. Next, I used a script to determine how many passwords in each file could be found using only the characters in the 90% set. Then I combined each of the 90% sets and determined how many passwords in each file could be found using the combined set.

Filename % Passwords in 90% Set % Passwords in Combined Set
Hotmail 64 77
phpbb 59 74
rockyou 58 73

In all three files the combined set found over 70% of the passwords. It looks like the frequency analysis could be useful in improving brute force password attacks. In another post I will put this theory to the test by attacking a different set of passwords.

If you want an in depth study of character frequency analysis of password files then check out Matt Weir's work at reusablesec.blogspot.com.

tags: password cracking