Security, Programming, Pentesting
by {"login"=>"averagesecurityguy", "email"=>"stephen@averagesecurityguy.info", "display_name"=>"averagesecurityguy", "first_name"=>"", "last_name"=>""}
There are two primary methods for cracking passwords, dictionary attacks and brute force attacks. With both methods, a candidate password is generated and then hashed and compared to the hash of the missing password. If the hashes match, then we have found our password; otherwise, we move to the next candidate. In a dictionary attack the candidate passwords are generated using a word list. In some cases the words are modified to add capital letters, numbers or symbols. Dictionary attacks run much faster than a brute force attack because the number of candidate passwords is much smaller but there is no guarantee you will find the password. A brute force attack can guarantee you will find the password but can take a very long time to run.
What I want to do is attempt to get the efficiency of a dictionary attack with the effectiveness of a brute force attack by looking at the frequency analysis of previously cracked passwords. Skull Security maintains lists of previously cracked passwords, including passwords for phpbb, hotmail, and rockyou. I used a script to identify and count each character in each of the files. I then calculated the percent of total characters each character represents and then a cumulative percentage. Using these numbers I was able to find the character set that represented approximately 90% of all the characters in the file.
Filename | # of Chars in File | # of Chars in 90% Set | 90% Character Set |
---|---|---|---|
Hotmail | 91 | 29 | aeoi1r0ln2st9m8c3756u4dbphgyv |
phpbb | 119 | 29 | aeoir1nstl02md3chpbu94k8576gy |
rockyou | 211 | 29 | ae10i2onrls938t45m67cdyhubkgp |
I was surprised to see how many characters overlapped in each of the 90% sets. Next, I used a script to determine how many passwords in each file could be found using only the characters in the 90% set. Then I combined each of the 90% sets and determined how many passwords in each file could be found using the combined set.
Filename | % Passwords in 90% Set | % Passwords in Combined Set |
---|---|---|
Hotmail | 64 | 77 |
phpbb | 59 | 74 |
rockyou | 58 | 73 |
In all three files the combined set found over 70% of the passwords. It looks like the frequency analysis could be useful in improving brute force password attacks. In another post I will put this theory to the test by attacking a different set of passwords.
If you want an in depth study of character frequency analysis of password files then check out Matt Weir's work at reusablesec.blogspot.com.
tags: password cracking