Password Cracking with CUDA 2 ways
A few weeks ago I decided to generate Rainbow Tables for LM hash password cracking. The Rainbowcrack project provides Windows and Linux software that can be used to generate the tables and do the actual cracking. I also wanted to leverage the CUDA GPU support to make the cracking as fast as possible. The first thing I needed to do was to generate the actual rainbow tables. In my lab I have two Proliant ML350 servers running ESXi 5.1 (dual Xeon E5645 in each) so rather than running the table generation on my laptop I created a Windows VM on one of the servers gave it 8 vcpu's and cut 'n' paste the commands for the table generation into a batch file. I set the batch file running and went to bed.
The next morning I checked on progress and calculated how long it was going to take to complete. With a bit of rough math I reckoned about six weeks!
Free Rainbow Tables also have a distributed generation app using the BOINC client software for creating the tables, allowing thousands of computers to participate in the creation of the tables just like the SETI@Home project.
Just a couple of more steps and the tables will ready to use. Now I just needed to sort the tables and then compress them to *.rtc files. rt2rtc reduces the size from 64GB to a more reasonable 32GB that I can put on my laptop.
Next I transferred the all the *.rtc files to my laptop. My laptop is a Dell XPS 15 L502X with 8GB Ram. It's had one important upgrade recently in that I replaced the internal HD with a Crucial M500 960GB SSD. Critically this provides a much needed improvement in disk access speed and will speed up the rainbow table look-up's considerably.
Once the *.rtc files were transferred over to my laptop I needed some LM hashes to test. So I asked a passing stranger to provide me with some hashes.
Hmmm...who was that guy? :-0
Now to test it!
One of the weaknesses of the Rainbow Table approach is that it cannot cope with salted hashes. Another way of looking at that is that salt+hash is a more secure way to store passwords. The problem is that you would need to create rainbow tables for every salt which is impractical. However, oclhashcat can brute force passwords that are stored that way using the acceleration of hundreds or thousands of GPU cores.
There is a useful guide available from VMware on how to configure pass-through and vDGA support on ESXi here. The shared version vSGA is not recommended for CUDA/OpenCL, it is designed for accelerated graphics capabilities in VDI environments. There are number of things to pay attention to in this guide:
Once the card was configured and the VM restarted all that is required is to install the nVidia video drivers in the VM. You do have to use the console in VMware vCenter to access the VM as RDP won't have access to the graphics card hardware. The first thing I did was to run the benchmark using ./cudahashcat64.bin -b.
It is more than a little disappointing given the speeds Rob VandenBrink got using OpenCL on his AMD Radeon HD 7970. In fact the speeds are only marginally faster than what I get running the benchmark on my laptop with it's nVidia GT 540M. The reason, it seems, is that the nVidia does not have the integer crunching capabilities of the Radeon. In fact I have seen it reported that the nVidia GTX range of cards out-perform the Quadro range in terms of H/s and that the AMD Radeon will also do better than the more expensive Firepro in this respect (with possible exception of the Firepro S10000). So guess what I bought?
It's an XFX AMD Radeon HD 7970. The cheapest I could find in the UK was around the £250 mark. The only problem is I'm now waiting on a power cable for my Proliant ML 350 G6, which unfortunately HP have discontinued; eBay to the rescue and now on route from the US. I'll update this post with the benchmarks once it's all installed and working.
Here are the benchmarks.
The end result is pretty similar to the speeds shown in Rob VandenBrink's post.
One thing to bear in mind is that the GPU draws considerable power once it's running at full capacity.
All of that power means a lot of heat is being generated and that in turn means the system fans must work harder as well.
Normally the four system fans don't go above 21%. With the system and GPU fans running it can be quite noisy.
As you can see below, it would take about 6 days to brute force an NTLM hash with a complex (mixed case alphanumeric and special) 8 character password with this setup.
Of course it would be better to use Rainbow tables for unsalted hashes, that's if you have room for the 1TB table (ntlm_mixalpha-numeric-all-space#1-8) that would be required. It's certainly worth trying oclhashcat dictionary or hybrid attacks if you haven't got the rainbow table before resorting to brute force. One area where oclhashcat scores is in it's ability to crack salted password hashes, something that Rainbow tables just can't do.
Rob mentioned that the AMD Radeon HD 7970 would scale really well (certainly up to 3 cards) using the SLI bridging. This got me thinking, and I started to wonder how you could fit three cards in a case and cope with all the power and cooling requirements - I'll live with the noise :-)
It looks like a lot of people are using the open rigs that Bitcoin miners use for multi-GPU card setups. However, I found a neater (and more expensive) solution in the NetStor NA255A-XGPU External PCIe Gen3 to GPU Desktop Enclosure. There are some interesting comparisons of the AMD Radeon HD 7990 GPU capabilities on Tom's Hardware here. Given the figures on there I would estimate that using the aforementioned NetStor GPU enclosure and three AMD Radeon HD 7990 cards would push the NTLM hashes a second up to around 100 Billion mark.
Check out this monster password cracker from Norway.
Many Thanks to Rob VandenBrink for his help and advice.
Little disclaimer: Techniques described for cracking passwords are only used by me in pursuit of lawful, authorised, penetration testing activities or against my own systems for the purposes of testing & education. I would not encourage anyone to use these attacks unlawfully.
Six Weeks Later...
After running at a near constant 100% CPU utilisation for the full six weeks my rainbow tables were finally ready. It's worth noting that you can buy rainbow tables from the Rainbow Crack project if you don't have the inclination or processing power to create your own; it's not cheap, but on the other hand, a lot of time, effort and computing power has all been done by someone else and it could take years to generate some tables with limited processing power. As well as buying or generating your own tables you could download them from Free Rainbow Tables. If you have oodles of bandwidth this could be a great option, however, if like me you don't have great bandwidth they also have a shop that will supply tables on a hard disk like the RainbowCrack Project.Free Rainbow Tables also have a distributed generation app using the BOINC client software for creating the tables, allowing thousands of computers to participate in the creation of the tables just like the SETI@Home project.
Just a couple of more steps and the tables will ready to use. Now I just needed to sort the tables and then compress them to *.rtc files. rt2rtc reduces the size from 64GB to a more reasonable 32GB that I can put on my laptop.
Next I transferred the all the *.rtc files to my laptop. My laptop is a Dell XPS 15 L502X with 8GB Ram. It's had one important upgrade recently in that I replaced the internal HD with a Crucial M500 960GB SSD. Critically this provides a much needed improvement in disk access speed and will speed up the rainbow table look-up's considerably.
Once the *.rtc files were transferred over to my laptop I needed some LM hashes to test. So I asked a passing stranger to provide me with some hashes.
Hmmm...who was that guy? :-0
Now to test it!
Password Cracking with oclhashcat in a VM
The second method of using a GPU to crack passwords I wanted to look at, uses oclhashcat to do brute force, dictionary and hybrid attacks accelerated by the GPU. The oclhashcat download contains both nVidia CUDA and AMD OpenCL executables. I've been wanting to try using an old nVidia Quadro 2000D I had laying around as dedicated graphics card for CUDA inside of an VM running on a vSphere ESXi server for a while. Then recently I saw a great post from Rob VandenBrink on the SANS ISC Community forums which inspired me to give it go.One of the weaknesses of the Rainbow Table approach is that it cannot cope with salted hashes. Another way of looking at that is that salt+hash is a more secure way to store passwords. The problem is that you would need to create rainbow tables for every salt which is impractical. However, oclhashcat can brute force passwords that are stored that way using the acceleration of hundreds or thousands of GPU cores.
There is a useful guide available from VMware on how to configure pass-through and vDGA support on ESXi here. The shared version vSGA is not recommended for CUDA/OpenCL, it is designed for accelerated graphics capabilities in VDI environments. There are number of things to pay attention to in this guide:
- Ensure the BIOS is set to use the embedded VGA card as primary - The HP Proliant's I have will use any additional graphics card as primary if this is not set
- Ensure you have adequate power in the PSU and the correct PCI-E 6/8 pin power connectors for the graphics card
- Follow the instructions to Reserve all memory for the VM (all locked) and add the pciHole.start = “2048” to the .vmx file if the VM has more than 2GB of RAM.
- Ignore instructions regarding VMware View driver installation - VMare View provides Teradici PCoIP support needed to access the graphics card capabilities remotely
Once the card was configured and the VM restarted all that is required is to install the nVidia video drivers in the VM. You do have to use the console in VMware vCenter to access the VM as RDP won't have access to the graphics card hardware. The first thing I did was to run the benchmark using ./cudahashcat64.bin -b.
It is more than a little disappointing given the speeds Rob VandenBrink got using OpenCL on his AMD Radeon HD 7970. In fact the speeds are only marginally faster than what I get running the benchmark on my laptop with it's nVidia GT 540M. The reason, it seems, is that the nVidia does not have the integer crunching capabilities of the Radeon. In fact I have seen it reported that the nVidia GTX range of cards out-perform the Quadro range in terms of H/s and that the AMD Radeon will also do better than the more expensive Firepro in this respect (with possible exception of the Firepro S10000). So guess what I bought?
It's an XFX AMD Radeon HD 7970. The cheapest I could find in the UK was around the £250 mark. The only problem is I'm now waiting on a power cable for my Proliant ML 350 G6, which unfortunately HP have discontinued; eBay to the rescue and now on route from the US. I'll update this post with the benchmarks once it's all installed and working.
Update...
I eventually got my extra power cables and installed the Radeon HD 7970 Card in my ESXi server. I had a problem running the ./oclhashcat -b command to generate the benchmark for each of the hash types. This only seems to occur if you attempt to run all of the benchmarks at once. So I wrote a little bit of powershell that allowed me to run them individually adding any exceptions to list to exclude those benchmarks if they caused an error. It also tidies up the output for importing it into a table or spreadsheet.Here are the benchmarks.
oclhashcat algorithm # | Hash Type | Speed |
---|---|---|
0 | MD5 | 7886.7 MH/s |
10 | md5($pass.$salt) | 8016.3 MH/s |
20 | md5($salt.$pass) | 4393.2 MH/s |
30 | md5(unicode($pass).$salt) | 7977.6 MH/s |
40 | md5($salt.unicode($pass)) | 4322.7 MH/s |
50 | HMAC-MD5 (key = $pass) | 1181.1 MH/s |
60 | HMAC-MD5 (key = $salt) | 2235.5 MH/s |
100 | SHA1 | 2505.9 MH/s |
110 | sha1($pass.$salt) | 2494.4 MH/s |
120 | sha1($salt.$pass) | 1604.4 MH/s |
130 | sha1(unicode($pass).$salt) | 2486.0 MH/s |
140 | sha1($salt.unicode($pass)) | 1521.6 MH/s |
150 | HMAC-SHA1 (key = $pass) | 543.7 MH/s |
160 | HMAC-SHA1 (key = $salt) | 1071.2 MH/s |
190 | sha1(LinkedIn) | 2458.2 MH/s |
300 | MySQL | 1179.2 MH/s |
400 | phpass, MD5(Wordpress), MD5(phpBB3) | 2033.1 kH/s |
500 | md5crypt, MD5(Unix), FreeBSD MD5, Cisco-IOS MD5 | 3487.2 kH/s |
900 | MD4 | 15819.5 MH/s |
1000 | NTLM | 15261.7 MH/s |
1100 | DCC, mscash | 4162.7 MH/s |
1400 | SHA256 | 995.4 MH/s |
1410 | sha256($pass.$salt) | 996.8 MH/s |
1420 | sha256($salt.$pass) | 841.4 MH/s |
1430 | sha256(unicode($pass).$salt) | 995.4 MH/s |
1440 | sha256($salt.$pass) | 818.1 MH/s |
1450 | HMAC-SHA256 (key = $pass) | 237.8 MH/s |
1460 | HMAC-SHA256 (key = $salt) | 496.1 MH/s |
1500 | descrypt, DES(Unix), Traditional DES | 84108.8 kH/s |
1600 | md5apr1, MD5(APR), Apache MD5 | 3491.7 kH/s |
1700 | SHA512 | 74276.2 kH/s |
1710 | sha512($pass.$salt) | 72691.9 kH/s |
1720 | sha512($salt.$pass) | 70962.0 kH/s |
1730 | sha512(unicode($pass).$salt) | 72364.1 kH/s |
1740 | sha512($salt.unicode($pass)) | 70036.3 kH/s |
1750 | HMAC-SHA512 (key = $pass) | 17852.9 kH/s |
1760 | HMAC-SHA512 (key = $salt) | 34321.7 kH/s |
1800 | sha512crypt, SHA512(Unix) | 12527 H/s |
2100 | DCC2, mscash2 | 102.1 kH/s |
2400 | Cisco-PIX MD5 | 5281.3 MH/s |
2500 | WPA/WPA2 | 131.0 kH/s |
2600 | Double MD5 | 2094.6 MH/s |
3000 | LM | 1269.2 MH/s |
3100 | Oracle 7-10g | 351.5 MH/s |
3200 | bcrypt, Blowfish(OpenBSD) | 3531 H/s |
5000 | SHA-3(Keccak) | 141.5 MH/s |
5100 | Half MD5 | 4581.7 MH/s |
5200 | Password Safe SHA-256 | 467.8 kH/s |
5300 | IKE-PSK MD5 | 504.6 MH/s |
5400 | IKE-PSK SHA1 | 273.4 MH/s |
5500 | NetNTLMv1-VANILLA / NetNTLMv1+ESS | 7934.7 MH/s |
5600 | NetNTLMv2 | 492.6 MH/s |
5700 | Cisco-IOS SHA256 | 992.3 MH/s |
5800 | Samsung Android Password/PIN | 1547.7 kH/s |
6000 | RipeMD160 | 1613.0 MH/s |
6100 | Whirlpool | 30104.4 kH/s |
6211 | TrueCrypt 5.0+ PBKDF2-HMAC-RipeMD160 + AES | 375.6 kH/s |
6221 | TrueCrypt 5.0+ PBKDF2-HMAC-SHA512 + AES | 36079 H/s |
6231 | TrueCrypt 5.0+ PBKDF2-HMAC-Whirlpool + AES | 1308 H/s |
6241 | TrueCrypt 5.0+ PBKDF2-HMAC-RipeMD160 boot-mode + AES | 743.2 kH/s |
6300 | AIX {smd5} | 3478.9 kH/s |
6400 | AIX {ssha256} | 6213.2 kH/s |
6500 | AIX {ssha512} | 512.3 kH/s |
6600 | 1Password | 1063.5 kH/s |
6700 | AIX {ssha1} | 13294.8 kH/s |
6800 | Lastpass | 943.7 kH/s |
6900 | GOST R 34.11-94 | 98531.6 kH/s |
7100 | OSX v10.8 | 525 H/s |
7200 | GRUB 2 | 1837 H/s |
7400 | sha256crypt, SHA256(Unix) | 74996 H/s |
7500 | Kerberos 5 AS-REQ Pre-Auth etype 23 | 50997.8 kH/s |
11 | Joomla | 7975.6 MH/s |
21 | osCommerce, xt:Commerce | 4389.5 MH/s |
101 | SHA-1(Base64), nsldap, Netscape LDAP SHA | 2508.2 MH/s |
111 | SSHA-1(Base64), nsldaps, Netscape LDAP SSHA | 2493.3 MH/s |
112 | Oracle 11g | 2493.2 MH/s |
121 | SMF > v1.1 | 1602.3 MH/s |
122 | OSX v10.4, v10.5, v10.6 | 1587.5 MH/s |
131 | MSSQL(2000) | 2481.6 MH/s |
132 | MSSQL(2005) | 2485.1 MH/s |
141 | EPiServer 6.x < v4 | 1505.6 MH/s |
1441 | EPiServer 6.x > v4 | 827.5 MH/s |
1711 | SSHA-512(Base64), LDAP {SSHA512} | 72950.4 kH/s |
1722 | OSX v10.7 | 70361.1 kH/s |
1731 | MSSQL(2012) | 73323.9 kH/s |
2611 | vBulletin < v3.8.5 | 2122.1 MH/s |
2711 | vBulletin > v3.8.5 | 1529.8 MH/s |
2811 | IPB2+, MyBB1.2+ | 1516.4 MH/s |
One thing to bear in mind is that the GPU draws considerable power once it's running at full capacity.
All of that power means a lot of heat is being generated and that in turn means the system fans must work harder as well.
Normally the four system fans don't go above 21%. With the system and GPU fans running it can be quite noisy.
As you can see below, it would take about 6 days to brute force an NTLM hash with a complex (mixed case alphanumeric and special) 8 character password with this setup.
Of course it would be better to use Rainbow tables for unsalted hashes, that's if you have room for the 1TB table (ntlm_mixalpha-numeric-all-space#1-8) that would be required. It's certainly worth trying oclhashcat dictionary or hybrid attacks if you haven't got the rainbow table before resorting to brute force. One area where oclhashcat scores is in it's ability to crack salted password hashes, something that Rainbow tables just can't do.
Rob mentioned that the AMD Radeon HD 7970 would scale really well (certainly up to 3 cards) using the SLI bridging. This got me thinking, and I started to wonder how you could fit three cards in a case and cope with all the power and cooling requirements - I'll live with the noise :-)
It looks like a lot of people are using the open rigs that Bitcoin miners use for multi-GPU card setups. However, I found a neater (and more expensive) solution in the NetStor NA255A-XGPU External PCIe Gen3 to GPU Desktop Enclosure. There are some interesting comparisons of the AMD Radeon HD 7990 GPU capabilities on Tom's Hardware here. Given the figures on there I would estimate that using the aforementioned NetStor GPU enclosure and three AMD Radeon HD 7990 cards would push the NTLM hashes a second up to around 100 Billion mark.
Check out this monster password cracker from Norway.
Many Thanks to Rob VandenBrink for his help and advice.
Little disclaimer: Techniques described for cracking passwords are only used by me in pursuit of lawful, authorised, penetration testing activities or against my own systems for the purposes of testing & education. I would not encourage anyone to use these attacks unlawfully.
Comments
Post a Comment