Password Cracking with CUDA 2 ways

A few weeks ago I decided to generate Rainbow Tables for LM hash password cracking. The Rainbowcrack project provides Windows and Linux software that can be used to generate the tables and do the actual cracking. I also wanted to leverage the CUDA GPU support to make the cracking as fast as possible. The first thing I needed to do was to generate the actual rainbow tables. In my lab I have two Proliant ML350 servers running ESXi 5.1 (dual Xeon E5645 in each) so rather than running the table generation on my laptop I created a Windows VM on one of the servers gave it 8 vcpu's and cut 'n' paste the commands for the table generation into a batch file. I set the batch file running and went to bed. The next morning I checked on progress and calculated how long it was going to take to complete. With a bit of rough math I reckoned about six weeks!

Six Weeks Later...

After running at a near constant 100% CPU utilisation for the full six weeks my rainbow tables were finally ready. It's worth noting that you can buy rainbow tables from the Rainbow Crack project if you don't have the inclination or processing power to create your own; it's not cheap, but on the other hand, a lot of time, effort and computing power has all been done by someone else and it could take years to generate some tables with limited processing power. As well as buying or generating your own tables you could download them from Free Rainbow Tables. If you have oodles of bandwidth this could be a great option, however, if like me you don't have great bandwidth they also have a shop that will supply tables on a hard disk like the RainbowCrack Project.
Free Rainbow Tables also have a distributed generation app using the BOINC client software for creating the tables, allowing thousands of computers to participate in the creation of the tables just like the SETI@Home project.

Just a couple of more steps and the tables will ready to use. Now I just needed to sort the tables and then compress them to *.rtc files. rt2rtc reduces the size from 64GB to a more reasonable 32GB that I can put on my laptop.

Next I transferred the all the *.rtc files to my laptop. My laptop is a Dell XPS 15 L502X with 8GB Ram. It's had one important upgrade recently in that I replaced the internal HD with a Crucial M500 960GB SSD. Critically this provides a much needed improvement in disk access speed and will speed up the rainbow table look-up's considerably.

Once the *.rtc files were transferred over to my laptop I needed some LM hashes to test. So I asked a passing stranger to provide me with some hashes.
hashdump

Hmmm...who was that guy? :-0
Now to test it!

Password Cracking with oclhashcat in a VM

The second method of using a GPU to crack passwords I wanted to look at, uses oclhashcat to do brute force, dictionary and hybrid attacks accelerated by the GPU. The oclhashcat download contains both nVidia CUDA and AMD OpenCL executables. I've been wanting to try using an old nVidia Quadro 2000D I had laying around as dedicated graphics card for CUDA inside of an VM running on a vSphere ESXi server for a while. Then recently I saw a great post from Rob VandenBrink on the SANS ISC Community forums which inspired me to give it go.
One of the weaknesses of the Rainbow Table approach is that it cannot cope with salted hashes. Another way of looking at that is that salt+hash is a more secure way to store passwords. The problem is that you would need to create rainbow tables for every salt which is impractical. However, oclhashcat can brute force passwords that are stored that way using the acceleration of hundreds or thousands of GPU cores.
There is a useful guide available from VMware on how to configure pass-through and vDGA support on ESXi here. The shared version vSGA is not recommended for CUDA/OpenCL, it is designed for accelerated graphics capabilities in VDI environments. There are number of things to pay attention to in this guide:
  1. Ensure the BIOS is set to use the embedded VGA card as primary - The HP Proliant's I have will use any additional graphics card as primary if this is not set
  2. Ensure you have adequate power in the PSU and the correct PCI-E 6/8 pin power connectors for the graphics card
  3. Follow the instructions to Reserve all memory for the VM (all locked) and add the pciHole.start = “2048” to the .vmx file if the VM has more than 2GB of RAM.
  4. Ignore instructions regarding VMware View driver installation - VMare View provides Teradici PCoIP support needed to access the graphics card capabilities remotely

Once the card was configured and the VM restarted all that is required is to install the nVidia video drivers in the VM. You do have to use the console in VMware vCenter to access the VM as RDP won't have access to the graphics card hardware. The first thing I did was to run the benchmark using ./cudahashcat64.bin -b.
It is more than a little disappointing given the speeds Rob VandenBrink got using OpenCL on his AMD Radeon HD 7970. In fact the speeds are only marginally faster than what I get running the benchmark on my laptop with it's nVidia GT 540M. The reason, it seems, is that the nVidia does not have the integer crunching capabilities of the Radeon. In fact I have seen it reported that the nVidia GTX range of cards out-perform the Quadro range in terms of H/s and that the AMD Radeon will also do better than the more expensive Firepro in this respect (with possible exception of the Firepro S10000). So guess what I bought?

It's an XFX AMD Radeon HD 7970. The cheapest I could find in the UK was around the £250 mark. The only problem is I'm now waiting on a power cable for my Proliant ML 350 G6, which unfortunately HP have discontinued;  eBay to the rescue and now on route from the US. I'll update this post with the benchmarks once it's all installed and working.

Update...

I eventually got my extra power cables and installed the Radeon HD 7970 Card in my ESXi server. I had a problem running the ./oclhashcat -b command to generate the benchmark for each of the hash types. This only seems to occur if you attempt to run all of the benchmarks at once. So I wrote a little bit of powershell that allowed me to run them individually adding any exceptions to list to exclude those benchmarks if they caused an error. It also tidies up the output for importing it into a table or spreadsheet.
Here are the benchmarks.
oclhashcat algorithm # Hash Type Speed
0 MD5 7886.7 MH/s
10 md5($pass.$salt) 8016.3 MH/s
20 md5($salt.$pass) 4393.2 MH/s
30 md5(unicode($pass).$salt) 7977.6 MH/s
40 md5($salt.unicode($pass)) 4322.7 MH/s
50 HMAC-MD5 (key = $pass) 1181.1 MH/s
60 HMAC-MD5 (key = $salt) 2235.5 MH/s
100 SHA1 2505.9 MH/s
110 sha1($pass.$salt) 2494.4 MH/s
120 sha1($salt.$pass) 1604.4 MH/s
130 sha1(unicode($pass).$salt) 2486.0 MH/s
140 sha1($salt.unicode($pass)) 1521.6 MH/s
150 HMAC-SHA1 (key = $pass) 543.7 MH/s
160 HMAC-SHA1 (key = $salt) 1071.2 MH/s
190 sha1(LinkedIn) 2458.2 MH/s
300 MySQL 1179.2 MH/s
400 phpass, MD5(Wordpress), MD5(phpBB3) 2033.1 kH/s
500 md5crypt, MD5(Unix), FreeBSD MD5, Cisco-IOS MD5 3487.2 kH/s
900 MD4 15819.5 MH/s
1000 NTLM 15261.7 MH/s
1100 DCC, mscash 4162.7 MH/s
1400 SHA256 995.4 MH/s
1410 sha256($pass.$salt) 996.8 MH/s
1420 sha256($salt.$pass) 841.4 MH/s
1430 sha256(unicode($pass).$salt) 995.4 MH/s
1440 sha256($salt.$pass) 818.1 MH/s
1450 HMAC-SHA256 (key = $pass) 237.8 MH/s
1460 HMAC-SHA256 (key = $salt) 496.1 MH/s
1500 descrypt, DES(Unix), Traditional DES 84108.8 kH/s
1600 md5apr1, MD5(APR), Apache MD5 3491.7 kH/s
1700 SHA512 74276.2 kH/s
1710 sha512($pass.$salt) 72691.9 kH/s
1720 sha512($salt.$pass) 70962.0 kH/s
1730 sha512(unicode($pass).$salt) 72364.1 kH/s
1740 sha512($salt.unicode($pass)) 70036.3 kH/s
1750 HMAC-SHA512 (key = $pass) 17852.9 kH/s
1760 HMAC-SHA512 (key = $salt) 34321.7 kH/s
1800 sha512crypt, SHA512(Unix) 12527 H/s
2100 DCC2, mscash2 102.1 kH/s
2400 Cisco-PIX MD5 5281.3 MH/s
2500 WPA/WPA2 131.0 kH/s
2600 Double MD5 2094.6 MH/s
3000 LM 1269.2 MH/s
3100 Oracle 7-10g 351.5 MH/s
3200 bcrypt, Blowfish(OpenBSD) 3531 H/s
5000 SHA-3(Keccak) 141.5 MH/s
5100 Half MD5 4581.7 MH/s
5200 Password Safe SHA-256 467.8 kH/s
5300 IKE-PSK MD5 504.6 MH/s
5400 IKE-PSK SHA1 273.4 MH/s
5500 NetNTLMv1-VANILLA / NetNTLMv1+ESS 7934.7 MH/s
5600 NetNTLMv2 492.6 MH/s
5700 Cisco-IOS SHA256 992.3 MH/s
5800 Samsung Android Password/PIN 1547.7 kH/s
6000 RipeMD160 1613.0 MH/s
6100 Whirlpool 30104.4 kH/s
6211 TrueCrypt 5.0+ PBKDF2-HMAC-RipeMD160 + AES 375.6 kH/s
6221 TrueCrypt 5.0+ PBKDF2-HMAC-SHA512 + AES 36079 H/s
6231 TrueCrypt 5.0+ PBKDF2-HMAC-Whirlpool + AES 1308 H/s
6241 TrueCrypt 5.0+ PBKDF2-HMAC-RipeMD160 boot-mode + AES 743.2 kH/s
6300 AIX {smd5} 3478.9 kH/s
6400 AIX {ssha256} 6213.2 kH/s
6500 AIX {ssha512} 512.3 kH/s
6600 1Password 1063.5 kH/s
6700 AIX {ssha1} 13294.8 kH/s
6800 Lastpass 943.7 kH/s
6900 GOST R 34.11-94 98531.6 kH/s
7100 OSX v10.8 525 H/s
7200 GRUB 2 1837 H/s
7400 sha256crypt, SHA256(Unix) 74996 H/s
7500 Kerberos 5 AS-REQ Pre-Auth etype 23 50997.8 kH/s
11 Joomla 7975.6 MH/s
21 osCommerce, xt:Commerce 4389.5 MH/s
101 SHA-1(Base64), nsldap, Netscape LDAP SHA 2508.2 MH/s
111 SSHA-1(Base64), nsldaps, Netscape LDAP SSHA 2493.3 MH/s
112 Oracle 11g 2493.2 MH/s
121 SMF > v1.1 1602.3 MH/s
122 OSX v10.4, v10.5, v10.6 1587.5 MH/s
131 MSSQL(2000) 2481.6 MH/s
132 MSSQL(2005) 2485.1 MH/s
141 EPiServer 6.x < v4 1505.6 MH/s
1441 EPiServer 6.x > v4 827.5 MH/s
1711 SSHA-512(Base64), LDAP {SSHA512} 72950.4 kH/s
1722 OSX v10.7 70361.1 kH/s
1731 MSSQL(2012) 73323.9 kH/s
2611 vBulletin < v3.8.5 2122.1 MH/s
2711 vBulletin > v3.8.5 1529.8 MH/s
2811 IPB2+, MyBB1.2+ 1516.4 MH/s
The end result is pretty similar to the speeds shown in Rob VandenBrink's post.
One thing to bear in mind is that the GPU draws considerable power once it's running at full capacity.

All of that power means a lot of heat is being generated and that in turn means the system fans must work harder as well.

Normally the four system fans don't go above 21%. With the system and GPU fans running it can be quite noisy.


As you can see below, it would take about 6 days to brute force an NTLM hash with a complex (mixed case alphanumeric and special) 8 character password with this setup.
Of course it would be better to use Rainbow tables for unsalted hashes, that's if you have room for the 1TB table (ntlm_mixalpha-numeric-all-space#1-8) that would be required. It's certainly worth trying oclhashcat dictionary or hybrid attacks if you haven't got the rainbow table before resorting to brute force. One area where oclhashcat scores is in it's ability to crack salted password hashes, something that Rainbow tables just can't do.

Rob mentioned that the AMD Radeon HD 7970 would scale really well (certainly up to 3 cards) using the SLI bridging. This got me thinking, and I started to wonder how you could fit three cards in a case and cope with all the power and cooling requirements - I'll live with the noise :-)
It looks like a lot of people are using the open rigs that Bitcoin miners use for multi-GPU card setups. However, I found a neater (and more expensive) solution in the NetStor NA255A-XGPU External PCIe Gen3 to GPU Desktop Enclosure. There are some interesting comparisons of the AMD Radeon HD 7990 GPU capabilities on Tom's Hardware here. Given the figures on there I would estimate that using the aforementioned NetStor GPU enclosure and three AMD Radeon HD 7990 cards would push the NTLM hashes a second up to around 100 Billion mark.

Check out this monster password cracker from Norway.

Many Thanks to Rob VandenBrink for his help and advice.

Little disclaimer: Techniques described for cracking passwords are only used by me in pursuit of lawful, authorised, penetration testing activities or against my own systems for the purposes of testing & education. I would not encourage anyone to use these attacks unlawfully.  

Comments

Popular posts from this blog

Squid Proxy with SOF-ELK Part 1

Netflow analysis with SiLK - Part 1 Installation