Search This Blog

Sunday, 27 July 2014

IDSUtil and Wireshark Alert plugin

I recently came across a really neat Wireshark plugin for displaying IDS alerts inside of Wireshark. I find this a really useful way of doing historical packet capture analysis as I have the complete detail of the alert right there inside of Wireshark. I installed the IDSUtil on a VM running the Centrych Linux distro which I have found to be one of the most pleasant to install and use. Centrych, the IDSUtil and the Wireshark Alert plugin were all created by Jack Radigan and I highly recommend them to anyone who needs to do historical packet analysis.
There is a great demonstration of the Wireshark plugin and the IDSUtil here.
Once everything is installed and configured all that is required is to update the rules and then run the ids-pcap command with the packet capture:
ids-rules ./snort/default --list
ids-pcap ./snort/default vrouter2.pcap
Once the pcap has been read by Snort or Suricata the alerts are available in Wireshark when the same pcap is opened.

Where there are more than one alert for a packet each alert is shown. Setting a display filter of just 'alert' displays all those packets that have one or more alerts associated with them. I have set my display preferences to only use a single word for columns in Wireshark to make it easier to export packets as CSV.

It's easy to export the displayed packets by selecting 'Export Packet Dissections' from the file menu in Wireshark.

Leaving 'Displayed' selected provide a filename and click OK to save the file.

As in my previous post I can use Log Parser Lizard to process the output to make a Treemap of the IDS alerts.

The query allows me to convert the priority to highest value where there are multiple alerts for a single packet and it also inverts the priority so that a bigger number is a higher priority so that it displays larger blocks for higher priorities, otherwise '1' would be the highest priority but the smallest block size.

Export the file from Log Parser Lizard as TSV and use Notepad++ to edit the headers and add the data types for Treemap.

Then save as .tm3 extension and load the file into Treemap.

Sunday, 20 July 2014

From Bro to Log Parser Lizard to Security Visualisation

Recently I had to do some work with packet captures and system logs and decided to use Log Parser Lizard to examine the syslog files and the Bro logs I got from parsing the pcap's. Log Parser Lizard is a GUI for the brilliant MS Log Parser utility. I know a lot of us of complain that Windows doesn't have our favourite text processing utilities like grep/sed/awk etc but the addition of MS Log Parser more than makes up for the loss. Adding Log Parser Lizard provides a really cool way of analysing data for forensics and much more. For anyone new to MS Log Parser there is a great book entitled Microsoft Log Parser Toolkit available on Amazon. This is a great solution for ad-hoc data analysis when you don't have the data in ELSA or logstash, but more than that, it provides a minimal capability for exploratory data analysis without requiring the 'R' statistical language or Python with the SciPy stack. Even if ultimately you need to use either of those, these techniques could still be useful in curating and munging data before importing into either 'R' or Python SciPy.
To start with I need to use Bro to work it's magic with the packet capture file and then set the timestamp in the file. Bro records the timestamp as an offset from the Unix epoch but that's not going to work for me when I use Log Parser to examine the data, so here it's set to UTC on my Security Onion VM.
Now I have the conn.log file from Bro with a timestamp, I just need to tidy up the header section of the file to leave just the column headings.

Now that I have the Bro connections log with a timestamp and a header row. It’s ready to use with log parser lizard. Next I need to fire-up Log Parser Lizard (note that you need to install MS Log Parser first before installing Log Parser Lizard), then I created a new query group for my queries and then created a new query. Once that's done I set the Input Log Format to TSV.

The input format properties are set to the default values and I don't need change anything here.

The first query is used to confirm that everything is working properly.
SELECT TOP 100 * FROM 'C:\Users\andy\Downloads\blog\bro_conn.log'

As you can see the the output is displayed as a series of columns with Bro field name as the column header. MS Log Parser uses SQL like syntax to extract and manipulate data in the file. There is a considerable amount of functionality and flexibility within MS Log Parser and I will only scratch the surface in this post. The 'TOP 100' part of the query just selects the first one hundred rows of the file without any aggregate functions. It's useful if you just want to get a view of the data to see what you have and check everything works. I usually save the first query I create for a given file and then use 'Save As' each time I want to create a new query as it saves setting the Input Format each time.

I started off by just looking at how many times an IP address appears in the data set and producing a graph using Log Parser Lizards built-in graphing capability.

I've used the COUNT() function to show how many times an IP address appears a source address (id.orig_h). The graphing capability in Log Parser Lizard is extensive, but if it does not do what's needed it's really easy to export data for use with other graphing tools. The next query uses the ROLLUP operator to create a summary for each source IP address and overall total.

In this next query I wanted to see all the failed, reset or rejected connections. There is a list of definitions for conn_state in the Bro documentation. In this query the I'm only looking for ports below 1024 with the desired connection states. The GROUP BY ROLLUP operator again provides additional rows for grand total and a summary total for each source IP.

There is one host in the list with a source address of that seems have more failed connections to more destinations than any of the others. This was caused by a slow (T2) nmap scan originating from that host. The next queries output is going to be used to create a Treemap graph. The query looks at connections states where data was transferred and sets a lower limit for the amount of bytes transferred. The output is formatted specifically to make it easier to load into Treemap.

One of the neat features of Log Parser Lizard is that you can put the displayed table into 'Edit Mode' and in this case I've used that to add the name of the service for Port 3306 (MySQL) as Bro didn't output that. It's easy to export data from inside Log Parser Lizard, select the Export tab and then click the 'Save Results to File' button. For Treemap the file needs to be formatted as tab separated value (TSV). For Treemap to understand the file I will make a small adjustment to the header. Only the first two columns need the header row titles so the rest can be deleted. Treemap also needs to know what the data types are for the first two columns. To to do this I add a row under the headers and put STRING and INTEGER (uppercase is required) separated by a tab. The Port numbers are treated as a STRING, essentially they are a name not a numeric value.

At this point the file has a .tsv extension which needs to be renamed to either .txt or .tm3 for Treemap to open it. The end result shown here has had colouring turned on for the Port so each port shown has it's own unique colour.

One of the cool things about the Treemap program is that you can drill down, filter, set bin sizes for colouring, and set the variable used for scaling. I think Treemap works best when used as a interactive tool rather than just used for producing a static image. This next screenshot shows a drill down to a single source host and the connections it made. This time the sizing of the boxes uses the TotalBytes field to show comparative sizes of the traffic to each host.

It's also easy to export data to use in Afterglow. Afterglow uses three columns, Src IP, Event (Port), Dst IP and optionally a fourth column for node sizes. One problem can be that one very large value in TotalBytes will mean that everything else looks so small it's virtually unreadable. To solve this I've used the LOG10() function to create a base 10 logarithmic scale in the query. Afterglow needs the output in CSV format and the only change the file requires once exported is have the column headers line at the top removed.

The resulting graphic shows the ports scaled better for a comparative view. As you can see there is a connection from to on port 445 and then a connection from that system back to on port 4444. In this case that was a reverse TCP meterpreter connection back to the attacking host.

And Finally...a couple of great books

While I was figuring out some of this stuff I found Applied Security Monitoring to be a really useful resource, they also have a really good site that seems to be adding evermore content. Highly recommended. I also have been working through Data-Driven Security: Analysis, Visualization and Dashboards and although I have no background in statistics it seems like a good introductory text.

Sunday, 26 January 2014

Password Cracking with CUDA 2 ways

A few weeks ago I decided to generate Rainbow Tables for LM hash password cracking. The Rainbowcrack project provides Windows and Linux software that can be used to generate the tables and do the actual cracking. I also wanted to leverage the CUDA GPU support to make the cracking as fast as possible. The first thing I needed to do was to generate the actual rainbow tables. In my lab I have two Proliant ML350 servers running ESXi 5.1 (dual Xeon E5645 in each) so rather than running the table generation on my laptop I created a Windows VM on one of the servers gave it 8 vcpu's and cut 'n' paste the commands for the table generation into a batch file. I set the batch file running and went to bed. The next morning I checked on progress and calculated how long it was going to take to complete. With a bit of rough math I reckoned about six weeks!

Six Weeks Later...

After running at a near constant 100% CPU utilisation for the full six weeks my rainbow tables were finally ready. It's worth noting that you can buy rainbow tables from the Rainbow Crack project if you don't have the inclination or processing power to create your own; it's not cheap, but on the other hand, a lot of time, effort and computing power has all been done by someone else and it could take years to generate some tables with limited processing power. As well as buying or generating your own tables you could download them from Free Rainbow Tables. If you have oodles of bandwidth this could be a great option, however, if like me you don't have great bandwidth they also have a shop that will supply tables on a hard disk like the RainbowCrack Project.
Free Rainbow Tables also have a distributed generation app using the BOINC client software for creating the tables, allowing thousands of computers to participate in the creation of the tables just like the SETI@Home project.

Just a couple of more steps and the tables will ready to use. Now I just needed to sort the tables and then compress them to *.rtc files. rt2rtc reduces the size from 64GB to a more reasonable 32GB that I can put on my laptop.

Next I transferred the all the *.rtc files to my laptop. My laptop is a Dell XPS 15 L502X with 8GB Ram. It's had one important upgrade recently in that I replaced the internal HD with a Crucial M500 960GB SSD. Critically this provides a much needed improvement in disk access speed and will speed up the rainbow table look-up's considerably.

Once the *.rtc files were transferred over to my laptop I needed some LM hashes to test. So I asked a passing stranger to provide me with some hashes.

Hmmm...who was that guy? :-0
Now to test it!

Password Cracking with oclhashcat in a VM

The second method of using a GPU to crack passwords I wanted to look at, uses oclhashcat to do brute force, dictionary and hybrid attacks accelerated by the GPU. The oclhashcat download contains both nVidia CUDA and AMD OpenCL executables. I've been wanting to try using an old nVidia Quadro 2000D I had laying around as dedicated graphics card for CUDA inside of an VM running on a vSphere ESXi server for a while. Then recently I saw a great post from Rob VandenBrink on the SANS ISC Community forums which inspired me to give it go.
One of the weaknesses of the Rainbow Table approach is that it cannot cope with salted hashes. Another way of looking at that is that salt+hash is a more secure way to store passwords. The problem is that you would need to create rainbow tables for every salt which is impractical. However, oclhashcat can brute force passwords that are stored that way using the acceleration of hundreds or thousands of GPU cores.
There is a useful guide available from VMware on how to configure pass-through and vDGA support on ESXi here. The shared version vSGA is not recommended for CUDA/OpenCL, it is designed for accelerated graphics capabilities in VDI environments. There are number of things to pay attention to in this guide:
  1. Ensure the BIOS is set to use the embedded VGA card as primary - The HP Proliant's I have will use any additional graphics card as primary if this is not set
  2. Ensure you have adequate power in the PSU and the correct PCI-E 6/8 pin power connectors for the graphics card
  3. Follow the instructions to Reserve all memory for the VM (all locked) and add the pciHole.start = “2048” to the .vmx file if the VM has more than 2GB of RAM.
  4. Ignore instructions regarding VMware View driver installation - VMare View provides Teradici PCoIP support needed to access the graphics card capabilities remotely

Once the card was configured and the VM restarted all that is required is to install the nVidia video drivers in the VM. You do have to use the console in VMware vCenter to access the VM as RDP won't have access to the graphics card hardware. The first thing I did was to run the benchmark using ./cudahashcat64.bin -b.
It is more than a little disappointing given the speeds Rob VandenBrink got using OpenCL on his AMD Radeon HD 7970. In fact the speeds are only marginally faster than what I get running the benchmark on my laptop with it's nVidia GT 540M. The reason, it seems, is that the nVidia does not have the integer crunching capabilities of the Radeon. In fact I have seen it reported that the nVidia GTX range of cards out-perform the Quadro range in terms of H/s and that the AMD Radeon will also do better than the more expensive Firepro in this respect (with possible exception of the Firepro S10000). So guess what I bought?

It's an XFX AMD Radeon HD 7970. The cheapest I could find in the UK was around the £250 mark. The only problem is I'm now waiting on a power cable for my Proliant ML 350 G6, which unfortunately HP have discontinued;  eBay to the rescue and now on route from the US. I'll update this post with the benchmarks once it's all installed and working.


I eventually got my extra power cables and installed the Radeon HD 7970 Card in my ESXi server. I had a problem running the ./oclhashcat -b command to generate the benchmark for each of the hash types. This only seems to occur if you attempt to run all of the benchmarks at once. So I wrote a little bit of powershell that allowed me to run them individually adding any exceptions to list to exclude those benchmarks if they caused an error. It also tidies up the output for importing it into a table or spreadsheet.
Here are the benchmarks.
oclhashcat algorithm # Hash Type Speed
0 MD5 7886.7 MH/s
10 md5($pass.$salt) 8016.3 MH/s
20 md5($salt.$pass) 4393.2 MH/s
30 md5(unicode($pass).$salt) 7977.6 MH/s
40 md5($salt.unicode($pass)) 4322.7 MH/s
50 HMAC-MD5 (key = $pass) 1181.1 MH/s
60 HMAC-MD5 (key = $salt) 2235.5 MH/s
100 SHA1 2505.9 MH/s
110 sha1($pass.$salt) 2494.4 MH/s
120 sha1($salt.$pass) 1604.4 MH/s
130 sha1(unicode($pass).$salt) 2486.0 MH/s
140 sha1($salt.unicode($pass)) 1521.6 MH/s
150 HMAC-SHA1 (key = $pass) 543.7 MH/s
160 HMAC-SHA1 (key = $salt) 1071.2 MH/s
190 sha1(LinkedIn) 2458.2 MH/s
300 MySQL 1179.2 MH/s
400 phpass, MD5(Wordpress), MD5(phpBB3) 2033.1 kH/s
500 md5crypt, MD5(Unix), FreeBSD MD5, Cisco-IOS MD5 3487.2 kH/s
900 MD4 15819.5 MH/s
1000 NTLM 15261.7 MH/s
1100 DCC, mscash 4162.7 MH/s
1400 SHA256 995.4 MH/s
1410 sha256($pass.$salt) 996.8 MH/s
1420 sha256($salt.$pass) 841.4 MH/s
1430 sha256(unicode($pass).$salt) 995.4 MH/s
1440 sha256($salt.$pass) 818.1 MH/s
1450 HMAC-SHA256 (key = $pass) 237.8 MH/s
1460 HMAC-SHA256 (key = $salt) 496.1 MH/s
1500 descrypt, DES(Unix), Traditional DES 84108.8 kH/s
1600 md5apr1, MD5(APR), Apache MD5 3491.7 kH/s
1700 SHA512 74276.2 kH/s
1710 sha512($pass.$salt) 72691.9 kH/s
1720 sha512($salt.$pass) 70962.0 kH/s
1730 sha512(unicode($pass).$salt) 72364.1 kH/s
1740 sha512($salt.unicode($pass)) 70036.3 kH/s
1750 HMAC-SHA512 (key = $pass) 17852.9 kH/s
1760 HMAC-SHA512 (key = $salt) 34321.7 kH/s
1800 sha512crypt, SHA512(Unix) 12527 H/s
2100 DCC2, mscash2 102.1 kH/s
2400 Cisco-PIX MD5 5281.3 MH/s
2500 WPA/WPA2 131.0 kH/s
2600 Double MD5 2094.6 MH/s
3000 LM 1269.2 MH/s
3100 Oracle 7-10g 351.5 MH/s
3200 bcrypt, Blowfish(OpenBSD) 3531 H/s
5000 SHA-3(Keccak) 141.5 MH/s
5100 Half MD5 4581.7 MH/s
5200 Password Safe SHA-256 467.8 kH/s
5300 IKE-PSK MD5 504.6 MH/s
5400 IKE-PSK SHA1 273.4 MH/s
5500 NetNTLMv1-VANILLA / NetNTLMv1+ESS 7934.7 MH/s
5600 NetNTLMv2 492.6 MH/s
5700 Cisco-IOS SHA256 992.3 MH/s
5800 Samsung Android Password/PIN 1547.7 kH/s
6000 RipeMD160 1613.0 MH/s
6100 Whirlpool 30104.4 kH/s
6211 TrueCrypt 5.0+ PBKDF2-HMAC-RipeMD160 + AES 375.6 kH/s
6221 TrueCrypt 5.0+ PBKDF2-HMAC-SHA512 + AES 36079 H/s
6231 TrueCrypt 5.0+ PBKDF2-HMAC-Whirlpool + AES 1308 H/s
6241 TrueCrypt 5.0+ PBKDF2-HMAC-RipeMD160 boot-mode + AES 743.2 kH/s
6300 AIX {smd5} 3478.9 kH/s
6400 AIX {ssha256} 6213.2 kH/s
6500 AIX {ssha512} 512.3 kH/s
6600 1Password 1063.5 kH/s
6700 AIX {ssha1} 13294.8 kH/s
6800 Lastpass 943.7 kH/s
6900 GOST R 34.11-94 98531.6 kH/s
7100 OSX v10.8 525 H/s
7200 GRUB 2 1837 H/s
7400 sha256crypt, SHA256(Unix) 74996 H/s
7500 Kerberos 5 AS-REQ Pre-Auth etype 23 50997.8 kH/s
11 Joomla 7975.6 MH/s
21 osCommerce, xt:Commerce 4389.5 MH/s
101 SHA-1(Base64), nsldap, Netscape LDAP SHA 2508.2 MH/s
111 SSHA-1(Base64), nsldaps, Netscape LDAP SSHA 2493.3 MH/s
112 Oracle 11g 2493.2 MH/s
121 SMF > v1.1 1602.3 MH/s
122 OSX v10.4, v10.5, v10.6 1587.5 MH/s
131 MSSQL(2000) 2481.6 MH/s
132 MSSQL(2005) 2485.1 MH/s
141 EPiServer 6.x < v4 1505.6 MH/s
1441 EPiServer 6.x > v4 827.5 MH/s
1711 SSHA-512(Base64), LDAP {SSHA512} 72950.4 kH/s
1722 OSX v10.7 70361.1 kH/s
1731 MSSQL(2012) 73323.9 kH/s
2611 vBulletin < v3.8.5 2122.1 MH/s
2711 vBulletin > v3.8.5 1529.8 MH/s
2811 IPB2+, MyBB1.2+ 1516.4 MH/s
The end result is pretty similar to the speeds shown in Rob VandenBrink's post.
One thing to bear in mind is that the GPU draws considerable power once it's running at full capacity.

All of that power means a lot of heat is being generated and that in turn means the system fans must work harder as well.

Normally the four system fans don't go above 21%. With the system and GPU fans running it can be quite noisy.

As you can see below, it would take about 6 days to brute force an NTLM hash with a complex (mixed case alphanumeric and special) 8 character password with this setup.
Of course it would be better to use Rainbow tables for unsalted hashes, that's if you have room for the 1TB table (ntlm_mixalpha-numeric-all-space#1-8) that would be required. It's certainly worth trying oclhashcat dictionary or hybrid attacks if you haven't got the rainbow table before resorting to brute force. One area where oclhashcat scores is in it's ability to crack salted password hashes, something that Rainbow tables just can't do.

Rob mentioned that the AMD Radeon HD 7970 would scale really well (certainly up to 3 cards) using the SLI bridging. This got me thinking, and I started to wonder how you could fit three cards in a case and cope with all the power and cooling requirements - I'll live with the noise :-)
It looks like a lot of people are using the open rigs that Bitcoin miners use for multi-GPU card setups. However, I found a neater (and more expensive) solution in the NetStor NA255A-XGPU External PCIe Gen3 to GPU Desktop Enclosure. There are some interesting comparisons of the AMD Radeon HD 7990 GPU capabilities on Tom's Hardware here. Given the figures on there I would estimate that using the aforementioned NetStor GPU enclosure and three AMD Radeon HD 7990 cards would push the NTLM hashes a second up to around 100 Billion mark.

Check out this monster password cracker from Norway.

Many Thanks to Rob VandenBrink for his help and advice.

Little disclaimer: Techniques described for cracking passwords are only used by me in pursuit of lawful, authorised, penetration testing activities or against my own systems for the purposes of testing & education. I would not encourage anyone to use these attacks unlawfully.  

Monday, 6 May 2013

Using CIF with SiLK

The Collective Intelligence Framework or CIF for short provides a variety of security intelligence feeds that you can use in your environment. CIF requires a server to collect the information from a variety of sources and a client program that can be used to access the intelligence data. CIF has feeds for malware, botnets, suspicious IP addresses, and scanning IP addresses etc. Installing the CIF client on my SiLK server makes using CIF intelligence data with SiLK commands easy, simply follow the instructions for installing the client here. CIF has a number of ways you can output the intelligence data dependent on how you want to use it. For example the following command produces snort rules as output.
The -p snort part causes the output of CIF to be in snort rule format but all that's required for use with SiLK is CSV format. One of the clever things that CIF does is to provide IP addresses for source data even when the source data does not provide IP addressing such as for URLs.
As can be seen from the above command, I needed to remove some localhost IP addresses, headings, blank lines and other extraneous output. The -c parameter allows you to specify the degree of confidence for the intel data. See the CIF homepage for a more detailed description of command line options. SiLK has the ability to create an IP SET and these can be used with rwfilter as source or destination or either addresses:
  • --anyset=IP_SET_FILENAME
  • --not-anyset=IP_SET_FILENAME
  • --dipset=IP_SET_FILENAME
  • --not-dipset=IP_SET_FILENAME
  • --sipset=IP_SET_FILENAME
  • --not-sipset=IP_SET_FILENAME

To create a SiLK IP set from the CIF output simply run the output through rwsetbuild.
The following output uses the previously created malware.set as destination IP addresses:
The output also shows another feature of SiLK to randomize IP addresses. The rwrandomizeip, command randomizes all the IP addresses in a specific IP set. In this case I'm using it to obscure the external IP addressing of my test lab.
The Collective Intelligence Framework is an amazing addition to network security monitoring. CIF can also be integrated into ELSA and even into commercial SIEM platforms. Give it try.

Thursday, 31 January 2013

ELSA with Sagan

Sagan is essentially a snort-like rule based detection engine for log data. Sagan is very easy to integrate with ELSA. All the logs sent to ELSA can be examined by Sagan rules. Every rule that fires produces an alert which is passed into ELSA. Sagan is easy to configure, build and install I just followed the instructions here. As barnyard2 will be used to take the unified2 output from Sagan I built Sagan without native database support. Create a Sagan user, the directories and set the permissions. The Sagan user will be used by barnyard2 and Sagan. Configure syslog-ng to send events via the sagan.fifo to Sagan. Edit the /usr/local/syslog-ng-3.2.4/etc/syslog-ng.conf on ELSA to add the configuration that sends all the inbound logs through Sagan. This configuration sends all logs received over the network to Sagan via the fifo. Now we have
Network Syslog -> Syslog-ng -> Sagan
The logs are still received by ELSA of course, but now the Sagan rules can inspect the log as well. Next edit the /usr/local/etc/sagon.conf Download and install the Sagan rules in /usr/local/etc/sagan-rules. Once Sagan is configured make and install barnyard2. Barnyard2 is used to read the unified2 output of Sagan and send any alerts to ELSA. It could also be used to send alerts to Sguil and Snorby if required. Configure the /usr/local/etc/barnyard2.conf Barnyard2 to send syslog alerts via the local syslog mechanism which means these must be sent to syslog-ng to be viewed in ELSA. Edit the syslog-ng.conf file again and add the following. Now I have this
Network Syslog -> Syslog-ng -> Sagan
Sagan Alerts -> barnyard2 -> Syslog-ng -> ELSA
Add a startup script for Sagan and barnyard2 in /etc/init.d/: ELSA has a parser for the output but it may require a node and web update.

The alerts from Sagan are properly parsed in ELSA and fully searchable.
Create a graph showing Sagan events grouped by signature.
Using the snort class and interface=sagan makes it easy to view just Sagan alerts in ELSA.
Looking at the host for the Sagan alerts makes it easy to find the log entries that caused the Sagan alert.
Add a dashboard for Sagan in ELSA
This dashboard can be imported into ELSA by cut 'n' paste into the import dashboard box in ELSA.

Enjoy ELSA with Sagan! 

Monday, 14 January 2013

Creating a Vyatta parser for ELSA

ELSA or Enterprise Log Search and Archive to give it it's full title is a centralised log management solution. Similar to Splunk in principal; it provides google-like searching, graphing, dashboards and alerts. ELSA is extraordinarily easy to install and get up and running, just follow the quick-start guide.

Searching in ELSA is incredibly fast; Martin Holste, ELSA's creator, went to amazing lengths to ensure that it was not only fast but exceptionally scalable as well. In order to meet those design goals Martin knew that using regex to parse log data would be way too slow, so ELSA uses Syslog-ng and Pattern DB to receive and categorise events.

For more information watch Martin's presentation entitled Perl for Big data on youtube.

ELSA has numerous parsers built into it including Snort, Windows, Apache, Cisco, Check Point and many others, so you might find that your log source is already supported. However, I wanted to add Vyatta Community Edition firewall logs to ELSA and in order to do that I needed to create a new parser.

As I've mentioned in previous posts I use Vyatta community edition as a router and firewall in my test lab. I really like Vyatta for its comprehensive feature set and its easy command-line syntax. And it works brilliantly under VMware.

Pattern DB uses a simple syntax to break up the log data and parse it into fields so that it can be searched easily but before I get to that I need to understand which class I want to add the Vyatta firewall logs to in ELSA.

ELSA uses a class to categorise data feeds. You can use an existing class in ELSA or create your own. In this case I wanted to use two existing classes for firewall traffic that is denied and permitted. The two classes in ELSA are FIREWALL_ACCESS_DENY and FIREWALL_CONNECTION_END.

The screenshot shows the FIREWALL_ACCESS_DENY class and it's associated fields. The field order is significant and you need to know this order to create the parser in Pattern DB. At the bottom of the screenshot an unparsed Vyatta log entry can be seen. Notice the program=kernel is shown, this is the identifying tag (the syslog facility) that ELSA has pulled from the log line. When I create the parser this is used to identify all the messages that should be parsed.
To summarise, so far I have:
    1. Proto
    2. srcip
    3. srcport
    4. dstip
    5. dstport
    6. o_int
    7. i_int
    8. access_group
    1. Proto
    2. srcip
    3. srcport
    4. dstip
    5. dstport
    6. conn_bytes
    7. o_int
    8. i_int
    9. conn_duration
  • Program: kernel

It's easy to look in the MySQL ELSA tables to check that the information I have collected is correct and to gain one additional piece of information I need to to create my parser. The following SQL statements show each of the fields used for the classes FIREWALL_ACCESS_DENY and FIREWALL_CONNECTION_END.
The output is shown below: The last piece of information I need is the field type as the placeholders for the parser need to reflect the type in the DB table.

Paste the SQL output into a spreadsheet and add a Token column. In the Token column number each integer and string field type starting from the lowest value and numbered from 0. The first int field is i0 and the next i1 and so on, the same for each string starting from s0. This will be used in the parser so that the field parsed is assigned to the correct field in the database. The spreadsheet above shows the Token values I added.

Edit PatternDB.XML

The final part is create the parser. Each parser consists of a ruleset followed by a pattern which identify the log lines that should be evaluated in this case program=kernel. This is followed by rules and a rule which specifies a provider, class and ID. The class and id is 2 for FIREWALL_ACCESS_DENY and 3 for FIREWALL_CONNECTION_END. The next section is Pattern or Patterns if you have more than one to apply. Next the Patterns are applied to each log line matching 'kernel'.
Take a look at the parsers already in the /usr/local/elsa/node/conf/patterndb.xml or if using ELSA on Security Onion /opt/elsa/node/conf/patterndb.xml. It's a good idea to examine the patterndb.xml to familiarise yourself with the structure and see some other examples. Each new parser needs to be added between the 'patterndb' Pattern DB start and end tags. Each line in the 'Patterns' section represents a potential Pattern match for various forms of the log line. Examining the first Pattern shows how strings are matched positionally using both string-literal matches and dynamically using QSTRING, ANYSTRING, NUMBER, IPv4 and ESTRING keywords. All keywords start and end with an @ symbol.

  • @QSTRING::[]@
Matches on begin and end characters. In this case [ and ]. I don't need to collect and store this part of the log, so the Token is missing between :: See the next example.
  • [@ESTRING:s2:-A]@
Matches on end string. In this case '-A]'. Note the start square brackets [ are string-literal in front of the variable data that I want to capture in s2 (See spreadsheet s2 = access_group)
  • IN=@ESTRING:s1: @
The next parameter follows the string ' IN=' and as it is ESTRING it just ends with a space before the @. This collects the inbound interface ' IN=eth0 ' Note the Token is s1.
  • OUT=@ESTRING:s0: @
The next part does the same as the above except for OUT this time using the Token s0. ' OUT=eth1 '
Next I just skip forward to the SRC= point in the log line and record the IPv4 address which follows @IPv4:i1:@ using the i1 Token.
  • DST=@IPv4:i3:@
Then capture the DST address and store it in Token i3.
Once again skip forward though the line to ' PROTO=' and store the protocol field @ESTRING:i0: @ with the Token i0.
  • SPT=@ESTRING:i2: @
Then source port and destination ports DPT=@ESTRING:i4: @ follow and are collected with the Token i2 and i4.
The last part just parses any text to the end of the line.

More information on the syntax is available here.

If you need to create your own ELSA Class follow the instructions on 'Adding Parsers' on the Documentation Wiki.

Testing the parser

The parser can be tested using the 'pdbtool' command line tool. An example can be added to check each log line variant with the 'examples' section in the patterndb.xml above. Alternatively you can test a single log line with pdbtool using the -M flag. Note that the examples here are from Security Onion and that the path to the patterndb.xml is different from the regular ELSA install '/usr/local/elsa/node/conf/patterndb.xml'.

If there are no errors from the testing then it's good to go.

The screenshot below shows Vyatta firewall log data in ELSA.
ELSA now ships with support for Vyatta firewall logs, so you don't need to do anything, simply set the syslog host in Vyatta to point to your ELSA server.

Thursday, 15 November 2012

Netflow analysis with SiLK - Part 2 Detection

SiLK provides numerous command line tools used to query Netflow records in the data store. The primary query tool is rwfilter. The rwfilter command provides a way of partitioning the flows and selecting data according to your needs. SiLK can be used to get an understanding of normal traffic and interactions on the network. This is a useful exercise in itself as it provides contextual information and situational awareness, not only for information security but also other parts of the business. An understanding of the network, it’s architecture and usage provides a foundation for troubleshooting and allows changes to be assessed in the light of solid information. Why use Netflow data? Netflow and IPFIX data has one major advantage over full packet capture in that it takes very little space to store by comparison. It can provide a record of traffic in the network going back months or even years which is nearly impossible for full packet capture. This could prove invaluable in the case of forensics where a compromise could have started months prior to discovery.

In this example I looked at how to determine which machines are running database management software. Of course it is possible that databases and other application might be running on non-standard ports, so it might be useful to determine this up front, using a packet capture and some traffic analysis. However, for the purpose of this exercise I’ll assume all DBMS are running on standard ports.

Using rwfilter I selected the sensor of interest and the ‘out’ direction looking for typical DBMS source ports for the dates in question.

As can be seen from the output I have found one MySQL server. Using the —pass=stdout option with rwfilter allows the binary records to be piped into another command from the SiLK suite; in this case rwstats. The rwstats command provides a statistical breakdown showing here the percentage of flows from the source ports I specified. The rwstats command can be used to produce top-x talkers on the network or summarise various data produced by rwfilter.

It should be noted that I intended to specify a range of 1 day by setting the —start-date and —end-date, however, because the time precision is to the hour (15=15:00) I have actually specified 1 day and 1 hour because all the minutes in the end hour are included.

Now that I have identified the database server and the type (MySQL) it it’s time to look at the flows in more detail.

Now that I know the server IP address and port I can set those as selection parameters in rwfilter and look at the percentage of traffic by client for just the 13th of November. Changing rwstats fields shows the percentage for the destination IP address as specified in the —fields=dip option. It’s easy to reverse this and look at the percentage of bytes sent to the server by the clients.

The —type=in is now specified to indicate the inbound flows. Similarly the —sport=3306 changed to —dport=3306 and source address —saddress= changes to —daddress= Flows are unidirectional so it’s convenient to look at them as two halves of a single conversation. If I wanted to see both parts at once I could use —type=in,out but I would have to use —aport=3306 to specify any port and -any-address=172.31253.102 to select both parts of the conversation.

Piping the output to rwuniq this time to see a more detailed view of the interactions between client and server. The rwuniq command summarises each distinct connection. Adding —packets=4- and —ack-flag=1 to the selection criteria of rwfilter ensures we view flows that exchanged at least 4 packs, the minimum required to establish a TCP connection and that at least one ACK flag was set.

Looking at the connections over a time period provides an indication of normal behaviour, I can use another rwfilter command piped to rwuniq to get an idea of how many bytes are sent from the server for each connection made by a client throughout the day.

In the above example I changed the column separator and omitted the column titles to make it easy to create a CSV file from the output for creating a graph. I also sorted the output so that I had it in time order. There is another command line tool for sorting called rwsort and it could be more efficient to pre-sort input to rwuniq in some circumstances.

Looking at the output this would appear to be a fairly normal day's access of the database but it’s probably worth taking a regular snapshot of stats and graphing activity over time. Before doing that, I want to examine how you can automate queries. In the previous example I put the commands into a shell script, but you could use Python to interrogate flow data. This Python script achieves exactly the same output but turning into a script provides a way of automating the collection process using a scheduled job, passing parameters for the date and time and gives better control over the output format.

A sample of the output is shown below.

Once the output is imported into a spreadsheet I can create a timeline graph for bytes transferred to the clients and hopefully any irregularities will show up.

Looking at the the graph it seems I have found some activity that does not fit with the norm; a huge spike from a previously unknown client at is observed towards the end of the monitored period. Although it’s not conclusive evidence of wrong doing, it’s certainly an indicator worthy of further investigation.

One problem area that I regularly work on is monitoring privileged users that have access to databases containing sensitive information such as PCI or PII data and using netflow data to examine activity is one method I have used to get an indication of unusual behaviour.