Table of Contents

logging

Nepenthes had awful logging, huge logfiles, pretty useless for most people. Some people even started writing parsers for the logfiles to extract&convert the usefull information for use in a database. For dionaea, I decided to stick with awful logging to textfiles, but provide a useful alternative which is easy to setup and maintain, feature rich and allows retrieving information in a useful way, so you don't have to grep.
Therefore, SQLite is used to write usefull information down to disk in the logsql.py script.
I know, SQLite is not PostgreSQL, PostgreSQL is superior in many ways, but it requires some more steps to setup, where SQLite just works out of the box. SQLite does not support concurrency, but as dionaea does not access the database simulaneaously, there were no problems with database-concurrency. On the other hand, if it works with SQLite, it will work with PostgreSQL too, all you'll have to do is adjust some things.

The definition of useful information is undefined, therefore I decided to go for things I want to see for now:

SQLite rocks

After adjusting the code, to be able to indicate a relation between connections, I had to recompile dionaea's python to support sqlite, and updated the docs.
After verifying 'import sqlite3' worked, I started hacking the code to log incidents to the database.
Once some things got into the database, I wrote some queries to make use of the data.

attacked ports

Which ports got the most connections incoming:

SELECT 
	COUNT(local_port) AS hitcount, 
	local_port AS port
FROM 
	connections  
WHERE 
	connection_type = 'accept' 
GROUP BY 
	local_port 
HAVING 
	COUNT(local_port) > 10
hitcountport
75 80
995 135
3982 445
29 1957

attacks over a day

Distribution of the attacks over a day:

SELECT 
	ROUND((connection_timestamp%(3600*24))/3600) AS hour,
	COUNT(*) 
FROM 
	connections 
WHERE 
	connection_parent IS NULL 
GROUP BY 
	ROUND((connection_timestamp%(3600*24))/3600);
hourhits
0.0 20
1.0 140
2.0 220
3.0 352
4.0 299
5.0 42
6.0 200
12.0 283
13.0 302
14.0 227
15.0 647
16.0 351
17.0 264
18.0 274
19.0 398
20.0 285
21.0 304
22.0 329
23.0 142
24.0 229

As I do not have enough data for a single day yet, we miss the hours 7-11.

popular malware downloads

Which files got downloaded most:

SELECT 
	COUNT(download_md5_hash), 
	download_md5_hash
FROM 
	downloads
GROUP BY
	download_md5_hash
ORDER BY
	COUNT(download_md5_hash) DESC
count(download_md5_hash)download_md5_hash
38df51e3310ef609e908a6b487a28ac068
1914a09a48ad23fe0ea5a180bee8cb750a
632b0a00ffb8799a7782a19ee99a0214f
56e0a868563ee18fe6569796c7ce169be
32fa0e36b36382b74e6e6a437ad664a80
3329c9d3163d9bf8f4bb9f0d1b846b6aa
34f4f2ebc395b0fbea708acfc3930ec6d
21d419d615dbe5a238bbaa569b3829a23
127f96b0496502104cf0bfc01391e1408
16c9d56535401bbd9052f499d05816ced
1a8640b9698542689880b7619dbde1293

busy attackers

Which host attacked us most:

SELECT 
	COUNT(remote_host),
	remote_host 
FROM 
	connections 
WHERE 
	connection_type = 'accept' 
GROUP BY 
	remote_host 
ORDER BY 	
	COUNT(remote_host) 
	DESC 
LIMIT 
	10;
COUNT(remote_host)remote_host
1655 10.204.202.23
420 10.2.101.193
234 10.246.93.128
224 10.208.119.223
120 10.54.151.201
120 10.129.95.105
120 10.174.16.255
120 10.234.207.36
120 10.133.39.52
120 10.31.104.74

how often did an attacker ask us to download a file

Count of parent connections remote ip address for each file offerd:

SELECT 
	count(*),
	download_md5_hash,
	remote_host 
FROM 
	connections 
NATURAL JOIN 
	downloads 
GROUP BY 
	download_md5_hash,remote_host
ORDER BY 
	download_md5_hash
	DESC
count(*)download_md5_hashremote_host
9df51e3310ef609e908a6b487a28ac06810.252.215.142
13df51e3310ef609e908a6b487a28ac06810.241.224.0
1df51e3310ef609e908a6b487a28ac06810.224.252.47
7df51e3310ef609e908a6b487a28ac06810.142.138.33
2df51e3310ef609e908a6b487a28ac06810.145.185.98
1df51e3310ef609e908a6b487a28ac06810.18.166.63
2df51e3310ef609e908a6b487a28ac06810.87.54.69
3df51e3310ef609e908a6b487a28ac06810.179.162.134
1a8640b9698542689880b7619dbde129310.72.12.70
26e0a868563ee18fe6569796c7ce169be10.92.48.170
16e0a868563ee18fe6569796c7ce169be10.234.235.205
16e0a868563ee18fe6569796c7ce169be10.10.88.64
16e0a868563ee18fe6569796c7ce169be10.1.82.35
16c9d56535401bbd9052f499d05816ced10.184.18.233
14f4f2ebc395b0fbea708acfc3930ec6d10.193.184.162
14f4f2ebc395b0fbea708acfc3930ec6d10.214.246.53
14f4f2ebc395b0fbea708acfc3930ec6d10.221.108.13
632b0a00ffb8799a7782a19ee99a0214f10.224.252.47
1329c9d3163d9bf8f4bb9f0d1b846b6aa10.179.215.73
1329c9d3163d9bf8f4bb9f0d1b846b6aa10.141.47.91
1329c9d3163d9bf8f4bb9f0d1b846b6aa10.76.228.187
32fa0e36b36382b74e6e6a437ad664a8010.197.169.135
127f96b0496502104cf0bfc01391e140810.29.109.144
21d419d615dbe5a238bbaa569b3829a2310.65.34.231
114a09a48ad23fe0ea5a180bee8cb750a10.65.34.231
214a09a48ad23fe0ea5a180bee8cb750a10.29.109.144
414a09a48ad23fe0ea5a180bee8cb750a10.241.224.0
114a09a48ad23fe0ea5a180bee8cb750a10.224.252.47
214a09a48ad23fe0ea5a180bee8cb750a10.142.138.33
314a09a48ad23fe0ea5a180bee8cb750a10.193.67.162
214a09a48ad23fe0ea5a180bee8cb750a10.145.185.98
314a09a48ad23fe0ea5a180bee8cb750a10.87.54.69
114a09a48ad23fe0ea5a180bee8cb750a10.179.162.134

internal bugfixing

Due to a bug in the logging (I'll take care, just to prevent complains for early adaptors), the next query required a fix:

UPDATE 
	connections 
SET 
	connection_tree = connection 
WHERE 
	connection_tree IS NULL

distinct attackers per file

Number of distinct attackers per distinct malware file:

SELECT 
	download_md5_hash, 
	COUNT(DISTINCT tree.remote_host) 
FROM 
	downloads 
	NATURAL JOIN connections AS parent 
	JOIN connections AS tree ON (parent.connection == tree.connection_tree) 
GROUP BY 
	download_md5_hash;
download_md5_hashCOUNT(DISTINCT tree.remote_host)
14a09a48ad23fe0ea5a180bee8cb750a1
27f96b0496502104cf0bfc01391e14081
2fa0e36b36382b74e6e6a437ad664a801
329c9d3163d9bf8f4bb9f0d1b846b6aa3
32b0a00ffb8799a7782a19ee99a0214f1
4f4f2ebc395b0fbea708acfc3930ec6d3
6c9d56535401bbd9052f499d05816ced1
6e0a868563ee18fe6569796c7ce169be4
a8640b9698542689880b7619dbde12931
df51e3310ef609e908a6b487a28ac0688

multiple infections/attacking hosts offering different malware files

Now, lets see if we have a host offering more than one malicious file:

SELECT 
	tree.remote_host,
	COUNT(DISTINCT download_md5_hash) 
FROM 
	downloads 
	NATURAL JOIN connections AS parent 
	JOIN connections AS tree ON (parent.connection == tree.connection_tree) 
GROUP BY 
	tree.remote_host
HAVING
	COUNT(DISTINCT download_md5_hash) > 1;
remote_hostCOUNT(DISTINCT download_md5_hash)
10.224.252.47 2

We have one, lets see which files …

SELECT 
	DISTINCT download_url,
	offer_url,
	download_md5_hash
FROM 
	downloads
	NATURAL JOIN offers 
	NATURAL JOIN connections AS parent 
	JOIN connections AS tree ON (parent.connection == tree.connection_tree) 
WHERE
	tree.remote_host = '10.224.252.47';
download_urloffer_urldownload_md5_hash
fxp://10.224.252.47/ssms.exe tfxp://10.224.252.47/ssms.exe 32b0a00ffb8799a7782a19ee99a0214f
fxp://10.224.252.47/ssms.exe tfxp://10.224.252.47/ssms.exe df51e3310ef609e908a6b487a28ac068

Obviously there is a bug, reporting download urls as ftp urls, when they were tftp urls initially … But, the 2 different files come from the same location, so it is likely the tftp transfer broke.

popular download locations

Number of downloads by location:

SELECT 
	COUNT(*),
	download_url 
FROM 
	downloads 
GROUP BY 
	download_url 
ORDER BY 
	COUNT(*) 
	DESC;

As mentioned previously, tftp downloads get reported as ftp downloads …

COUNT(*)download_url
13fxp:://10.241.224.0/ssms.exe
9fxp:://10.252.215.142/ssms.exe
7fxp:://10.224.252.47/ssms.exe
7fxp:://10.142.138.33/ssms.exe
5hxxp:://10.181.184.21/i/1i2.zip
3fxp:://10.197.169.135/ssms.exe
3fxp:://10.179.162.134/ssms.exe
3fxp:://1:1@10.241.224.0:64459/ssms.exe
3fxp:://1:1@10.87.54.69:42751/ssms.exe
3hxxp:://zonetech.info/58.exe
3hxxp:://zonetech.info/61.exe
2fxp:://10.145.185.98/ssms.exe
2fxp:://10.87.54.69/ssms.exe
2fxp:://1:1@10.65.34.231:8218/ssms.exe
2fxp:://1:1@10.193.67.162:61062/ssms.exe
1fxp:://10.65.34.231/ssms.exe
1fxp:://10.29.109.144/ssms.exe
1fxp:://10.18.166.63/ssms.exe
1fxp:://1:1@10.29.109.144:52275/ssms.exe
1fxp:://1:1@10.29.109.144:53336/ssms.exe
1fxp:://1:1@10.241.224.0:43153/ssms.exe
1fxp:://1:1@10.224.252.47:38794/ssms.exe
1fxp:://1:1@10.142.138.33:12654/ssms.exe
1fxp:://1:1@10.142.138.33:64375/ssms.exe
1fxp:://1:1@10.193.67.162:41851/ssms.exe
1fxp:://1:1@10.145.185.98:10721/ssms.exe
1fxp:://1:1@10.145.185.98:16110/ssms.exe
1fxp:://1:1@10.179.162.134:10901/ssms.exe
1hxxp:://gutgal.com/sh.php
1hxxp:://privcash.cc/r2.exe

addressed dcerpc calls

Interested which dcerpc calls get attacked most?

SELECT 
	COUNT(*),
	dcerpc_uuid,
	dcerpc_opnum 
FROM 
	dcerpcs 
GROUP BY 
	dcerpc_uuid,
	dcerpc_opnum 
ORDER BY 
	COUNT(*) 
DESC;
COUNT(*)dcerpc_uuiddcerpc_opnum
1002 12345778-1234-abcd-ef00-0123456789ac62
5113919286a-b10c-11d0-9ba8-00c04fd92ef59
131000001a0-0000-0000-c000-000000000046 4
344d9f4ab8-7d1c-11cf-861e-0020af6e7c570
30367abb81-9844-35f1-ad32-98f03800100327
21a0010000-0000-0000-c000-000000000046 4
124b324fc8-1670-01d3-1278-5a47bf6ee18831
11 78573412-3412-cdab-ef00-0123456789ac62
48d9f4e40-a03d-11ce-8f69-08003e30051b54
16a281939-0cb1-d011-9ba8-00c04fd92ef59
1c84f324b-7016-d301-1278-5a47bf6ee18831

fingerprints

Using p0f, dionaea can collect fingerprints for incoming attacks, lets play with the data …

operating system genre

Count of operating system genre:

SELECT 
	COUNT(*),
	p0f_genre 
FROM 
	p0fs 
GROUP BY 
	p0f_genre 
ORDER BY 
	COUNT(*) 
	DESC;
COUNT(*)p0f_genre
4491Windows
419
3Linux

Windows dominates, 10% unknown …

operating system

Split by operating system version:

SELECT 
	COUNT(*),
	p0f_genre,
	p0f_detail 
FROM 
	p0fs 
GROUP BY 
	p0f_genre,
	p0f_detail 
ORDER BY 
	COUNT(*) 
	DESC;
COUNT(*)p0f_genrep0f_detail
1539WindowsXP/2000 (RFC1323+, w+, tstamp-)
1457Windows2000 SP2+, XP SP1+ (seldom 98)
709WindowsXP SP1+, 2000 SP3
615Windows2000 SP4, XP SP1+
419
120WindowsXP/2000 (RFC1323+, w, tstamp-)
49Windows2000 SP4, XP SP1+ (2)
3Linux2.6 (newer, 3)
2Windows2003 (1)

Obviously we are missing Windows Vista and maybe even Windows 7, I'm sure at least Vista got it's cut on the attacks, but as p0f fingerprints are rather outdated, there is nothing I can do about it. But, once there are more recent fingerprints, identifying Vista and Windows 7, they will show up.

ports attacked from linux hosts

But, lets see which ports got hit by the Linux hosts:

SELECT 
	COUNT(*),
	local_port 
FROM 
	connections 
	NATURAL JOIN p0fs 
WHERE 
	p0f_genre = 'Linux' 
GROUP BY 
	local_port;
COUNT(*)local_port
3 80

So, something was accessing the webserver on port 80.

attacks are trees

Given the possibility to improve a horrible situation from the beginning, I wanted to be able to correlate a malware download with the initiating connection. Unfortunately there can be more than one connection involved before downloading the malware, as an example take an exploit, which spawns a bindshell, which accepts a connection, which triggers a download via ftp. So, you have at least 3 connections, and all belong to each other hierarchical,

So, as I wanted to store the information in rdbms, namely SQLite, I knew rdbms do not support hierarchical structures that good, you can store information hierarchical, but querying is hard. For example to get the initiating connection for the malware download in previous example you'd have to walk the parents until there is no parent left. Thats no real problem, if you can access the database with a cursor, but SQLite does not support cursors. Therefore, it sounded reasonable to store the id of the upper-most parent for each connection.

I'll try to explain how to accomplish this in python.

The first snippet resolves a python dbi cursor result, so you get a list of dicts, and can access the items by name instead of an index. It will break if you have multiple columns with the same name, but as long as you take care of forming your queries well, it works fine.

def resolve_result(resultcursor):
	names = [resultcursor.description[x][0] for x in range(len(resultcursor.description))]
	resolvedresult = [ dict(zip(names, i)) for i in resultcursor]
	return resolvedresult

Next, some formatting to print a connection, depending on type:

def print_connection(c, indent):
	if c['connection_type'] == 'accept':
		print("%*s connection %i %s %s %s %s:%i <- %s:%i" % ( indent, " ", c['connection'], c['connection_protocol'], c['connection_transport'], c['connection_type'], c['local_host'], c['local_port'], c['remote_host'], c['remote_port']) )
	elif c['connection_type'] == 'connect':
		print("%*s connection %i %s %s %s %s:%i -> %s/%s:%i" % ( indent, " ", c['connection'], c['connection_protocol'], c['connection_transport'], c['connection_type'], c['local_host'], c['local_port'], c['remote_hostname'], c['remote_host'], c['remote_port']) )
	elif c['connection_type'] == 'listen':
		print("%*s connection %i %s %s %s %s:%i" % ( indent, " ", c['connection'], c['connection_protocol'], c['connection_transport'], c['connection_type'], c['local_host'], c['local_port']) )

Now, we open the database and retrieve the root-connections. Root-connections initiate an attack, so they have no parent-connection.

dbh = sqlite3.connect("/tmp/test.sqlite")
cursor = dbh.cursor()
 
result = cursor.execute("SELECT * from connections WHERE connection_tree = connection OR connection_tree IS NULL ")
connections = resolve_result(result)

Now we can iterate through the connections, and print each connections information:

for c in connections:
	connection = c['connection']
	print_connection(c, 1)

But we want to print the child-connections too, and the possible child-child-connection, and possible child-child-child …, so we will use recursion:

def recursive_print(cursor, connection, indent):
	result = cursor.execute("SELECT * from connections WHERE connection_parent = ?", (connection, ))
	connections = resolve_result(result)
	for c in connections:
		if c['connection'] == connection:
			continue
		print_connection(c, indent)
		recursive_print(cursor, c['connection'], indent+2)

and change the loop on all root connections to print the child-connections too:

for c in connections:
	connection = c['connection']
	print_connection(c, 1)
	recursive_print(cursor, c['connection'], 2)

This already gives good results:

  connection 610 smbd tcp accept 10.69.53.52:445 <- 10.65.34.231:2010
   connection 611 remoteshell tcp listen 10.69.53.52:1957
     connection 612 remoteshell tcp accept 10.69.53.52:1957 <- 10.65.34.231:2135
       connection 613 ftpctrl tcp connect 10.69.53.52:37065 -> 10.65.34.231/None:8218
         connection 614 ftpdata tcp listen 10.69.53.52:62087
           connection 615 ftpdata tcp accept 10.69.53.52:62087 <- 10.65.34.231:2308

After adding code to print all information we have for each connection, it looks like this:

  connection 610 smbd tcp accept 10.69.53.52:445 <- 10.65.34.231:2010
   dcerpc request: uuid '3919286a-b10c-11d0-9ba8-00c04fd92ef5' opnum 9
   p0f: genre:'Windows' detail:'XP SP1+, 2000 SP3' uptime:'-1' tos:'' dist:'11' nat:'0' fw:'0'
   profile: [{'return': '0x7c802367', 'args': ['', 'CreateProcessA'], 'call': 'GetProcAddress'}, ...., {'return': '0', 'args': ['0'], 'call': 'ExitThread'}]
   service: bindshell://1957
   connection 611 remoteshell tcp listen 10.69.53.52:1957
     connection 612 remoteshell tcp accept 10.69.53.52:1957 <- 10.65.34.231:2135
       p0f: genre:'Windows' detail:'XP SP1+, 2000 SP3' uptime:'-1' tos:'' dist:'11' nat:'0' fw:'0'
       offer: fxp://1:1@10.65.34.231:8218/ssms.exe
       download: 1d419d615dbe5a238bbaa569b3829a23 fxp://1:1@10.65.34.231:8218/ssms.exe
       connection 613 ftpctrl tcp connect 10.69.53.52:37065 -> 10.65.34.231/None:8218
         connection 614 ftpdata tcp listen 10.69.53.52:62087
           connection 615 ftpdata tcp accept 10.69.53.52:62087 <- 10.65.34.231:2308
             p0f: genre:'Windows' detail:'XP SP1+, 2000 SP3' uptime:'-1' tos:'' dist:'11' nat:'0' fw:'0'

So, if you want to have a gui for the honeypot, you could use the internal http service and script the webinterface in python, within the honeypot itself, or create static html pages using a cron job and serve them, or push them somewhere else.

Sidenote
I changed the addresses, protecting the decoy&attackers.