Nepenthes had awful logging, huge logfiles, pretty useless for most people. Some people even started writing parsers for the logfiles to extract&convert the usefull information for use in a database.
For dionaea, I decided to stick with awful logging to textfiles, but provide a useful alternative which is easy to setup and maintain, feature rich and allows retrieving information in a useful way, so you don't have to grep.
Therefore, SQLite is used to write usefull information down to disk in the logsql.py script.
I know, SQLite is not PostgreSQL, PostgreSQL is superior in many ways, but it requires some more steps to setup, where SQLite just works out of the box.
SQLite does not support concurrency, but as dionaea does not access the database simulaneaously, there were no problems with database-concurrency.
On the other hand, if it works with SQLite, it will work with PostgreSQL too, all you'll have to do is adjust some things.
The definition of useful information is undefined, therefore I decided to go for things I want to see for now:
connections
exploits
malware offers
malware downloads
After adjusting the code, to be able to indicate a relation between connections, I had to recompile dionaea's python to support sqlite, and updated the docs.
After verifying 'import sqlite3' worked, I started hacking the code to log incidents to the database.
Once some things got into the database, I wrote some queries to make use of the data.
Which ports got the most connections incoming:
SELECT
COUNT(local_port) AS hitcount,
local_port AS port
FROM
connections
WHERE
connection_type = 'accept'
GROUP BY
local_port
HAVING
COUNT(local_port) > 10
| hitcount | port |
| 75 | 80 |
| 995 | 135 |
| 3982 | 445 |
| 29 | 1957 |
Distribution of the attacks over a day:
SELECT
ROUND((connection_timestamp%(3600*24))/3600) AS hour,
COUNT(*)
FROM
connections
WHERE
connection_parent IS NULL
GROUP BY
ROUND((connection_timestamp%(3600*24))/3600);
| hour | hits |
| 0.0 | 20 |
| 1.0 | 140 |
| 2.0 | 220 |
| 3.0 | 352 |
| 4.0 | 299 |
| 5.0 | 42 |
| 6.0 | 200 |
| 12.0 | 283 |
| 13.0 | 302 |
| 14.0 | 227 |
| 15.0 | 647 |
| 16.0 | 351 |
| 17.0 | 264 |
| 18.0 | 274 |
| 19.0 | 398 |
| 20.0 | 285 |
| 21.0 | 304 |
| 22.0 | 329 |
| 23.0 | 142 |
| 24.0 | 229 |
As I do not have enough data for a single day yet, we miss the hours 7-11.
Which files got downloaded most:
SELECT
COUNT(download_md5_hash),
download_md5_hash
FROM
downloads
GROUP BY
download_md5_hash
ORDER BY
COUNT(download_md5_hash) DESC
| count(download_md5_hash) | download_md5_hash |
| 38 | df51e3310ef609e908a6b487a28ac068 |
| 19 | 14a09a48ad23fe0ea5a180bee8cb750a |
| 6 | 32b0a00ffb8799a7782a19ee99a0214f |
| 5 | 6e0a868563ee18fe6569796c7ce169be |
| 3 | 2fa0e36b36382b74e6e6a437ad664a80 |
| 3 | 329c9d3163d9bf8f4bb9f0d1b846b6aa |
| 3 | 4f4f2ebc395b0fbea708acfc3930ec6d |
| 2 | 1d419d615dbe5a238bbaa569b3829a23 |
| 1 | 27f96b0496502104cf0bfc01391e1408 |
| 1 | 6c9d56535401bbd9052f499d05816ced |
| 1 | a8640b9698542689880b7619dbde1293 |
Which host attacked us most:
SELECT
COUNT(remote_host),
remote_host
FROM
connections
WHERE
connection_type = 'accept'
GROUP BY
remote_host
ORDER BY
COUNT(remote_host)
DESC
LIMIT
10;
| COUNT(remote_host) | remote_host |
| 1655 | 10.204.202.23 |
| 420 | 10.2.101.193 |
| 234 | 10.246.93.128 |
| 224 | 10.208.119.223 |
| 120 | 10.54.151.201 |
| 120 | 10.129.95.105 |
| 120 | 10.174.16.255 |
| 120 | 10.234.207.36 |
| 120 | 10.133.39.52 |
| 120 | 10.31.104.74 |
Count of parent connections remote ip address for each file offerd:
SELECT
count(*),
download_md5_hash,
remote_host
FROM
connections
NATURAL JOIN
downloads
GROUP BY
download_md5_hash,remote_host
ORDER BY
download_md5_hash
DESC
| count(*) | download_md5_hash | remote_host |
| 9 | df51e3310ef609e908a6b487a28ac068 | 10.252.215.142 |
| 13 | df51e3310ef609e908a6b487a28ac068 | 10.241.224.0 |
| 1 | df51e3310ef609e908a6b487a28ac068 | 10.224.252.47 |
| 7 | df51e3310ef609e908a6b487a28ac068 | 10.142.138.33 |
| 2 | df51e3310ef609e908a6b487a28ac068 | 10.145.185.98 |
| 1 | df51e3310ef609e908a6b487a28ac068 | 10.18.166.63 |
| 2 | df51e3310ef609e908a6b487a28ac068 | 10.87.54.69 |
| 3 | df51e3310ef609e908a6b487a28ac068 | 10.179.162.134 |
| 1 | a8640b9698542689880b7619dbde1293 | 10.72.12.70 |
| 2 | 6e0a868563ee18fe6569796c7ce169be | 10.92.48.170 |
| 1 | 6e0a868563ee18fe6569796c7ce169be | 10.234.235.205 |
| 1 | 6e0a868563ee18fe6569796c7ce169be | 10.10.88.64 |
| 1 | 6e0a868563ee18fe6569796c7ce169be | 10.1.82.35 |
| 1 | 6c9d56535401bbd9052f499d05816ced | 10.184.18.233 |
| 1 | 4f4f2ebc395b0fbea708acfc3930ec6d | 10.193.184.162 |
| 1 | 4f4f2ebc395b0fbea708acfc3930ec6d | 10.214.246.53 |
| 1 | 4f4f2ebc395b0fbea708acfc3930ec6d | 10.221.108.13 |
| 6 | 32b0a00ffb8799a7782a19ee99a0214f | 10.224.252.47 |
| 1 | 329c9d3163d9bf8f4bb9f0d1b846b6aa | 10.179.215.73 |
| 1 | 329c9d3163d9bf8f4bb9f0d1b846b6aa | 10.141.47.91 |
| 1 | 329c9d3163d9bf8f4bb9f0d1b846b6aa | 10.76.228.187 |
| 3 | 2fa0e36b36382b74e6e6a437ad664a80 | 10.197.169.135 |
| 1 | 27f96b0496502104cf0bfc01391e1408 | 10.29.109.144 |
| 2 | 1d419d615dbe5a238bbaa569b3829a23 | 10.65.34.231 |
| 1 | 14a09a48ad23fe0ea5a180bee8cb750a | 10.65.34.231 |
| 2 | 14a09a48ad23fe0ea5a180bee8cb750a | 10.29.109.144 |
| 4 | 14a09a48ad23fe0ea5a180bee8cb750a | 10.241.224.0 |
| 1 | 14a09a48ad23fe0ea5a180bee8cb750a | 10.224.252.47 |
| 2 | 14a09a48ad23fe0ea5a180bee8cb750a | 10.142.138.33 |
| 3 | 14a09a48ad23fe0ea5a180bee8cb750a | 10.193.67.162 |
| 2 | 14a09a48ad23fe0ea5a180bee8cb750a | 10.145.185.98 |
| 3 | 14a09a48ad23fe0ea5a180bee8cb750a | 10.87.54.69 |
| 1 | 14a09a48ad23fe0ea5a180bee8cb750a | 10.179.162.134 |
Due to a bug in the logging (I'll take care, just to prevent complains for early adaptors), the next query required a fix:
UPDATE
connections
SET
connection_tree = connection
WHERE
connection_tree IS NULL
Number of distinct attackers per distinct malware file:
SELECT
download_md5_hash,
COUNT(DISTINCT tree.remote_host)
FROM
downloads
NATURAL JOIN connections AS parent
JOIN connections AS tree ON (parent.connection == tree.connection_tree)
GROUP BY
download_md5_hash;
| download_md5_hash | COUNT(DISTINCT tree.remote_host) |
| 14a09a48ad23fe0ea5a180bee8cb750a | 1 |
| 27f96b0496502104cf0bfc01391e1408 | 1 |
| 2fa0e36b36382b74e6e6a437ad664a80 | 1 |
| 329c9d3163d9bf8f4bb9f0d1b846b6aa | 3 |
| 32b0a00ffb8799a7782a19ee99a0214f | 1 |
| 4f4f2ebc395b0fbea708acfc3930ec6d | 3 |
| 6c9d56535401bbd9052f499d05816ced | 1 |
| 6e0a868563ee18fe6569796c7ce169be | 4 |
| a8640b9698542689880b7619dbde1293 | 1 |
| df51e3310ef609e908a6b487a28ac068 | 8 |
Now, lets see if we have a host offering more than one malicious file:
SELECT
tree.remote_host,
COUNT(DISTINCT download_md5_hash)
FROM
downloads
NATURAL JOIN connections AS parent
JOIN connections AS tree ON (parent.connection == tree.connection_tree)
GROUP BY
tree.remote_host
HAVING
COUNT(DISTINCT download_md5_hash) > 1;
| remote_host | COUNT(DISTINCT download_md5_hash) |
| 10.224.252.47 | 2 |
We have one, lets see which files …
SELECT
DISTINCT download_url,
offer_url,
download_md5_hash
FROM
downloads
NATURAL JOIN offers
NATURAL JOIN connections AS parent
JOIN connections AS tree ON (parent.connection == tree.connection_tree)
WHERE
tree.remote_host = '10.224.252.47';
| download_url | offer_url | download_md5_hash |
| fxp://10.224.252.47/ssms.exe | tfxp://10.224.252.47/ssms.exe | 32b0a00ffb8799a7782a19ee99a0214f |
| fxp://10.224.252.47/ssms.exe | tfxp://10.224.252.47/ssms.exe | df51e3310ef609e908a6b487a28ac068 |
Obviously there is a bug, reporting download urls as ftp urls, when they were tftp urls initially …
But, the 2 different files come from the same location, so it is likely the tftp transfer broke.
Number of downloads by location:
SELECT
COUNT(*),
download_url
FROM
downloads
GROUP BY
download_url
ORDER BY
COUNT(*)
DESC;
As mentioned previously, tftp downloads get reported as ftp downloads …
| COUNT(*) | download_url |
| 13 | fxp:://10.241.224.0/ssms.exe |
| 9 | fxp:://10.252.215.142/ssms.exe |
| 7 | fxp:://10.224.252.47/ssms.exe |
| 7 | fxp:://10.142.138.33/ssms.exe |
| 5 | hxxp:://10.181.184.21/i/1i2.zip |
| 3 | fxp:://10.197.169.135/ssms.exe |
| 3 | fxp:://10.179.162.134/ssms.exe |
| 3 | fxp:://1:1@10.241.224.0:64459/ssms.exe |
| 3 | fxp:://1:1@10.87.54.69:42751/ssms.exe |
| 3 | hxxp:://zonetech.info/58.exe |
| 3 | hxxp:://zonetech.info/61.exe |
| 2 | fxp:://10.145.185.98/ssms.exe |
| 2 | fxp:://10.87.54.69/ssms.exe |
| 2 | fxp:://1:1@10.65.34.231:8218/ssms.exe |
| 2 | fxp:://1:1@10.193.67.162:61062/ssms.exe |
| 1 | fxp:://10.65.34.231/ssms.exe |
| 1 | fxp:://10.29.109.144/ssms.exe |
| 1 | fxp:://10.18.166.63/ssms.exe |
| 1 | fxp:://1:1@10.29.109.144:52275/ssms.exe |
| 1 | fxp:://1:1@10.29.109.144:53336/ssms.exe |
| 1 | fxp:://1:1@10.241.224.0:43153/ssms.exe |
| 1 | fxp:://1:1@10.224.252.47:38794/ssms.exe |
| 1 | fxp:://1:1@10.142.138.33:12654/ssms.exe |
| 1 | fxp:://1:1@10.142.138.33:64375/ssms.exe |
| 1 | fxp:://1:1@10.193.67.162:41851/ssms.exe |
| 1 | fxp:://1:1@10.145.185.98:10721/ssms.exe |
| 1 | fxp:://1:1@10.145.185.98:16110/ssms.exe |
| 1 | fxp:://1:1@10.179.162.134:10901/ssms.exe |
| 1 | hxxp:://gutgal.com/sh.php |
| 1 | hxxp:://privcash.cc/r2.exe |
Interested which dcerpc calls get attacked most?
SELECT
COUNT(*),
dcerpc_uuid,
dcerpc_opnum
FROM
dcerpcs
GROUP BY
dcerpc_uuid,
dcerpc_opnum
ORDER BY
COUNT(*)
DESC;
| COUNT(*) | dcerpc_uuid | dcerpc_opnum |
| 1002 | 12345778-1234-abcd-ef00-0123456789ac | 62 |
| 511 | 3919286a-b10c-11d0-9ba8-00c04fd92ef5 | 9 |
| 131 | 000001a0-0000-0000-c000-000000000046 | 4 |
| 34 | 4d9f4ab8-7d1c-11cf-861e-0020af6e7c57 | 0 |
| 30 | 367abb81-9844-35f1-ad32-98f038001003 | 27 |
| 21 | a0010000-0000-0000-c000-000000000046 | 4 |
| 12 | 4b324fc8-1670-01d3-1278-5a47bf6ee188 | 31 |
| 11 | 78573412-3412-cdab-ef00-0123456789ac | 62 |
| 4 | 8d9f4e40-a03d-11ce-8f69-08003e30051b | 54 |
| 1 | 6a281939-0cb1-d011-9ba8-00c04fd92ef5 | 9 |
| 1 | c84f324b-7016-d301-1278-5a47bf6ee188 | 31 |
Using p0f, dionaea can collect fingerprints for incoming attacks, lets play with the data …
Count of operating system genre:
SELECT
COUNT(*),
p0f_genre
FROM
p0fs
GROUP BY
p0f_genre
ORDER BY
COUNT(*)
DESC;
| COUNT(*) | p0f_genre |
| 4491 | Windows |
| 419 | |
| 3 | Linux |
Windows dominates, 10% unknown …
Split by operating system version:
SELECT
COUNT(*),
p0f_genre,
p0f_detail
FROM
p0fs
GROUP BY
p0f_genre,
p0f_detail
ORDER BY
COUNT(*)
DESC;
| COUNT(*) | p0f_genre | p0f_detail |
| 1539 | Windows | XP/2000 (RFC1323+, w+, tstamp-) |
| 1457 | Windows | 2000 SP2+, XP SP1+ (seldom 98) |
| 709 | Windows | XP SP1+, 2000 SP3 |
| 615 | Windows | 2000 SP4, XP SP1+ |
| 419 | | |
| 120 | Windows | XP/2000 (RFC1323+, w, tstamp-) |
| 49 | Windows | 2000 SP4, XP SP1+ (2) |
| 3 | Linux | 2.6 (newer, 3) |
| 2 | Windows | 2003 (1) |
Obviously we are missing Windows Vista and maybe even Windows 7, I'm sure at least Vista got it's cut on the attacks, but as p0f fingerprints are rather outdated, there is nothing I can do about it.
But, once there are more recent fingerprints, identifying Vista and Windows 7, they will show up.
But, lets see which ports got hit by the Linux hosts:
SELECT
COUNT(*),
local_port
FROM
connections
NATURAL JOIN p0fs
WHERE
p0f_genre = 'Linux'
GROUP BY
local_port;
So, something was accessing the webserver on port 80.
Given the possibility to improve a horrible situation from the beginning, I wanted to be able to correlate a malware download with the initiating connection.
Unfortunately there can be more than one connection involved before downloading the malware, as an example take an exploit, which spawns a bindshell, which accepts a connection, which triggers a download via ftp.
So, you have at least 3 connections, and all belong to each other hierarchical,
So, as I wanted to store the information in rdbms, namely SQLite, I knew rdbms do not support hierarchical structures that good, you can store information hierarchical, but querying is hard. For example to get the initiating connection for the malware download in previous example you'd have to walk the parents until there is no parent left.
Thats no real problem, if you can access the database with a cursor, but SQLite does not support cursors.
Therefore, it sounded reasonable to store the id of the upper-most parent for each connection.
I'll try to explain how to accomplish this in python.
The first snippet resolves a python dbi cursor result, so you get a list of dicts, and can access the items by name instead of an index.
It will break if you have multiple columns with the same name, but as long as you take care of forming your queries well, it works fine.
def resolve_result(resultcursor):
names = [resultcursor.description[x][0] for x in range(len(resultcursor.description))]
resolvedresult = [ dict(zip(names, i)) for i in resultcursor]
return resolvedresult
Next, some formatting to print a connection, depending on type:
def print_connection(c, indent):
if c['connection_type'] == 'accept':
print("%*s connection %i %s %s %s %s:%i <- %s:%i" % ( indent, " ", c['connection'], c['connection_protocol'], c['connection_transport'], c['connection_type'], c['local_host'], c['local_port'], c['remote_host'], c['remote_port']) )
elif c['connection_type'] == 'connect':
print("%*s connection %i %s %s %s %s:%i -> %s/%s:%i" % ( indent, " ", c['connection'], c['connection_protocol'], c['connection_transport'], c['connection_type'], c['local_host'], c['local_port'], c['remote_hostname'], c['remote_host'], c['remote_port']) )
elif c['connection_type'] == 'listen':
print("%*s connection %i %s %s %s %s:%i" % ( indent, " ", c['connection'], c['connection_protocol'], c['connection_transport'], c['connection_type'], c['local_host'], c['local_port']) )
Now, we open the database and retrieve the root-connections. Root-connections initiate an attack, so they have no parent-connection.
dbh = sqlite3.connect("/tmp/test.sqlite")
cursor = dbh.cursor()
result = cursor.execute("SELECT * from connections WHERE connection_tree = connection OR connection_tree IS NULL ")
connections = resolve_result(result)
Now we can iterate through the connections, and print each connections information:
for c in connections:
connection = c['connection']
print_connection(c, 1)
But we want to print the child-connections too, and the possible child-child-connection, and possible child-child-child …, so we will use recursion:
def recursive_print(cursor, connection, indent):
result = cursor.execute("SELECT * from connections WHERE connection_parent = ?", (connection, ))
connections = resolve_result(result)
for c in connections:
if c['connection'] == connection:
continue
print_connection(c, indent)
recursive_print(cursor, c['connection'], indent+2)
and change the loop on all root connections to print the child-connections too:
for c in connections:
connection = c['connection']
print_connection(c, 1)
recursive_print(cursor, c['connection'], 2)
This already gives good results:
connection 610 smbd tcp accept 10.69.53.52:445 <- 10.65.34.231:2010
connection 611 remoteshell tcp listen 10.69.53.52:1957
connection 612 remoteshell tcp accept 10.69.53.52:1957 <- 10.65.34.231:2135
connection 613 ftpctrl tcp connect 10.69.53.52:37065 -> 10.65.34.231/None:8218
connection 614 ftpdata tcp listen 10.69.53.52:62087
connection 615 ftpdata tcp accept 10.69.53.52:62087 <- 10.65.34.231:2308
After adding code to print all information we have for each connection, it looks like this:
connection 610 smbd tcp accept 10.69.53.52:445 <- 10.65.34.231:2010
dcerpc request: uuid '3919286a-b10c-11d0-9ba8-00c04fd92ef5' opnum 9
p0f: genre:'Windows' detail:'XP SP1+, 2000 SP3' uptime:'-1' tos:'' dist:'11' nat:'0' fw:'0'
profile: [{'return': '0x7c802367', 'args': ['', 'CreateProcessA'], 'call': 'GetProcAddress'}, ...., {'return': '0', 'args': ['0'], 'call': 'ExitThread'}]
service: bindshell://1957
connection 611 remoteshell tcp listen 10.69.53.52:1957
connection 612 remoteshell tcp accept 10.69.53.52:1957 <- 10.65.34.231:2135
p0f: genre:'Windows' detail:'XP SP1+, 2000 SP3' uptime:'-1' tos:'' dist:'11' nat:'0' fw:'0'
offer: fxp://1:1@10.65.34.231:8218/ssms.exe
download: 1d419d615dbe5a238bbaa569b3829a23 fxp://1:1@10.65.34.231:8218/ssms.exe
connection 613 ftpctrl tcp connect 10.69.53.52:37065 -> 10.65.34.231/None:8218
connection 614 ftpdata tcp listen 10.69.53.52:62087
connection 615 ftpdata tcp accept 10.69.53.52:62087 <- 10.65.34.231:2308
p0f: genre:'Windows' detail:'XP SP1+, 2000 SP3' uptime:'-1' tos:'' dist:'11' nat:'0' fw:'0'
So, if you want to have a gui for the honeypot, you could use the internal http service and script the webinterface in python, within the honeypot itself, or create static html pages using a cron job and serve them, or push them somewhere else.
Sidenote
I changed the addresses, protecting the decoy&attackers.
Hi,
I have problem on p0fs. I have enabled in config file /opt/dionaea/etc/dionaea/dionaea.conf
But I found error in log file if path config =un:///tmp/p0f.sock
However,if path config =/tmp/p0f.sock , I found some connection made.
Then I use sqlite3 to check the connections table, it show some records successfully sqlite> select * from connections where connection_protocol like '%p0f%';
But, I checked the p0fs tables, not record found. sqlite> select * from p0fs; not record found
Is any module or config missed, would you help me to solve it? Thx
Edited formatting