Software at carnivore.it

dionaea

nepenthes

libemu

nebula

liblcfg


git-daemon logfile processing

dionaea git useage
the git-daemon activity for the dionaea.git repository, pull and uniq hosts/day, basically 5-10 users update their software daily.

Often the most complex part in data visualization is the processing before you can provide the data in a format your visualization software understands.
I choose the git-daemon logs as an example of such an case.
One could have used sshd logs as an example too, but I choose this, as I'm pretty sure there is no parser for the git-daemon logfiles. In doubt, I'm pretty confident, one could adjust this git-daemon parser to deal with sshd too.

python3 - ctypes

ctypes

If you want to use a native library in python, but there is no binding, you can 'try' to interface the library with ctypes.

As I wanted to play with bpf, which is part of libpcap, which lacks a python3 binding, I decided to try ctypes.

What I wanted to do:

  • compile a bpf filter like dst port 445 and src net 127.0.0.0/8
  • match the bpf filter on a buffer

data visualisation - afterglow

We will create images showing the correlation of attacker-host, vulnerability, malware.
Basically, image will look like this:
small version of an afterglow picture
I had to cheat to get the image to a valid size …

data visualisation

new files
Presenting data in a human compatible way is a problem, rumors say at this stage of evolution pictures work best.
Therefore some hints how to create graphs using the dionaea logsql sqlite database.

rapidshare anyone?

There is malware downloading files from rapidshare to install on your drive. Nothing new, I've had shellcode downloading files from rapidshare before

first last hits url
2010-01-06 2010-01-07 2 hxtp://rapidshare.com/files/331049304/hitman1
2010-01-08 2010-01-10 2 hxtp://rapidshare.com/files/332058885/two
2010-01-12 2010-01-12 1 hxtp://rapidshare.com/files/333804484/roo
2010-01-17 2010-01-17 1 hxtp://rapidshare.com/files/335701706/uhit
2010-01-20 2010-01-20 1 hxtp://rapidshare.com/files/337582552/newtom
2010-01-20 2010-01-20 1 hxtp://rapidshare.com/files/337582552/newtom
2010-01-21 2010-01-21 1 hxtp://rapidshare.com/files/338398794/tomhas
2010-01-21 2010-01-21 1 hxtp://rapidshare.com/files/338403156/farhas
2010-01-25 2010-01-25 1 hxtp://rapidshare.com/files/340552045/tomd
2010-01-27 2010-01-27 1 hxtp://rapidshare.com/files/341701463/tsa
2010-01-27 2010-01-27 1 hxtp://rapidshare.com/files/341737994/xc
2010-01-29 2010-01-30 2 hxtp://rapidshare.com/files/342702954/dams

but, the shellcode downloads the files directly.

paris mission pack avs

As promised, I uploaded virustotal results for *every* file the paris db.

The packed sql data has 600k, to use:

bunzip paris-20091207-missionpack_avs.sql.bz2
sqlite3 logsql.sqlite < paris-20091207-missionpack_avs.sql

I can recommend sqliteman to for playing with the database. sqliteman picture

virustotal fun

I hacked a script to retrieve the virustotal results for the files mentioned in the paris database, and store the results in the paris database so I could query them. Unfortunately dionaea does not submit to virustotal.com (yet), therefore there are signatures missing for 'some' (75%) files. Afterwards I designed a queries to retrieve some stats about different things.

Conficker/Kido domination

As I was interested in the share of Conficker attacks, I decided to retrieve some numbers from the paris database.
As I don't know which files count as Conficker, I had to rely on av vendor signatures.

sqlite performance

Andrew Waite downloaded the sqlite datasets and blogged about his results running his mimic-nepstats.py script, as I was surprised about the time it took for the paris dataset, I had to investigate. For me, the paris dataset took more than 30minutes, and I even rewrote some of the queries to make it faster, but he said it was done in about 3minutes. So, I gave it a shot, and he was right, it was even faster then the 3 minutes he claimed, I could to it in about ~2minutes.

The only difference I could figure out, my initial test did not use the anonymized database. I gave it a shot, and the not-anonymized database was rather sloppy compared to the anonymized db. The steps to create the anonymized db involved dumping the original db and restoring the dump to a new database.

Post it yourself

Microsoft Malware Protection Center recently had a news about Do and don’ts for p@$$w0rd$, but they just released some statistics about the data gathered. Thats common, raw data is dangerous for the decoys, nobody wants to reveal his honeypots address, and raw data is pretty large.
But as current technology allows data compression, and we are confident our anonymization allows protecting decoy and attackers, we decided to release raw data.

We offer two sqlite databases 1),

  • berlin (~39MB)
    • ~ 280MB uncompressed
    • ~ four weeks 2)
    • ~ 600.000 attacks
    • ~ 2700 malware downloads
    • conficker free zone
    • includes p0f fingerprints
    • includes information about rejected connections
  • paris (391MB)
    • ~ 4.1GB uncompressed
    • ~ one week 3)
    • ~ 7.8 million attacks
    • ~ 750.000 malware downloads
    • insane amount of conficker attacks

Reports

Please let us know, if you post/blog about it, so we can link it here. A simple mail to nepenthesdev@gmail.com, or the still virgin #dionaea hashtag on twitter will do the trick.

1) please be aware both archives reveal the logsql.sqlite file, so don't unpack both to the same directory
2) some downtimes during dionaea updates
3) 15 minutes downtime at given point

logging

Nepenthes had awful logging, huge logfiles, pretty useless for most people. Some people even started writing parsers for the logfiles to extract&convert the usefull information for use in a database. For dionaea, I decided to stick with awful logging to textfiles, but provide a useful alternative which is easy to setup and maintain, feature rich and allows retrieving information in a useful way, so you don't have to grep.
Therefore, SQLite is used to write usefull information down to disk in the logsql.py script.
I know, SQLite is not PostgreSQL, PostgreSQL is superior in many ways, but it requires some more steps to setup, where SQLite just works out of the box. SQLite does not support concurrency, but as dionaea does not access the database simulaneaously, there were no problems with database-concurrency. On the other hand, if it works with SQLite, it will work with PostgreSQL too, all you'll have to do is adjust some things.

The definition of useful information is undefined, therefore I decided to go for things I want to see for now:

  • connections
  • exploits
  • malware offers
  • malware downloads

start.txt · Last modified: 2010/06/15 14:07 by common
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0