
As you can see on the picture, there was a sudden decrease in attacks on the epmapper service at my sensor during September 2010.
I've been messing with virustotal some time ago, and at that point of time you had to scape the html output. Things have changed, virustotal offers a free http api to interface with their services, all you have to do is sign up and get the api key which is hidden deep in the web2.0 interface.
Using the api, you have get_file_report, scan_file and make_comment.
smb protocol stats for 2010 so far
gnuplotsql.py is a script I wrote to visualize the sqlite database in a useful way.
You can browse my statistics here while you use the script to create stats for your own database:

the git-daemon activity for the dionaea.git repository, pull and uniq hosts/day, basically 5-10 users update their software daily.
Often the most complex part in data visualization is the processing before you can provide the data in a format your visualization software understands.
I choose the git-daemon logs as an example of such an case.
One could have used sshd logs as an example too, but I choose this, as I'm pretty sure there is no parser for the git-daemon logfiles.
In doubt, I'm pretty confident, one could adjust this git-daemon parser to deal with sshd too.
If you want to use a native library in python, but there is no binding, you can 'try' to interface the library with ctypes.
As I wanted to play with bpf, which is part of libpcap, which lacks a python3 binding, I decided to try ctypes.
What I wanted to do:
We will create images showing the correlation of attacker-host, vulnerability, malware.
Basically, image will look like this:

I had to cheat to get the image to a valid size …

Presenting data in a human compatible way is a problem, rumors say at this stage of evolution pictures work best.
Therefore some hints how to create graphs using the dionaea logsql sqlite database.
There is malware downloading files from rapidshare to install on your drive.
Nothing new, I've had shellcode downloading files from rapidshare before
| first | last | hits | url |
| 2010-01-06 | 2010-01-07 | 2 | hxtp://rapidshare.com/files/331049304/hitman1 |
| 2010-01-08 | 2010-01-10 | 2 | hxtp://rapidshare.com/files/332058885/two |
| 2010-01-12 | 2010-01-12 | 1 | hxtp://rapidshare.com/files/333804484/roo |
| 2010-01-17 | 2010-01-17 | 1 | hxtp://rapidshare.com/files/335701706/uhit |
| 2010-01-20 | 2010-01-20 | 1 | hxtp://rapidshare.com/files/337582552/newtom |
| 2010-01-20 | 2010-01-20 | 1 | hxtp://rapidshare.com/files/337582552/newtom |
| 2010-01-21 | 2010-01-21 | 1 | hxtp://rapidshare.com/files/338398794/tomhas |
| 2010-01-21 | 2010-01-21 | 1 | hxtp://rapidshare.com/files/338403156/farhas |
| 2010-01-25 | 2010-01-25 | 1 | hxtp://rapidshare.com/files/340552045/tomd |
| 2010-01-27 | 2010-01-27 | 1 | hxtp://rapidshare.com/files/341701463/tsa |
| 2010-01-27 | 2010-01-27 | 1 | hxtp://rapidshare.com/files/341737994/xc |
| 2010-01-29 | 2010-01-30 | 2 | hxtp://rapidshare.com/files/342702954/dams |
but, the shellcode downloads the files directly.
As promised, I uploaded virustotal results for *every* file the paris db.
The packed sql data has 600k, to use:
bunzip paris-20091207-missionpack_avs.sql.bz2
sqlite3 logsql.sqlite < paris-20091207-missionpack_avs.sql
I can recommend sqliteman to for playing with the database.
I hacked a script to retrieve the virustotal results for the files mentioned in the paris database, and store the results in the paris database so I could query them.
Unfortunately dionaea does not submit to virustotal.com (yet), therefore there are signatures missing for 'some' (75%) files.
Afterwards I designed a queries to retrieve some stats about different things.
As I was interested in the share of Conficker attacks, I decided to retrieve some numbers from the paris database.
As I don't know which files count as Conficker, I had to rely on av vendor signatures.
Andrew Waite downloaded the sqlite datasets and blogged about his results running his mimic-nepstats.py script, as I was surprised about the time it took for the paris dataset, I had to investigate.
For me, the paris dataset took more than 30minutes, and I even rewrote some of the queries to make it faster, but he said it was done in about 3minutes.
So, I gave it a shot, and he was right, it was even faster then the 3 minutes he claimed, I could to it in about ~2minutes.
The only difference I could figure out, my initial test did not use the anonymized database.
I gave it a shot, and the not-anonymized database was rather sloppy compared to the anonymized db.
The steps to create the anonymized db involved dumping the original db and restoring the dump to a new database.