
Presenting data in a human compatible way is a problem, rumors say at this stage of evolution pictures work best.
Therefore some hints how to create graphs using the dionaea logsql sqlite database.
As dionaea embeds sqlite, it is rather easy to extract the data.
We want to graph the number of new uniq files a day, I used the following query to make sqlite create the dataset which can be used to graph.
SELECT
strftime('%Y-%m-%d',connection_timestamp,'unixepoch','localtime') AS date,
MAX(downloads.download) AS total,
COUNT(downloads.download) AS today,
(
SELECT
COUNT(DISTINCT x.download_md5_hash)
FROM
downloads AS x
NATURAL JOIN
connections AS c
WHERE
strftime('%Y-%m-%d',c.connection_timestamp,'unixepoch','localtime')
<= strftime('%Y-%m-%d',connections.connection_timestamp,'unixepoch','localtime')
) AS uniq,
(
SELECT
COUNT(*)
FROM
(
SELECT
a.download_md5_hash
FROM
downloads AS a
JOIN
connections AS b ON(a.connection = b.connection)
GROUP BY
a.download_md5_hash
HAVING
strftime('%Y-%m-%d',MIN(b.connection_timestamp),'unixepoch','localtime')
= strftime('%Y-%m-%d',MAX(connections.connection_timestamp),'unixepoch','localtime')
)
)AS uniq_this_day,
(
SELECT
COUNT(*)
FROM
(
SELECT
MIN(a.download) AS download
FROM
downloads AS a
JOIN
connections AS b ON(a.connection = b.connection)
GROUP BY
a.download_md5_hash
HAVING
strftime('%Y-%m-%d',MIN(b.connection_timestamp),'unixepoch','localtime')
= strftime('%Y-%m-%d',connections.connection_timestamp,'unixepoch','localtime')
) AS newdownloads
NATURAL JOIN downloads
WHERE
download_url LIKE 'logxmpp://%'
)AS uniq_this_day_via_logxmpp
FROM
downloads
NATURAL JOIN
connections
GROUP BY
date
ORDER BY
date DESC;
uniqfiles.sql
The queries performance is horrible, for each day we run 3 subqueries, 2 of these subqueries have another subquery. The grouping by the generated date-string is suboptimal too, still the query takes about a second on a moderate sized database. Running the query on the paris and berlin databases took ~90s for berlin, and ~679s for paris.
When working with really large databases, it might be usefull to create a temp table with the data which is required in the representation which is used.
sqlite3 /opt/dionaea/var/dionaea/logsql.sqlite
.timer on
.output uniqfiles.txt
.read uniqfiles.sql
CPU Time: user 1.850000 sys 0.000000
As the “new” files distribution is more or less random, the graphs look like a saw, which makes them hard to get, therefore I decided to smooth the graphs for these numbers, rendering the graph inexact, but giving a good overview of the tendency.
set terminal png size 600,120 nocrop butt font "/usr/share/fonts/truetype/ttf-liberation/LiberationSans-Regular.ttf" 8
set output "newfiles.png"
set xdata time
set timefmt "%Y-%m-%d"
set format x "%b %d"
set ylabel "total"
set y2label "new"
set y2tics
set datafile separator "|"
plot 'uniqfiles.txt' using 1:4 title "uniq" with lines, "" using 1:6 smooth bezier title "new uniq" with lines axes x1y2, "" using 1:5 smooth bezier title "new xmpp" with lines axes x1y2
set terminal png size 800,160 nocrop butt font "/usr/share/fonts/truetype/ttf-liberation/LiberationSans-Regular.ttf" 8
set output "totalfiles.png"
set xdata time
set timefmt "%Y-%m-%d"
set format x "%b %d"
set ylabel "total"
set y2label "downloads"
set y2tics
set datafile separator "|"
plot 'uniqfiles.txt' using 1:2 title "total" with lines, "" using 1:3 smooth bezier title "downloads" with lines axes x1y2
Nice work as usual :D
You always manage to beat me to the punch, had Daviz on my radar for a while for similar jobs but haven't managed to find the time to get anything workable yet.