During the weekend I got some backtraces via email, indicating a problems in dionaea, the common thing in all of them was a reference to trigger_cb in the backtrace:
./dionaea(sigsegv_backtrace_cb+0x26)[0x418686]
/lib/libc.so.6(+0x33c20)[0x7faf9cdbbc20]
/lib/libglib-2.0.so.0(g_hash_table_destroy+0x9)[0x7faf9dbfa0a9]
/opt/dionaea/lib/dionaea/emu.so(emulate_ctx_free+0x170)[0x7faf98d528ea]
./dionaea(trigger_cb+0x44)[0x419128]
/opt/dionaea/lib/libev.so.3(ev_invoke_pending+0x61)[0x7faf9e4be101]
/opt/dionaea/lib/libev.so.3(ev_loop+0x66f)[0x7faf9e4c2fff]
./dionaea(main+0x13b4)[0x40a897]
/lib/libc.so.6(__libc_start_main+0xfe)[0x7faf9cda6d8e]
./dionaea[0x408b09]
/opt/dionaea/bin/dionaea(sigsegv_backtrace_cb+0x20)[0x805d776]
[0xb784b400]
/opt/dionaea/bin/dionaea(trigger_cb+0x2e)[0x805e0b7]
/opt/dionaea/lib/libev.so.3(ev_invoke_pending+0x54)[0xb783e554]
/opt/dionaea/lib/libev.so.3(ev_loop+0x67f)[0xb78439ff]
/opt/dionaea/bin/dionaea(main+0x15e7)[0x8050788]
/lib/libc.so.6(__libc_start_main+0xe7)[0xb7594ce7]
/opt/dionaea/bin/dionaea[0x804e8d1]
/opt/dionaea/bin/dionaea(sigsegv_backtrace_cb+0x20)[0x805d776]
[0xb771f400]
/opt/dionaea/bin/dionaea(trigger_cb+0x2e)[0x805e0b7]
/opt/dionaea/lib/libev.so.3(ev_invoke_pending+0x54)[0xb7712554]
/opt/dionaea/lib/libev.so.3(ev_loop+0x67f)[0xb77179ff]
/opt/dionaea/bin/dionaea(main+0x15e7)[0x8050788]
/lib/libc.so.6(__libc_start_main+0xe7)[0xb7468ce7]
/opt/dionaea/bin/dionaea[0x804e8d1]
./dionaea(sigsegv_backtrace_cb+0x20)[0x805e3de]
[0xa0f400]
./dionaea(tempfile_close+0x1e)[0x80524f5]
/opt/dionaea/lib/dionaea/emu.so(emulate_ctx_free+0x12d)[0x5aacc2]
./dionaea(trigger_cb+0x2e)[0x805ed1f]
/opt/dionaea/lib/libev.so.3(ev_invoke_pending+0x54)[0x5bb9d4]
/opt/dionaea/lib/libev.so.3(ev_loop+0x6dc)[0x5c10ac]
./dionaea(main+0x15e3)[0x80513a9]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0x126bd6]
./dionaea[0x804f4f1]
trigger_cb is a multiplexer function, it serves as a trampoline for code which runs in a thread from the thread-pool which is used in dionaea for shellcode detection and emulation and needs to interact with the event loop 1).
So, my first suspect was libemu, I was noticed libemu would leak memory during shellcode detection, and I took care of it. So my first idea was I was free'ing too much memory to early.
Given the bistreams I got from the people reporting problems, I tried to reproduce and failed, it always worked fine for me.
I extracted the shellcode from the bistreams, ran them in sctest using valgrind, but according to valgrind everything was fine.
I ran dionaea in valgrind, which gives a lot of false positives due to python's PyMalloc allocator, but everything was fine.
It just worked.
So I went on …
I noticed the occurence of tempfile_close, tempfile got changed over the weekend too, so it would was leaking memory too, maybe free'ing these bytes caused problems?
I found nothing.
I starred at the code changes and related code, but apart from spotting old&unrelated memory leaks, I found nothing at all.
For all segfaults, the shellcode would require emulation, so it would run within the libemu vm and some api calls would be run on the real underlying operating system to satisfy requirements emulating and interpreting the profile of api calls can not.
I got an idea when I found this in a logfile provided by a user:
emulate emulate.c:310-debug: cpu error error accessing 0x7c80c058 not mapped
emulate emulate.c:325-debug: shellcode took 0.037658 seconds on cpu, 171833 steps
emulate emulate.c:310-debug: cpu error error accessing 0x7c80c058 not mapped
emulate emulate.c:325-debug: shellcode took 0.037671 seconds on cpu, 171834 steps
thread threads.c:52-debug: Thread fn 0xb6effa4d con (nil) data 0xb505dbf0 took 0.020237 ms
thread threads.c:52-debug: Thread fn 0xb6effa4d con (nil) data 0xb505dbf0 took 0.014905 ms
Something was wrong with the shellcode emulation threads, there were two threads finishing th emulation of the same shellcode - 0xb505dbf0.
The following explanation of how dionaea runs shellcode on the host operating system is taken from a mail I sent to nepenthes-devel some time ago:
Next thing is attack detection, where amun does the same as nepenthes,
pattern matching, dionaea does not use pattern matching (yet?), it
detects shellcode using emulation, independent from the protocol used.
Once a shellcode is detected, a profile is created and evaluated - if
possible actions are taken to do what the shellcode would do.
Easy example of such shellcode profile:
I decided to censor the url, else mail filters and other crap may interfere.
[{'return': '0x7df20000', 'args': ['urlmon'], 'call': 'LoadLibraryA'},
{'return': '0', 'args': ['', 'hxxp://gut________gal.com/newfido.exe',
'85.scr', '0', '0'], 'call': 'URLDownloadToFile'}, {'return': '32',
'args': ['85.scr', '895'], 'call': 'WinExec'}, {'return': '0', 'args':
['-1'], 'call': 'Sleep'}]
All API calls are recorded, including their arguments, serialized as json.
In this case, emu.py had a look on it, and decided the shellcode would
download the file hxxp://gut________gal.com/newfido.exe.
If not, the shellcode is run within a small vm, libemu, so it can do
whatever it wants to do - we monitor the shellcode behaviour to make
sure it is not … malicious - harming the honeypot by eating up
resources.
Example:
shellcode is detected, can not be profiled, thread is used to run the shellcode.
Once io has to be done, the socket is moved into the main-loop, and
the thread goes back into the pool.
The mainloop calls, once there action, and we request a thread from
the pool to continue emulation.
This shellcode is the famous linkbot shellcode, which is called
blink:// in nepenthes.
> proc_emu_on_io_in con 0x7fa7fc0009d0 pd 0x7fa7fc0016e0
> shellcode found offset 75
> profile recv, emulating instead
As mentioned previously …
> user_hook_socket env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4580 ...
> user_hook_bind env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4430 ...
> user_hook_listen env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d44e0 ...
The shellcode bound a socket listening on a port
> ll_win_hook_accept env 0x7fa7fc076360 hook 0x7fa7fc0d4420
This accept call would block, blocking the thread is bad
> async_connection_accept data 0x7fa7fc0def60
The thread sends the main-event-loop a message to include this socket
into his loop, and goes back to the thread-pool, availible to do
something more usefull than waiting for an incoming connection.
> proto_emu_ctx_new con 0x7fa7fc006ab0 ctx 0x7fa7fc0df8b0
> proto_emu_accept_established con 0x7fa7fc006ab0 ctx 0x7fa7fc0df8b0
The main-event-loop got action on the socket.
We accept the new connection and continue in a thread
> hook_connection_accept_cb con 0x7fa7fc006ab0
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> recv connection state connected
> polling for io in ...
> async_connection_io_in data 0x7fa7fc006ab0
The shellcode calls recv, which is blocking …, same story, thread
goes back to pool, after asking the mainloop to watch the socket for
action.
> proto_emu_io_in con 0x7fa7fc006ab0 ctx 0x7fa7fc00acf0 data 0x7fa7fc007580 size 380
main-event-loop reported action, data is received and stored in a
buffer, thread from pool is requested
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> data availible (380 bytes)
Now, we can 'recv', as we have data in our buffer, waiting to get used
by the shellcode.
> user_hook_send env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4540 ...
Shellcode wants to send something, thread goes back to the pool,
waiting for the mainloop to signal the data was sent
> proto_emu_io_out con 0x7fa7fc006ab0 ctx 0x7fa7fc00acf0
Data was sent, thread is requested
> user_hook_CreateFile env 0x7fa7fc076360 emu_env_hook 0x7fa7fc099a50 ...
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> recv connection state connected
> polling for io in ...
> async_connection_io_in data 0x7fa7fc006ab0
Shellcode creates a file, and wants to receive data, we have nothing
in our buffer left, so the thread goes back to the pool, after asking
the main-event-loop to look for action on the socket.
> proto_emu_io_in con 0x7fa7fc006ab0 ctx 0x7fa7fc00acf0 data 0x7fa7fc00f420 size 1024
we got action, received 1024 bytes, requested a thread
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> data availible (1024 bytes)
> user_hook_WriteFile env 0x7fa7fc076360 emu_env_hook 0x7fa7fc09ce10 ...
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> data availible (512 bytes)
> user_hook_WriteFile env 0x7fa7fc076360 emu_env_hook 0x7fa7fc09ce10 ...
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> recv connection state connected
> polling for io in ...
> async_connection_io_in data 0x7fa7fc006ab0
Again, shellcode wants to receive data …, it is getting boring
writing the same over and over …
> proto_emu_io_in con 0x7fa7fc006ab0 ctx 0x7fa7fc00acf0 data 0x7fa7fc011c40 size 4096
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> data availible (4096 bytes)
> user_hook_WriteFile env 0x7fa7fc076360 emu_env_hook 0x7fa7fc09ce10 ...
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> data availible (3584 bytes)
> user_hook_WriteFile env 0x7fa7fc076360 emu_env_hook 0x7fa7fc09ce10 ...
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> data availible (3072 bytes)
> user_hook_WriteFile env 0x7fa7fc076360 emu_env_hook 0x7fa7fc09ce10 ...
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> data availible (2560 bytes)
> user_hook_WriteFile env 0x7fa7fc076360 emu_env_hook 0x7fa7fc09ce10 ...
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> data availible (2048 bytes)
> user_hook_WriteFile env 0x7fa7fc076360 emu_env_hook 0x7fa7fc09ce10 ...
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> data availible (1536 bytes)
> user_hook_WriteFile env 0x7fa7fc076360 emu_env_hook 0x7fa7fc09ce10 ...
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> data availible (1024 bytes)
> user_hook_WriteFile env 0x7fa7fc076360 emu_env_hook 0x7fa7fc09ce10 ...
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> data availible (512 bytes)
> user_hook_WriteFile env 0x7fa7fc076360 emu_env_hook 0x7fa7fc09ce10 ...
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> recv connection state connected
> polling for io in ...
> async_connection_io_in data 0x7fa7fc006ab0
Shellcode wants to receive again, no data in the buffer, thread goes
back into the pool, socket is put into the main-event-loop.
> proto_emu_disconnect con 0x7fa7fc006ab0 ctx 0x7fa7fc00acf0
Action!, the connection got disconnected, request thread
> ll_win_hook_recv env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4510 ...
> recv connection state close
return '0' for recv, to indicate the shellcode the connection was closed.
> user_hook_CloseHandle env 0x7fa7fc076360 emu_env_hook 0x7fa7fc099870 ...
> user_hook_CreateProcess env 0x7fa7fc076360 emu_env_hook 0x7fa7fc099b80 ...
> user_hook_WaitForSingleObject env 0x7fa7fc076360 emu_env_hook 0x7fa7fc09cd00 ...
> user_hook_CloseHandle env 0x7fa7fc076360 emu_env_hook 0x7fa7fc099870 ...
> user_hook_close env 0x7fa7fc076360 emu_env_hook 0x7fa7fc0d4440 ...
Shelllcode closes all sockets, creates a thread to execute the downloaded file.
Detection using emulation is faster than using pattern matching.
Currently detection rate is great, of course there are things libemu
can not detect, but there are more things where pattern matching is
meant to fail.
For pattern matching, you have to create a pattern for every
shellcode, and even worse, assembly can translate to different machine
code bytes, doing the same, looking different.
Today I've had some shellcode revealing a bug in libemu's detection, I
figured out what was wrong and fixed it, this fix will work for all
other shellcodes as well, for pattern matching you need a pattern for
every shellcode.
So emulation is a more generic approach to detect and deal with shellcode.
In short
shellcode is run in a thread from thread pool
once there is network io involved, the thread takes care of getting the socket into the global even loop, and goes back into the pool
action on the socket will queue the shellcode-emulation-job into the pool for a worker thread from the pool, and a thread from the pool will finally continue to run the shellcode
This shellcode somehow managed to have two threads running his emulation, and I was wondering why.
So I had a look on the libemu hooked win32 api functions which were able to queue shellcode into the thread pool, this turned out to be
Stumbling upon this comment
/**
* Callback for protocol.disconnect
* This callback can only occur if the connection is waiting for io,
* like recv.
* Therefore we can continue the emulation
*
I was glad the emu/hooks.c file is the best commented file, and I wondered if this assumption was wrong and may cause problems.
So I had a look on the logfile where I got the first hint:
hooks hooks.c:225-debug: proto_emu_io_in con 0x91329e0 ctx 0x90abe10 data 0x91a6070 size 1275
connection connection.c:1581-debug: connection_stats_accounting_limit_exceeded stats 0x9132eac
hooks hooks.c:888-debug: ll_win_hook_recv env 0xb505dc28 emu_env_hook 0x91b3330 ...
hooks hooks.c:934-debug: data availible (1275 bytes)
hooks hooks.c:1469-debug: user_hook__lwrite env 0xb505dc28 emu_env_hook 0xb5079570 ...
hooks hooks.c:888-debug: ll_win_hook_recv env 0xb505dc28 emu_env_hook 0x91b3330 ...
hooks hooks.c:934-debug: data availible (1020 bytes)
hooks hooks.c:1469-debug: user_hook__lwrite env 0xb505dc28 emu_env_hook 0xb5079570 ...
hooks hooks.c:888-debug: ll_win_hook_recv env 0xb505dc28 emu_env_hook 0x91b3330 ...
hooks hooks.c:934-debug: data availible (765 bytes)
hooks hooks.c:1469-debug: user_hook__lwrite env 0xb505dc28 emu_env_hook 0xb5079570 ...
hooks hooks.c:888-debug: ll_win_hook_recv env 0xb505dc28 emu_env_hook 0x91b3330 ...
hooks hooks.c:934-debug: data availible (510 bytes)
hooks hooks.c:1469-debug: user_hook__lwrite env 0xb505dc28 emu_env_hook 0xb5079570 ...
hooks hooks.c:888-debug: ll_win_hook_recv env 0xb505dc28 emu_env_hook 0x91b3330 ...
hooks hooks.c:934-debug: data availible (255 bytes)
the shellcode would receive 1275 bytes, and drain this buffer
hooks hooks.c:1469-debug: user_hook__lwrite env 0xb505dc28 emu_env_hook 0xb5079570 ...
hooks hooks.c:888-debug: ll_win_hook_recv env 0xb505dc28 emu_env_hook 0x91b3330 ...
hooks hooks.c:950-debug: recv connection state established
hooks hooks.c:975-debug: polling for io in ...
the shellcode tries to read more data, but there is no data buffered, so we poll for it
hooks hooks.c:833-debug: async_connection_io_in data 0x91329e0
thread threads.c:52-debug: Thread fn 0xb6effa4d con (nil) data 0xb505dbf0 took 0.000155 ms
hooks hooks.c:843-warning: at 118.046510
hooks hooks.c:844-warning: repeat 118.260707
connection connection.c:1282-debug: connection_sustain_timeout_set 0x91329e0 118.046510
connection connection.c:1915-debug: connection_tcp_io_in_cb con 0x91329e0
connection connection.c:1929-debug: can recv 16384 bytes
connection connection.c:1934-debug: io_in: throttle can 16384 want 16384
connection connection.c:1684-debug: connection_throttle_update con 0x91329e0 thr 0x9132ebc bytes 706
connection connection.c:1962-debug: remote closed connection
hooks hooks.c:225-debug: proto_emu_io_in con 0x91329e0 ctx 0x90abe10 data 0x91a6070 size 706
now, there is data, emulation of the shellcode continues
connection connection.c:2069-debug: connection_tcp_disconnect con 0x91329e0
connection connection.c:3861-message: connection 0x91329e0 accept/tcp/established [1.2.3.4:9988->6.7.8.9:31817] state: established->close
hooks hooks.c:888-debug: ll_win_hook_recv env 0xb505dc28 emu_env_hook 0x91b3330 ...
connection connection.c:1145-debug: connection_disconnect con 0x91329e0
hooks hooks.c:934-debug: data availible (706 bytes)
hooks hooks.c:1469-debug: user_hook__lwrite env 0xb505dc28 emu_env_hook 0xb5079570 ...
hooks hooks.c:888-debug: ll_win_hook_recv env 0xb505dc28 emu_env_hook 0x91b3330 ...
hooks hooks.c:934-debug: data availible (451 bytes)
hooks hooks.c:281-debug: proto_emu_disconnect con 0x91329e0 ctx 0x90abe10
and, as this is the last chunk of data, the shellcode gets continued again by the disconnect callback
Ouch, this was bad, but given this knowledge I was able to modify the timing in the code using the good old sleep(1);, which finally allowed me to reproduce this.
But why would this happen, this has not happend before …, and the shellcode was not new or even special.
I noticed these lines in the log:
connection connection.c:1929-debug: can recv 16384 bytes
connection connection.c:1934-debug: io_in: throttle can 16384 want 16384
connection connection.c:1684-debug: connection_throttle_update con 0x91329e0 thr 0x9132ebc bytes 706
the logfile claimed dionaea was able to receive 16kbyte, but only received 706 bytes instead.
I remembered messing with this part of code during some adjustments for OpenBSD compatibility.
Previously dionaea would query the kernel to get an idea how much data was available on the socket to receive, so it could consume all the data in a single operation.
As OpenBSD lacks this very nice SIOCINQ feature, I decided to go for 16kbyte by default on OpenBSD, if there is less data, we'll receive less and nobody is harmed.
After adding the required autoconf lines to detect the availibility of SIOCINQ I forgot to include “config.h” into connection.c, and linux defaulted to a 16kbyte buffer too.
So, the initial code was like this:
int size, buf_size;
/* determine how many bytes we can recv */
if( ioctl(con->socket, SIOCINQ, &buf_size) != 0 )
buf_size=1024;
unsigned char buf[buf_size];
GString *new_in = g_string_sized_new(buf_size);
while( (size = recv(con->socket, buf, recv_size, 0)) > 0 )
{
g_string_append_len(new_in, (gchar *)buf, size);
recv_size -= size;
if( recv_size <= 0 )
break;
}
// append
g_string_append_len(con->transport.tcp.io_in, new_in->str, new_in->len);
if( size==0 )
{
g_debug("remote closed connection");
if( new_in->len > 0 )
con->protocol.io_in(con, con->protocol.ctx, (unsigned char *)con->transport.tcp.io_in->str, con->transport.tcp.io_in->len);
connection_tcp_disconnect(con);
}
and got modified into this:
int size, buf_size;
buf_size = 16*1024;
unsigned char buf[buf_size];
GString *new_in = g_string_sized_new(buf_size);
while( (size = recv(con->socket, buf, recv_size, 0)) > 0 )
{
g_string_append_len(new_in, (gchar *)buf, size);
recv_size -= size;
if( recv_size <= 0 )
break;
}
// append
g_string_append_len(con->transport.tcp.io_in, new_in->str, new_in->len);
if( size==0 )
{
g_debug("remote closed connection");
if( new_in->len > 0 )
con->protocol.io_in(con, con->protocol.ctx, (unsigned char *)con->transport.tcp.io_in->str, con->transport.tcp.io_in->len);
connection_tcp_disconnect(con);
}
The difference is, in case of a 16kbyte buffer and less than 16kbyte available on the socket, the while loop would run again, if the remote host closed the connection in the meantime, as he was done sending his data, size would be 0 and we would run the io_in and disconnect callbacks on the underlying procotol.
A SIOCINQ sized buffer would not have received the EOF in the same iteration as data.
In case of the emu protocol which is used to emulate network io, this caused trouble.
The shellcode would be queued twice in the thread pool, once for io_in and once for disconnect.
After the first thread ran the code and the shellcode emulation finished, the thread would release the mutex on the data and queue the context for free, then the second thread would pick up, and try to access free'd data.
The fix …
The old&unrelated memory leak I found was in the curl module when uploading files.
to all those who sent in traces & bistreams which allowed me to understand and fix this bug.