Kernel 2.6.38 introduced an API to access the kernel crypto API from userspace. While there was a port of BSD's cryptodev for linux which basically provides the same functionality, the cryptodev code never made it into the mainline of the kernel.
Accessing the kernels crypto API from userspace allows making use of crypto hardware, which can't be accessed from userspace directly. Hardware accelerated cryptography as provided by VIA Padlock1) and Intel AES-NI2) can be accessed from userspace directly, so you do not need AF_ALG at all, but AMD Geode processors AES cryptography is - contrary to Padlock and AES-NI - not an instruction3) and therefore can't be accessed from userspace.
You may be interested in the discussion of af_alg vs cryptodev performance 4) and the raw numbers and benchmarking software too
5).
As I own AMD Geode powered hardware (ALIX), I decided to play with AF_ALG.
For a smooth start and a kernel 2.6.38 I installed Ubuntu 11.04 Beta2, and wrote a plugin for openssl with uses the AF_ALG engine for AES cryptography.
You can grab the code for the plugin here, refer to the README for instructions to compile & install and enable the engine by default.
OpenSSH >= 5.4p1 honors the openssl.cnf, for OpenVPN you can specify the -engine paramter to choose af_alg.
On my desktop, which lacks crypto hardware acceleration, the plugin does not provide any value, as it slows down cryptography overall:
| | type | 16 bytes | 64 bytes | 256 bytes | 1024 bytes | 8192 bytes |
| SW | aes-128-cbc | 114159.97k | 161353.93k | 179392.98k | 184250.03k | 185925.63k |
| AF | aes-128-cbc | 9052.54k | 30872.41k | 75081.98k | 114684.93k | 136803.67k |
| SW | aes-192-cbc | 101844.31k | 138453.91k | 151127.38k | 154833.92k | 156150.44k |
| AF | aes-192-cbc | 8887.01k | 29380.39k | 70381.14k | 104782.51k | 122953.73k |
| SW | aes-256-cbc | 92980.04k | 122068.95k | 131564.20k | 134272.00k | 135140.69k |
| AF | aes-256-cbc | 8997.12k | 28859.43k | 65344.34k | 93743.10k | 109021.87k |
on my ALIX with AMD Geode processor things look different:
| | type | 16 bytes | 64 bytes | 256 bytes | 1024 bytes | 8192 bytes |
| SW | aes-128-cbc | 5740.48k | 7370.72k | 7986.28k | 8141.89k | 8197.45k |
| AF | aes-128-cbc | 531.45k | 2023.54k | 7248.08k | 18955.76k | 41894.75k |
| SW | aes-192-cbc | 5073.19k | 6314.20k | 6756.99k | 6905.86k | 6937.57k |
| AF | aes-192-cbc | 541.98k | 1745.74k | 4059.81k | 5969.25k | 7003.41k |
| SW | aes-256-cbc | 4608.07k | 5578.57k | 5900.87k | 6000.83k | 6030.18k |
| AF | aes-256-cbc | 536.58k | 1690.61k | 3786.41k | 5417.83k | 6229.84k |
Tests were run using:
openssl speed -evp aes-XXX-cbc -elapsed [-engine af_alg].
AES-128-CBC performance increases by 500% for 8192 byte blocks and by 100% for 1024 bytes, all other suffer from the same performance impact as my desktop as the Geode CPU only supports AES-128-CBC and the penalty for using AF_ALG is larger than the gain for small block sizes.
Despite the overhead of AF_ALG it is faster for aes-256-cbc on 8192 byte blocks than openssl version, I assume the kernel aes-256-cbc code is slightly faster than the openssl code.
I just added support for the SHA1 digest, as I lack silicon which supports calculating the sha1 (or any other digest) in hardware, all I can come up with is numbers which show the decrease in performance when using the af_alg plugin on my desktop.
| | type | 16 bytes | 64 bytes | 256 bytes | 1024 bytes | 8192 bytes |
| SW | sha1 | 36312.51k | 108659.26k | 239334.14k | 338129.58k | 386804.39k |
| AF | sha1 | 1762.78k | 6886.23k | 25046.27k | 69683.20k | 160432.13k |
So, if you have hardware which supports calculating SHA1 digests in hardware and want to add your numbers, mail me or use the comments to post the results of:
openssl speed sha1 -elapsed [-engine af_alg]
Cool! Looking forward to using your stuff on my own ALIX. Hope your patch is included into OpenSSL soon.