All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Force kernel AES-NI usage on a VPS without the aes CPU flag
First of all, thanks to @rm_ for his brilliant blog post on forcing OpenSSL to use the AES-NI instruction set when the CPU of a VPS does not report its existence while it is actually supported. This is a counterpart that forces the Linux kernel to use AES-NI when QEMU does not pass through that flag, which is useful for IPSec, disk encryption, etc.
It turns out to be fairly simple with a kernel module. Just shove these two lines into any hello world boilerplate that you can find in a "how to write Linux kernel modules" tutorial.
#include <linux/bitops.h>
set_bit(153, (unsigned long *)(boot_cpu_data.x86_capability));
The magic number 153 is taken from arch/x86/include/asm/cpufeatures.h
. It is trivial to enforce the usage of another CPU feature (e.g., AVX) with another magic number.
After inserting your own module, manually modprobe aesni_intel
should do the trick.
On one of my KVM servers, the result of cryptsetup benchmark
increased from
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 169.8 MiB/s 167.3 MiB/s
... to ...
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 678.2 MiB/s 2201.4 MiB/s
Comments
Just curious -- would your VPS get nuked if you did this on a node that doesn't have AES-NI support?
@psb777 Cheers! Added a link to this post to my original one.
Your VPS kernel would just crash with an "Invalid opcode" exception.
... and then you’d have to go into the Rescue Disk of Shame.
The fact that the vCPU passthrough exposes these at the KVM level makes me a bit unhappy about the state of KVM. I had an f00f flashback for a moment- there has to be some magical featureset you can hit to crash the upstream kvm kernel module. Someone will find it soon enough.
I entered rescue disk of shame into google image search, see what I found
You can expect anything with optinal codepaths depending on this flag to crash for use of illegal instructions and if said stuff is in the kernel it might spell reboot time (not that a constantly crashing ssh daemon would'nt cause the same).
VMX too? @WSS going to try and make a vps a rootserver by adding AES and VMX? would be fun to see if this would work...
@Falzo You're the German- go for it.
Sums it up pretty well :I
I'm a little curious what you get with your custom module doing this after your assertion:
I'd play around a bit, myself, but I'm too damn lazy to crash my own shit.
It won't change the output of the cpuid instruction, and it won't modify
/proc/cpuinfo
either...By the way, as OpenSSL directly calls the cpuid instruction to check the availability of AES-NI, you still need @rm_'s OpenSSL trick to force your userspace program to use it.
So, it just sets it as an active bit which may or may not actually do anything for most software, then?
So, anything in userspace still has to be forced to run code that may, or may not execute properly, and there's no change actually set in the subkernel, other than it works? So, how does flipping the bit do anything at all, other than setup a path for at least this instance to follow any code which may work with the AES-NI subset- and how the hell does it do that when you can't test for it?
I get the override for OpenSSL, I'm just wondering how/where this might actually be useful in common utilization, like speeding up ffmpeg, et al..
Right, it won't do anything for most software. It just enables the kernel to load the
aesni_intel
module, which provides accelerated AES functions that the kernel alone uses. This is possibly only useful for IPSec and dm-crypt where en-/decryption is done in the kernel space.That makes sense. I'm assuming that there's something in the modulespace that sets that bit for only modules, then, so my snippit above would likely assert true where anywhere outside of the module-level ring, it'd just be ignored as you suggested.
Interesting find. You're more bored than I am!
There is an interesting set of patch on lkml about emulating CPUID through a flag in MSR. Although these patches did add support for the ENABLES_CPUID_FAULT bit in KVM-emulated MSR, they were merged into mainline only since 4.12. Ideally with this patch we can make most userspace software automatically detects and uses AES-NI or other fancy features.
Well, this is good to know going to have a nose around at this for other CPU instructions as well. Maybe I can finally get some good performance out of some of my VPS' at hosts that refuse to passthrough.
any help for crypto mining?
shoo.
Does that actually work though? AES-NI doesn't introduce new registers, so it should be safe as long as the underlying CPU supports it, but AVX does change register definitions.
ffmpeg does have a
-cpuflags
flag which you could use.Good point. I suppose the host machine can handle YMM registers or whatnot during context switches (regardless AVX support in its KVM's), but I'm not sure if the guest kernel can handle them. Probably not, as the xsave flag is also missing. But you control the guest kernel, so you can always do extra hacks to make it work.
Theoretically if the KVM has only one CPU and no userspace program is using AVX (which may or may not be the case), I think it is safe to assume nobody is going to touch your YMM registers and thus one kernel module can hold exclusive usage of those exotic registers...
cryptsetup benchmark improved like 6x faster, but openvpn is still slow af (aes-128-cbc)
why?
Didn't know @rm_ had a blog - thanks for that link.
I don't think a virtualized kernel will protect you from something like that. IANAVE (I am not a virtualization engineer) but ultimately the CPU is going to execute an x86 opcode, no? So if there's a bug due to opcodes, you're going to see it regardless.
KVM doesn't evaluate and wrap every opcode, does it? Things like bochs are "virtualized processors" while other more performant virtualization methods are more like "virtualized environments" but ultimately it's still opcodes on the metal.
Not as such, which is what speeds up the emulation, when it's a passthrough, like using svm_amd/kvm_intel. That said I haven't audited (or understand) all of the code, so I'm sure not running in Ring 0 will do quite a bit to help keep this from being an issue, but the fact that you can arbitrarily set this in the CPU from the module level (I assumed there'd be more [e.g. preload for microcode]) makes me question just how difficult the next nasty crash issue might be- but you still need to load modules to get it to that sublayer- so far.
This method only forces the Linux kernel to use AES-NI. It applies to VPNs that use the kernel IPSec stack, but not OpenVPN.
OpenVPN does encryption in the userspace, and it uses OpenSSL by default. So you should check out this blog post to get a boost.
yes i already read that blog entry and added
export OPENSSL_ia32cap="+0x200000200000000"
to /etc/init.d/openvpn and rebooted, but it didn't work.
I have just ran a series of tests with openvpn and iperf
See, there's a performance boost, despite being a puny one.
Edit: here's the result for IPSec
I'd say this is more a measuring inaccuracy than a real boost.
It turns out I'm not too sharp at adapting the linked tutorial...
I'd appreciate a pointer and once I've got it working I'll write it up as a full walk through that even someone like myself could follow
There's a reason why it wasn't.
I knew you'd be along shortly @WSS but don't be so discouraging :P