Your Intel x86 CPU is Deeply Flawed (Meltdown/Spectre)

perennate · January 2018

Maounique said: I can also assume KVM can implement a filter so the necessary instructions to trigger this bug will not pass or be emulated and filtered only, effectively filtering out this and possible future attacks at a much smaller performance cost, still keeping the out of order benefits of the older cpus without applying the updates.

I don't believe there is any way for software to enforce a filter like what you are describing, without emulating each instruction (which would incur an order of magnitude performance decrease). KVM and other hypervisors use hardware features to isolate guests, if the hardware doesn't support some form of isolation then it's not possible to implement it.

Maounique said: TL;DR-in KVM select a CPU without out of order features and you are safe without applying any patch, at least in theory.

Branch prediction has been around since before Pentium 4 :P I'm not sure what level of speculative execution the attacks depend on, but I doubt that the CPU can support some mode that would prevent the attacks. I don't know enough about how CPU passthrough works to say why though (although these passthrough modes do use some hardware feature, I think VT-d).

Edit: from https://wiki.openstack.org/wiki/LibvirtXMLCPUModel it actually sounds like all this does is tell the guest some metadata about what the CPU does and does not support. But the guest is free to use additional features. In fact the document says that you could tell the guest that the CPU supports more features than it does, which would cause the guest to kernel panic.

Edit2: actually usually out-of-order execution refers to the CPU reordering instructions to minimize latency between instructions because of units in the CPU being occupied. For example, if you have (1) a := x + y, (2) b := a + z, (3) c := m + n, then it might be faster to execute (1), then (3), then (2) because (2) depends on the output of (1), whereas (3) is independent from the other two. The CPU can detect these "data dependencies" and reschedule instructions. I didn't think the exploits actually rely on this reordering to work but i could be wrong.

Branch prediction only occurs when the CPU encounters a conditional jump instruction (e.g. jmpz, jne). These instructions can either continue to the next instruction or jump to a specified location in the code, depending on e.g. whether a specified register is zero or not. Here we say there are two possible branches that the program may follow: the next instruction or the far away one. Oftentimes one of these branches ends up being followed much more often than the other one. The CPU can achieve SIGNIFICANT performance improvements if it follows the branch that it thinks is more likely to be eventually chosen.

This leaves a question -- why is it faster, if the CPU is executing one instruction at a time? The CPU should be able to simply wait until the branch instruction gets executed, and then decide based on the current state of the specified register which branch to take. In reality, though, the CPU can execute tens or hundreds of instructions in parallel. If the CPU waits until the branch to make the decision, then right before the branch it can only be executing one instruction at a time. If it predicts which branch will be taken, though, then it can continue executing many instructions in parallel as it can read instructions from the predicted branch location. Of course, if the branch prediction ends up being incorrect (CPU thought it would continue executing the for loop but really the for loop terminated), it has to undo any changes made by these extra instructions.

ANYWAY, branch prediction, out-of-order execution, and parallel instruction execution are all things that have been around before Pentium 4.

Maounique · January 2018

mksh said: It's how the host cpu handles the virtualized instructions and you can't control that.

You can control that by filtering. If you insert another layer between the attacker and the vulnerable CPU, some instructions are virtualized and could be something necessary for the exploit to run, remember this needs specific ways to access the CPU, filter that out and it wont work. There is one thing to craft your code to do this exactly as you like it, and another to have parts of it going through a filter and executed as a set of completely different instructions which achieve the same thing in the emulated cpu, but not in the real one, it will be like trying to read the memory through the emulated cpu and it will fail.

LjL · January 2018

@bsdguy said:
Oh, btw, before I forget it again: This vulnerability is known since more than half a year.

But it has reached hype and very serious concerns level only now. Why? I guess the answer is very ugly and frightening.

Isn't that because it was communicated privately to Intel and other big players when it was discovered and embargoed, like most serious vulnerabilities when they are found by actors that follow industry standards?

perennate · January 2018

Maounique said: You can control that by filtering.

@mksh means you can't control that, unless you want to emulate each instruction, which again would incur 10x+ performance penalty. Even when you run JS in your browser it is not emulating each line of JS code, modern browsers compile it to native code and run it in the same high-level way as running a VM or any process. But with JS, you control the compiler, so you can actually do what you're saying about filtering instructions; but with a VM, the user can e.g. create a file in the VM with arbitrary content (arbitrary instructions) and then execute it, there is no way to filter it without emulation.

Maounique · January 2018

LjL said: Isn't that because it was communicated privately to Intel and other big players when it was discovered and embargoed,

I think the whole industry knew this can and will happen, just that the fix would have been to come up with another method to do pseudo-execution of the code in otherwise idle time waiting for data, something that will yield usable results without reading usable data, which is more or less impossible.
Put it on the backburner, future implementations will solve this issue naturally... Well, it didnt and now action is unavoidable.

LjL · January 2018

@LjL said:
I posted an inquiry about any plan of action from my host Scaleway

I guess this email I just got with subject "Scaleway - Emergency security update required on all hypervisors" and linking to https://blog.online.net/2018/01/03/important-note-about-the-security-flaw-impacting-arm-intel-hardware/ kinda answers my inquiry. They do express dissatisfaction with their communication with Intel on the matter.

Maounique · January 2018

perennate said: but with a VM, the user can e.g. create a file in the VM with arbitrary content (arbitrary instructions) and then execute it, there is no way to filter it without emulation.

I understand this, I presume the user runs the exploit and intends to read the memory which does not belong to his process, i.e. the code is as nasty as it can be, I also understand a full emulation will be extremely costly, I do not say that.
The idea is that, for every type of CPU the hypervisor will pass most of the instructions unchanged and will only emulate those that are not supported on a LESSER CPU model, while, if the CPU selected in the hypervisor is a lesser version of the real one, all code will be executed as that kind of CPU, will be passed entirely unmodified as the guest os will refuse to run anything it knows the CPU will not support.
I presume an attacker can craft a kernel for the guest to pass on instructions not supported by the CPU presented to it somehow, but the emulated CPU in the hypervisor will not actually execute it, nor pass it unmodified.

bsdguy · January 2018

@LjL said:

@bsdguy said:
Oh, btw, before I forget it again: This vulnerability is known since more than half a year.

But it has reached hype and very serious concerns level only now. Why? I guess the answer is very ugly and frightening.

Isn't that because it was communicated privately to Intel and other big players when it was discovered and embargoed, like most serious vulnerabilities when they are found by actors that follow industry standards?

So? What's the difference? Either way the relevant players were informed early - about half a year ago - and did ... not much. Until now.

@Maounique

Virtualization is a complex field and there are many misconceptions.

Todays kvm does not emulate the cpu if host and guest are the same architecture. One major reason is that cpu emulation brings severe performance loss. What it (and some others) does is that it allows one to define a virtual cpu model. Only the differences are handled by kvm (the kernel module).

Note, though, that that comes down to privilege level switches as kvm is handling anything (e.g. some special instructions) that the virtual cpu doesn't or is forbidden to do. It's important to understand that the guest code still runs on the host cpu. Also note that one classical mechanism to do that is processor traps/exceptions (which is why I mentioned those as important in the current fuckup).

Of course, both the implicit executable rewrite as well as the much increased switching between the guest and the hypervisor considerably increase performance costs.

Btw, even a fully virtualized cpu would not be secure. Reason: the virtualization, typ. in qemu, would just be an additional barrier, however a smart Eve could find out about (it's documented) and construct guest instructions so that they would end up being the desired host instructions. That said, I think that's irrelevant anyway because, again, that would be prohibitively slow and you's be very hard pressed to find provider running his virtualization that way.

For practical purposes we can say that x86 guests on x86 hosts will virtually always run on the physical host cpu.
Secondly, we can state - which explains the "secure" xen mode - that a true hypervisor, i.e. a bare hypervisor as opposed to some host kernel module based hypervisor is safer wrt to the attack group we're discussing here. Note, however, that this a just the current state and not somehow intrinsic, i.e. it might change because the underlying problem is independent of OS or hypervisor; it just so happened that kernel based hypervisors were the preferred target (almost certainly because the target field is much, much larger than the rel. small field of bare hypervisors.

Finally it's noteworthy to mention that kvm (and most other virt.) guests are basically just threads. If a provider provisions a kvm machine with x vCores then the guest will be a virtual machine that in effect is x threads on the host.

Maounique · January 2018

Yes, as a guest os I see a cpu and I will execute code I know it can support, An attacker can force me to execute code which I know it will not run, but the hypervisor will not pass it unchanged. The idea that an attacker can craft a code to use regular core2due instructions and somehow assemble them in a way to trigger the exploit on the host cpu on a thread together with other stuff and possibly interrupted and running another sequence which changes the memory int he meantime, and still be able to get anything usable is a bit far fetched, albeit, with extremely accurate and timed knowledgeable attacks, it will not be impossible, probably, but impractical and any exploits will be probably automatic run by scripters and the situation in the field in a guest os will unlikely match the lab conditions in which the exploit was crafted.

TL;DR low hanging fruit. Virtualization does significantly complicates the issue, making it not worth it beyond a somewhat forced PoC.

mksh · January 2018

@bsdguy said:
Btw, even a fully virtualized cpu would not be secure. Reason: the virtualization, typ. in qemu, would just be an additional barrier, however a smart Eve could find out about (it's documented) and construct guest instructions so that they would end up being the desired host instructions. That said, I think that's irrelevant anyway because, again, that would be prohibitively slow and you's be very hard pressed to find provider running his virtualization that way.

That is a very valid point. Admittedly it would allow for something like the proposed filter to run though. I'm deliberatly saying run not work here because on one hand it would impose even more (probably drastic) performance loss than the CPU emulation alone and on the other hand be very hard to actually trust since as i understand it there are quite a lot possibilities to trigger the bug with many of them being perfectly harmless code if not abused. Trying to catch all the bad apples seems like a very tough task. Probably so hard i'd be very sceptical if someone came out and claimed he got it all covered.

perennate · January 2018

Maounique said: The idea is that, for every type of CPU the hypervisor will pass most of the instructions unchanged and will only emulate those that are not supported on a LESSER CPU model, while, if the CPU selected in the hypervisor is a lesser version of the real one, all code will be executed as that kind of CPU, will be passed entirely unmodified as the guest os will refuse to run anything it knows the CPU will not support.

@bsdguy answered this, but put another way, virtualization is black-and-white: you can either run the VM on the CPU, or emulate each instruction. There is no in-between that allows you to efficiently filter each instruction before it runs.

~~Maybe this example will help to see this: suppose you were writing a hypervisor in C. You are given a *char containing the VM kernel, i.e. it's an application binary.~~

To implement the "run VM on CPU", you can execute an instruction that tells the CPU to jump to the beginning of the *char, but after the jump your hypervisor no longer has any control over the code that the kernel can execute.

~~To implement the "emulate each instruction", you can loop through the instructions in the binary, maybe do something like:~~

for(instruction in binary) { if(instruction is bad) continue; else execute instruction. }

~~But the if statement is already going to be 10 instructions, thus 10x performance decrease (probably much more but idk).~~

These approaches are fundamentally different. In the first, we are letting the CPU run the entire binary (but we can use CPU modes and other hardware features to restrict what the binary can do). In the second, we only let the CPU run one instruction at a time. So there is not really a middle ground.

Edit: apparently you can do this, see @bsdguy's post below o.o

perennate · January 2018

Maounique said: Yes, as a guest os I see a cpu and I will execute code I know it can support, An attacker can force me to execute code which I know it will not run, but the hypervisor will not pass it unchanged. The idea that an attacker can craft a code to use regular core2due instructions and somehow assemble them in a way to trigger the exploit on the host cpu on a thread together with other stuff and possibly interrupted and running another sequence which changes the memory int he meantime, and still be able to get anything usable is a bit far fetched, albeit, with extremely accurate and timed knowledgeable attacks, it will not be impossible, probably, but impractical and any exploits will be probably automatic run by scripters and the situation in the field in a guest os will unlikely match the lab conditions in which the exploit was crafted.

TL;DR low hanging fruit. Virtualization does significantly complicates the issue, making it not worth it beyond a somewhat forced PoC.

Well, can't argue that it isn't far fetched :P the whole three-phase attack thing to read host RAM from KVM guest sounded pretty complicated. But, would you really be willing to risk it?

Edit: BTW, if you want to test the performance drop of doing something like filtering it's easy, just run qemu without kvm.

bsdguy · January 2018

@Maounique

Yes and no. For one qemu and friends are well documented. Moreover, you make the error of focusing only the given scenario. However, who and what tells, let alone guarantees, you that the qemu is the real thing and not a changed and infiltrated one? After all, that attack class has hundreds and hundreds of working incarnations ...

And No, virtualization does not "significantly complicate the issue". It's just complicating right now in the current stage because some few hypervisors simply haven't been researched yet.

Again, keep in mind that what we're talking about is happening way below the radar of a hypervisor or OS. You would also be well advised to not simply ignore my repeated hint at bypassed processor exceptions (which are, in fact, one of the very few ways for a hypervisor to know about the attempts).

Btw, I think it is largely a coincidence that some researchers happened to develop an interest to have yet another and this time a deeper look at that mine field. I think we would be very ill advised to assume that nobody else (think: state players like nsa) haven't done some research, too and considering their means and resources I do not at all feel safe and cool.

mksh · January 2018

@Maounique said:
Yes, as a guest os I see a cpu and I will execute code I know it can support, An attacker can force me to execute code which I know it will not run, but the hypervisor will not pass it unchanged. The idea that an attacker can craft a code to use regular core2due instructions

Not sure why you think the instruction set matters that much. It's hardly about particular instructions but how they are executed. Besides as @perennate said above cpu capabilities aren't enforced. It's merely a hint to the virtualized OS about what's available. You are free to run code that uses instructions not officialy supported and it will be fine as long as the host CPU supports them. Executing instruction is nothing more than pointing IP at a bunch of bytes. Hell, you can jump into the middle of some multibyte instruction having it gain a completly different meaning and it's perfectly legal code.

TL;DR low hanging fruit. Virtualization does significantly complicates the issue, making it not worth it beyond a somewhat forced PoC.

I wouldn't be so sure about the significantly part. Automation is key and as long as you can sanity check your result what hinders you to just throw a million tries per second at it. Sorry, no offense but it seems underestimate exploit writers alot and if this has any chance of taking control of the host node it is very very juicy. Expect some really smart guys to invest tons of time into making this work.

Side question: Do you have any hands on experience working with machine code?

Edit: Clarity.

perennate · January 2018

Does anyone know of any performance benchmarks specifically testing virtualization software?

Also @bsdguy do you think that virtio and similar features could avoid some of the overhead you mentioned? I don't know much about how that works.

bsdguy · January 2018

@mksh

@perennate said:
@bsdguy answered this, but put another way, virtualization is black-and-white: you can either run the VM on the CPU, or emulate each instruction. There is no in-between that allows you to efficiently filter each instruction before it runs.

Sorry, No. In fact, that is what some emulators do (more or less well), that's how it works. It's, however, done on a lower level, it's the binary that is "filtered" and where forbidden (or not existing on a given guest architecture) binary instructions are trapped out (deviated to the hypervisor). And btw. modern cpus also translate the binaries; that's (well, something very similar) what microcode was mainly about.

However, there are obviously two problems. One is that that mechanism considerably slows down things (those deviations cause a privilege level switch). And the second one is that we must rely on and trust that the software (e.g. kqemu) does it well and correctly, which is a problem in itself because software by its very nature can be hacked and changed easier than hardware. That's why I brought that problem up also above. The whole thing is based on an assumption set (my hypervisor software does it and does it correctly and it is the correct software and not a hacked version) that may or may not hold true.

Summary: It is strongly preferable to have properly working cpus over emulating and filtering in software.

bsdguy · January 2018

@perennate said:
Also @bsdguy do you think that virtio and similar features could avoid some of the overhead you mentioned? I don't know much about how that works.

Virtio and friends are about a quite different thing, namely about virtualizing i/o and hardware, pcie access, etc.

As for benchmarks, yesno. I know that there are benchmarks but I don't know them in detail. What I do know is that most decent virtualization nowadays is in the single digit performance loss range and that executable filtering and instruction trapping and simulating quickly gets into the 2 digit range

mksh · January 2018

@bsdguy said:
@mksh

@perennate said:
@bsdguy answered this, but put another way, virtualization is black-and-white: you can either run the VM on the CPU, or emulate each instruction. There is no in-between that allows you to efficiently filter each instruction before it runs.

Sorry, No. In fact, that is what some emulators do (more or less well), that's how it works. It's, however, done on a lower level, it's the binary that is "filtered" and where forbidden (or not existing on a given guest architecture) binary instructions are trapped out (deviated to the hypervisor). And btw. modern cpus also translate the binaries; that's (well, something very similar) what microcode was mainly about.

However, there are obviously two problems. One is that that mechanism considerably slows down things (those deviations cause a privilege level switch). And the second one is that we must rely on and trust that the software (e.g. kqemu) does it well and correctly, which is a problem in itself because software by its very nature can be hacked and changed easier than hardware. That's why I brought that problem up also above. The whole thing is based on an assumption set (my hypervisor software does it and does it correctly and it is the correct software and not a hacked version) that may or may not hold true.

Summary: It is strongly preferable to have properly working cpus over emulating and filtering in software.

I agree. Also not enterily sure why this was directed at me. Might be because i said emulated OS when i really ment virtualized. If that was the case i apolgize. Might bad i should have been more clear. I have edited my post to avoid further confusion. Also you are right that there probably is a mechanism to emulate just a set of instructions but as you said it is very costly to do this more than absolutly needed.

perennate · January 2018

bsdguy said: It's, however, done on a lower level, it's the binary that is "filtered" and where forbidden (or not existing on a given guest architecture) binary instructions are trapped out (deviated to the hypervisor).

I'd call that "running the VM on the CPU". You're merely adding some preprocessing to the binary. This sounds similar to NaCl for JS, where they analyze the binary prior to allowing for functions in the binary to be called from JS. You can't do that with full VMs or most programs because they need to be able to create executable pages dynamically.

Edit: huh didn't realize qemu supports this while still supporting dynamic executable code. That's actually pretty cool :O. I guess that's pretty much in-between. Although I think my explanation still serves its purpose.

Edit2: actually yeah you're right, that would pretty much be what Maonique was suggesting. Thanks for the insight -- this translation while supporting code generation is very interesting o.o

bsdguy · January 2018

First this one:

@mksh said:

... Also not sure why this was directed at me.

Nope, totally innocent. When I quote and reply to someone it's by no means always in opposition. In this case, for instance, I included you simply because I was under the impression that the answer might be relating to and interesting for you, too.

@mksh said:
I wouldn't be so sure about the significantly part. Automation is key and as long as you can sanity check your result what hinders you to just throw a million tries per second at it. Sorry, no offense but it seems underestimate exploit writers alot and if this has any chance of taking control of the host node it is very very juicy. Expect some really smart guys to invest tons of time into making this work.

Side question: Do you have any hands on experience working with machine code?

First - I mention this because it underlines the level we're talking about: It's not just about machine code but also about microcode and in particular about internal details of a given cpu, e.g. the way (and amount) how a given cpu keeps track of prediction hits or fails.

The main point (at least on the surface) that makes me worry is a) the size of "robbed" data: about 2KB. That's a fucking lot on that level, and b) the "setup" times for the attacks (before one can extract data in useful speed): some hours down to about 30 seconds! Considering that those are the result of an "early afternoon walk" of the researchers it seems very highly probable to bring that down to the sub-second range and all in all something like "2 MB of robbed data" in a second seems not at all out of reach.

Plus, of course: this is but the "first generation" of that kind of attack. Looking at the extreme complexity of the hardware those attacks target and the fact that intel is very limited in how much screws it can fine tune, it seems to me that we see but the dawn of a very ugly and painful time.

perennate · January 2018

bsdguy said: Virtio and friends are about a quite different thing, namely about virtualizing i/o and hardware, pcie access, etc.

I guess I'm just confused how AWS/Google Compute seem to have a way to offer the same level of performance for KVM and Xen HVM (although there were complaints that Xen PV machines are now slower).

WSS · January 2018

@perennate said:

bsdguy said: Virtio and friends are about a quite different thing, namely about virtualizing i/o and hardware, pcie access, etc.

I guess I'm just confused how AWS/Google Compute seem to have a way to offer the same level of performance for KVM and Xen HVM (although there were complaints that Xen PV machines are now slower).

You're aware that Google makes their own BIOS/et al, right? For their resources, creating a proprietary setup that they can access through a higher level interface to lower level hardware isn't a very big deal.

perennate · January 2018

WSS said: You're aware that Google makes their own BIOS/et al, right? For their resources, creating a proprietary setup that they can access through a higher level interface to lower level hardware isn't a very big deal.

But what would an architecture that allows VM to perform e.g. network/disk I/O without context switch while closing the exploit look like? Is it as simple as minimizing the kernel code that is mapped to the VM address space, but including support for functions like I/O that the VM needs to perform? (I'm sure it's not simple in implementation but at a high-level conceptually that is simple.)

If so, it seems like this answers @Maonique's question about how all this will actually affect hosts, i.e., hypervisors will implement something like to resolve the vulnerability while retaining performance, and hosts can simply upgrade.

Maounique · January 2018

clarifications for all.

I agree exploits will get better and have the potential to grow worse security wise;
The lab conditions where you run one VM on a node and that is all on it differ extremely to the field conditions where you run hundreds of VMs (in a public node, i.e. one which holds VMs of "customers" not some secret agency running only internal stuff). The noise introduced by a real hw node loaded with more than one VM is deafening, so, while the PoC exists and it does what it was supposed to do, in reality it will have a lot more problems, therefore it will need to evolve a lot before would be able to do the same thing in a real shared environment;
I am not contesting the PoC, nor that it can work, but while the attacks will develop, so the microcode and hypervisors. It is like saying, OMG, I am able to read text from the mail you send unencrypted alone in a network from one computer to another without other traffic. Naturally, I may be able to encrypt or pad it with garbage, fragment it and send through various routers out of order or it will fragment itself and get padded with other data automatically in a busy network. Email is still a mess today after the first one was sent decades ago, but things evolved a lot. It is highly unlikely prediction and speculation, statistics, etc will cease to be used in CPUs from now on and we have to live with this, not throw the baby out with the water;
A microcode update should be able to fix this, if not in a logical way, at least by cutting some useful mechanism lowering the performance, but nowhere near the 30% predicted, most likely a single digit;
I stand by my statement that virtualization, while not protecting 100% in theory, makes attacks impractical and is much easier to stop in the hypervisor one way or the other, as the hypervisor already does some fencing and can be tweaked through well documented calls and checks to do it with some performance penalty, no doubt, but still not 30% or anywhere close to that;
While the PoC is in early days, so are the patches, expect much better ones later on, once the people at intel get involved as they already are deeply involved in kernel development and expect some interesting new ideas to come out of this. Intel claims at most 5% performance loss, I think that is an exageration so it does not leave them open to attacks once again, I would expect at most 1%, anything over that will open them to major headaches and they will probably sacrifice some security for performance, at least in the older models;
I coded for spectrum clones (z80) and modified a machine so it holds RAM instead of ROM with the purpose to crack and modify games. Seems like the stone age today, but it was a lot of work back then with a lot of guessing as the documentation was almost non-existent for me. This is why, back in the day, confronted with huge limitations people discovered bugs and transformed them into features so they are able to squeeze even more in the meager ram and CPU at the time.

WSS · January 2018

@perennate said:

WSS said: You're aware that Google makes their own BIOS/et al, right? For their resources, creating a proprietary setup that they can access through a higher level interface to lower level hardware isn't a very big deal.

But what would an architecture that allows VM to perform e.g. network/disk I/O without context switch while closing the exploit look like? Is it as simple as minimizing the kernel code that is mapped to the VM address space, but including support for functions like I/O that the VM needs to perform? (I'm sure it's not simple in implementation but at a high-level conceptually that is simple.)

I'm stating that their purpose/et al is not going to be as easily manipulated- unless you're going to infer that by ticking the drivers just right could cause bad things to happen. As discussed by Google, they were able to alter the code being run under very specific conditions, which are none of the above. Just because it emulates talking to hardware and has a very fast rate, doesn't mean that it's using the host CPU for any more than it would for any other device.

mksh · January 2018

@Maounique said:

The lab conditions where you run one VM on a node and that is all on it differ extremely to the field conditions where you run hundreds of VMs (in a public node, i.e. one which holds VMs of "customers" not some secret agency running only internal stuff). The noise introduced by a real hw node loaded with more than one VM is deafening, so, while the PoC exists and it does what it was supposed to do, in reality it will have a lot more problems, therefore it will need to evolve a lot before would be able to do the same thing in a real shared environment;

Not even arguing with this but you realize there is a lot of money to made here so if it's in any way possible there is more or less infinite manpower available to make said evolution happen?

I am not contesting the PoC, nor that it can work, but while the attacks will develop, so the microcode and hypervisors. It is like saying, OMG, I am able to read text from the mail you send unencrypted alone in a network from one computer to another without other traffic. Naturally, I may be able to encrypt or pad it with garbage, fragment it and send through various routers out of order or it will fragment itself and get padded with other data automatically in a busy network. Email is still a mess today after the first one was sent decades ago, but things evolved a lot. It is highly unlikely prediction and speculation, statistics, etc will cease to be used in CPUs from now on and we have to live with this, not throw the baby out with the water;

I don't think anyone is saying speculation is per se bad or shouldn't be used. What people are saying is that as far as hypervisors are concerned the amount of options to mitigate this are limited and at least on first glance seem very taxing performance wise and also quite fragile if possible at all.

A microcode update should be able to fix this, if not in a logical way, at least by cutting some useful mechanism lowering the performance, but nowhere near the 30% predicted, most likely a single digit;

If you have the knowledge to make this claim i will not argue with it as i sure don't. All i can say is that the referenced information saying micro code won't help seem plausible as internal optimization as is speculative execution is likely to be very time critical and i wouldn't be suprised if it was implemented purely in hardware.

I stand by my statement that virtualization, while not protecting 100% in theory, makes attacks impractical and is much easier to stop in the hypervisor one way or the other, as the hypervisor already does some fencing and can be tweaked through well documented calls and checks to do it with some performance penalty, no doubt, but still not 30% or anywhere close to that;

Time will tell. Still not likely to happen is not something that enchances my sleep quality.

While the PoC is in early days, so are the patches, expect much better ones later on, once the people at intel get involved as they already are deeply involved in kernel development and expect some interesting new ideas to come out of this. Intel claims at most 5% performance loss, I think that is an exageration so it does not leave them open to attacks once again, I would expect at most 1%, anything over that will open them to major headaches and they will probably sacrifice some security for performance, at least in the older models;

Why on earth would you expect intel to overestimate impact? If anyone has a real interest in downplaying the problem it's them.

I coded for spectrum clones (z80) and modified a machine so it holds RAM instead of ROM with the purpose to crack and modify games. Seems like the stone age today, but it was a lot of work back then with a lot of guessing as the documentation was almost non-existent for me. This is why, back in the day, confronted with huge limitations people discovered bugs and transformed them into features so they are able to squeeze even more in the meager ram and CPU at the time.

Not really directly answers my question about machine code experience but i guess i'll take this as a yes.

Hxxx · January 2018

CHILL. Let the big boys handle it.

Maounique · January 2018

mksh said: Why on earth would you expect intel to overestimate impact? If anyone has a real interest in downplaying the problem it's them.

At first glance. I think this was carefully supervised by some bosses, not the general PR thrown out at media.
If they say 5 and it is 4 3 2 1, then people will let them off the hook. If they say 5 and is 10, it will be pandemonium.
Also, it is based on available data. They believe they can do that in at most 5% margin, in reality, new ways to mitigate it may be found, slightly altering the timing, the order, shrinking the window of opportunity and skewing it, translating some instructions and breaking them down differently, after all this is what microcode was supposed to implement, making impossible to use current flaw to read memory, while not really patching it out of existence completely until someone manages to bypass the mitigation in place.
In short, the cost of not delivering on that promise after they lash out at the "bad media" will be huge and they are not trump to afford it, their audience is educated and will not take it in their stride like, hey, the bad media is out to get us and we screwed it up because of them and believe any conspiracy theory, not matter how phantasmagorical with alternative facts and stuff. No, they clearly said 5%, they cant go over that, and it must be a conservative estimate, aws, azure etc may ask for damages and replacement cpus, both hugely costly and the rivals will profit once they smell blood as they already tried to.
TL;DR Intel should be aware they cannot play with these numbers without huge consequences, if they are not... Well, it will be interesting but not very likely.

mksh said: Not really directly answers my question about machine code experience but i guess i'll take this as a yes.

When i think of patching the microcode to do something what was not intended, i remember those times, this is why i gave that example. Where there is a will and people which know their craft, there is a way. Maybe not orthodox, maybe not 100% bullet-proof, but will do.

mksh said: Still not likely to happen is not something that enchances my sleep quality.

I know a nice cliff in the mountain, we should visit one day and jump from there, the amount of flaws in the most basic drivers and building blocks of the digital infrastructure controlling nuclear weapons and generators as well as the knowledge about them kept under wraps by not so benevolent entities is staggering. We are certainly doomed and should not sleep until the end, cause it is nigh.

mksh said: the amount of options to mitigate this are limited

There are not many, indeed, and most seem to be taxing, but a smart and imaginative guy may find a SPoF which breaks the whole, pretty long, chain of events needed for a successful exploitation through a hypervisor. Once it is known and many people look for a cure, one is likely to be found and spread/adapted to other scenarios.

mksh said: you realize there is a lot of money to made here so if it's in any way possible there is more or less infinite manpower available to make said evolution happen?

While the poor AWS and Microsoft as well as many others will not be able to afford some research to mitigate it instead of changing the whole infrastructure over night. I can see the titles in the tabloids, the "cloud" is in danger, haxxorz will steal yer money, dont host anywhere outside of premises, dont keep your money in the bank dont use phone, computer, ATM, plane, car, etc... A lot of money will be put into mitigation and I really hope this will bring good things too.

eva2000 · January 2018

MeMyselfandLinux said: and still no official statement from amd(?)

http://www.amd.com/en/corporate/speculative-execution

jar · January 2018

Step 1: Remember that things often sound worse than they are.
Step 2: Admit I know nothing about anything relating to how this functions anyway.
Step 3: Rejoice in knowing that I don't personally manage any VPS nodes, and that if I have servers enabling anyone to execute code on them right now then I really have bigger problems anyway.
Step 4: Drink.

How I feel reading this thread. Enjoying reading the words of brighter minds.

Howdy, Stranger!

Categories

In this Discussion

Your Intel x86 CPU is Deeply Flawed (Meltdown/Spectre)

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Your Intel x86 CPU is Deeply Flawed (Meltdown/Spectre)

Comments