Andy Glew's comp-arch.net wiki, http://semipublic.comp-arch.net

If you are reading this elsewhere, e.g. at site waboba.info, it is an unauthorized copy, and probably a malware site.
comp-arch.net wiki on hold from October 17, 2011

SIE

From CompArch

Jump to: navigation, search

Contents

SIE - what it can be good for

I think IBM's SIE (Start Interpretive Execution) instruction is a very good thing. Not only can it be used, as originally, for IBM style virtual machines. It also enables the following:

  • user level virtual machines, with no need for an OS component. (with hardware support for concatenated page tables)
  • multilayer virtual machines, to almost arbitrary nesting level
  • lightweight thread forking and joining, without OS intervention
  • instruction set extensions and mode switches, e.g. switching to a completely different instruction set, such as VLIW, in the middle of a legacy instruction set
  • coprocessors

It is not clear whether IBM realized all of this potential when they implemented SIE.

--

TBD: discuss the above. Show how it can be done.

Joke?

I have long wanted to write a paper "SIE Sr: IBM did virtual machines better in 1985 than we do now", or "SIE Sr: the Abstract Virtual Machine Interface that enables (all of the above)."

The Basic Idea behind SIE

The basic idea behind SIE is this:

  • You are executing in a first mode, call it the host.
  • You encounter the SIE instruction. Typically, it has as an argument a pointer to a memory buffer. Let's call that the SIEBK.
    1. Note: the SIEBK is *not* at a fixed address in a hardwired register. I.e. NOT like Intel VMX's VMCS. Or at least not if you wantmany of the good things that SIE brings to the table, like multilayer.
      1. Fixed address SIEBK/VMCS is much like taking interrupts on a single stack, or saving interruptee state to a fixed area: it impedes reentrancy and nesting.
  • The SIEBK in memory contains the state of a new virtual machine that is to be loaded.
    1. In IBM's SIE, the SIEBK contained mainly only the privileged state. The caller could load much of the unprivileged state, the ordinary registers. WLOG we will just talk as if all of the state is in the SIEBK.
  • The state of the host is saved somewhere.
    1. The where is not necessarily defined.
    2. For the simplest example, let us assume that it is saved in the SIEBK itself. I.e. let us assume that the SIEBK is large enough to hold both the old state to be saved and the new state to be loaded.
  • The SIE instruction therefore saves the old host state (into the SIEBK), and loads the new, guest, state (from the SIEBK)
  • The guest is therefore running.

So far, so good. SIE starts the guest. Now, how do you handle virtual machine intercepts, things that the host needs to do for the guest?

  • When the guest encounters an instruction or an event (such as an interrupt, or an access to a memory location that the host virtual machine disallows) the the guest performs an SIE EXIT.
  • Guest state is saved back into the SIEBK. Typically, into exactly the same area that the guest state was loaded from in the first place.
  • The old host state is restored. In our current example, from the old host state save area in the SIEBK.
  • Typically, some register or condition code or memory area to indicate the status of the guest virtual machine.
    1. I.e. it is not necessary to restore all of the host state - or, actually, any except for the privileged parts.
  • The host resumes executing, typically at the instruction after the SIE itself.
    • The code after the SIE instruction checks the status of the guest. It branches to whatever code handles the guest virtual machine event.
    1. E.g. if the guest is performing a privileged instruction that the host VMM needs to emulate, the host does so. And then, once that is done, it returns to the guest.
  • How does the host VMM return to the guest VM? By re-executing the same SIE instruction, reloading the guest's saved state.

That's it.

Issue: Pinning the SIEBK

Actually, that's not quite it.

So,

1. we have read guest state from the SIEBK, and at least conceptually stored host state into the SIEBK.
2. we have executed in the guest
3. now we have encountered an event such as a page fault in the guest
4. we want to save the current guest state in the SIEBK, and restore the saved host state (which, again, may be imagined to be stored in the SIEBK, but in reality may be somewhere else)

Q: what happens if, at this point, we want to take a page fault on the SIEBK?

After all, although the OS should not have removed the SIEBK virtual address from the page tables while the guest was running, that assumes a well behaved OS. In industry, in the mass market, we are not always allowed to make that assumption. In particular, it would be very nice to be able to add an instruction like SIE to the instruction set while requiring minimal changes to the OS. After all, the OS and the CPU may be implemented by differing companies, with differing goals and, even when goals are aligned, differing schedules.

The naive solution, proposed umpteen times, is to say "The OS must pin the SIEBK". I.e. lock the SIEBK down, prevent it from being paged. This is hardly OS transparent.

Here's another solution: pseudo-pinning.

At time 1, when executing the SIE instruction, save the translation, the virtual to physical address of the SIEBK, along with its permissions. Ensure that it is writeable.

At time 4, when trying to suspend the SIE by saving guest state into the SIEBK, use the translation saved from time 1.

Think about it: this old translation is guaranteed to be still accurate. TLBs are essentially non-coherent caches for the page tables. On machines, such as Intel x86, that do not have operations that invalidate TLB entries in other processors' TLBs, once a virtual to physical address translation has been read, it may reside in the TLB essentially forever. It may be thrashed out of the TLB by other translations. But the only way you can guarantee that a TLB entry is not present is to perform a TLB shootdown, sending an interrupt to the other processor(s) where the interrupt handler performs a local TLB invalidation (x86 INVALPG).

So, if no interrupt has occurred between time 1 and time 2, you can use the saved TLB translation from time 1.

What if an interrupt occurs between time 1 and time 2? Well, you *could* save the interrupted guest state in the SIEBK using the translation saved at time 1. The instruction pointer pushed on the interrupt stack would point to the SIE instruction. On return from the interrupt handler to the interrupted code, the SIE instruction would re-execute, reload the SIEBK translation, and the guest resume.

You *could* do the above. It seems simple enough. Unfortunately, it is a virtualization hole, in particular in the presence of multithreading: other threads of the host VMM would be able to observe that the guest SIEBK area had been written. This may not seem bad if the host VMM is actually part of the OS, and aware of interrupts. But it is a violation of virtualization if the host VMM is, e.g., a user process, or an OS below which yet another virtual machine is running.

You fix this virtualization hole by saving the guest SIE state somewhere else. The interrupt handler, of course, always has to arrange to save the state of the code it is interrupting. Using pinning if necessary.

Nothing terribly challenging here. The key idea is that pseudo-pinning allows you to save state without having to modify the OS to pin the SIEBK. At the cost of some minor virtualization issues. Which can always be fixed ... by modifying the OS, or VMM, or ...

What about if there are hardware multiprocessor TLB invalidation / shootdown instructions? If you can tell that the virtual addresses involved do not involve the SIEBK, no worries. If they do involve the SIEBK, or if you are too lazy to figure out, treat it like an interrupt: save the guest state, either to the SIEBK or to the magic other place that preserves virtualization, perform the TLB invalidation. After it is done, try to restore the SIEBK, almost as if reexecuting the SIE instruction. If it takes a page fault, fault - with the instruction pointer pointing to the SIE instruction, and the SIEBK containing the guest state.

I.e. almost exactly what you would do for an interrupt.

This simple idea, pseudo-pinning, enables many instruction set innovations. It requires little OS support. However, with full and proper OS support it can be made fully transparent.


Why I deprecated SIE when I first learned abut it, and why I came to love it.

I was something of a RISC bigot when I first learned about SIE (in the late 1980s?). It seemed to me that SIE was the ultimate CISC instruction - Start Interpretive execution?!? I imagined some sort of huge microcode loop.

  1. Turns out that SIE is a largely non-microcoded interface. You load the new state, and execute in hardware. There are a few new control register bits, aka the SIE interception bitmask. When the guest VM tries to execute an instruction that you want to emulate in the host VMM, you detect that in hardware, and perform the SIE EXIT.
    1. The only microcode involved is that which saves the state to the SIE BK, and which restores the state. I.e. basically the equivalent of an exception or interrupt, albeit with a slightly different set of state.
    2. As a RISC bigot, I was also trying to eliminate *that* sort of microcode, by creating register only interrupt sequences, e.g. swapping register files. Experience has shown that, while this can be done, it has reentrancy issues.

I also thought that SIE would be inefficient. E.g. if performing an SIE EXIT for one of several known reasons, why not vector directly to a handler for that reason? E.g. like interrupt vectoring? Why have the guest code

    1. While this can be done - and, indeed, must be done in certain circumstances to preserve virtualization - it presents certain issues with respect to securely validating and protecting the host state save areas that are to be loaded on such an event.
    2. It is possibly worth working out the details of such a directly vectored approach.
    3. But, the single SIEBK approach is so simple, that it allows us to avoid many thorny details.

For me, the final straw with respect to SIE was realizing that it could be used to support multilayer, user mode, virtual machines.


SIE allows user mode VMMs

With some extra support for multilayer virtual memory address translation, such as Intel EPT (Extended Page Tables).

    • As a RISC bigot I was similarly opposed to multilayer virtual memory translation. Heck, I was seriously interested in software TLB miss handling.
  1. But I realized that a user level host VMM could execute SIE. The guest VM could enter any privilege level it wanted.
    1. The only really privileged operation was translating guest VM virtual and physical addresses back to host VMM unprivileged user level virtual addresses, and hence to host physical addresses.
    2. Given "hardware" support for the concatenated (gV->gP), (gP->hV), (hV->hP) translations, the guest really doesn't require any special privilege.
      1. In the concatenated or idealized virtual machine address translation, (gV->gP), (gP->hV), (hV->hP), you could have 3 arbitrary mappings. However, you might also have a simplified mapping such as address offsets for the (gP->hV) mapping. Thus, you might be able to get away with only 2 levels of arbitrary mappings.

Therefore, a user level host VMM can use SIE to create a guest VM that supports any privilege level.

Furthermore, a guest VM can itself execute SIE, whether in kernel or user space. And, at least conceptually, the state save works the same way, in this second, nested, virtual machine.

    1. If hardware only supports two levels of virtual machine address translation, the original host VMM (user mode code in our example) needs to intercept the nested SIE. It may then need to perform a concatenation, in software, of the virtual machine address translations. I.e. it can create shadow page tables, eagerly or lazily.
    2. But, from the point of view of the guest that is executing the SIE instruction - the SIE is executed. As before. No matter that it might be emulated. The guest cannot see an artifacts of the nested VM emulation.

The special value of this is that it can be done entirely by user mode code in the user mode host VMM. There is no need for OS level code. (Unlike Intel and AMD's virtual machine architectures, which require that the VMM be in privileged Ring 0.)

    1. With this hardware support, for example, you could emulate Windows on top of Linux, and/or Linux on top of Windows. And neither OS would need to be modified. No device drivers would need to be installed. Nothing.

This is a good thing.

Virtualization holes in SIE - SIE is an abstract virtual machine interface

On a single threaded machine, with no other processors or DMA, SIE is secure. You can actually save the host privileged state to the SIEBK. Since nobody else is running, nobody can look at that privileged state, or modify it. You would need to scrub, clear out, the host privileged state from its save area in the SIE BK on an SIE EXIT, since the caller of SIE might not be allowed to look at the actual state.

When you start having DMA or channel processors, this is not good enough. The I/O processors, or other SMP processors, may be able to look at and/or modify the state saved in the SIEBK. This would be a virtualization hole.

You therefore need to protect the host saved state, certainly against writing, and probably also against reading.

  1. You might save it somewhere other than the SIEBK host save area. (But then you have to manage that other save area. TBD, elaborate.)
  2. You might also consider cryptographic protection - encrypting the saved state so that it cannot be read, hashing it so that attempts to modify it can be detected.
  3. Or you might wave a magic wand and create a new hardware protection mechanism - something like a last level of page granular protection, that applies to all physical memory accesses, whether from the CPU or via I/O. (This creates issues on paging.)

Let's assume that you have solved this issue one way or another.

There is a further virtualization hole that occurs in multithreaded virtual machine environments. Recall how we said that, on an SIE exit, the guest state is saved in the SIEBK? This is a nice concept, but it is a virtualization hole.

  1. e.g. if VM0 -sie01-> VM1 -sie12-> VM2, and if a virtual machine event occurs that causes an SIE EXIT all the way back to VM0, but which is supposedly invisible to VM1, then we cannot simply store VM2's state in sie12's SIEBK, and VM1's state in sie01's SIEBK. Other threads running in the VM1 level may be able to observe the modification of sie12's SIEBK They will therefore be able to observe that VM2 has SIE EXITed, and, since they know that VM1 is not intercepting, then can tell that a lower level virtual machine has intercepted.

It is therefore necessary to save the VM2 guest state not in its parent sie12 SIEBK guest area, but in the gradparent sie01 SIEBK guest area. It is necessary to SIE EXIT directly from VM2 to VM1. Conceptually, VM0 eventually returns to VM1 by reexecuting sie01, but that actually jumps back to VM2.

Remember how I wanted to be able to vector directly to an interception handler? Multilayer SIE requires such vectoring. The implementation needs to be able to determine which VM layer to SIE EXIT to. This *could* be done by walking the tree of SIE blocks, each of which conceptually points to a parent SIEBK (although that assumes the sort of magic protection mechanism and/or SIEBK shadowing mentioned at the start of this section). A more efficient implementation might be to have a registers pointing to the SIEBKs that an interception should vector to, for some (but not necessarily all) interception causes. Or... you could just exit all the way to the base level VMM for all interception events, and let its software determine where you actually need to go.

This is the key insight. SIE is an abstract virtual machine interface. It is not necessarily the lowest level, actual, implementation of the virtual machine mechanism.

You might want to have a base level virtual machine that uses interception vectoring, or one to which all SIE EXITS branch. But it may want to emulate the abstract SIE virtual machine interface.

It is possible for such a non-SIE virtual machine interface to efficiently emulate any number of nested SIE virtual machines. It is also possible for a non-SIE virtual machine interface to emulate another non-SIE virtual machine. However, it is not clear if it can be done efficiently; and it certainly is more complex than emulating an SIE based virtual machine.

The key insight is that most uses of multilayer virtual machines do not care about all interceptions. The user might, for example, only need to emulate one or two new instructions, and may not care about anything else. SIE is a nice interface for such simple layering. A more complicated VM architecture may be needed for the base level virtual machine, or for certain other purposes; but it is good to have a simple interface for the common case.

All of this fancy footwork is needed only to create multilayer, multithreaded, fully transparent, virtual machines. That may be true for real IBM-style VM architectures. But many jobs do not need full virtualization. That is the other key benefit of SIE.

SIE. IBM really came up with a winner. It is not clear to me if IBM ever knew about all of the potential goodness of SIE, or if it has exploited this for anything other than virtual machines. There are reports that many IBM VMs have restrictions that SIE can solve. However, IBM certainly provides 3 or 4 levels of nested VM support, something that newcomers to the field may not understand the need for. My guess is that whoever invented SIE at IBM understood much, if not all, of its potential, but that not all was immediately implemented. I wish I knew who invented SIE at IBM. (Hmm, now I am not at Intel I am allowed to do patent research. I should try to find out.)

Certainly, IBM has often noted that SIE was one of the major improvements in the IBM VM architecture.

non-VM uses for SIE

You want to execute a new instruction set - e.g. a VLIW on a RISC? SPARC on a MIPS? Vice versa?

  1. Traditional proposals to do this involve a lot of OS work.
  • Execute a flavor of SIE where the guest state includes a new instruction set mode.
  • No need to change the OS, to save modes: the SIE XIT saves the new ISA mode, restores the old, and then allows the existing interrupt handling mechanisms to run.
  • Note that this can be arbitrarily nested. ISA0 can cal a function, that SIEs to ISA1. That ISA1 function can call another function, that SIEs to ISA0, or to a nw ISA2.

You want to spawn thousands of lightweight threads?

  1. Traditional proposals to do this involve a lot of OS work.
  2. Instead, use a flavor of SIE:
  • Create a flavor of SIE where the SIEBK is a data structure than contains multiple guest thread states. E.g. an array of guest thread states.
  • Load up the threads, and run them. No need to tell the host OS.
    • If there are sufficient hardware threads available, run em all.
    • If this user wants more threads than hardware has, then you may want to consider a microcode scheduler. Or not...
  • When any of the threads encounters a problem
    • If the OS is not aware of the threads, stop em all. Save all of their state to the SIEBK gust state areas. Then resume the host, and invoke the OS.
    1. You can do better with OS involvement. But it is nice if the OS involvement is optional, not required.

The only real difference in most of these ideas for non-VM SIE flavors is that the OS may receive something like a page fault, where the saved instruction pointer points to the SIE-like instruction, rather than the actual instruction that caused the page fault. Ditto illegal instructions, debuggers, etc. The OS needs to be modified so that, if it needs to look at the actual instruction, it can say "I am stopped at an SIE - I need to look in the SIE blocks, possibly a chain thereof, to find the actual instruction". However, we mostly try to avoid the OS having to look at or emulate the instruction. E.g. we provide CR2 page fault address registers, etc. Having the OS have to know about all new instruction formats is an impediment to adding new instructions.

You want to use an external coprocessor from user mode, as opposed to having to hide it in a OS device driver?

  • Use a flavor of SIE where the guest state is the coprocessor.
  • Naively, have the processor go idle while the coprocessor runs. I.e. synchronous coprocessing. Although you can do better than that, with OS support. (TBD talk about asynchronous SIE support...)

You want to use a chunk of reconfigurable logic to extend your processor?

  • Use a flavor of SIE where the guest state is the FPGA state.
  • Use the old DEC / Rajiv virtualization of reconfigurable logic for efficiency.
  • TBD: hmm, this may actually be a new invention...

Note that many of the above non-VM SIE flavors do not need full closure of the virtualization holes that VM SIE requires.

Amdahl BIE a predecessor to SIE?

It has been reported to me that IBM SIE (Start Interpretive Execution) follows, is derived from, comes after Amdahl BIE (Begin Interpretive Execution).

However, I have seen no documentation about Amdahl BIE.

Personal tools
No more shadowing