Wednesday, July 20, 2011

VMX and SMM – Dual monitor mode


The Dual monitor mode involves two monitors:  Executive monitor and the SMM monitor. The Executive monitor is analogous to the vmx-root hypervisor that exists outside of SMM. The SMM monitor is a special hypervisor that operates only in SMM. Under dual monitor treatment SMI’s cause vmexits and this information is recorded in a separate vmcs called SMM transfer vmcs. This enables the SMM monitor to assume control of vmexits caused by SMI# assertion.

Normal  vmx transitions (without dual monitor):
1.        vmxon executed by the executive monitor.
2.       vmptrld is executed.  The guest vmcs is then initialized via vmwrites.
3.       vmlaunch is executed. The guest virtual machine is launched.
4.       A vmexit from the guest traps back to the executive monitor. The executive monitor reads the exit_reason, exit_qual and a bunch of vmcs fields to extract more details on the vmexit.  After handling the vmexit , the executive monitor resumes the guest by executing vmresume.
5.       If there is a SMI# in the guest, vmx is turned off. The processor enters SMM. Upon a RSM, the processor takes us back to vmx guest.

Dual Monitor vmx transitions:
0.       Enable Dual Monitor. [see section on enabling dual  monitor below].
1.       A. vmxon executed by the executive monitor.  
B. Dual monitor treatment is activated [see section on activating dual  monitor below].
2.       vmptrld is executed.  The guest vmcs is then initialized via vmwrites.
3.       vmlaunch is executed. The guest virtual machine is launched.
4.       A vmexit from the guest traps back to the executive monitor. The executive monitor reads the exit_reason, exit_qual and a bunch of vmcs fields to extract more details on the vmexit.  After handling the vmexit , the executive monitor resumes the guest by executing vmresume.
5.       If there is a SMI# in the guest, then a SMM VMexit occurs. Control is transferred to the SMM monitor (instead of executive monitor).  The SMM monitor now handles the SMM vmexit by reading relevant fields in the SMM transfer vmcs.  After handling the SMM vmexit, it resumes the guest by executing a vmresume.
Steps 1A, 2,3 and 4 are identical for normal and dual-monitor vmx transitions.
The only difference between Normal vmx transitions and the Dual monitor vmx transitions is in the handling of SMI. In the dual monitor case, there is a new vmexit (SMM VMexit) that traps to the SMM monitor. All other vmexits continue to trap to the executive monitor.

When the machine is in the SMM monitor, it is considered to be in SMM. A  SMM VMexit is one that begins outside of SMM and ends in SMM. This means that SMM VMexits are also accompanied by the SMI_ACK special cycle. Similarly, a vmresume from a SMM monitor that resumes the guest is also accompanied by a SMI_ACK special cycle(since this vmresume takes the machine from SMM to outside of SMM).


The next two sections cover step 0 and 1B of Dual Monitor vmx transitions.


Enabling Dual Monitor Treatment:
Intel provides a new msr (msr 0x9b – SMM_MONITOR_CTL msr) for this.  Bit 0 of this msr is the valid bit. Bits 31:12 is the physical address (4K aligned) of the monitor segment (also called MSEG) that initializes the SMM transfer vmcs. This msr can be written only in SMM mode.  Here is a sample code:
mov ecx, 0x9B
mov eax, 0x00009001
xor edx, edx ; bits 63:32 are reserved. Clear edx
wrmsr
rsm  ; get out of SMM

The valid bit is set to 1.  Bits 31:12 = 0x9 – This implies that the physical address of the MSEG segment is 0x9000. Note that the above code snippet must run in SMM (SMI handlers that are dual-monitor aware may add the above code to initialize the MSEG). 

A sampleMSEG header looks like the one shown below.  In our example, the header is at physical address 0x9000 (what we wrote in msr 0x9b).

revision_identifier              dd 0
smm_monitor_features      dd 0
gdtr_limit                           dd 
gdtr_baseoffset                 dd
cs_sel                               dd
eip_offset                         dd
esp_offset                        dd
cr3_offset                        dd

The format of the MSEG_HEADER above matches the one described in Table 26.10(vol 3b, System Management Mode, chapter 26).  Note: Depending on the version of the Intel manual, table numbers may vary – but it will be found in the SMM chapter regardless of the manual version.


Activating Dual Monitor Treatment:
After enabling the dual-monitor treatment, software can activate it by executing vmcall instruction.  This execution of vmcall is in VMX_ROOT mode.  (Execution of vmcall in vmx_non_root mode always causes a vmexit. Vmcall execution in vmx_root mode thus has a special meaning – to activate dual monitor treatment).  Here is a sample code that accomplishes this:

; enable cr4.vmxe
mov eax, 0x00002010
mov cr4, eax
; do vmxon
VMXON [vmxon_ptr]
jbe fail
;load smm transfer vmcs pointer
vmclear [vmcs_smm_ptr]
jbe fail
vmptrld [vmcs_smm_ptr]
jbe fail
; now do vmcall
vmcall


When the processor executes vmcall instruction in vmx_root mode, internally does the following:
vmcall_flow:
if (vmx_root) {
       If(dual_monitor_active) {
          Perform SMM_VMEXIT;
       } else if (SMM_MONITOR_CTL_VALID){
         Activate_Dual_Monitor_SMM_VMexit;
      }
}

In the above code snippet,   SMM_MONITOR_CTL_VALID  comes directly from bit 0 of the SMM_MONITOR_CTL_MSR (msr 0x9b).  If these conditions are not met, vmcall fails. Also note that there are additional checks vmcall performs on the SMM transfer vmcs which are not discussed here. For those details reading the manual vol 3b is recommended.  [Also looking at the vmcall pseudo-code provided in Vol 2b (under vmx instructions) is recommended].

In the process of ‘Activate_Dual_Monitor_SMM_VMexit’, the processor does the following:
a.      
En  Enters SMM (issues a SMI_ACK bus cycle)
b.      Reads the MSEG revision identifier (offset 0). If it does not match the revision identifier supported by the processor then VMCALL fails. [The MSEG revision id supported by the processor is obtained by a rdmsr of IA32_VMX_MISC_MSR (msr 0x485 – bits 63:32).
c.       Reads the MSEG features field and performs checks on that field.
d.      After all checks pass, the processor starts executing instructions from the RIP indicated in the eip_offset field of the MSEG.

Sample MSEG code:
mov eax, 0x11ff
mov ebx, VMX_ENTRY_CONTROLS
vmwrite ebx, eax

mov eax, 0x008B
mov ebx, VMX_GUEST_TR_ATTR
vmwrite ebx, eax
 
mov ebx, VMX_EXIT_INSTR_LEN
vmread eax, ebx

mov ebx, VMX_GUEST_RIP
vmread ebx, ebx

add eax, ebx
mov ebx, VMX_GUEST_RIP
vmwrite ebx, eax

vmlaunch

The code above does only the bare minimum stuff (In reality, it will initialize the entire SMM_VMCS) – It initializes the entry_controls, updates the guest_rip and does a vmlaunch.  Where does this vmlaunch take the machine?   The answer to VMX_ROOT.  This is a special type of VMentry that takes the machine back to VMX_ROOT – Intel calls this VMentry as a ‘VMentry that returns from SMM’.   Remember that the machine performed a SMM_Vmexit when VMCALL was executed in VMX_ROOT mode – So this VMLAUNCH in the MSEG code takes us back to VMX_ROOT.  At this point, we are in the executive-monitor. This completes step 1B in the dual monitor flow.

Thursday, January 6, 2011

VMX and System Management Mode - Part 1

There are two different modes of operation of VMX within SMM:
1.Normal Mode
2.Dual monitor mode


Normal Mode:

Under Normal mode, a SMI# assertion causes the processor to turn-off vmx and enter into SMM. Upon a RSM, the processor automatically enables VMX if it was either in VMX-ROOT or VMX-GUEST prior to the SMI#. Since the processor turns off VMX, it means that CR4.VMXE is treated as reserved bit and must be 0 during RSM.

Algorithmically,

if(smi){
if(vmx_root or vmx_guest){
save cr4.vmxe internally;
if(vmx_root) internal_state = vmx_root;
if(vmx_guest) internal_state = vmx_guest;
turn_off_vmx;
}
save cr4 to smm_ram;
}

during rsm:

if(rsm){
read cr4_val from smm_ram;
if(cr4_val.vmxe==1) jump_to_shutdown;
retrieve internal cr4.vmxe;
cr4 <- cr4_val | (cr4.vmxe<<13);
read internal_state;
if(internal_state==vmx_root) put_cpu_in_vmx_root;
if(internal_state==vmx_guest) put_cpu_in_vmx_guest;
}


Notice the jump_to_shutdown during RSM. Since the processor saves CR4.VMXE internally during SMM, the value saved in SMRAM for CR4.VMXE is always 0. During RSM, the CR4 value is first loaded from SMRAM and bit 13 is checked . It must be 0 – If not the cpu will jump to shutdown. The processor then retrieves the value of VMXE from an internal register and updates CR4 with this value. The state of the processor (whether it was in vmx-root or vmx-guest or normal ia32 operation) is also retrieved and the cpu is put in that state after the completion of RSM.

This process is the default treatment of SMIs with VMX.

Notes on System Management Mode [SMM]

SMM:
SMM [System Management Mode] is an operating mode entered through the assertion of the SMI# pin. The processor upon detecting a SMI# saves the processor state in SMRAM [The base address of the SMRAM is obtained form an internal SMBASE register. The reset value of SMBASE register is 0x30000]. The processor saves several architectural values into the SMRAM (like the values of CR0, CR3, CR4 etc) when it enters SMM. To exit out of SMM , software executes a RSM(resume) instruction. During the RSM instruction, the processor reloads the architectural state from SMRAM and gets back to the state it was prior to the SMI#.
Here is a loosely defined algorithm for entering and exiting SMM:
1.Processor is executing a task (say T).
2.SMI# is detected by the processor.
3.Processor saves all information pertaining to task T in the SMRAM. It issues SMI_ENTER_ACK bus cycle and enters SMM.
4.Processor executes code from the SMM space[starting at address 0x38000]
5.When it executes the RSM instruction, the processor reloads the prior architectural state from SMRAM and then issues SMI_EXIT_ACK bus cycle and exits SMM.
6.Processor resumes executing the task T.

During Step 5, while the processor loads architectural state, it performs few checks on the state being loaded:
1.It checks the reserved bits of CR4.
2.It checks CR0 register for illegal combinations. For eg: CR0.PG=1 and CR0.PE=0 or CR0.CD=0 and and NW=1 .
If the checks above fail, then the processor enters shutdown.
[Note: there may be additional checks performed. CR0 and CR4 values in SMRAM should be left untouched by the SMM handler. These checks exist to make sure that the handler does not modify values to put the processor in an incompatible state after the execution of RSM].