Wednesday, July 20, 2011

VMX and SMM – Dual monitor mode


The Dual monitor mode involves two monitors:  Executive monitor and the SMM monitor. The Executive monitor is analogous to the vmx-root hypervisor that exists outside of SMM. The SMM monitor is a special hypervisor that operates only in SMM. Under dual monitor treatment SMI’s cause vmexits and this information is recorded in a separate vmcs called SMM transfer vmcs. This enables the SMM monitor to assume control of vmexits caused by SMI# assertion.

Normal  vmx transitions (without dual monitor):
1.        vmxon executed by the executive monitor.
2.       vmptrld is executed.  The guest vmcs is then initialized via vmwrites.
3.       vmlaunch is executed. The guest virtual machine is launched.
4.       A vmexit from the guest traps back to the executive monitor. The executive monitor reads the exit_reason, exit_qual and a bunch of vmcs fields to extract more details on the vmexit.  After handling the vmexit , the executive monitor resumes the guest by executing vmresume.
5.       If there is a SMI# in the guest, vmx is turned off. The processor enters SMM. Upon a RSM, the processor takes us back to vmx guest.

Dual Monitor vmx transitions:
0.       Enable Dual Monitor. [see section on enabling dual  monitor below].
1.       A. vmxon executed by the executive monitor.  
B. Dual monitor treatment is activated [see section on activating dual  monitor below].
2.       vmptrld is executed.  The guest vmcs is then initialized via vmwrites.
3.       vmlaunch is executed. The guest virtual machine is launched.
4.       A vmexit from the guest traps back to the executive monitor. The executive monitor reads the exit_reason, exit_qual and a bunch of vmcs fields to extract more details on the vmexit.  After handling the vmexit , the executive monitor resumes the guest by executing vmresume.
5.       If there is a SMI# in the guest, then a SMM VMexit occurs. Control is transferred to the SMM monitor (instead of executive monitor).  The SMM monitor now handles the SMM vmexit by reading relevant fields in the SMM transfer vmcs.  After handling the SMM vmexit, it resumes the guest by executing a vmresume.
Steps 1A, 2,3 and 4 are identical for normal and dual-monitor vmx transitions.
The only difference between Normal vmx transitions and the Dual monitor vmx transitions is in the handling of SMI. In the dual monitor case, there is a new vmexit (SMM VMexit) that traps to the SMM monitor. All other vmexits continue to trap to the executive monitor.

When the machine is in the SMM monitor, it is considered to be in SMM. A  SMM VMexit is one that begins outside of SMM and ends in SMM. This means that SMM VMexits are also accompanied by the SMI_ACK special cycle. Similarly, a vmresume from a SMM monitor that resumes the guest is also accompanied by a SMI_ACK special cycle(since this vmresume takes the machine from SMM to outside of SMM).


The next two sections cover step 0 and 1B of Dual Monitor vmx transitions.


Enabling Dual Monitor Treatment:
Intel provides a new msr (msr 0x9b – SMM_MONITOR_CTL msr) for this.  Bit 0 of this msr is the valid bit. Bits 31:12 is the physical address (4K aligned) of the monitor segment (also called MSEG) that initializes the SMM transfer vmcs. This msr can be written only in SMM mode.  Here is a sample code:
mov ecx, 0x9B
mov eax, 0x00009001
xor edx, edx ; bits 63:32 are reserved. Clear edx
wrmsr
rsm  ; get out of SMM

The valid bit is set to 1.  Bits 31:12 = 0x9 – This implies that the physical address of the MSEG segment is 0x9000. Note that the above code snippet must run in SMM (SMI handlers that are dual-monitor aware may add the above code to initialize the MSEG). 

A sampleMSEG header looks like the one shown below.  In our example, the header is at physical address 0x9000 (what we wrote in msr 0x9b).

revision_identifier              dd 0
smm_monitor_features      dd 0
gdtr_limit                           dd 
gdtr_baseoffset                 dd
cs_sel                               dd
eip_offset                         dd
esp_offset                        dd
cr3_offset                        dd

The format of the MSEG_HEADER above matches the one described in Table 26.10(vol 3b, System Management Mode, chapter 26).  Note: Depending on the version of the Intel manual, table numbers may vary – but it will be found in the SMM chapter regardless of the manual version.


Activating Dual Monitor Treatment:
After enabling the dual-monitor treatment, software can activate it by executing vmcall instruction.  This execution of vmcall is in VMX_ROOT mode.  (Execution of vmcall in vmx_non_root mode always causes a vmexit. Vmcall execution in vmx_root mode thus has a special meaning – to activate dual monitor treatment).  Here is a sample code that accomplishes this:

; enable cr4.vmxe
mov eax, 0x00002010
mov cr4, eax
; do vmxon
VMXON [vmxon_ptr]
jbe fail
;load smm transfer vmcs pointer
vmclear [vmcs_smm_ptr]
jbe fail
vmptrld [vmcs_smm_ptr]
jbe fail
; now do vmcall
vmcall


When the processor executes vmcall instruction in vmx_root mode, internally does the following:
vmcall_flow:
if (vmx_root) {
       If(dual_monitor_active) {
          Perform SMM_VMEXIT;
       } else if (SMM_MONITOR_CTL_VALID){
         Activate_Dual_Monitor_SMM_VMexit;
      }
}

In the above code snippet,   SMM_MONITOR_CTL_VALID  comes directly from bit 0 of the SMM_MONITOR_CTL_MSR (msr 0x9b).  If these conditions are not met, vmcall fails. Also note that there are additional checks vmcall performs on the SMM transfer vmcs which are not discussed here. For those details reading the manual vol 3b is recommended.  [Also looking at the vmcall pseudo-code provided in Vol 2b (under vmx instructions) is recommended].

In the process of ‘Activate_Dual_Monitor_SMM_VMexit’, the processor does the following:
a.      
En  Enters SMM (issues a SMI_ACK bus cycle)
b.      Reads the MSEG revision identifier (offset 0). If it does not match the revision identifier supported by the processor then VMCALL fails. [The MSEG revision id supported by the processor is obtained by a rdmsr of IA32_VMX_MISC_MSR (msr 0x485 – bits 63:32).
c.       Reads the MSEG features field and performs checks on that field.
d.      After all checks pass, the processor starts executing instructions from the RIP indicated in the eip_offset field of the MSEG.

Sample MSEG code:
mov eax, 0x11ff
mov ebx, VMX_ENTRY_CONTROLS
vmwrite ebx, eax

mov eax, 0x008B
mov ebx, VMX_GUEST_TR_ATTR
vmwrite ebx, eax
 
mov ebx, VMX_EXIT_INSTR_LEN
vmread eax, ebx

mov ebx, VMX_GUEST_RIP
vmread ebx, ebx

add eax, ebx
mov ebx, VMX_GUEST_RIP
vmwrite ebx, eax

vmlaunch

The code above does only the bare minimum stuff (In reality, it will initialize the entire SMM_VMCS) – It initializes the entry_controls, updates the guest_rip and does a vmlaunch.  Where does this vmlaunch take the machine?   The answer to VMX_ROOT.  This is a special type of VMentry that takes the machine back to VMX_ROOT – Intel calls this VMentry as a ‘VMentry that returns from SMM’.   Remember that the machine performed a SMM_Vmexit when VMCALL was executed in VMX_ROOT mode – So this VMLAUNCH in the MSEG code takes us back to VMX_ROOT.  At this point, we are in the executive-monitor. This completes step 1B in the dual monitor flow.