Monday, October 19, 2009
VMEXIT on INVLPG
(a) The virtual-machine is configured to vmexit on INVLPG(bit 9 of the PROCESSOR_EXECUTION_CONTROLS is 1).
(b) The virtual-machine has GS BASE = 0xFFFF8000_00000000
(c) Virtual machine executes: invlpg [gs:0-1]
(d) Execution of invlpg causes vmexit.
(e) The address of invlpg is recorded in exit-qualification. Upon a vmread of EXIT_QUALIFICATION the value obtained is:
=> FFFF7FFF_FFFFFFFF
Notice that the value recorded is a non-canonical address ie; address[63:48] != address[47]. This is the only case i have encountered where a non-canonical address shows up on the exit-qualification.
The only explanation I can come up with for this behavior is that : INVLPG unlike other instructions does not fault in 64-bit mode with a non-canonical operand. According to the instruction spec, INVLPG morphs into a NOP for such cases.
When a vmexit handler for INVLPG is written, this case must be taken into consideration(ie; a non-canonical address might show up in the exit-qualification field).
Saturday, July 25, 2009
A full blown initialization of VMCS - Assembly code
Prior to looking at the assembly code, here is a step-by-step description of what is being done:
The reader must know that:
A)this code will run only in ring0.
B)that paging is already enabled in CR0(bit 31).
(1) First Enable VMXE (bit 13) in CR4. Make sure that processor supports VMX by executing CPUID(leaf 1, ecx[5]).
(2) Intialize revision-id(msr 0x480,31:0) in the vmxon region and in the guest-vmcs region.
(3) Execute VMXON with the pointer to vmxon region. In some cases, if BIOS has not enabled bits 0, 2 of FEATURE_CONTROL_MSR (msr 0x3a) this will fail.
(4) Execute VMCLEAR with the pointer to the guest-vmcs region.
(5) Execute VMPTRLD with the pointer to the guest-vmcs region.
(6) Now initialize the guest-vmcs:
(a) First initialize the vmx controls. These include the following controls:
1. PIN_BASED
2. PROC_BASED
3. ENTRY_CONTROLS
4. EXIT_CONTROLS
(b) Next initialize the host-state and guest-state.
(c) Now do vmlaunch. If VMLAUNCH is successful, then the processor will start executing code
from the GUEST_CS:GUEST_RIP value specified in the VMCS.
Here comes the code:
////////////////////////////////////////////////////
mov eax, cr4
bts eax, 13
mov cr4, eax
mov ecx, 0x480
rdmsr
mov edx, [vmxon-ptr]
mov [edx], eax
mov edx, [guest-ptr]
mov [edx], eax
VMXON [vmxon-ptr]
jbe fail
vmclear [guest-ptr]
jbe fail
vmptrld [guest-ptr]
jbe fail
call initialize_vmx_controls
call initialize_vmx_host_guest_state
call do_vmlaunch
;ideally a hypervisor would read the VMX-MSRS
; to determine what values to write.
initialize_vmx_controls:
mov ebx, ENTRY_CONTROLS ;0x4012
mov eax, 0x11ff
vmwrite ebx, eax
mov ebx, PIN_CONTROLS; 0x4000
mov eax, 0x1f
vmwrite ebx, eax
mov ebx, PROC_CONTROLS ; 0x4002
mov eax, 0x0401E9F2
vmwrite ebx, eax
mov ebx, EXIT_CONTROLS ; 0x400C
mov eax, 0x36dff
vmwrite ebx, eax
ret
initialize_vmx_host_guest_state:
mov eax, cr3
mov ebx, HOST_CR3 ;0x6C02
mov edx, GUEST_CR3 ;0x6802
VMWRITE EBX,EAX
mov eax, pdebase_guest
VMWRITE EDX,EAX
mov ebx, HOST_RSP ;0x6c14
mov eax, tos ;top-of-stack
vmwrite ebx, eax
mov ebx, HOST_CR0 ; 0x6C00
mov eax, cr0
vmwrite ebx, eax
mov ebx, GUEST_CR0 ;0x6800
vmwrite ebx, eax
mov ebx, HOST_CR4 ; 0x6C04
mov eax, cr4
vmwrite ebx, eax
mov ebx, GUEST_CR4; 0x6804
vmwrite ebx, eax
mov ebx, HOST_CS_SEL ; 0x0c02
mov eax, cs
vmwrite ebx, eax
mov ebx,HOST_DS_SEL ; 0x0c06
mov eax, ds
vmwrite ebx, eax
mov ebx, HOST_SS_SEL ; 0x00000c04
mov eax, 0x18
vmwrite ebx, eax
mov ebx, HOST_TR_SEL; 0x00000c0c
mov eax, 0x18
vmwrite ebx, eax
mov ebx, GUEST_TR_SEL ;0x0000080e
mov eax, 0x18
vmwrite ebx, eax
mov ebx, GUEST_TR_ATTR ;0x00004822
mov eax, 0x8b
vmwrite ebx, eax
mov ebx, GUEST_TR_LIMIT ;0x0000480e
mov eax, 0xff
vmwrite ebx, eax
mov ebx, GUEST_LDTR_ATTR ;0x00004820
mov eax, 0x00010000
vmwrite ebx, eax
mov ebx, GUEST_SS_ATTR ;0x00004818
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_DS_ATTR ;0x0000481a
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_ES_ATTR ;0x00004814
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_FS_ATTR ;0x0000481c
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_GS_ATTR ;0x0000481e
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_SS_LIMIT ;0x00004804
mov eax, 0xffffffff
vmwrite ebx, eax
mov ebx, GUEST_DS_LIMIT ;0x00004806
vmwrite ebx, eax
mov ebx, GUEST_ES_LIMIT ;0x00004800
vmwrite ebx, eax
mov ebx, GUEST_FS_LIMIT ;0x00004808
vmwrite ebx, eax
mov ebx, GUEST_GS_LIMIT ;0x0000480a
vmwrite ebx, eax
mov ebx, LINK_PTR_FULL ;0x00002800
vmwrite ebx, eax
mov ebx, VMS_LINK_PTR_HIGH ;0x00002801
vmwrite ebx, eax
mov ebx, GUEST_GDTR_BASE ;0x00006816
mov eax, gdt32t
vmwrite ebx, eax
mov ebx, HOST_GDTR_BASE ;0x00006c0c
vmwrite ebx, eax
ov ebx, GUEST_CS_LIMIT ;0x00004802
mov eax, 0xffffffff
vmwrite ebx, eax
mov ebx, GUEST_CS_ATTR ;0x00004816
mov eax, 0xc09b
vmwrite ebx, eax
mov ebx, GUEST_RSP ;0x0000681c
mov eax, tos
vmwrite ebx, eax
mov ebx, GUEST_IDTR_BASE ;0x00006818
mov eax, idt32t
vmwrite ebx, eax
mov ebx, HOST_IDTR_BASE ;0x00006c0e
vmwrite ebx, eax
mov ebx, GUEST_CS_SEL ;0x00000802
mov eax, guest_sel
vmwrite ebx, eax
mov ebx, GUEST_CS_BASE ;0x00006808
mov eax, guest_base
vmwrite ebx, eax
mov ebx, GUEST_RIP ;0x0000681e
mov eax, 0
vmwrite ebx, eax
mov ebx, HOST_RIP ;0x00006c16
mov eax, after_vmexit
vmwrite ebx, eax
mov ebx, GUEST_RFLAGS ;0x00006820
mov eax, 2
vmwrite ebx, eax
mov ebx, EXCEPTION_BITMAP ;0x4004
mov eax,0xdeadfeef
vmwrite ebx, eax
ret
do_vmlaunch:
VMLAUNCH
after_vmexit:
;read EXIT_REASON and figure out what caused the vmexit.
///////////////////////////////////////////////////////////////
The HOST_RIP is where control is transferred after a vmexit. The hypervisor can determine the appropriate course of action by reading the vmexit fields from the vmcs.
Wednesday, June 17, 2009
VMCS Guest State Area
Wednesday, May 6, 2009
VMX EXECUTION_CONTROLS
a) PIN_BASED execution controls
b) PROC_BASED execution controls
PIN_BASED Controls:
The vmcs encoding for this field is 0x4000. There are 2 bits in this 32-bit field that are interesting:
Bit 0 – External Interrupt Exiting
Bit 3 – NMI Exiting
After launching a vmx-guest, when an external interrupt is received in the guest and Bit0 is 1 then there is a vmexit due to external interrupt.
Bit3 setting controls the behavior of the processor in response to a NMI while running as vmx-guest. If bit3==1 and a nmi is received in the guest a vmexit occurs.
The other bits are reserved. The settings of the reserved bits(0 or 1) are obtained by reading msr 0x481. To initialize this field in the vmcs:
xor eax,eax
xor edx, edx
mov ecx, 0x481
rdmsr
or eax, edx ; it has the valid vector to be written into the vmcs.
bts eax, 0 ; set bit0 to vmexit due to interrupts
bts eax, 3; nmi exiting bit = 1
mov ebx, 0x4000 ; encoding for entry controls
vmwrite ebx, eax
PROC_BASED Controls:
The vmcs encoding for this field is 0x4002. It is a 32 bit field that determines the behavior of the processor when certain instructions are executed in the vmx-guest.
For eg:
Bit7 of this vector controls the processor behavior upon execution of the HLT instruction in vmx-guest. If 1 , execution of HLT will cause a vmexit. If 0, the instruction will be executed normally without any vmexit.
Similarly bit9 controls the behavior of the processor on INVLPG, bit19 controls the behavior on mov-to-cr8 and bit20 controls the behavior on mov-from-cr8 etc.
The bit positions are described below:
INTRWINDOW 2
TSCOFFSET 3
HLT 7
INVLPG 9
MWAIT 10
RDPMC 11
RDTSC 12
CR8LOAD 19
CR8STORE 20
TPRSHADOW 21
MOVDR 23
IOUNCOND 24
IOBITMAP 25
MSRBITMAP 28
MONITOR 29
PAUSE 30
MSR 0x482 indicates the allowed-0 and allowed-1 settings of these controls.
Note: Newer processors have additional bits defined for these controls. For more details see PRM Vol 3b.
Monday, May 4, 2009
VMX Exit Control fields
This field is used by the processor during a vmexit. This is a 32-bit field (just like the entry controls) but only 2 bits are defined:
Bit 9 – Host address space – This value is loaded to EFER.LME and CS.L on a vmexit.
Bit 15 – Acknowledge Interrupt on Exit – If there is a vmexit due to interrupt this bit determines whether the interrupt is acknowledged or not. The interrupt vector is recorded in the vmcs.
All other bits are reserved. They are either 0s or 1s as determined by the EXIT_CTLS_MSR (msr 0x483).
EXIT_CONTROL FOR MSR:
This is exactly similar to ENTRY_CONTROL FOR MSR. The only difference is in the vmcs encodings . They are tabulated below:
EXIT_MSR_STORE_ADDR EQU 0x2006
EXIT_MSR_STORE_COUNT EQU 0x400E
The Guest MSRS are saved in the MSR store area during a vmexit. On a subsequent VMEntry, these MSRS will be loaded from the same area.
EXIT_MSR_LOAD_ADDR EQU 0x2008
EXIT_MSR_LOAD_COUNT EQU 0x4010
The Host MSRs are loaded from the physical address specified in EXIT_MSR_LOAD_ADDR.
The format of the msr-load/msr-store areas is exactly similar to the msr-load area that is used for vmentry.
VMX Entry Control fields
Control fields are of 3 types:
a) Entry Control fields
b) Exit Control fields
c) Execution Control fields.
Entry Control fields:
Used during VMEntry (Vmentry is the process by which CPU transitions from HOST state to the Guest state).
VMENTRY_CONTROLS:
This is a 32-bit field that sets up some critical information that is used by the processor during vmentry. Most of the fields in this 32-bit field is reserved.
Among the bits that are defined, the following 3 are interesting:
bit 9 - Guest is in long mode
bit 10 - Guest is in SMM
bit 11 - Deactivate Dual monitor treatment
For normal vmentries, bit 10 and bit 11 are always 0. Bit 9 can be 0 or 1 depending on whether the guest is in long-mode or protected mode.
Note:
(A) If a guest will be in compatibility-mode , bit 9 must be set to 1. When the processor loads state during Vmentry, if GUEST_CS.L bit is 0 and bit 9 of entry_control is 1 , then the guest will be in compatibility-mode after vmentry.
(B) During Vmentry the value of bit 9 is copied into EFER.LME. Since CR0.PG is fixed to 1, the value also propagates to EFER.LMA.
Sample code to set up entry controls:
To set up this field, software should consult msr 0x484 and extract the allowed-0 and allowed-1 settings of this field.
xor eax,eax
xor edx, edx
mov ecx, 0x484
rdmsr
or eax, edx ; it has the valid vector to be written into the vmcs.
btr eax, 10 ; clear the SMM bit
btr eax, 11 ; clear the deactivate dual monitor bit
mov rbx, 0x4012 ; encoding for entry controls
vmwrite rbx, rax
VMENTRY_CONTROL_MSR:
This field is used when msrs are to be loaded as part of vmentry. This is sometimes required for the hypervisor to present the guest with a msr value different than the host-value.
Sample code:
%define MSR_LOAD_ADDR EQU 0x200a
%define MSR_LOAD_COUNT EQU 0x4014
mov rax,
mov rbx, MSR_LOAD_ADDR
vmwrite rbx, rax
mov rax, 1
mov rbx, MSR_LOAD_COUNT
vmwrite rbx, rax
my_msr_address:
dd
dd 0
dd msr_data_lo
dd msr_data_hi
Note:
my_msr_address is the Physical Address of the msr-load area in memory.
The layout of my_msr_address must match the layout described above. my_msr_address must be 16B aligned.
VMENTRY_CONTROL_EVENT_INJECTION:
This field is used when delivering an event/exception to the guest during vmentry. For eg: If the hypervisor wants the control to be transferred to the guest_GP handler, it would do the following:
mov rax, 0x4016; vmcs encoding
mov rbx, 0x80000B0D ; bits 10:8 = 3 -> HW exception, bits 7:0 = 0x0d (vector 13)
vmwrite rax, rbx
Vol 3b has more details on this vmcs field. The hypervisor might use this technique to handle a vmexit from the guest due to an exception.
Initializing the VMCS
(a) Host Area
(b) Guest Area
(c) VMX Control fields
(d) VMX Exit Information fields
Each VMCS field is identified by an encoding which is used by the processor to write into the appropriate place in the vmcs.
Host Area:
Host selector fields:
------------------------
Host ES selector 0xC00
Host CS selector 0xC02
Host SS selector 0xC04
Host DS selector 0xC06
Host FS selector 0xC08
Host GS selector 0xC0A
Host TR selector 0xC0C
As an example, say the hypervisor wants to initialize the Task register selector with a value of 0x18:
mov rax, 0x0C0C
mov rbx, 0x18
vmwrite rbx, rax
To read a value from the vmcs, vmread is used:
mov rax, 0x0C0C
vmread rcx, rax ; Read from Host TR selector
Other Host state fields:
Host CR0 0x6C00
Host CR3 0x6C02
Host CR4 0x6C04
Host FS base 0x6C06
Host GS base 0x6C08
Host TR base 0x6C0A
Host GDTR base 0x6C0C
Host IDTR base 0x6C0E
Host IA32_SYSENTER_ESP 0x6C10
Host IA32_SYSENTER_EIP 0x6C12
Host RSP 0x6C14
Host RIP 0x6C16
As an example to write to host_cr0 in the vmcs, the following code snippet may be used:
mov rbx, cr0
mov rax, 0x6c00 ; encoding for host CR0
vmwrite rax,rbx
Similarly the other host state fields are to be intialized. For a complete list of the vmcs fields see Intel PRM Vol 3b .
Guest Area
The technique to intialize guest state area is the same as the host-state area. Hypervisors use vmwrite instruction to initialize the guest-state area. The encodings used as operands to the vmwrite instruction reflect the guest-state encodings. Here are few examples:
Guest CR0 0x6800
Guest CR3 0x6802
Guest CR4 0x6804
Guest ES base 0x6806
Guest CS base 0x6808
Follow the same approach as before to write to these vmcs fields. For eg: to write a guest CR4 value that has PAE=1, PGE=1, OSFXSR=1,OSXMMEXCPT=1 do the following:
mov rbx, 0x6A0 ; required value in cr4
mov rax, 0x6804 ; GUEST_CR4 encoding
vmwrite rax, rbx
A similar approach is adopted for intializing other GUEST_STATE fields.
Thursday, April 23, 2009
VMPTRLD - Load VMCS pointer
vmptrld [vmcs_ptr]
vmcs_ptr dq vmcs_region
vmcs_region:
rev_id dd 0
As with vmxon, the revision id of the vmcs_region should be updated with the revision-id supported by the processor (contained in msr 0x480) prior to executing vmptrld. As with vmxon, the vmcs_region must be located on a 4K boundary.
The only other thing worth mentioning is if you try to load the vmxon_ptr as an operand to vmptrld, then execution of vmptrld will fail. Meaning, a code sequence like the one shown below is guaranteed to fail vmptrld:
vmxon [vmxon_ptr]
jbe vmxon_failed
vmptrld [vmxon_ptr]
jbe vmptrld_failed
When the processor executes vmptrld, it realizes that vmptrld's pointer points to the same region as vmxon. This will cuase vmptrld to fail.
It may also be a good practice to execute vmclear before executing vmptrld to load the vmcs-pointer. So the hypervisor may want to do this:
vmclear [vmcs_ptr]
jbe vmclear_failed
vmptrld [vmcs_ptr]
jbe vmptrld_failed
At this point we have executed vmxon, entered VMX_ROOT mode, initialized the virtual-machine-vmcs with vmclear and loaded the virtual-machine-vmcs pointer into the processor by executing vmptrld. The next step is to initialize the vmcs with the virtual-machine's (hence forth referred to as guest) data and then launch the guest.
Wednesday, April 22, 2009
More on VMXON
The code may look like this:
vmxon [vmxon_ptr]
vmxon_ptr dq vmxon_region_begin
vmxon_region_begin: vmxon_rev_id dd 0
Key things to note in the above snippet:
a. The operand to vmxon is vmxon_ptr. vmxon_ptr is a pointer to the vmxon_region. Note: vmxon_region is in physical memory.
b. The vmxon_region contains a 4-byte field called 'rev_id'. The hypervisor is expected to set up the revision-id in the vmxon-region provided by the processor.
How does the hypervisor determine the revision-id?
On Intel processors, the revision-id is contained in VMX_BASIC_MSR (0x480). Bits 31:0 of this MSR contains the revision-id of the processor.
So, for the example above, the hypervisor may want to do:
///////////////////////////////////////////////////
xor eax, eax
xor edx, edx
mov ecx, 0x480
rdmsr ; after rdmsr eax has the revision id
mov dword [vmxon_rev_id], eax ; write the rev-id into vmxon region
vmxon [vmxon_ptr]
////////////////////////////////////////////////
Note: Intel PRM also specifies that the vmxon_region must be aligned on a 4K boundary. If it is not 4K aligned , VMXON is guaranteed to fail.
It is worth repeating that the operand to vmxon is a pointer to the vmxon_region which is in physical memory. Hence vmxon regions should reside in unpaged memory.
Successful completion of vmxon will cause the processor to enter the VMX_ROOT operation.
Hypervisors must also check to see if the execution of vmxon was successful. That can be done by checking the state of eflags.ZF and eflags.CF. If eflags.ZF =0 and eflags.CF=0 then vmxon was successful.
Continuing our previous code:
///////////////////////////////////////////////////
xor eax, eax
xor edx, edx
mov ecx, 0x480
rdmsr
mov dword [vmxon_rev_id], eax
vmxon [vmxon_ptr]
jbe vmxon_failed
vmxon_pass:
if i am here then eflags.ZF=0 and eflags.CF=0. So vmxon was successful.
vmxon_failed:
either eflags.ZF=1 or eflags.CF=1
handle failed code here
//////////////////////////////////////////////////
First look at VMXON
1. Hypervisor must turn on the CR4.VMXE bit. The VMX enable (VMXE) bit is bit 13 of CR4. A typical code sequence would be:
mov eax, cr4
or eax, 0x2000
mov cr4, eax
Executing VMXON without CR4.VMXE=1 will cause the processor to generate a #UD(undefined opcode) exception.
2. Hypervisor must set the fixed bits of CR0. CR0.NE,PG and PE are all fixed bits in vmx operation and they should always be 1 as long as the processor is in VMX_ROOT operation.
Any attempt to clear the fixed bits of CR0 after executing vmxon will cause the processor to generate a #GP exception.
3.A20M: A20M# must be off prior to the execution of vmxon. (violating this will result in a #GP).
4. The hypervisor must ensure that prior to execution of vmxon , the processor is not in V86 mode(eflags.vm must be 0) or in compatibility mode(efer.lma && !cs.l must be false).
The above 4 conditions must be satisfied for vmxon to work. (For it to be successful few other things need to be done).
Note that:
Assertions of INIT# will not be recoginzed by the cpu after the execution of vmxon.I think INIT# just stays pending until it gets unblocked.
VMX instructions in x86
To enable virtual machine architecture, Intel provides new instructions as part of their Virtual Machine Extensions(abbreviated VMX) instruction set. This instruction set is different from the one that AMD provides for SVM.
Here is a quick look at the instructions:
(a) VMXON - enter vmx operation
(b) VMXOFF - leave vmx operation
(c) VMREAD - read from the vmcs (vmcs will be discussed later)
(d) VMWRITE - write to the vmcs
(e) VMPTRLD - load vmcs pointer
(f) VMPTRST - store vmcs pointer
(g) VMLAUNCH/VMRESUME - launch or resume virtual machine
(h) VMCALL - call to the hypervisor
Processor/Firmware settings for VMX:
1. To make sure your processor supports VMX, execute CPUID with eax=1 (leaf 1) and check for bit 5 of ecx. If the bit is set the CPU supports VMX else it is not supported.
2. In addition to the above the BIOS must enable VMX by a write to the FEATURE_CONTROL_MSR (address 0x3a). If the msr value is initialized to 0x5 (bit0=1 and bit2=1), then vmx is enabled.
Bit 0 of the msr is the lock bit. If set, the msr is protected. This means the processor will throw a #GP exception when a wrmsr is attempted with the lock bit = 1. Bit 2 is the VMXON_ENABLE bit. Executing VMXON without bit2 set will cause the processor to generate a #GP exception.