Monday, October 19, 2009

VMEXIT on INVLPG

A boundary case observed on Intel Merom:

(a) The virtual-machine is configured to vmexit on INVLPG(bit 9 of the PROCESSOR_EXECUTION_CONTROLS is 1).

(b) The virtual-machine has GS BASE = 0xFFFF8000_00000000

(c) Virtual machine executes: invlpg [gs:0-1]

(d) Execution of invlpg causes vmexit.

(e) The address of invlpg is recorded in exit-qualification. Upon a vmread of EXIT_QUALIFICATION the value obtained is:
=> FFFF7FFF_FFFFFFFF


Notice that the value recorded is a non-canonical address ie; address[63:48] != address[47]. This is the only case i have encountered where a non-canonical address shows up on the exit-qualification.

The only explanation I can come up with for this behavior is that : INVLPG unlike other instructions does not fault in 64-bit mode with a non-canonical operand. According to the instruction spec, INVLPG morphs into a NOP for such cases.

When a vmexit handler for INVLPG is written, this case must be taken into consideration(ie; a non-canonical address might show up in the exit-qualification field).

Saturday, July 25, 2009

A full blown initialization of VMCS - Assembly code

The code below will outline the general steps prior to executing a VMLAUNCH or VMRESUME.
Prior to looking at the assembly code, here is a step-by-step description of what is being done:

The reader must know that:
A)this code will run only in ring0.
B)that paging is already enabled in CR0(bit 31).

(1) First Enable VMXE (bit 13) in CR4. Make sure that processor supports VMX by executing CPUID(leaf 1, ecx[5]).

(2) Intialize revision-id(msr 0x480,31:0) in the vmxon region and in the guest-vmcs region.

(3) Execute VMXON with the pointer to vmxon region. In some cases, if BIOS has not enabled bits 0, 2 of FEATURE_CONTROL_MSR (msr 0x3a) this will fail.

(4) Execute VMCLEAR with the pointer to the guest-vmcs region.

(5) Execute VMPTRLD with the pointer to the guest-vmcs region.

(6) Now initialize the guest-vmcs:
(a) First initialize the vmx controls. These include the following controls:
1. PIN_BASED
2. PROC_BASED
3. ENTRY_CONTROLS
4. EXIT_CONTROLS

(b) Next initialize the host-state and guest-state.

(c) Now do vmlaunch. If VMLAUNCH is successful, then the processor will start executing code
from the GUEST_CS:GUEST_RIP value specified in the VMCS.


Here comes the code:
////////////////////////////////////////////////////
mov eax, cr4
bts eax, 13
mov cr4, eax

mov ecx, 0x480
rdmsr
mov edx, [vmxon-ptr]
mov [edx], eax
mov edx, [guest-ptr]
mov [edx], eax

VMXON [vmxon-ptr]
jbe fail

vmclear [guest-ptr]
jbe fail

vmptrld [guest-ptr]
jbe fail


call initialize_vmx_controls
call initialize_vmx_host_guest_state
call do_vmlaunch

;ideally a hypervisor would read the VMX-MSRS
; to determine what values to write.
initialize_vmx_controls:
mov ebx, ENTRY_CONTROLS ;0x4012
mov eax, 0x11ff
vmwrite ebx, eax
mov ebx, PIN_CONTROLS; 0x4000
mov eax, 0x1f
vmwrite ebx, eax
mov ebx, PROC_CONTROLS ; 0x4002
mov eax, 0x0401E9F2
vmwrite ebx, eax
mov ebx, EXIT_CONTROLS ; 0x400C
mov eax, 0x36dff
vmwrite ebx, eax
ret


initialize_vmx_host_guest_state:
mov eax, cr3
mov ebx, HOST_CR3 ;0x6C02
mov edx, GUEST_CR3 ;0x6802
VMWRITE EBX,EAX
mov eax, pdebase_guest
VMWRITE EDX,EAX

mov ebx, HOST_RSP ;0x6c14
mov eax, tos ;top-of-stack
vmwrite ebx, eax

mov ebx, HOST_CR0 ; 0x6C00
mov eax, cr0
vmwrite ebx, eax
mov ebx, GUEST_CR0 ;0x6800
vmwrite ebx, eax
mov ebx, HOST_CR4 ; 0x6C04
mov eax, cr4
vmwrite ebx, eax
mov ebx, GUEST_CR4; 0x6804
vmwrite ebx, eax
mov ebx, HOST_CS_SEL ; 0x0c02
mov eax, cs
vmwrite ebx, eax
mov ebx,HOST_DS_SEL ; 0x0c06
mov eax, ds
vmwrite ebx, eax
mov ebx, HOST_SS_SEL ; 0x00000c04
mov eax, 0x18
vmwrite ebx, eax
mov ebx, HOST_TR_SEL; 0x00000c0c
mov eax, 0x18
vmwrite ebx, eax
mov ebx, GUEST_TR_SEL ;0x0000080e
mov eax, 0x18
vmwrite ebx, eax
mov ebx, GUEST_TR_ATTR ;0x00004822
mov eax, 0x8b
vmwrite ebx, eax
mov ebx, GUEST_TR_LIMIT ;0x0000480e
mov eax, 0xff
vmwrite ebx, eax
mov ebx, GUEST_LDTR_ATTR ;0x00004820
mov eax, 0x00010000
vmwrite ebx, eax
mov ebx, GUEST_SS_ATTR ;0x00004818
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_DS_ATTR ;0x0000481a
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_ES_ATTR ;0x00004814
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_FS_ATTR ;0x0000481c
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_GS_ATTR ;0x0000481e
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_SS_LIMIT ;0x00004804
mov eax, 0xffffffff
vmwrite ebx, eax
mov ebx, GUEST_DS_LIMIT ;0x00004806
vmwrite ebx, eax
mov ebx, GUEST_ES_LIMIT ;0x00004800
vmwrite ebx, eax
mov ebx, GUEST_FS_LIMIT ;0x00004808
vmwrite ebx, eax
mov ebx, GUEST_GS_LIMIT ;0x0000480a
vmwrite ebx, eax
mov ebx, LINK_PTR_FULL ;0x00002800
vmwrite ebx, eax
mov ebx, VMS_LINK_PTR_HIGH ;0x00002801
vmwrite ebx, eax
mov ebx, GUEST_GDTR_BASE ;0x00006816
mov eax, gdt32t
vmwrite ebx, eax
mov ebx, HOST_GDTR_BASE ;0x00006c0c
vmwrite ebx, eax
ov ebx, GUEST_CS_LIMIT ;0x00004802
mov eax, 0xffffffff
vmwrite ebx, eax
mov ebx, GUEST_CS_ATTR ;0x00004816
mov eax, 0xc09b
vmwrite ebx, eax
mov ebx, GUEST_RSP ;0x0000681c
mov eax, tos
vmwrite ebx, eax
mov ebx, GUEST_IDTR_BASE ;0x00006818
mov eax, idt32t
vmwrite ebx, eax
mov ebx, HOST_IDTR_BASE ;0x00006c0e
vmwrite ebx, eax
mov ebx, GUEST_CS_SEL ;0x00000802
mov eax, guest_sel
vmwrite ebx, eax
mov ebx, GUEST_CS_BASE ;0x00006808
mov eax, guest_base
vmwrite ebx, eax
mov ebx, GUEST_RIP ;0x0000681e
mov eax, 0
vmwrite ebx, eax
mov ebx, HOST_RIP ;0x00006c16
mov eax, after_vmexit
vmwrite ebx, eax
mov ebx, GUEST_RFLAGS ;0x00006820
mov eax, 2
vmwrite ebx, eax
mov ebx, EXCEPTION_BITMAP ;0x4004
mov eax,0xdeadfeef
vmwrite ebx, eax
ret

do_vmlaunch:
VMLAUNCH

after_vmexit:
;read EXIT_REASON and figure out what caused the vmexit.

///////////////////////////////////////////////////////////////

The HOST_RIP is where control is transferred after a vmexit. The hypervisor can determine the appropriate course of action by reading the vmexit fields from the vmcs.

Wednesday, June 17, 2009

VMCS Guest State Area

The VMCS has an area for guest register state and guest non-register state. The guest register state contains fields for control registers(CR0,CR3,CR4), segment registers(cs,ds etc) , segment selectors, segment attributes(also called ARBytes),segment-limits, debug-register(dr7) etc.

The guest non-register state has the following fields:

(a) VMCS GUEST ACTIVITY STATE: This is a 32 bit field that describes the state of the guest. Intel manual defines only the first four bits in this field. The others are marked as reserved. The four bits are as given under:
(i) Bit 0 - If this bit is 1 , the guest is in Active State .
(ii) Bit 1 - If this bit is 1 , the guest is in HLT state.
(iii) Bit 2 - If this bit is 1 , the guest is in Shutdown state.
(iv) Bit 3 - If this bit is 1 , the guest is in Wait-for-SIPI state.

Here is a scenario how the vmcs might indicate a guest in halt state:

The guest executes HLT and the execution of HLT does not cause a vmexit(assume proc_based_ctl[hlt] is 0). At this point the guest is in HLT state. Now an interrupt arrives and the processing of the interrupt causes a vmexit(assume pin_based_ctl for intr is 1). The vmexit causes the processor to update the GUEST ACTIVITY STATE in the VMCS as HLT. A subsequent VMRESUME will read this field from the vmcs and launch the guest in HLT state.

(b) VMCS GUEST INTR INFO : The INTR INFO describes the interruptibility state of the guest. Four bits are defined in this field.
(i) Bit 0 - Guest has STI blocking active
(ii) Bit 1 - Guest has MOV-SS/POP-SS active
(iii) Bit 2 - Blocking by SMI
(iv) Bit 3 - Blocking by NMI

Here is a scenario where the vmcs might indicate STI blocking:

Guest executes STI. Following STI, guest executes HLT which vmexits (assume proc_based_ctl[hlt] is 1). The vmexit due to hlt will cause the processor to update the interruptibility-info with STI blocking. A subsequent VMRESUME will put the guest back in STI blocking state. In this case, during a vmresume, the processor also checks the EFLAGS.IF = 1 and will fail VMEntry if it detects the following condition:

if(sti_blocking==TRUE && EFLAGS.IF==0) {
fail_vmentry();
}

(c)Pending Debug Exceptions Field: This field in the VMCS indicates if there are any debug exceptions that are pending in the guest. The meaningful bits in this field are:

Bit 12 -> Enabled Breakpoint
Bit 14 -> Single Step
Bits 3-0 -> Correspond to B0,B1,B2,B3 (meaningful only if bit 12 is 1).

Here is a scenario how the guest may end up with a single-step exception pending during a vmexit:

pushfd ; push flags
or dword [esp], 0x100 ; set eflags.TF
popfd ; pop flags now eflags.TF=1
mov ss, ax ; will delay recognition of single step until the end of the next instruction
vmcall ; cause vmexit

In the above code, at the time of vmexit , single-step is pending (it is delayed because of mov-ss blocking). This vmexit will cause the processor to record a single-step pending in the debug-exceptions field of the vmcs. Incidentally, the above vmexit will also cause the processor to record mov-ss blocking in the interruptibility field.

Wednesday, May 6, 2009

VMX EXECUTION_CONTROLS

Two types of execution controls are defined:
a) PIN_BASED execution controls
b) PROC_BASED execution controls



PIN_BASED Controls:
The vmcs encoding for this field is 0x4000. There are 2 bits in this 32-bit field that are interesting:
Bit 0 – External Interrupt Exiting
Bit 3 – NMI Exiting
After launching a vmx-guest, when an external interrupt is received in the guest and Bit0 is 1 then there is a vmexit due to external interrupt.
Bit3 setting controls the behavior of the processor in response to a NMI while running as vmx-guest. If bit3==1 and a nmi is received in the guest a vmexit occurs.
The other bits are reserved. The settings of the reserved bits(0 or 1) are obtained by reading msr 0x481. To initialize this field in the vmcs:


xor eax,eax
xor edx, edx
mov ecx, 0x481
rdmsr
or eax, edx ; it has the valid vector to be written into the vmcs.
bts eax, 0 ; set bit0 to vmexit due to interrupts
bts eax, 3; nmi exiting bit = 1
mov ebx, 0x4000 ; encoding for entry controls
vmwrite ebx, eax



PROC_BASED Controls:
The vmcs encoding for this field is 0x4002. It is a 32 bit field that determines the behavior of the processor when certain instructions are executed in the vmx-guest.
For eg:
Bit7 of this vector controls the processor behavior upon execution of the HLT instruction in vmx-guest. If 1 , execution of HLT will cause a vmexit. If 0, the instruction will be executed normally without any vmexit.


Similarly bit9 controls the behavior of the processor on INVLPG, bit19 controls the behavior on mov-to-cr8 and bit20 controls the behavior on mov-from-cr8 etc.

The bit positions are described below:

INTRWINDOW 2
TSCOFFSET 3
HLT 7
INVLPG 9
MWAIT 10
RDPMC 11
RDTSC 12
CR8LOAD 19
CR8STORE 20
TPRSHADOW 21
MOVDR 23
IOUNCOND 24
IOBITMAP 25
MSRBITMAP 28
MONITOR 29
PAUSE 30



MSR 0x482 indicates the allowed-0 and allowed-1 settings of these controls.

Note: Newer processors have additional bits defined for these controls. For more details see PRM Vol 3b.

Monday, May 4, 2009

VMX Exit Control fields

EXIT_CONTROLS:
This field is used by the processor during a vmexit. This is a 32-bit field (just like the entry controls) but only 2 bits are defined:
Bit 9 – Host address space – This value is loaded to EFER.LME and CS.L on a vmexit.
Bit 15 – Acknowledge Interrupt on Exit – If there is a vmexit due to interrupt this bit determines whether the interrupt is acknowledged or not. The interrupt vector is recorded in the vmcs.
All other bits are reserved. They are either 0s or 1s as determined by the EXIT_CTLS_MSR (msr 0x483).


EXIT_CONTROL FOR MSR:
This is exactly similar to ENTRY_CONTROL FOR MSR. The only difference is in the vmcs encodings . They are tabulated below:

EXIT_MSR_STORE_ADDR EQU 0x2006
EXIT_MSR_STORE_COUNT EQU 0x400E
The Guest MSRS are saved in the MSR store area during a vmexit. On a subsequent VMEntry, these MSRS will be loaded from the same area.


EXIT_MSR_LOAD_ADDR EQU 0x2008
EXIT_MSR_LOAD_COUNT EQU 0x4010
The Host MSRs are loaded from the physical address specified in EXIT_MSR_LOAD_ADDR.
The format of the msr-load/msr-store areas is exactly similar to the msr-load area that is used for vmentry.

VMX Entry Control fields

VMX Control fields

Control fields are of 3 types:
a) Entry Control fields
b) Exit Control fields
c) Execution Control fields.

Entry Control fields:
Used during VMEntry (Vmentry is the process by which CPU transitions from HOST state to the Guest state).


VMENTRY_CONTROLS:
This is a 32-bit field that sets up some critical information that is used by the processor during vmentry. Most of the fields in this 32-bit field is reserved.
Among the bits that are defined, the following 3 are interesting:
bit 9 - Guest is in long mode
bit 10 - Guest is in SMM
bit 11 - Deactivate Dual monitor treatment
For normal vmentries, bit 10 and bit 11 are always 0. Bit 9 can be 0 or 1 depending on whether the guest is in long-mode or protected mode.

Note:

(A) If a guest will be in compatibility-mode , bit 9 must be set to 1. When the processor loads state during Vmentry, if GUEST_CS.L bit is 0 and bit 9 of entry_control is 1 , then the guest will be in compatibility-mode after vmentry.

(B) During Vmentry the value of bit 9 is copied into EFER.LME. Since CR0.PG is fixed to 1, the value also propagates to EFER.LMA.

Sample code to set up entry controls:

To set up this field, software should consult msr 0x484 and extract the allowed-0 and allowed-1 settings of this field.
xor eax,eax
xor edx, edx
mov ecx, 0x484
rdmsr
or eax, edx ; it has the valid vector to be written into the vmcs.
btr eax, 10 ; clear the SMM bit
btr eax, 11 ; clear the deactivate dual monitor bit
mov rbx, 0x4012 ; encoding for entry controls
vmwrite rbx, rax



VMENTRY_CONTROL_MSR:
This field is used when msrs are to be loaded as part of vmentry. This is sometimes required for the hypervisor to present the guest with a msr value different than the host-value.


Sample code:
%define MSR_LOAD_ADDR EQU 0x200a
%define MSR_LOAD_COUNT EQU 0x4014
mov rax,
mov rbx, MSR_LOAD_ADDR
vmwrite rbx, rax
mov rax, 1
mov rbx, MSR_LOAD_COUNT
vmwrite rbx, rax
my_msr_address:
dd
dd 0
dd msr_data_lo
dd msr_data_hi
Note:
my_msr_address is the Physical Address of the msr-load area in memory.
The layout of my_msr_address must match the layout described above. my_msr_address must be 16B aligned.



VMENTRY_CONTROL_EVENT_INJECTION:
This field is used when delivering an event/exception to the guest during vmentry. For eg: If the hypervisor wants the control to be transferred to the guest_GP handler, it would do the following:


mov rax, 0x4016; vmcs encoding
mov rbx, 0x80000B0D ; bits 10:8 = 3 -> HW exception, bits 7:0 = 0x0d (vector 13)
vmwrite rax, rbx



Vol 3b has more details on this vmcs field. The hypervisor might use this technique to handle a vmexit from the guest due to an exception.

Initializing the VMCS

Software initializes the vmcs by using the vmwrite instruction. It can read the value from the vmcs using the vmread instruction. The VMCS is divided into four areas:

(a) Host Area
(b) Guest Area
(c) VMX Control fields
(d) VMX Exit Information fields


Each VMCS field is identified by an encoding which is used by the processor to write into the appropriate place in the vmcs.

Host Area:

Host selector fields:
------------------------
Host ES selector 0xC00
Host CS selector 0xC02
Host SS selector 0xC04
Host DS selector 0xC06
Host FS selector 0xC08
Host GS selector 0xC0A
Host TR selector 0xC0C
As an example, say the hypervisor wants to initialize the Task register selector with a value of 0x18:
mov rax, 0x0C0C
mov rbx, 0x18
vmwrite rbx, rax


To read a value from the vmcs, vmread is used:
mov rax, 0x0C0C
vmread rcx, rax ; Read from Host TR selector


Other Host state fields:
Host CR0 0x6C00
Host CR3 0x6C02
Host CR4 0x6C04
Host FS base 0x6C06
Host GS base 0x6C08
Host TR base 0x6C0A
Host GDTR base 0x6C0C
Host IDTR base 0x6C0E
Host IA32_SYSENTER_ESP 0x6C10
Host IA32_SYSENTER_EIP 0x6C12
Host RSP 0x6C14
Host RIP 0x6C16


As an example to write to host_cr0 in the vmcs, the following code snippet may be used:
mov rbx, cr0
mov rax, 0x6c00 ; encoding for host CR0
vmwrite rax,rbx



Similarly the other host state fields are to be intialized. For a complete list of the vmcs fields see Intel PRM Vol 3b .


Guest Area
The technique to intialize guest state area is the same as the host-state area. Hypervisors use vmwrite instruction to initialize the guest-state area. The encodings used as operands to the vmwrite instruction reflect the guest-state encodings. Here are few examples:


Guest CR0 0x6800
Guest CR3 0x6802
Guest CR4 0x6804
Guest ES base 0x6806
Guest CS base 0x6808


Follow the same approach as before to write to these vmcs fields. For eg: to write a guest CR4 value that has PAE=1, PGE=1, OSFXSR=1,OSXMMEXCPT=1 do the following:

mov rbx, 0x6A0 ; required value in cr4
mov rax, 0x6804 ; GUEST_CR4 encoding
vmwrite rax, rbx

A similar approach is adopted for intializing other GUEST_STATE fields.

Thursday, April 23, 2009

VMPTRLD - Load VMCS pointer

vmptrld will load the vmcs pointer for the virtual-machine to be launched. The vmcs stands for Virtual Machine Control Structure. The vmcs is a region in memory which holds all the data for the virtual-machine to be launched. The instruction usage is similar to vmxon:

vmptrld [vmcs_ptr]
vmcs_ptr dq vmcs_region

vmcs_region:
rev_id dd 0

As with vmxon, the revision id of the vmcs_region should be updated with the revision-id supported by the processor (contained in msr 0x480) prior to executing vmptrld. As with vmxon, the vmcs_region must be located on a 4K boundary.


The only other thing worth mentioning is if you try to load the vmxon_ptr as an operand to vmptrld, then execution of vmptrld will fail. Meaning, a code sequence like the one shown below is guaranteed to fail vmptrld:
vmxon [vmxon_ptr]
jbe vmxon_failed
vmptrld [vmxon_ptr]
jbe vmptrld_failed

When the processor executes vmptrld, it realizes that vmptrld's pointer points to the same region as vmxon. This will cuase vmptrld to fail.

It may also be a good practice to execute vmclear before executing vmptrld to load the vmcs-pointer. So the hypervisor may want to do this:
vmclear [vmcs_ptr]
jbe vmclear_failed
vmptrld [vmcs_ptr]
jbe vmptrld_failed

At this point we have executed vmxon, entered VMX_ROOT mode, initialized the virtual-machine-vmcs with vmclear and loaded the virtual-machine-vmcs pointer into the processor by executing vmptrld. The next step is to initialize the vmcs with the virtual-machine's (hence forth referred to as guest) data and then launch the guest.

Wednesday, April 22, 2009

More on VMXON

vmxon takes as its operand a pointer to the vmxon region.

The code may look like this:

vmxon [vmxon_ptr]


vmxon_ptr dq vmxon_region_begin
vmxon_region_begin: vmxon_rev_id dd 0

Key things to note in the above snippet:

a. The operand to vmxon is vmxon_ptr. vmxon_ptr is a pointer to the vmxon_region. Note: vmxon_region is in physical memory.

b. The vmxon_region contains a 4-byte field called 'rev_id'. The hypervisor is expected to set up the revision-id in the vmxon-region provided by the processor.

How does the hypervisor determine the revision-id?

On Intel processors, the revision-id is contained in VMX_BASIC_MSR (0x480). Bits 31:0 of this MSR contains the revision-id of the processor.

So, for the example above, the hypervisor may want to do:

///////////////////////////////////////////////////

xor eax, eax

xor edx, edx

mov ecx, 0x480

rdmsr ; after rdmsr eax has the revision id

mov dword [vmxon_rev_id], eax ; write the rev-id into vmxon region

vmxon [vmxon_ptr]

////////////////////////////////////////////////

Note: Intel PRM also specifies that the vmxon_region must be aligned on a 4K boundary. If it is not 4K aligned , VMXON is guaranteed to fail.

It is worth repeating that the operand to vmxon is a pointer to the vmxon_region which is in physical memory. Hence vmxon regions should reside in unpaged memory.

Successful completion of vmxon will cause the processor to enter the VMX_ROOT operation.

Hypervisors must also check to see if the execution of vmxon was successful. That can be done by checking the state of eflags.ZF and eflags.CF. If eflags.ZF =0 and eflags.CF=0 then vmxon was successful.

Continuing our previous code:

///////////////////////////////////////////////////
xor eax, eax
xor edx, edx
mov ecx, 0x480
rdmsr
mov dword [vmxon_rev_id], eax
vmxon [vmxon_ptr]

jbe vmxon_failed

vmxon_pass:

if i am here then eflags.ZF=0 and eflags.CF=0. So vmxon was successful.

vmxon_failed:

either eflags.ZF=1 or eflags.CF=1

handle failed code here
//////////////////////////////////////////////////

First look at VMXON

Hypervisors should first begin with the execution of the vmxon instruction. VMXON enables vmx operation. Execution of VMXON puts the processor in VMX_ROOT mode. There are a few things that the hypervisor must ascertain before executing vmxon:

1. Hypervisor must turn on the CR4.VMXE bit. The VMX enable (VMXE) bit is bit 13 of CR4. A typical code sequence would be:

mov eax, cr4
or eax, 0x2000
mov cr4, eax

Executing VMXON without CR4.VMXE=1 will cause the processor to generate a #UD(undefined opcode) exception.

2. Hypervisor must set the fixed bits of CR0. CR0.NE,PG and PE are all fixed bits in vmx operation and they should always be 1 as long as the processor is in VMX_ROOT operation.

Any attempt to clear the fixed bits of CR0 after executing vmxon will cause the processor to generate a #GP exception.

3.A20M: A20M# must be off prior to the execution of vmxon. (violating this will result in a #GP).

4. The hypervisor must ensure that prior to execution of vmxon , the processor is not in V86 mode(eflags.vm must be 0) or in compatibility mode(efer.lma && !cs.l must be false).

The above 4 conditions must be satisfied for vmxon to work. (For it to be successful few other things need to be done).

Note that:
Assertions of INIT# will not be recoginzed by the cpu after the execution of vmxon.I think INIT# just stays pending until it gets unblocked.

VMX instructions in x86

Note: Intel PRM Vol 3b has a lot of details on VMX. If you want a quick snapshot read this blog and then Vol 3b will seem tractable.


To enable virtual machine architecture, Intel provides new instructions as part of their Virtual Machine Extensions(abbreviated VMX) instruction set. This instruction set is different from the one that AMD provides for SVM.

Here is a quick look at the instructions:

(a) VMXON - enter vmx operation

(b) VMXOFF - leave vmx operation

(c) VMREAD - read from the vmcs (vmcs will be discussed later)

(d) VMWRITE - write to the vmcs

(e) VMPTRLD - load vmcs pointer

(f) VMPTRST - store vmcs pointer

(g) VMLAUNCH/VMRESUME - launch or resume virtual machine

(h) VMCALL - call to the hypervisor

Processor/Firmware settings for VMX:

1. To make sure your processor supports VMX, execute CPUID with eax=1 (leaf 1) and check for bit 5 of ecx. If the bit is set the CPU supports VMX else it is not supported.

2. In addition to the above the BIOS must enable VMX by a write to the FEATURE_CONTROL_MSR (address 0x3a). If the msr value is initialized to 0x5 (bit0=1 and bit2=1), then vmx is enabled.
Bit 0 of the msr is the lock bit. If set, the msr is protected. This means the processor will throw a #GP exception when a wrmsr is attempted with the lock bit = 1. Bit 2 is the VMXON_ENABLE bit. Executing VMXON without bit2 set will cause the processor to generate a #GP exception.