Monday, October 4, 2010

Software injection into V86 guest with interrupt redirection - What must be the IDT VECTOR INFO?

The following observation is made while launching a V86 guest on Intel Merom. As part of vmlaunch or vmresume, a software interrupt is injected into the V86 guest(The entry interruption info field reads 0x800004vv where vv is the vector number). The V86 virtual machine has:

a.  GUEST_RFLAGS.VM = 1 (indicating the guest is in V86 mode).
b. CR4.VME=1 (enables interrupt redirection provided the redirection bitmap says so in TSS).
c. The exception_bitmap in the guest is configured to vmexit on a #PF.

At the end of vmlaunch, the software interrupt is injected. The guest is in V86 mode and has CR4.VME=1. The cpu consults the TSS to read the interrupt redirection bitmap. The TSS page is not present and the cpu takes a #PF. The guest is configured to vmexit on #PF. After the vmexit, use vmread to read the following vmcs fields:
a. Exit reason (reads 0)
b. Exit Interruption Info (0x80000B0E - indicates a #PF)
c. IDT Vector  Info (reads 0)
d. Exit Qualification (0x - address that caused #PF).

Something interesting in the above results is the value of idt-vector-info. The idt-vector-info must have read 0x800004vv(vv=vector), since the vmexit was encountered in the process of injecting an event. This behavior appears to violate what is stated in  vol3b.

Monday, April 26, 2010

Injecting software interrupt into a V86 guest

To inject an interrupt or exception into a guest a hypervisor uses the ENTRY_INTERRUPTION_INFO field in the vmcs. For eg: if there is a need to inject a #GP exception into the guest as part of vmentry, the entry_interruption_info field would look like this: 0x80000B0D.

1 . Bits 7:0 of this field represent the vector (0x0D - which is vector 13)
2. Bits 10:8 indicate the type(in this case type = 0x3 which is a hardware-exception).
3. Bit   11 is the error-code valid bit which is true in the example above.
4. Bit 31 is the valid bit for  ENTRY_INTERRUPTION_INFO field.

To inject a software interrupt (say vector 0x8) hypervisor would program entry_interruption_info field as given under: 0x80000408 (type=0x4 and vector=0x8). If the guest is in V86 mode (GUEST_RFLAGS[VM]=1) , the processor behaves according to Table 15.2, Intel SDM, vol 3A .


Given below is a summary of the processor behavior during normal software-interrupt execution in V86 and during an event injection into a V86 guest:

1. EFLAGS.VM = 1 , CR4.VME=1, EFLAGS.IOPL=3
=> In this case the bit in the redirection bitmap of the TSS is consulted.
=> if bit in the redirection bitmap=0, the software interrupt is redirected to x86 style handler.
=> if bit in the redirection bitmap=1, the software interrupt is redirected to protected-mode handler.

2.  EFLAGS.VM = 1 , CR4.VME=1, EFLAGS.IOPL<3
=> In this case the bit in the redirection bitmap of the TSS is consulted.

=> if bit in the redirection bitmap=0, the software interrupt is redirected to x86 style handler. Notice that this is the same behavior as with EFLAGS.IOPL=3. The difference is in the value of eflags pushed on the stack. Here the IOPL of the eflags image is forced to 3 and the value of VIF is copied to IF.

Normal behavior:  if bit in the redirection bitmap=1, the interrupt is directed to a #GP handler.
During VMX event injection:  if bit in the redirection bitmap = 1, the processor will *NOT*  #GP due to IOPL < CPL.

3. EFLAGS.VM = 1 , CR4.VME=0, EFLAGS.IOPL=3
=> Normal behavior: Interrupt directed to a protected mode handler (No #GP).
=> During event injection: Same as above.

4. EFLAGS.VM = 1 , CR4.VME=0, EFLAGS.IOPL<3
=> Normal behavior: Interrupt directed to a #GP handler .
=> During Event Injection:  No #GP can occur due to IOPL< CPL. The behavior will be the same as with IOPL=3.

Summary:
From the above discussion is there will be no #GP due to IOPL < CPL during the injection of a software interrupt into a V86 guest. If the hypervisor wants this #GP to occur, it needs to inject a #GP directly into the guest instead of a software-interrupt.This can be achieved by programming the entry_interruption_info field to 0x80000B0D.

Monday, October 19, 2009

VMEXIT on INVLPG

A boundary case observed on Intel Merom:

(a) The virtual-machine is configured to vmexit on INVLPG(bit 9 of the PROCESSOR_EXECUTION_CONTROLS is 1).

(b) The virtual-machine has GS BASE = 0xFFFF8000_00000000

(c) Virtual machine executes: invlpg [gs:0-1]

(d) Execution of invlpg causes vmexit.

(e) The address of invlpg is recorded in exit-qualification. Upon a vmread of EXIT_QUALIFICATION the value obtained is:
=> FFFF7FFF_FFFFFFFF


Notice that the value recorded is a non-canonical address ie; address[63:48] != address[47]. This is the only case i have encountered where a non-canonical address shows up on the exit-qualification.

The only explanation I can come up with for this behavior is that : INVLPG unlike other instructions does not fault in 64-bit mode with a non-canonical operand. According to the instruction spec, INVLPG morphs into a NOP for such cases.

When a vmexit handler for INVLPG is written, this case must be taken into consideration(ie; a non-canonical address might show up in the exit-qualification field).

Saturday, July 25, 2009

A full blown initialization of VMCS - Assembly code

The code below will outline the general steps prior to executing a VMLAUNCH or VMRESUME.
Prior to looking at the assembly code, here is a step-by-step description of what is being done:

The reader must know that:
A)this code will run only in ring0.
B)that paging is already enabled in CR0(bit 31).

(1) First Enable VMXE (bit 13) in CR4. Make sure that processor supports VMX by executing CPUID(leaf 1, ecx[5]).

(2) Intialize revision-id(msr 0x480,31:0) in the vmxon region and in the guest-vmcs region.

(3) Execute VMXON with the pointer to vmxon region. In some cases, if BIOS has not enabled bits 0, 2 of FEATURE_CONTROL_MSR (msr 0x3a) this will fail.

(4) Execute VMCLEAR with the pointer to the guest-vmcs region.

(5) Execute VMPTRLD with the pointer to the guest-vmcs region.

(6) Now initialize the guest-vmcs:
(a) First initialize the vmx controls. These include the following controls:
1. PIN_BASED
2. PROC_BASED
3. ENTRY_CONTROLS
4. EXIT_CONTROLS

(b) Next initialize the host-state and guest-state.

(c) Now do vmlaunch. If VMLAUNCH is successful, then the processor will start executing code
from the GUEST_CS:GUEST_RIP value specified in the VMCS.


Here comes the code:
////////////////////////////////////////////////////
mov eax, cr4
bts eax, 13
mov cr4, eax

mov ecx, 0x480
rdmsr
mov edx, [vmxon-ptr]
mov [edx], eax
mov edx, [guest-ptr]
mov [edx], eax

VMXON [vmxon-ptr]
jbe fail

vmclear [guest-ptr]
jbe fail

vmptrld [guest-ptr]
jbe fail


call initialize_vmx_controls
call initialize_vmx_host_guest_state
call do_vmlaunch

;ideally a hypervisor would read the VMX-MSRS
; to determine what values to write.
initialize_vmx_controls:
mov ebx, ENTRY_CONTROLS ;0x4012
mov eax, 0x11ff
vmwrite ebx, eax
mov ebx, PIN_CONTROLS; 0x4000
mov eax, 0x1f
vmwrite ebx, eax
mov ebx, PROC_CONTROLS ; 0x4002
mov eax, 0x0401E9F2
vmwrite ebx, eax
mov ebx, EXIT_CONTROLS ; 0x400C
mov eax, 0x36dff
vmwrite ebx, eax
ret


initialize_vmx_host_guest_state:
mov eax, cr3
mov ebx, HOST_CR3 ;0x6C02
mov edx, GUEST_CR3 ;0x6802
VMWRITE EBX,EAX
mov eax, pdebase_guest
VMWRITE EDX,EAX

mov ebx, HOST_RSP ;0x6c14
mov eax, tos ;top-of-stack
vmwrite ebx, eax

mov ebx, HOST_CR0 ; 0x6C00
mov eax, cr0
vmwrite ebx, eax
mov ebx, GUEST_CR0 ;0x6800
vmwrite ebx, eax
mov ebx, HOST_CR4 ; 0x6C04
mov eax, cr4
vmwrite ebx, eax
mov ebx, GUEST_CR4; 0x6804
vmwrite ebx, eax
mov ebx, HOST_CS_SEL ; 0x0c02
mov eax, cs
vmwrite ebx, eax
mov ebx,HOST_DS_SEL ; 0x0c06
mov eax, ds
vmwrite ebx, eax
mov ebx, HOST_SS_SEL ; 0x00000c04
mov eax, 0x18
vmwrite ebx, eax
mov ebx, HOST_TR_SEL; 0x00000c0c
mov eax, 0x18
vmwrite ebx, eax
mov ebx, GUEST_TR_SEL ;0x0000080e
mov eax, 0x18
vmwrite ebx, eax
mov ebx, GUEST_TR_ATTR ;0x00004822
mov eax, 0x8b
vmwrite ebx, eax
mov ebx, GUEST_TR_LIMIT ;0x0000480e
mov eax, 0xff
vmwrite ebx, eax
mov ebx, GUEST_LDTR_ATTR ;0x00004820
mov eax, 0x00010000
vmwrite ebx, eax
mov ebx, GUEST_SS_ATTR ;0x00004818
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_DS_ATTR ;0x0000481a
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_ES_ATTR ;0x00004814
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_FS_ATTR ;0x0000481c
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_GS_ATTR ;0x0000481e
mov eax, 0xc093
vmwrite ebx, eax
mov ebx, GUEST_SS_LIMIT ;0x00004804
mov eax, 0xffffffff
vmwrite ebx, eax
mov ebx, GUEST_DS_LIMIT ;0x00004806
vmwrite ebx, eax
mov ebx, GUEST_ES_LIMIT ;0x00004800
vmwrite ebx, eax
mov ebx, GUEST_FS_LIMIT ;0x00004808
vmwrite ebx, eax
mov ebx, GUEST_GS_LIMIT ;0x0000480a
vmwrite ebx, eax
mov ebx, LINK_PTR_FULL ;0x00002800
vmwrite ebx, eax
mov ebx, VMS_LINK_PTR_HIGH ;0x00002801
vmwrite ebx, eax
mov ebx, GUEST_GDTR_BASE ;0x00006816
mov eax, gdt32t
vmwrite ebx, eax
mov ebx, HOST_GDTR_BASE ;0x00006c0c
vmwrite ebx, eax
ov ebx, GUEST_CS_LIMIT ;0x00004802
mov eax, 0xffffffff
vmwrite ebx, eax
mov ebx, GUEST_CS_ATTR ;0x00004816
mov eax, 0xc09b
vmwrite ebx, eax
mov ebx, GUEST_RSP ;0x0000681c
mov eax, tos
vmwrite ebx, eax
mov ebx, GUEST_IDTR_BASE ;0x00006818
mov eax, idt32t
vmwrite ebx, eax
mov ebx, HOST_IDTR_BASE ;0x00006c0e
vmwrite ebx, eax
mov ebx, GUEST_CS_SEL ;0x00000802
mov eax, guest_sel
vmwrite ebx, eax
mov ebx, GUEST_CS_BASE ;0x00006808
mov eax, guest_base
vmwrite ebx, eax
mov ebx, GUEST_RIP ;0x0000681e
mov eax, 0
vmwrite ebx, eax
mov ebx, HOST_RIP ;0x00006c16
mov eax, after_vmexit
vmwrite ebx, eax
mov ebx, GUEST_RFLAGS ;0x00006820
mov eax, 2
vmwrite ebx, eax
mov ebx, EXCEPTION_BITMAP ;0x4004
mov eax,0xdeadfeef
vmwrite ebx, eax
ret

do_vmlaunch:
VMLAUNCH

after_vmexit:
;read EXIT_REASON and figure out what caused the vmexit.

///////////////////////////////////////////////////////////////

The HOST_RIP is where control is transferred after a vmexit. The hypervisor can determine the appropriate course of action by reading the vmexit fields from the vmcs.

Wednesday, June 17, 2009

VMCS Guest State Area

The VMCS has an area for guest register state and guest non-register state. The guest register state contains fields for control registers(CR0,CR3,CR4), segment registers(cs,ds etc) , segment selectors, segment attributes(also called ARBytes),segment-limits, debug-register(dr7) etc.

The guest non-register state has the following fields:

(a) VMCS GUEST ACTIVITY STATE: This is a 32 bit field that describes the state of the guest. Intel manual defines only the first four bits in this field. The others are marked as reserved. The four bits are as given under:
(i) Bit 0 - If this bit is 1 , the guest is in Active State .
(ii) Bit 1 - If this bit is 1 , the guest is in HLT state.
(iii) Bit 2 - If this bit is 1 , the guest is in Shutdown state.
(iv) Bit 3 - If this bit is 1 , the guest is in Wait-for-SIPI state.

Here is a scenario how the vmcs might indicate a guest in halt state:

The guest executes HLT and the execution of HLT does not cause a vmexit(assume proc_based_ctl[hlt] is 0). At this point the guest is in HLT state. Now an interrupt arrives and the processing of the interrupt causes a vmexit(assume pin_based_ctl for intr is 1). The vmexit causes the processor to update the GUEST ACTIVITY STATE in the VMCS as HLT. A subsequent VMRESUME will read this field from the vmcs and launch the guest in HLT state.

(b) VMCS GUEST INTR INFO : The INTR INFO describes the interruptibility state of the guest. Four bits are defined in this field.
(i) Bit 0 - Guest has STI blocking active
(ii) Bit 1 - Guest has MOV-SS/POP-SS active
(iii) Bit 2 - Blocking by SMI
(iv) Bit 3 - Blocking by NMI

Here is a scenario where the vmcs might indicate STI blocking:

Guest executes STI. Following STI, guest executes HLT which vmexits (assume proc_based_ctl[hlt] is 1). The vmexit due to hlt will cause the processor to update the interruptibility-info with STI blocking. A subsequent VMRESUME will put the guest back in STI blocking state. In this case, during a vmresume, the processor also checks the EFLAGS.IF = 1 and will fail VMEntry if it detects the following condition:

if(sti_blocking==TRUE && EFLAGS.IF==0) {
fail_vmentry();
}

(c)Pending Debug Exceptions Field: This field in the VMCS indicates if there are any debug exceptions that are pending in the guest. The meaningful bits in this field are:

Bit 12 -> Enabled Breakpoint
Bit 14 -> Single Step
Bits 3-0 -> Correspond to B0,B1,B2,B3 (meaningful only if bit 12 is 1).

Here is a scenario how the guest may end up with a single-step exception pending during a vmexit:

pushfd ; push flags
or dword [esp], 0x100 ; set eflags.TF
popfd ; pop flags now eflags.TF=1
mov ss, ax ; will delay recognition of single step until the end of the next instruction
vmcall ; cause vmexit

In the above code, at the time of vmexit , single-step is pending (it is delayed because of mov-ss blocking). This vmexit will cause the processor to record a single-step pending in the debug-exceptions field of the vmcs. Incidentally, the above vmexit will also cause the processor to record mov-ss blocking in the interruptibility field.

Wednesday, May 6, 2009

VMX EXECUTION_CONTROLS

Two types of execution controls are defined:
a) PIN_BASED execution controls
b) PROC_BASED execution controls



PIN_BASED Controls:
The vmcs encoding for this field is 0x4000. There are 2 bits in this 32-bit field that are interesting:
Bit 0 – External Interrupt Exiting
Bit 3 – NMI Exiting
After launching a vmx-guest, when an external interrupt is received in the guest and Bit0 is 1 then there is a vmexit due to external interrupt.
Bit3 setting controls the behavior of the processor in response to a NMI while running as vmx-guest. If bit3==1 and a nmi is received in the guest a vmexit occurs.
The other bits are reserved. The settings of the reserved bits(0 or 1) are obtained by reading msr 0x481. To initialize this field in the vmcs:


xor eax,eax
xor edx, edx
mov ecx, 0x481
rdmsr
or eax, edx ; it has the valid vector to be written into the vmcs.
bts eax, 0 ; set bit0 to vmexit due to interrupts
bts eax, 3; nmi exiting bit = 1
mov ebx, 0x4000 ; encoding for entry controls
vmwrite ebx, eax



PROC_BASED Controls:
The vmcs encoding for this field is 0x4002. It is a 32 bit field that determines the behavior of the processor when certain instructions are executed in the vmx-guest.
For eg:
Bit7 of this vector controls the processor behavior upon execution of the HLT instruction in vmx-guest. If 1 , execution of HLT will cause a vmexit. If 0, the instruction will be executed normally without any vmexit.


Similarly bit9 controls the behavior of the processor on INVLPG, bit19 controls the behavior on mov-to-cr8 and bit20 controls the behavior on mov-from-cr8 etc.

The bit positions are described below:

INTRWINDOW 2
TSCOFFSET 3
HLT 7
INVLPG 9
MWAIT 10
RDPMC 11
RDTSC 12
CR8LOAD 19
CR8STORE 20
TPRSHADOW 21
MOVDR 23
IOUNCOND 24
IOBITMAP 25
MSRBITMAP 28
MONITOR 29
PAUSE 30



MSR 0x482 indicates the allowed-0 and allowed-1 settings of these controls.

Note: Newer processors have additional bits defined for these controls. For more details see PRM Vol 3b.

Monday, May 4, 2009

VMX Exit Control fields

EXIT_CONTROLS:
This field is used by the processor during a vmexit. This is a 32-bit field (just like the entry controls) but only 2 bits are defined:
Bit 9 – Host address space – This value is loaded to EFER.LME and CS.L on a vmexit.
Bit 15 – Acknowledge Interrupt on Exit – If there is a vmexit due to interrupt this bit determines whether the interrupt is acknowledged or not. The interrupt vector is recorded in the vmcs.
All other bits are reserved. They are either 0s or 1s as determined by the EXIT_CTLS_MSR (msr 0x483).


EXIT_CONTROL FOR MSR:
This is exactly similar to ENTRY_CONTROL FOR MSR. The only difference is in the vmcs encodings . They are tabulated below:

EXIT_MSR_STORE_ADDR EQU 0x2006
EXIT_MSR_STORE_COUNT EQU 0x400E
The Guest MSRS are saved in the MSR store area during a vmexit. On a subsequent VMEntry, these MSRS will be loaded from the same area.


EXIT_MSR_LOAD_ADDR EQU 0x2008
EXIT_MSR_LOAD_COUNT EQU 0x4010
The Host MSRs are loaded from the physical address specified in EXIT_MSR_LOAD_ADDR.
The format of the msr-load/msr-store areas is exactly similar to the msr-load area that is used for vmentry.