Why Intel GPU Can't Run LLM Inference in WSL2: A Layer-by-Layer Diagnosis

In the previous post, Shimmy failed to run on WSL2 with Intel integrated graphics. This post digs into why — it’s not Shimmy’s fault, not llama.cpp’s fault, but a fundamental incompatibility between WSL2’s DXG kernel module and the Intel GPU driver at the protocol level.

Part 2 of the “Local LLM Inference Environment” series.

Core Finding

Intel integrated graphics cannot be used for LLM inference in WSL2. The issue isn’t in IGC or OpenCL drivers — it’s WSL2’s DXG kernel module failing to communicate with the Intel Arc 140T Windows driver.

WSL2 GPU Stack

Four layers from hardware to Python:

Python (llama-cpp-python, OpenCL)
         ↓
[3] IGC — Intel Graphics Compiler
         ↓
[2] OpenCL ICD (intel-opencl-icd)
         ↓
[1] WSL2 DXG kernel module (/dev/dxg)
         ↓
[0] Windows Intel GPU driver
         ↓
     Physical GPU (Intel Arc 140T / Arrow Lake H)

Per-Layer Diagnosis

Layer	Component	Status	Evidence
[0]	Windows driver (32.0.101.6554)	⚠️ Outdated	2025-01-15, Intel shipped many Arrow Lake updates since
[1]	WSL2 DXG (kernel 6.6.87.2)	🔴 Massive ioctl failures	dmesg errors from 5 seconds after boot
[2]	OpenCL ICD (23.43.27642.40)	✅ Enumerates GPU	`clGetDeviceIDs` returns Intel Graphics
[3]	IGC (1.0.15468.25)	✅ Library intact	Compilation OK, but kernel submission goes through DXG → crash

Layer 1 is the real failure point.

dmesg Evidence

dmesg is flooded with DXG ioctl errors from boot — not triggered by running llama.cpp:

[    5.459659] misc dxg: dxgkio_query_adapter_info: Ioctl failed: -22  (EINVAL)
[    5.460357] misc dxg: dxgkio_query_adapter_info: Ioctl failed: -2   (ENOENT)
[    5.460880] misc dxg: dxgkio_query_adapter_info: Ioctl failed: -22
...
[78509.821263] misc dxg: dxgkio_query_adapter_info: Ioctl failed: -22
[78509.825964] misc dxg: dxgkio_reserve_gpu_va:     Ioctl failed: -75  (EOVERFLOW)

errno	Meaning
-2 (ENOENT)	Adapter not found
-22 (EINVAL)	Invalid parameter — WSL2 DXG sends params the driver doesn’t understand
-75 (EOVERFLOW)	GPU virtual address reservation overflow

WSL2’s dxg module and the Windows Intel GPU driver are incompatible at the protocol level.

Root Cause

Intel Arc 140T is Arrow Lake H architecture, released late 2024. Windows driver 32.0.101.6554 (2025-01-15) is an early Arrow Lake driver with incomplete WSL2 DXG support. When DXG sends requests, the driver doesn’t understand them or has bugs, returning EINVAL/EOVERFLOW.

Debugging Reflection

The first time you see IGC segfault, intuition says “IGC is the problem.” So you check IGC versions, look for known bugs, search Reddit — IGC is the latest, no known issues. Shift direction. Test Vulkan (WSL2 doesn’t support native Vulkan passthrough). Test DirectML (Windows-only, unusable in WSL2). Test Shimmy (same llama.cpp backend — completely unrelated variable).

Four hours went into tracing upward and sideways. Downward takes one command, five minutes.

Crash log formatting naturally draws your attention to the crashing layer. IGC: Internal Compiler Error — IGC’s name is in the error message, your attention gets pinned there. But when any component crashes while calling into an external layer, the root cause is almost always in the layer being called. Self-contained computation failures are your own problem.

Five Approaches Tested

Approach	GPU Recognition	Inference	Rating	Notes
Vulkan (WSL2)	❌ Only llvmpipe	CPU fallback	⭐	Mesa needs native `/dev/dri`
OpenCL (WSL2)	✅ Enumerates OK	❌ DXG ioctl crash	⭐	Retry after driver update
SYCL (WSL2)	❌ Confirmed dead	Needs ABI patch	⭐	Community has given up
Windows Ollama	✅	✅ Works out of box	⭐⭐⭐	Only viable option now
IPEX-LLM (Win)	✅	✅ 17-18 t/s	⭐⭐⭐	Verified on Lunar Lake

Details

Vulkan (WSL2): No /dev/dri device, Mesa Intel Vulkan ICD can’t initialize. GPU count = 0.

OpenCL (WSL2): GPU recognized (Intel(R) Graphics [0x7d51]), but IGC crashes on kernel submission — root cause is DXG layer.

SYCL (WSL2): Reddit r/IntelArc users confirmed (2026-04) that WSL2 can’t detect Intel GPU via standard SYCL runtime. Community consensus: use Windows native.

Windows Ollama: Native Intel GPU support, WSL2 calls via HTTP API. Verified working, zero config.

IPEX-LLM (Windows native): Replacing IPEX-LLM’s bundled Ollama binary achieved 17-18 tokens/s on Lunar Lake Arc 140V with qwen3:8b at 100% GPU utilization.

Next Steps

Priority	Action
1	🔑 Update Windows Intel GPU driver (6554 → 8801)
2	Update WSL2: `wsl --update`
3	Return to stable Windows (exit Insider Preview Canary)
4	Try Level Zero backend (`sudo apt install intel-level-zero-gpu`)