In the previous post, Shimmy failed to run on WSL2 with Intel integrated graphics. This post digs into why — it’s not Shimmy’s fault, not llama.cpp’s fault, but a fundamental incompatibility between WSL2’s DXG kernel module and the Intel GPU driver at the protocol level.

Part 2 of the “Local LLM Inference Environment” series.

Core Finding

Intel integrated graphics cannot be used for LLM inference in WSL2. The issue isn’t in IGC or OpenCL drivers — it’s WSL2’s DXG kernel module failing to communicate with the Intel Arc 140T Windows driver.

WSL2 GPU Stack

Four layers from hardware to Python:

Python (llama-cpp-python, OpenCL)
[3] IGC — Intel Graphics Compiler
[2] OpenCL ICD (intel-opencl-icd)
[1] WSL2 DXG kernel module (/dev/dxg)
[0] Windows Intel GPU driver
     Physical GPU (Intel Arc 140T / Arrow Lake H)

Per-Layer Diagnosis

LayerComponentStatusEvidence
[0]Windows driver (32.0.101.6554)⚠️ Outdated2025-01-15, Intel shipped many Arrow Lake updates since
[1]WSL2 DXG (kernel 6.6.87.2)🔴 Massive ioctl failuresdmesg errors from 5 seconds after boot
[2]OpenCL ICD (23.43.27642.40)✅ Enumerates GPUclGetDeviceIDs returns Intel Graphics
[3]IGC (1.0.15468.25)✅ Library intactCompilation OK, but kernel submission goes through DXG → crash

Layer 1 is the real failure point.

dmesg Evidence

dmesg is flooded with DXG ioctl errors from boot — not triggered by running llama.cpp:

[    5.459659] misc dxg: dxgkio_query_adapter_info: Ioctl failed: -22  (EINVAL)
[    5.460357] misc dxg: dxgkio_query_adapter_info: Ioctl failed: -2   (ENOENT)
[    5.460880] misc dxg: dxgkio_query_adapter_info: Ioctl failed: -22
...
[78509.821263] misc dxg: dxgkio_query_adapter_info: Ioctl failed: -22
[78509.825964] misc dxg: dxgkio_reserve_gpu_va:     Ioctl failed: -75  (EOVERFLOW)
errnoMeaning
-2 (ENOENT)Adapter not found
-22 (EINVAL)Invalid parameter — WSL2 DXG sends params the driver doesn’t understand
-75 (EOVERFLOW)GPU virtual address reservation overflow

WSL2’s dxg module and the Windows Intel GPU driver are incompatible at the protocol level.

Root Cause

Intel Arc 140T is Arrow Lake H architecture, released late 2024. Windows driver 32.0.101.6554 (2025-01-15) is an early Arrow Lake driver with incomplete WSL2 DXG support. When DXG sends requests, the driver doesn’t understand them or has bugs, returning EINVAL/EOVERFLOW.

Debugging Reflection

The first time you see IGC segfault, intuition says “IGC is the problem.” So you check IGC versions, look for known bugs, search Reddit — IGC is the latest, no known issues. Shift direction. Test Vulkan (WSL2 doesn’t support native Vulkan passthrough). Test DirectML (Windows-only, unusable in WSL2). Test Shimmy (same llama.cpp backend — completely unrelated variable).

Four hours went into tracing upward and sideways. Downward takes one command, five minutes.

Crash log formatting naturally draws your attention to the crashing layer. IGC: Internal Compiler Error — IGC’s name is in the error message, your attention gets pinned there. But when any component crashes while calling into an external layer, the root cause is almost always in the layer being called. Self-contained computation failures are your own problem.

Five Approaches Tested

ApproachGPU RecognitionInferenceRatingNotes
Vulkan (WSL2)❌ Only llvmpipeCPU fallbackMesa needs native /dev/dri
OpenCL (WSL2)✅ Enumerates OK❌ DXG ioctl crashRetry after driver update
SYCL (WSL2)❌ Confirmed deadNeeds ABI patchCommunity has given up
Windows Ollama✅ Works out of box⭐⭐⭐Only viable option now
IPEX-LLM (Win)✅ 17-18 t/s⭐⭐⭐Verified on Lunar Lake

Details

Vulkan (WSL2): No /dev/dri device, Mesa Intel Vulkan ICD can’t initialize. GPU count = 0.

OpenCL (WSL2): GPU recognized (Intel(R) Graphics [0x7d51]), but IGC crashes on kernel submission — root cause is DXG layer.

SYCL (WSL2): Reddit r/IntelArc users confirmed (2026-04) that WSL2 can’t detect Intel GPU via standard SYCL runtime. Community consensus: use Windows native.

Windows Ollama: Native Intel GPU support, WSL2 calls via HTTP API. Verified working, zero config.

IPEX-LLM (Windows native): Replacing IPEX-LLM’s bundled Ollama binary achieved 17-18 tokens/s on Lunar Lake Arc 140V with qwen3:8b at 100% GPU utilization.

Next Steps

PriorityAction
1🔑 Update Windows Intel GPU driver (6554 → 8801)
2Update WSL2: wsl --update
3Return to stable Windows (exit Insider Preview Canary)
4Try Level Zero backend (sudo apt install intel-level-zero-gpu)

References