Post

Build Intel® Extension for PyTorch from source

Build Intel® Extension for PyTorch from source

Build IPEX from source

This is a record on how to build Intel® Extension for PyTorch (IPEX) from source. After three days of torturing, it is finally done. In memory of this, I will write down the steps here. Most steps remain still as the official document, but there are some hidden bugs not mentioned. It is recommended to read the official document thoroughly first, for some bugs may be warned in later steps. You can find a TL;DR version in the end section as a supplement for the official document.

My compilation environment is listed as follows:

TermEnvironment
OSWindows 11 23H2 64-bit
CPU13th Gen Intel(R) Core(TM) i7-1360P @ 2.611GHz
GPUIntel(R) Iris(R) Xe Graphics
Python3.11.12 (Intel® Distribution for Python) in conda environment
GCCversion 11.2.0 (x86_64-posix-sjlj-rev1, Built by MinGW-W64 project)

System Requirements

First, follow the instructions to install GPU drivers, Intel® Deep Learning Essentials/Intel® oneAPI Base Toolkit, Python and the compiler.

  • Make sure the version of oneAPI tools are the same as that given in the document to avoid any version mismatch or dependency issues. For example, if the document uses 2025.0, then the first two version numbers of your oneAPI tools should be 2025.0 as well. The last one may be different, such as 2025.0.2 or 2025.0.4 are both acceptable.
  • Note that the Python version is exactly the one that you are building for. So please plan ahead, or you will need to rebuild the whole thing again.
  • Use Python in conda environment.
  • Use GCC 11 is recommended, since higher versions such as GCC 14 may fail to compile.

Build Process

Open a cmd terminal, setup the oneAPI, conda and proxy (since there are git operations during setup) environment vars. For example, I use the init.bat file below to do this automatically.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@echo off
rem Replace the path with your own oneAPI installation path
call "D:\\Intel\\oneAPI\\setvars.bat"
rem Replace the env name with your own conda env name
conda activate ipex
rem Copied from Clash for Windows
set http_proxy=http://127.0.0.1:1169 & set https_proxy=http://127.0.0.1:1169
echo "setup oneAPI vars, conda env-ipex, and proxy"

rem [OPTIONAL SETTINGS]
rem If your hardware is not so good, then stack overflow may happen while compiling pytorch.
rem It's because the default stack size is small and the default parallel compiling uses too many threads.
rem You can set the following env vars to limit the number of threads and memory usage.
set MAX_JOBS=2
set CL=/Zm800

Remember to run the following command to install the required dependencies before building (which are not properly set in the official compile_bundle.bat).

1
conda install -y -c conda-forge libjpeg-turbo libpng json5

Then call the compile_bundle.bat (download it from github, see the official document for reference) with the following command, and just wait.

1
2
# compile_bundle.bat <DPCPP_ROOT> <ONEMKL_ROOT> <AOT>
compile_bundle.bat "D:\\Intel\\oneAPI\\compiler\\latest" "D:\\Intel\\oneAPI\\mkl\\latest" "xe-lpg,mtl-u"
  • Here I specify DPCPP_ROOT and ONEMKL_ROOT explicitly to avoid unnecessary errors.
  • The legal AOT targets can be found in official document and the csrc/gpu/aten/operators/xetla/kernels/CMakeLists.txt file in IPEX’s github repository. Please check your hardware types carefully. You can use sycl-ls --verbose in an oneAPI-enabled terminal to check your hardware types.
  • The AOT targets listed in the CMakeLists.txt file mentioned above (in the latest commit till 2025.04.23) are as follows:
1
2
3
set(AOT_REGEX_XE_HPC "(xe-hpc.*|pvc|bmg|bmg-.*|lnl-.*|xe2-.*)")
set(AOT_REGEX_XE_HPG "(xe-hpg.*|ats-m.*|acm-.*|dg2|dg2-.*|arl-h)")
set(AOT_REGEX_XE_LPG "(xe-lpg.*|mtl|mtl-.*|arl-u|arl-s|0x7d55|0x7dd5|0x7d57|0x7dd7)")

The compile process is quite time-consuming and storage-consuming. The main parts are Pytorch and IPEX. To compile Pytorch, it will require about 30~40 GB of storage (build cache included) and about 6~7 hours. To compile IPEX, it will require about 10 GB of storage and about 4 hours. So make yourself fully prepared in advance.

Summary of Encountered Problems

Here I will record the major problems I encountered during the build process, which are already integrated into the steps above yet still listed here as a reference.

  • Use GCC 14 instead of GCC 11 at first, and failed every time (also mentioned in ipex issues #790).
  • Use oneAPI 2025.1 instead of 2025.0.
  • Use the oneAPI command prompt, yet to find “Use individual component-specific activation scripts to activate required components listed below one-by-one” in the document.
  • Forget to setup proxy or install dependencies after creating a brand new conda environment. (So I write the init.bat and modified compile_bundle.bat to do this automatically.)
  • Haven’t set the DPCPP_ROOT and ONEMKL_ROOT explicitly but use the ONEAPIROOT, which may cause some errors.
  • Set unsupported AOT targets, such as rpl-p (listed in the official document, yet absent in the CMakeLists.txt) or xe_lpg (spelling error).
  • No enough disk space.
  • Stack overflow during compilation. (Set MAX_JOBS to limit the parallel jobs of ninja, and CL to expand the stack size. The former is more important, since I only set CL to 800 at first and it still overflowed. Even worse, I tried twice without realizing this issue.)
  • Run part of the compile_bundle.bat manually, yet comment out the wrong lines (to be more specific, I comment out some necessary lines while forget to comment out some useless lines).
This post is licensed under CC BY-NC-SA 4.0 by the author.