Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix building on Windows with presence of Triton #6749

Merged
merged 4 commits into from
Jan 8, 2025

Conversation

woct0rdho
Copy link
Contributor

This fixes some errors when installing DeepSpeed on Windows with the presence of Triton.

I guess we can assume we don't need the warning about NFS on Windows for now. I did not try how to detect NFS path on Windows, but we can detect UNC path starting with \\ if needed.

os.rename does not allow overwriting the file on Windows, and os.replace is more cross-platform.

@woct0rdho woct0rdho requested a review from tohtana as a code owner November 14, 2024 13:42
@loadams loadams self-requested a review November 14, 2024 17:20
@loadams
Copy link
Contributor

loadams commented Nov 15, 2024

Hi @woct0rdho - thanks for your contribution. I'll review soon and repro/test on my side as well. But can you share more info about your setup, the error you hit, and things building after?

@woct0rdho
Copy link
Contributor Author

woct0rdho commented Nov 16, 2024

Hi @loadams , for the setup, I use Windows 11 23H2 build 22631.4460, MSVC and Windows SDK from VS Build Tools 2022 17.12.0, CUDA 12.5, Python 3.10.10.

As a minimal example, I create a fresh venv (not using conda), and install pip 24.3.1, setuptools 75.5.0, torch 2.5.1+cu124, triton 3.1.0, py-cpuinfo 9.0.0. Triton on Windows is currently not merged into the official repo and is a community-driven project. You can install my wheel pip install https://github.com/woct0rdho/triton-windows/releases/download/v3.1.0-windows.post5/triton-3.1.0-cp310-cp310-win_amd64.whl

Then I clone the master branch of this repo, and run build_win.bat in admin cmd.

After some compiling, it shows:

Traceback (most recent call last):
  File "C:\DeepSpeed\setup.py", line 40, in <module>
    from op_builder import get_default_compute_capabilities, OpBuilder
  File "C:\DeepSpeed\op_builder\__init__.py", line 18, in <module>
    import deepspeed.ops.op_builder  # noqa: F401 # type: ignore
  File "C:\DeepSpeed\deepspeed\__init__.py", line 25, in <module>
    from . import ops
  File "C:\DeepSpeed\deepspeed\ops\__init__.py", line 11, in <module>
    from . import transformer
  File "C:\DeepSpeed\deepspeed\ops\transformer\__init__.py", line 7, in <module>
    from .inference.config import DeepSpeedInferenceConfig
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\__init__.py", line 7, in <module>
    from ....model_implementations.transformers.ds_transformer import DeepSpeedTransformerInference
  File "C:\DeepSpeed\deepspeed\model_implementations\__init__.py", line 6, in <module>
    from .transformers.ds_transformer import DeepSpeedTransformerInference
  File "C:\DeepSpeed\deepspeed\model_implementations\transformers\ds_transformer.py", line 18, in <module>
    from deepspeed.ops.transformer.inference.triton.mlp import TritonMLP
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\__init__.py", line 10, in <module>
    from .ops import *
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\ops.py", line 6, in <module>
    import deepspeed.ops.transformer.inference.triton.matmul_ext as matmul_ext
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 461, in <module>
    fp16_matmul = Fp16Matmul()
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 210, in __init__
    __class__._read_autotune_table()
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 444, in _read_autotune_table
    TritonMatmul._read_autotune_table(__class__.__name__ + "_2d_kernel", __class__._2d_kernel)
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 165, in _read_autotune_table
    cache_manager = AutotuneCacheManager(cache_key)
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 87, in __init__
    TritonCacheDir.warn_if_nfs(self.cache_dir)
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 44, in warn_if_nfs
    if is_nfs_path(cache_dir) and not TritonCacheDir._warning_printed:
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 27, in is_nfs_path
    output = subprocess.check_output(['df', '-T', path], encoding='utf-8')
  File "C:\Python310\lib\subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Python310\lib\subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Python310\lib\subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python310\lib\subprocess.py", line 1440, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

which is because it cannot find the command df in the function is_nfs_path.

After fixing this, I run build_win.bat again, and it shows:

Traceback (most recent call last):
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 477, in matmul_ext_update_autotune_table
    fp16_matmul._update_autotune_table()
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 454, in _update_autotune_table
    TritonMatmul._update_autotune_table(__class__.__name__ + "_2d_kernel", __class__._2d_kernel)
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 183, in _update_autotune_table
    cache_manager.put(autotune_table)
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 102, in put
    os.rename(self.file_path + ".tmp", self.file_path)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\wocto\\.triton\\autotune\\Fp16Matmul_2d_kernel.pickle.tmp' -> 'C:\\Users\\wocto\\.triton\\autotune\\Fp16Matmul_2d_kernel.pickle'

which is because of the problem of os.rename.

After fixing both errors in this PR, the wheel successfully builds.

@loadams loadams enabled auto-merge January 8, 2025 17:31
@loadams
Copy link
Contributor

loadams commented Jan 8, 2025

Hi @loadams , for the setup, I use Windows 11 23H2 build 22631.4460, MSVC and Windows SDK from VS Build Tools 2022 17.12.0, CUDA 12.5, Python 3.10.10.

As a minimal example, I create a fresh venv (not using conda), and install pip 24.3.1, setuptools 75.5.0, torch 2.5.1+cu124, triton 3.1.0, py-cpuinfo 9.0.0. Triton on Windows is currently not merged into the official repo and is a community-driven project. You can install my wheel pip install https://github.com/woct0rdho/triton-windows/releases/download/v3.1.0-windows.post5/triton-3.1.0-cp310-cp310-win_amd64.whl

Then I clone the master branch of this repo, and run build_win.bat in admin cmd.

After some compiling, it shows:

Traceback (most recent call last):
  File "C:\DeepSpeed\setup.py", line 40, in <module>
    from op_builder import get_default_compute_capabilities, OpBuilder
  File "C:\DeepSpeed\op_builder\__init__.py", line 18, in <module>
    import deepspeed.ops.op_builder  # noqa: F401 # type: ignore
  File "C:\DeepSpeed\deepspeed\__init__.py", line 25, in <module>
    from . import ops
  File "C:\DeepSpeed\deepspeed\ops\__init__.py", line 11, in <module>
    from . import transformer
  File "C:\DeepSpeed\deepspeed\ops\transformer\__init__.py", line 7, in <module>
    from .inference.config import DeepSpeedInferenceConfig
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\__init__.py", line 7, in <module>
    from ....model_implementations.transformers.ds_transformer import DeepSpeedTransformerInference
  File "C:\DeepSpeed\deepspeed\model_implementations\__init__.py", line 6, in <module>
    from .transformers.ds_transformer import DeepSpeedTransformerInference
  File "C:\DeepSpeed\deepspeed\model_implementations\transformers\ds_transformer.py", line 18, in <module>
    from deepspeed.ops.transformer.inference.triton.mlp import TritonMLP
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\__init__.py", line 10, in <module>
    from .ops import *
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\ops.py", line 6, in <module>
    import deepspeed.ops.transformer.inference.triton.matmul_ext as matmul_ext
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 461, in <module>
    fp16_matmul = Fp16Matmul()
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 210, in __init__
    __class__._read_autotune_table()
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 444, in _read_autotune_table
    TritonMatmul._read_autotune_table(__class__.__name__ + "_2d_kernel", __class__._2d_kernel)
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 165, in _read_autotune_table
    cache_manager = AutotuneCacheManager(cache_key)
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 87, in __init__
    TritonCacheDir.warn_if_nfs(self.cache_dir)
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 44, in warn_if_nfs
    if is_nfs_path(cache_dir) and not TritonCacheDir._warning_printed:
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 27, in is_nfs_path
    output = subprocess.check_output(['df', '-T', path], encoding='utf-8')
  File "C:\Python310\lib\subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Python310\lib\subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Python310\lib\subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python310\lib\subprocess.py", line 1440, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

which is because it cannot find the command df in the function is_nfs_path.

After fixing this, I run build_win.bat again, and it shows:

Traceback (most recent call last):
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 477, in matmul_ext_update_autotune_table
    fp16_matmul._update_autotune_table()
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 454, in _update_autotune_table
    TritonMatmul._update_autotune_table(__class__.__name__ + "_2d_kernel", __class__._2d_kernel)
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 183, in _update_autotune_table
    cache_manager.put(autotune_table)
  File "C:\DeepSpeed\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 102, in put
    os.rename(self.file_path + ".tmp", self.file_path)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\wocto\\.triton\\autotune\\Fp16Matmul_2d_kernel.pickle.tmp' -> 'C:\\Users\\wocto\\.triton\\autotune\\Fp16Matmul_2d_kernel.pickle'

which is because of the problem of os.rename.

After fixing both errors in this PR, the wheel successfully builds.

Thanks @woct0rdho and apologies for the delayed review but we will get this merged.

@loadams loadams added this pull request to the merge queue Jan 8, 2025
Merged via the queue into microsoft:master with commit b62c84d Jan 8, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants