sudo apt-get install
packages for me, and so forth. Setting up projects took days. Since then, the NLP community has converged around the more unified transformers
ecosystem, and the broader AI community is now mostly PyTorch-centric and Docker-free.
These are great steps forward, but we are still a long way from "frustration-free packaging". Even recently, I tried cloning someone's project and found myself unsuccessfully debugging their (now out-of-date) conda installation for several hours.
Eventually, I relented and rewrote the installation using the methods that I will describe in this post. I was successful after just 10 more minutes. By using these approaches, I've also encountered far fewer issues in my own projects over the last two years.[my_project]/.venv
), which is easier to maintain than a faraway ~/envs/[my_project]
.
(a) Don't rely on sudo
or software already installed on your server. Your machine may have python3.12
and CUDA 12.3; other machines may not. List these in your project dependencies and rely entirely on the virtual environment instead. (Refer to the appendices later for instructions about including non-Python dependencies, like CUDA.)
pyproject.toml
. This is the current standard for Python project metadata (so don't use requirements.txt
, setup.py
, or setup.cfg
). To start, list all dependencies that you plan to directly import (+ anything else not implicitly covered) and pin their exact versions (e.g. torch==2.5.1
). These are the set of dependencies you are guaranteeing your codebase will work for.
my-project v0.1.0
├── numpy v2.2.2
├── torch v2.5.1
│ ├── filelock v3.16.1
│ ├── fsspec v2024.12.0
│ ├── jinja2 v3.1.5
│ │ └── markupsafe v3.0.2
│ ├── networkx v3.4.2
│ ├── nvidia-cublas-cu12 v12.4.5.8
│ ├── nvidia-cuda-cupti-cu12 v12.4.127
│ ├── nvidia-cuda-nvrtc-cu12 v12.4.127
│ ├── nvidia-cuda-runtime-cu12 v12.4.127
│ ├── nvidia-cudnn-cu12 v9.1.0.70
│ │ └── nvidia-cublas-cu12 v12.4.5.8
│ ├── nvidia-cufft-cu12 v11.2.1.3
│ │ └── nvidia-nvjitlink-cu12 v12.4.127
│ ├── nvidia-curand-cu12 v10.3.5.147
│ ├── nvidia-cusolver-cu12 v11.6.1.9
│ │ ├── nvidia-cublas-cu12 v12.4.5.8
│ │ ├── nvidia-cusparse-cu12 v12.3.1.170
│ │ │ └── nvidia-nvjitlink-cu12 v12.4.127
│ │ └── nvidia-nvjitlink-cu12 v12.4.127
│ ├── nvidia-cusparse-cu12 v12.3.1.170 (*)
│ ├── nvidia-nccl-cu12 v2.21.5
│ ├── nvidia-nvjitlink-cu12 v12.4.127
│ ├── nvidia-nvtx-cu12 v12.4.127
│ ├── setuptools v75.8.0
│ ├── sympy v1.13.1
│ │ └── mpmath v1.3.0
│ ├── triton v3.1.0
│ │ └── filelock v3.16.1
│ └── typing-extensions v4.12.2
└── torchvision v0.20.1
├── numpy v2.2.2
├── pillow v11.1.0
└── torch v2.5.1 (*)
(*) Package tree already displayed
.env
file.
uv
and as opposed to de facto tools like pip
— are actually more holistic, like project managers, and can automate the above steps for us. I'll just discuss a small set of features. One thing you'll notice is that uv
is "10-100x faster than pip
". uv
is also becoming extremely popular.
uv
(and restart your shell):
curl -LsSf https://astral.sh/uv/install.sh | sh
uv init my-project --package --python "3.12"
cd my-project
uv add "torch==2.5.1" "torchvision==0.20.1" "numpy==2.2.2" # or "uv sync" if not adding deps
.env
file: it's empty for now, but you can add any useful environment variables here: e.g. export TORCH_HOME=.cache/torch
. We will load these every time we activate our project environmet.
touch .env
src/my_project
and Python files that you plan to run from the terminal in scripts
.
mkdir scripts
my_project
into your scripts. For example, we can create a scripts/hello_world.py
that calls main
from src/my_project/__init__.py
. We'll run this later.
import my_project
if __name__ == "__main__":
my_project.main()
my-project
├── .env
├── .git
├── .gitignore
├── pyproject.toml
├── .python-version
├── README.md
├── scripts
│ └── hello_world.py
├── src
│ └── my_project
│ └── __init__.py
├── uv.lock
└── .venv
src/my_project/__init__.py
def main() -> None:
print("Hello from my-project!")
.python-version
3.12
pyproject.toml
[project]
name = "my-project"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
authors = [ ... ]
requires-python = ">=3.12"
dependencies = [
"numpy==2.2.2",
"torch==2.5.1",
"torchvision==0.20.1",
]
[project.scripts]
my-project = "my_project:main"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
Lockfile (uv.lock)
Virtual Environment
.venv
├── bin
│ ├── activate
│ ├── activate.bat
│ ├── activate.csh
│ ├── activate.fish
│ ├── activate.nu
│ ├── activate.ps1
│ ├── activate_this.py
│ ├── convert-caffe2-to-onnx
│ ├── convert-onnx-to-caffe2
│ ├── deactivate.bat
│ ├── f2py
│ ├── isympy
│ ├── numpy-config
│ ├── proton
│ ├── proton-viewer
│ ├── pydoc.bat
│ ├── python -> ~/.local/share/uv/python/cpython-3.12.7-linux-x86_64-gnu/bin/python3.12
│ ├── python3 -> python
│ ├── python3.12 -> python
│ ├── torchfrtrace
│ └── torchrun
├── CACHEDIR.TAG
├── lib
│ └── python3.12
│ └── site-packages
│ ├── _distutils_hack
│ ├── distutils-precedence.pth
│ ├── filelock
│ ├── filelock-3.16.1.dist-info
│ ├── fsspec
│ ├── fsspec-2024.12.0.dist-info
│ ├── functorch
│ ├── isympy.py
│ ├── jinja2
│ ├── jinja2-3.1.5.dist-info
│ ├── markupsafe
│ ├── MarkupSafe-3.0.2.dist-info
│ ├── mpmath
│ ├── mpmath-1.3.0.dist-info
│ ├── networkx
│ ├── networkx-3.4.2.dist-info
│ ├── numpy
│ ├── numpy-2.2.2.dist-info
│ ├── numpy.libs
│ ├── nvidia
│ ├── nvidia_cublas_cu12-12.4.5.8.dist-info
│ ├── nvidia_cuda_cupti_cu12-12.4.127.dist-info
│ ├── nvidia_cuda_nvrtc_cu12-12.4.127.dist-info
│ ├── nvidia_cuda_runtime_cu12-12.4.127.dist-info
│ ├── nvidia_cudnn_cu12-9.1.0.70.dist-info
│ ├── nvidia_cufft_cu12-11.2.1.3.dist-info
│ ├── nvidia_curand_cu12-10.3.5.147.dist-info
│ ├── nvidia_cusolver_cu12-11.6.1.9.dist-info
│ ├── nvidia_cusparse_cu12-12.3.1.170.dist-info
│ ├── nvidia_nccl_cu12-2.21.5.dist-info
│ ├── nvidia_nvjitlink_cu12-12.4.127.dist-info
│ ├── nvidia_nvtx_cu12-12.4.127.dist-info
│ ├── PIL
│ ├── pillow-11.1.0.dist-info
│ ├── pillow.libs
│ ├── pkg_resources
│ ├── setuptools
│ ├── setuptools-75.8.0.dist-info
│ ├── sympy
│ ├── sympy-1.13.1.dist-info
│ ├── torch
│ ├── torch-2.5.1.dist-info
│ ├── torchgen
│ ├── torchvision
│ ├── torchvision-0.20.1.dist-info
│ ├── torchvision.libs
│ ├── triton
│ ├── triton-3.1.0.dist-info
│ ├── typing_extensions-4.12.2.dist-info
│ ├── typing_extensions.py
│ ├── _virtualenv.pth
│ └── _virtualenv.py
├── lib64 -> lib
├── pyvenv.cfg
└── share
└── man
└── man1
└── isympy.1
How should you pick versions? I think it's usually safe to pin dependencies to their latest available version. And, you should choose a new-ish Python version supported by all your dependencies. For example, the latest PyTorch (torch==2.5.1
) supports up to Python 3.12 (but not the very latest 3.13). So I usually go with--python "3.12"
.
source .venv/bin/activate && source .env
You need to run this in every new terminal. The python
command will then exclusively refer to our virtual environment. For example:
python -c "import torch; print(torch.__file__)"
# [...] .venv/lib/python3.12/site-packages/torch/__init__.py
python scripts/hello_world.py
!
Finally, you can useuv add
oruv remove
to update dependencies. Or editpyproject.toml
and runuv sync
.
pip install
are pre-built and packaged as "wheels". These are usually easy to download and extract.easynmt
for machine translation. It depends on fasttext, which builds from C++ code at install time. The build failed on my system, because my system's compiler was too old to support C++17.uv add easynmt fasttext-wheel
while excluding fasttext
via my pyproject.toml
:[tool.uv]
override-dependencies = [
"fasttext ; sys_platform == 'never'",
]
uv pip install flash-attn --no-build-isolation
) is supposed to install the corresponding wheel (if it exists) or build it otherwise.
However — if the CUDA toolkit is not installed — the setup fails even when building is not necessary. This is not ideal, because installing the pre-built wheel does not actually need this toolkit (and neither does PyTorch).torch.version.cuda
) and simply run:
# observe keywords in the file name: cu12 ... torch2.5 ... cp312
uv add "https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.5cxx11abiFALSE-cp312-cp312-linux_x86_64.whl"
uv
and wants to use your codebase. They'll simply clone your repo and run:
uv sync
uv
will reproduce your virtual environment on their machine. After they activate the environment (source .venv/bin/activate && source .env
), they will be able to run your python
scripts.mail [at] apoorvkh.com
) if you have any questions or suggestions!nvcc -V
.
nvidia-smi
.
deepspeed
actually does need the whole CUDA toolkit and also the gcc
& gxx
compilers.
In this case, I recommend using pixi
instead. Like uv
, it can install Python packages from PyPI; but it can also install non-Python dependencies from conda
repositories to the virtual environment.
pixi
with: curl -fsSL https://pixi.sh/install.sh | bash
[my_project]/.pixi
).
Since PyTorch 2.5.1 from PyPI is built with CUDA 12.4, we will install that version of the toolkit.
pixi init my-project --format pyproject
cd my-project
pixi project channel add "conda-forge" "nvidia/label/cuda-12.4.0"
pixi add "python=3.12.7" "cuda=12.4.0" "gcc=11.4.0" "gxx=11.4.0"
pixi add --pypi "torch==2.5.1" "torchvision==0.20.1" "deepspeed==0.16.2" "numpy==2.2.2"
mkdir scripts
and add the following environment variables to your .env
file:
export PYTHONNOUSERSITE="1"
export LIBRARY_PATH="$CONDA_PREFIX/lib"
export LD_LIBRARY_PATH="$CONDA_PREFIX/lib"
export CUDA_HOME="$CONDA_PREFIX"
pixi shell
source .env
pixi shell
.