diff --git a/.gitignore b/.gitignore index 34e23272b2..77254e2567 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,6 @@ +# agent memory +.memsearch/* + # pixi environments .pixi *.egg-info diff --git a/.memsearch/memory/2026-05-14.md b/.memsearch/memory/2026-05-14.md deleted file mode 100644 index 2c35cd48be..0000000000 --- a/.memsearch/memory/2026-05-14.md +++ /dev/null @@ -1,3 +0,0 @@ - -## Session 17:45 - diff --git a/docs/hpc/14_tutorial_apptainer/01_intro_apptainer.mdx b/docs/hpc/14_tutorial_apptainer/01_intro_apptainer.mdx new file mode 100644 index 0000000000..8cb30c204d --- /dev/null +++ b/docs/hpc/14_tutorial_apptainer/01_intro_apptainer.mdx @@ -0,0 +1,32 @@ +# Introduction to Apptainer on Torch + +## Why Containers? +Researchers often rely on complex software stacks that include programming languages and libraries. Installing and maintaining these dependencies can be challenging when different projects require different software versions. + +Containers provide a way to package software together with the environment it needs to run. Instead of manually installing every dependency on a system, users can run software inside a container that already includes the required libraries and tools. + + +## What Is Apptainer? +Apptainer is a container platform that allows users to run software inside isolated environments without requiring administrator privileges on the host system. + +Apptainer is the continuation of the Singularity project. The open-source Singularity project was renamed to Apptainer and continues to be developed under the Linux Foundation. + +Unlike Docker, Apptainer does not require a privileged daemon running on the system. This makes it well suited for shared HPC environments where security and multi-user access are important considerations. + +Torch uses Apptainer as its supported container platform. Many container images distributed through Docker registries can be used directly with Apptainer, allowing researchers to take advantage of existing software environments while working on the cluster. + +## What Are `.sif` And `.sqf` Files + +Torch provides container images with both `.sif` and `.sqf` extensions. + +`.sif` is the standard Apptainer image format and can be used directly with commands such as `apptainer exec`, `apptainer run`, and `apptainer shell`. + +Some Torch-provided application images use `.sqf` (Squash File System) files and are typically accessed through wrapper scripts such as `run-anaconda3-2024.10-1.bash`. These scripts handle the details of launching the application environment and are often the recommended interface for users. + +When available, use the wrapper script documented for that application. For general Apptainer examples in this tutorial, we use `.sif` images because they can be launched directly with standard Apptainer commands. + +## Apptainer Image Cache + +While Apptainer doesn’t have a local image repository in the same way as Docker, it does cache downloaded image files. Images are simply `.sif` files stored on your local disk. + +If you delete a local `.sif` image that you have pulled from a remote image repository and then pull it again, if the image is unchanged from the version you previously pulled, you will be given a copy of the image file from your local cache rather than the image being downloaded again from the remote source. This removes unnecessary network transfers and is particularly useful for large images which may take some time to transfer over the network. diff --git a/docs/hpc/14_tutorial_apptainer/02_running_containers.mdx b/docs/hpc/14_tutorial_apptainer/02_running_containers.mdx new file mode 100644 index 0000000000..2628e27dcb --- /dev/null +++ b/docs/hpc/14_tutorial_apptainer/02_running_containers.mdx @@ -0,0 +1,115 @@ +# Using Apptainer to Run Commands on Torch + +:::warning + +Container workloads should be run on compute nodes rather than login nodes. + +While simple commands may work on a login node, pulling images, launching software, installing packages, or building environments can consume significant CPU, memory, and storage resources. These activities should be performed within an interactive Slurm allocation or a batch job. +::: + +Torch provides many prebuilt container images under: + +```bash +ls /share/apps/images/ +``` + +:::note + +Torch provides container images with both `.sif` and `.sqf` extensions. + +`.sif` is the standard Apptainer image format. Some Torch-provided application images use `.sqf` and may be intended to be launched through wrapper scripts such as `run-anaconda3-2024.10-1.bash`. + +When available, use the wrapper script documented for that application. For general Apptainer examples in this tutorial, we use `.sif` images because they work directly with `apptainer exec`, `apptainer run`, and `apptainer shell`. + +::: + +For this tutorial, we will use the Ubuntu 24.04 image that is already available on the cluster. + +## Running Your First Container + +Apptainer images can define a default action that runs when the container starts. + +To launch a container and execute its default action, use `apptainer run`: + +```bash +apptainer run /share/apps/images/ubuntu-24.04.3.sif +``` + +Depending on how the image was built, this command may produce output, launch an application, or simply start and exit. + +The important point is that with `apptainer run`, Apptainer executes the default action defined by the image creator. + +To run a specific command, use `apptainer exec`. + +## Running Specific Commands Within a Container + +`apptainer exec` allows us to specify exactly what command should run inside the container. + +For example: + +```bash +apptainer exec /share/apps/images/ubuntu-24.04.3.sif /bin/echo "Hello World!" +``` + +Output: + +```text +Hello World! +``` + +## The Difference Between `apptainer run` and `apptainer exec` + +Both `apptainer run` and `apptainer exec` start a container, but they serve different purposes. + +`apptainer run` executes the default action defined by the image creator. Depending on how the image was built, this may launch an application, run a script, or perform another predefined task. + +`apptainer exec` allows you to specify exactly which command should run inside the container. Rather than relying on the image's default behavior, you provide the command directly. + +## Opening an Interactive Shell Within a Container + +There are many reasons why you might want to use a container interactively. +- debugging (software, bind mounts, hardware integrations, etc.) +- testing software upgrades +- rapid Software Prototyping +- data exploration + +Apptainer provides the `apptainer shell` command for this purpose. + +Launch a shell inside the Ubuntu container: + +```bash +apptainer shell /share/apps/images/ubuntu-24.04.3.sif +``` + +You should see a prompt similar to: + +```text +Singularity> +``` + +You can now run commands inside the container: + +```bash +whoami +pwd +cat /etc/os-release +``` + +Example output: + +```text +PRETTY_NAME="Ubuntu 24.04.3 LTS" +NAME="Ubuntu" +VERSION_ID="24.04" +... +``` + +Notice that the prompt changes to indicate that you are working inside the container environment. + +When you are finished, leave the container with: + +```bash +exit +``` + +This returns you to your normal shell on Torch. diff --git a/docs/hpc/14_tutorial_apptainer/03_apptainer_files.mdx b/docs/hpc/14_tutorial_apptainer/03_apptainer_files.mdx new file mode 100644 index 0000000000..8361ae17e8 --- /dev/null +++ b/docs/hpc/14_tutorial_apptainer/03_apptainer_files.mdx @@ -0,0 +1,53 @@ +# Files in Apptainer Containers +Apptainer is designed to work closely with the host filesystem. In most cases, your home directory and current working directory remain accessible from within the container. + +## Accessing Your Files + +While inside a container, check your current directory: + +```bash +pwd +``` + +You can also list files in your home directory: + +```bash +ls ~ +``` + +The files and directories you see should match those available outside the container. + +:::note +Files created in these mounted directories remain available after the container exits. +::: + +## Binding Additional Directories + +Sometimes you may need access to directories through a different path inside the container. + +Apptainer allows additional directories to be mounted using the `-B` option: + +```bash +apptainer shell \ + -B /scratch/:/data \ + /share/apps/images/ubuntu-24.04.3.sif +``` + +In this example, your scratch directory on the host system is made available as `/data` inside the container. + +After entering the container, you can verify the mount: + +```bash +ls /data +``` + +Any files stored in `/scratch/` on the host system will be accessible through `/data` inside the container. + +The source and destination paths do not need to be the same. This can be useful when an application expects data to be located in a specific directory within the container. + +:::info + +Container images are typically read-only. Research data, scripts, notebooks, and output files usually remain outside the container. + +By making host directories available inside the container, Apptainer allows applications to access data stored on Torch while maintaining a reproducible software environment. +::: diff --git a/docs/hpc/14_tutorial_apptainer/04_docker_images.mdx b/docs/hpc/14_tutorial_apptainer/04_docker_images.mdx new file mode 100644 index 0000000000..9896f07333 --- /dev/null +++ b/docs/hpc/14_tutorial_apptainer/04_docker_images.mdx @@ -0,0 +1,47 @@ +# Using Docker Images with Apptainer + +So far, we have used container images that are already available on Torch under `/share/apps/images`. + +In practice, you may also want to run software that is not provided by the cluster. Apptainer can pull images directly from Docker registries and convert them into the Apptainer `SIF` format. + +:::warning +Pulling container images can require significant network bandwidth and disk space. On Torch, image downloads should be performed on compute nodes rather than login nodes. +::: + +For example, we can pull an official PyTorch image from Docker Hub: + +```bash +apptainer pull pytorch.sif docker://pytorch/pytorch:latest +``` + +During the pull process, Apptainer downloads the Docker image layers and converts them into a single `SIF` image: + +```bash +INFO: Converting OCI blobs to SIF format +INFO: Starting build... +INFO: Fetching OCI image... +... +INFO: Creating SIF file... +``` + +The output shows that Apptainer is downloading the Docker image layers and converting them into a single `SIF` image. Once the conversion completes, the resulting `SIF` file can be used without Docker. When the command completes, a new image named `pytorch.sif` will be created in the current directory. + +You can verify that the image exists: + +```bash +ls -lh pytorch.sif +``` + +The image can now be used like any other Apptainer image. + +For example: + +```bash +apptainer exec pytorch.sif python --version +``` + +or + +```bash +apptainer exec pytorch.sif python -c "import torch; print(torch.__version__)" +``` diff --git a/docs/hpc/14_tutorial_apptainer/05_customize_container_environments.mdx b/docs/hpc/14_tutorial_apptainer/05_customize_container_environments.mdx new file mode 100644 index 0000000000..5e666d11af --- /dev/null +++ b/docs/hpc/14_tutorial_apptainer/05_customize_container_environments.mdx @@ -0,0 +1,77 @@ +# Why Customize a Container Environment? +Prebuilt container images provide a convenient starting point, but they may not contain all of the software required for a particular project. + +Most Apptainer images are distributed as `.sif` files, which are typically read-only. If you need to install additional software or make persistent changes to a container, Apptainer provides overlays, which add a writable layer on top of the original image while leaving the `.sif` file unchanged. + +## Example: Creating a Custom Directory with an Overlay + + +Pull a Python container image from Docker Hub: + +```bash +apptainer pull python.sif docker://python:3.10 +``` + +Create a writable overlay: + +```bash +apptainer overlay create --size 1024 python_overlay.ext3 +``` + +Launch the container with the overlay attached: + +```bash +apptainer shell \ + --overlay python_overlay.ext3:rw \ + python.sif +``` + +Verify that `numpy` is not installed in the base image: + +```bash +python -c "import numpy" +``` + +You should see: + +```text +ModuleNotFoundError: No module named 'numpy' +``` + +Install `numpy`: + +```bash +pip install numpy +``` + +Verify that the packages are available: + +```bash +python -c "import numpy; print(numpy.__version__)" +``` + +Exit the container: + +```bash +exit +``` + +The next time you want to use the customized environment, launch the container using the same overlay: + +```bash +apptainer shell \ + --overlay python_overlay.ext3:rw \ + python.sif +``` + +This demonstrates that software installed within the overlay persists across sessions, while the original `.sif` image remains unchanged. + +:::info +In the examples above we started a shell session with the overlay in read-write mode (`:rw`). +::::warning +Only open the overlay in writable mode when you are adding packages to it. When not loading packages you should use read-only mode (`:ro`). + +If you leave the overlay open in read-write mode no other process will be able to open it in either read-write or read-only mode. +:::: +read-only overlays can be used by multiple processes which is why they are useful for parallel processing. +::: diff --git a/docs/hpc/14_tutorial_apptainer/_category_.json b/docs/hpc/14_tutorial_apptainer/_category_.json new file mode 100644 index 0000000000..df501c11cf --- /dev/null +++ b/docs/hpc/14_tutorial_apptainer/_category_.json @@ -0,0 +1,3 @@ +{ + "label": "Tutorial: Apptainer on Torch" +} diff --git a/docs/hpc/14_support/01_support.md b/docs/hpc/15_support/01_support.md similarity index 82% rename from docs/hpc/14_support/01_support.md rename to docs/hpc/15_support/01_support.md index fe100daa67..5769d2bf96 100644 --- a/docs/hpc/14_support/01_support.md +++ b/docs/hpc/15_support/01_support.md @@ -3,8 +3,8 @@ - Some of your questions may be already answered here - [Tutorial: Introduction to Using the Shell on Torch](../12_tutorial_intro_shell_hpc/01_intro.mdx) - [Tutorial: Introduction to High-Performance Computing](../13_tutorial_intro_hpc/01_intro_hpc.mdx) - - Consider to sign up for Training and Workshop. You can find the list of available HPC coruses [can be viewed at nyu.libcal.com](https://nyu.libcal.com/calendar?cid=1564&t=d&d=0000-00-00&cal=1564&ct=6016). - - Consider signing up for ACCESS workshops. As part of the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support program, NSF provides tutorials for HPC, OpenOnDemand, etc. Here's a list of upcoming workshops: [link](https://support.access-ci.org/events). + - Consider to sign up for Training and Workshop. You can find the list of available HPC courses [can be viewed at nyu.libcal.com](https://nyu.libcal.com/calendar?cid=1564&t=d&d=0000-00-00&cal=1564&ct=6016). + - Consider signing up for ACCESS workshops. As part of the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support program, NSF provides tutorials for HPC, `OpenOnDemand`, etc. Here's a list of upcoming workshops: [link](https://support.access-ci.org/events). - [Introductory HPC Video Playlist](https://www.youtube.com/watch?v=0pP_TeKH1MI&list=PL5l6Qz3Xhfi9Jn9-iMKJisYsSW5tRzPSd&t=3s). - NYU HPC offers personalized help through **personal consultations** for simple and advanced cases: diff --git a/docs/hpc/14_support/_category_.json b/docs/hpc/15_support/_category_.json similarity index 100% rename from docs/hpc/14_support/_category_.json rename to docs/hpc/15_support/_category_.json