CRC NOTS Expansion (NOTSx)

Details on the NOTS expansion, called NOTSx, and how it differs from the current NOTS cluster.

Introduction

The NOTS expansion ("NOTSx") is an update to the existing NOTS cluster, with new hardware running updated versions of most software. We will maintain it as a separate cluster until testing is complete and we can migrate existing NOTS nodes into the new configuration. After that, we will rename the combined cluster back to "NOTS."

For the most part, using NOTSx is similar to using NOTS, with a few differences which are noted in this document. See CRC Getting Started on NOTS (OLD) for complete details on the old configuration of NOTS, and compare against this document for NOTSx particulars. We will update that document soon to reflect the new cluster.

Hardware

**NOTSx Nodes**
Type	Nodes	Hardware	CPU	Cores	Hyper- threaded	Total RAM (GiB)	Available RAM (GiB)	Available Disk (GiB)	GPUs	High Speed Network	Storage Network
Compute	60	HPE Proliant DL360 Gen11	Intel(R) Xeon(R) Platinum 8468	96	Yes	256	251	~800	-	Infiniband HDR 100	25 GbE
GPU	12	HPE ProLiant DL380a Gen11	Intel(R) Xeon(R) Platinum 8468	96	Yes	512	503	~800	4 x NVIDIA L40S (single-precision)	Infiniband HDR 100	25 GbE

The new nodes have updated versions of software such as:

the Operating System is Red Hat Enterprise Linux 9 (upgraded from RHEL 7 on NOTS)
Slurm scheduler version 24 (upgraded from version 19 on NOTS
CUDA 12.2 (upgraded from 11.0 on NOTS)

Logging into the Cluster

Use the name "notsx.rice.edu" instead of "nots.rice.edu".

If you have an active NOTS account, you can log into NOTSx via SSH using your NetID and password.

NOTSx has its own login node(s), currently separate from those of NOTS.

Data and Quotas

The filesystems are the same, and most are shared between NOTS and NOTSx. The main difference is that NOTSx stores home directories separately from NOTS, so you will start out on NOTSx with a nearly empty home directory.

NOTSx mounts your NOTS home directory read-only at /oldhome/$USER/, so that you can copy any files you need to your NOTSx home.

We recommend carefully considering what files to copy over to NOTS. As an example, the following command would copy everything in your NOTS home directory to NOTSx. Be careful as it could overwrite any changes you have already made on NOTSx:

rsync -avx /oldhome/$USER/ /home/$USER

(Be sure to execute this command exactly as written, especially with regard to the placement of / characters.)

Environment and Shells

Customizing Your Environment With the Module Command

We have installed many, but not all, of the same modules that are installed on NOTS, skipping older versions to install only the newer version of most modules.

If you need to run a version of a module, or an application package that is not on NOTSx, please let us know.

Anaconda Alternatives

Note that Anaconda modules are not installed on NOTSx. Alternative modules are:

Miniforge3/24.1.2-0 -- provides the same conda command as Anaconda, without the huge "base" environment
Mamba/23.11.0-0 -- provides the mamba command, which works identically to conda, but can be faster to calculate package dependencies

Both commands obtain packages from the conda-forge channel by default. You can create Python environments with either module, using commands such as:

conda create -n myenvironment numpy pandas

mamba create -n myenvironment numpy pandas

If you have trouble porting your existing environments over to NOTSx with these modules, please let us know.

Running Jobs with Slurm

Available Partitions and System Load

**NOTSx Partitions**
Account Name	Partition Name	Maximum CPUs Per Node	Maximum CPUs Per Job	Maximum jobs running per user	Maximum run time	What changed on NOTSx?
commons	long	192	TBD	5000	72:00:00	new queue
commons	commons	192	TBD	5000	24:00:00
commons	debug	192	384	100	00:30:00	new name (was "interactive")
commons	scavenge	192	TBD	5000	01:00:00	new maximum run time

Slurm Batch Script Options

These are mostly similar to NOTS, with one operational difference:

The processors on NOTSx (as with many nodes on NOTS) support hyperthreading, where each compute core can run two threads of execution more-or-less at the same time. For memory or IO-bound computations, hyperthreading allows a single core to interleave the waiting time for data from memory or I/O between the two threads, and operate more efficiently. For CPU-bound computations, hyperthreading does not result in the expected 2x speed-up, and usually best performance is achieved by disabling hyperthreading.

By default, Slurm turns hyperthreading ON for each job, unless you specify --threads-per-core=1.

The --cpus-per-task=N flag actually requests Slurm to allocate N threads per task. If hyperthreading is ON, then Slurm allocates N/2 cores, allowing two threads to execute on each core. If hyperthreading is OFF, Slurm allocates N cores, distributing one thread to each.

The behavior of Slurm 19 on NOTS allows a job requesting --threads-per-core=1 to access both hyperthreads of each core, effectively allocating twice as many threads as requested. Slurm 24 on NOTSx limits a job requesting --threads-per-core=1 to using one thread per core. This new behavior is more accurate, but different from what you may be used to on NOTS.

Compiling and Optimizing

If you run code on NOTS that you compiled yourself, especially code that uses MPI, we recommend you recompile it for NOTSx.

NOTSx has newer system libraries, MPI libraries, compilers, etc., so some executable programs compiled for NOTS may not work properly or efficiently on NOTSx.

Please note: When you recompile, you should not overwrite the compiler output files (*.o files, executable programs, etc.), but save the NOTSx files separately from the NOTS files. This way, you can still run your code on both systems.

Everything Else

Everything else should work the same between the two clusters.

If you encounter something that does not work as expected, or if we need to clarify how something works on NOTSx, please let us know.

Keywords:

crc notsx cluster gpu slurm file system home projects scratch work login node compute thread hyperthread compile

Doc ID:

142746

Owned by:

Bryan R. in Rice U

Created:

2024-10-01

Updated:

2025-02-11

Sites:

Rice University

6 0 Comment Suggest new doc