Topics Map > •Research Computing

CRC NOTS Expansion (NOTSx)

Details on the NOTS expansion, called NOTSx, and how it differs from the current NOTS cluster.

Introduction

The NOTS expansion ("NOTSx") is an update to the existing NOTS cluster, with new hardware running updated versions of most software. We will maintain it as a separate cluster until testing is complete and we can migrate existing NOTS nodes into the new configuration. After that, we will rename the combined cluster back to "NOTS."

For the most part, using NOTSx is similar to using NOTS, with a few differences which are noted in this document. See CRC Getting Started on NOTS for complete details on NOTS, and compare against this document for NOTSx particulars. We will update that document soon to reflect the new cluster.

Hardware

NOTSx Nodes
Type
Nodes
Hardware CPU Cores Hyper- threaded Total RAM (GiB)

Available RAM (GiB)

Available Disk (GiB)
GPUs
High Speed Network Storage Network
Compute 60 HPE Proliant DL360 Gen11 Intel(R) Xeon(R) Platinum 8468 96 Yes 256 251 ~800 - Infiniband
HDR 100
25 GbE
GPU 12 HPE ProLiant DL380a Gen11 Intel(R) Xeon(R) Platinum 8468 96 Yes 512 503 ~800 4 x NVIDIA L40S (single-precision) Infiniband
HDR 100
25 GbE

The new nodes have updated versions of software such as:

  • the Operating System is Red Hat Enterprise Linux 9 (upgraded from RHEL 7 on NOTS)
  • Slurm scheduler version 24 (upgraded from version 19 on NOTS
  • CUDA 12.2 (upgraded from 11.0 on NOTS)

Logging into the Cluster

Use the name "notsx.rice.edu" instead of "nots.rice.edu".

If you have an active NOTS account, you can log into NOTSx via SSH using your NetID and password.

NOTSx has its own login node(s), currently separate from those of NOTS.

Data and Quotas

The filesystems are the same, and most are shared between NOTS and NOTSx. The main difference is that NOTSx stores home directories separately from NOTS, so you will start out on NOTSx with a nearly empty home directory.

NOTSx mounts your NOTS home directory read-only at /oldhome/$USER/, so that you can copy any files you need to your NOTSx home.

We recommend carefully considering what files to copy over to NOTS.  As an example, the following command would copy everything in your NOTS home directory to NOTSx.  Be careful as it could overwrite any changes you have already made on NOTSx:

rsync -avx /oldhome/$USER/ /home/$USER

(Be sure to execute this command exactly as written, especially with regard to the placement of / characters.)

Environment and Shells

Customizing Your Environment With the Module Command

We have installed many, but not all, of the same modules that are installed on NOTS, skipping older versions to install only the newer version of most modules.

If you need to run a version of a module, or an application package that is not on NOTSx, please let us know.

Anaconda Alternatives

Note that Anaconda modules are not installed on NOTSx.  Alternative modules are:

  • Miniforge3/24.1.2-0 -- provides the same conda command as Anaconda, without the huge "base" environment
  • Mamba/23.11.0-0 -- provides the mamba command, which works identically to conda, but can be faster to calculate package dependencies

Both commands obtain packages from the conda-forge channel by default.  You can create Python environments with either module, using commands such as:

conda create -n myenvironment numpy pandas

or

mamba create -n myenvironment numpy pandas

If you have trouble porting your existing environments over to NOTSx with these modules, please let us know.

Running Jobs with Slurm

Available Partitions and System Load

NOTSx Partitions
Account Name Partition Name Maximum CPUs Per Node Maximum CPUs Per Job Maximum jobs running per user Maximum run time What changed on NOTSx?
commons long 192 TBD 5000 72:00:00 new queue
commons commons 192 TBD 5000 24:00:00
commons debug 192 384 100 00:30:00 new name (was "interactive")
commons scavenge 192 TBD 5000 01:00:00 new maximum run time

Slurm Batch Script Options

These are mostly similar to NOTS, with one operational difference:

The processors on NOTSx (as with many nodes on NOTS) support hyperthreading, where each compute core can run two threads of execution more-or-less at the same time. For memory or IO-bound computations, hyperthreading allows a single core to interleave the waiting time for data from memory or I/O between the two threads, and operate more efficiently. For CPU-bound computations, hyperthreading does not result in the expected 2x speed-up, and usually best performance is achieved by disabling hyperthreading.

By default, Slurm turns hyperthreading ON for each job, unless you specify --threads-per-core=1.

The --cpus-per-task=N flag actually requests Slurm to allocate N threads per task. If hyperthreading is ON, then Slurm allocates N/2 cores, allowing two threads to execute on each core. If hyperthreading is OFF, Slurm allocates N cores, distributing one thread to each.

The behavior of Slurm 19 on NOTS allows a job requesting --threads-per-core=1 to access both hyperthreads of each core, effectively allocating twice as many threads as requested. Slurm 24 on NOTSx limits a job requesting --threads-per-core=1 to using one thread per core. This new behavior is more accurate, but different from what you may be used to on NOTS.

Compiling and Optimizing

If you run code on NOTS that you compiled yourself, especially code that uses MPI, we recommend you recompile it for NOTSx.

NOTSx has newer system libraries, MPI libraries, compilers, etc., so some executable programs compiled for NOTS may not work properly or efficiently on NOTSx.

Please note:  When you recompile, you should not overwrite the compiler output files (*.o files, executable programs, etc.), but save the NOTSx files separately from the NOTS files.  This way, you can still run your code on both systems.

Everything Else

Everything else should work the same between the two clusters.

If you encounter something that does not work as expected, or if we need to clarify how something works on NOTSx, please let us know.



Keywords:
crc notsx cluster gpu slurm file system home projects scratch work login node compute thread hyperthread compile 
Doc ID:
142746
Owned by:
Bryan R. in Rice U
Created:
2024-10-01
Updated:
2024-10-30
Sites:
Rice University