Topics Map > •Research Computing
CRC NOTS Expansion (NOTSx)
Introduction
The NOTS expansion ("NOTSx") is an update to the existing NOTS cluster, with new hardware running updated versions of most software. We will maintain it as a separate cluster until testing is complete and we can migrate existing NOTS nodes into the new configuration. After that, we will rename the combined cluster back to "NOTS."
For the most part, using NOTSx is similar to using NOTS, with a few differences which are noted in this document. See CRC Getting Started on NOTS for complete details on NOTS, and compare against this document for NOTSx particulars. We will update that document soon to reflect the new cluster.
Hardware
Type |
Nodes |
Hardware | CPU | Cores | Hyper- threaded | Total RAM (GiB) |
Available RAM (GiB) |
Available Disk (GiB) |
GPUs |
High Speed Network | Storage Network |
---|---|---|---|---|---|---|---|---|---|---|---|
Compute | 60 | HPE Proliant DL360 Gen11 | Intel(R) Xeon(R) Platinum 8468 | 96 | Yes | 256 | 251 | ~800 | - | Infiniband HDR 100 |
25 GbE |
GPU | 12 | HPE ProLiant DL380a Gen11 | Intel(R) Xeon(R) Platinum 8468 | 96 | Yes | 512 | 503 | ~800 | 4 x NVIDIA L40S (single-precision) | Infiniband HDR 100 |
25 GbE |
The new nodes have updated versions of software such as:
- the Operating System is Red Hat Enterprise Linux 9 (upgraded from RHEL 7 on NOTS)
- Slurm scheduler version 24 (upgraded from version 19 on NOTS
- CUDA 12.2 (upgraded from 11.0 on NOTS)
Logging into the Cluster
Use the name "notsx.rice.edu" instead of "nots.rice.edu".
If you have an active NOTS account, you can log into NOTSx via SSH using your NetID and password.
NOTSx has its own login node(s), currently separate from those of NOTS.
Data and Quotas
The filesystems are the same, and most are shared between NOTS and NOTSx. The main difference is that NOTSx stores home directories separately from NOTS, so you will start out on NOTSx with a nearly empty home directory.
NOTSx mounts your NOTS home directory read-only at /oldhome/$USER/
, so that you can copy any files you need to your NOTSx home.
We recommend carefully considering what files to copy over to NOTS. As an example, the following command would copy everything in your NOTS home directory to NOTSx. Be careful as it could overwrite any changes you have already made on NOTSx:
rsync -avx /oldhome/$USER/ /home/$USER
(Be sure to execute this command exactly as written, especially with regard to the placement of /
characters.)
Environment and Shells
Customizing Your Environment With the Module Command
We have installed many, but not all, of the same modules that are installed on NOTS, skipping older versions to install only the newer version of most modules.
If you need to run a version of a module, or an application package that is not on NOTSx, please let us know.
Anaconda Alternatives
Note that Anaconda modules are not installed on NOTSx. Alternative modules are:
- Miniforge3/24.1.2-0 -- provides the same
conda
command as Anaconda, without the huge "base" environment - Mamba/23.11.0-0 -- provides the
mamba
command, which works identically toconda
, but can be faster to calculate package dependencies
Both commands obtain packages from the conda-forge channel by default. You can create Python environments with either module, using commands such as:
conda create -n myenvironment numpy pandas
or
mamba create -n myenvironment numpy pandas
If you have trouble porting your existing environments over to NOTSx with these modules, please let us know.
Running Jobs with Slurm
Available Partitions and System Load
Account Name | Partition Name | Maximum CPUs Per Node | Maximum CPUs Per Job | Maximum jobs running per user | Maximum run time | What changed on NOTSx? |
---|---|---|---|---|---|---|
commons | long | 192 | TBD | 5000 | 72:00:00 | new queue |
commons | commons | 192 | TBD | 5000 | 24:00:00 | |
commons | debug | 192 | 384 | 100 | 00:30:00 | new name (was "interactive") |
commons | scavenge | 192 | TBD | 5000 | 01:00:00 | new maximum run time |
Slurm Batch Script Options
These are mostly similar to NOTS, with one operational difference:
The processors on NOTSx (as with many nodes on NOTS) support hyperthreading, where each compute core can run two threads of execution more-or-less at the same time. For memory or IO-bound computations, hyperthreading allows a single core to interleave the waiting time for data from memory or I/O between the two threads, and operate more efficiently. For CPU-bound computations, hyperthreading does not result in the expected 2x speed-up, and usually best performance is achieved by disabling hyperthreading.
By default, Slurm turns hyperthreading ON for each job, unless you specify --threads-per-core=1
.
The --cpus-per-task=N
flag actually requests Slurm to allocate N threads per task. If hyperthreading is ON, then Slurm allocates N/2 cores, allowing two threads to execute on each core. If hyperthreading is OFF, Slurm allocates N cores, distributing one thread to each.
The behavior of Slurm 19 on NOTS allows a job requesting --threads-per-core=1
to access both hyperthreads of each core, effectively allocating twice as many threads as requested. Slurm 24 on NOTSx limits a job requesting --threads-per-core=1
to using one thread per core. This new behavior is more accurate, but different from what you may be used to on NOTS.
Compiling and Optimizing
If you run code on NOTS that you compiled yourself, especially code that uses MPI, we recommend you recompile it for NOTSx.
NOTSx has newer system libraries, MPI libraries, compilers, etc., so some executable programs compiled for NOTS may not work properly or efficiently on NOTSx.
Please note: When you recompile, you should not overwrite the compiler output files (*.o
files, executable programs, etc.), but save the NOTSx files separately from the NOTS files. This way, you can still run your code on both systems.
Everything Else
Everything else should work the same between the two clusters.
If you encounter something that does not work as expected, or if we need to clarify how something works on NOTSx, please let us know.