This document describes the different filesystems accessible on the RANGE cluster and the quotas applied to them
RANGE utilizes several different filesystems to store and process data, each with its own specialized function. A summary of all filesystems available to RANGE users is presented in the following table:
RANGE filesystem summary
| Filesystem |
Accessed via environment variable |
Physical Path |
Size |
Quota |
Type |
Purge Policy |
| Home directories |
$HOME |
/home |
256 TB |
50 GB |
NFS |
none |
| Shared Scratch high performance I/O |
$SHARED_SCRATCH |
/scratch |
600 TB |
None |
VAST |
14 days |
| Local Scratch on each node |
$LOCAL_SCRATCH |
/tmp |
872 GB |
None |
Local |
at the end of each job |
| RHF |
N/A |
/rhf/allocations |
5 PB |
Varies by group |
NFS |
none |
Checking disk utilization
Use the df -h command on a given filesystem to see how much space is available and how much has been used.
For an individual user directory, for example:
[user@login1 ~]$ df -h $HOME
Filesystem Size Used Avail Use% Mounted on
data190.nots.rice.edu:/range/home 50G 0 50G 0% /home
You can also use this command to check your group's RHF allocation capacity; e.g.
[user@login1 ~]$ df -h /rhf/allocations/crc
Filesystem Size Used Avail Use% Mounted on
data190.nots.rice.edu:/rhf/allocations 10T 4.9T 5.2T 49% /rhf/allocations
$SHARED_SCRATCH usage and quota
The clustered file system $SHARED_SCRATCH provides fast, high-bandwidth I/O for running jobs. Though not limited by quotas, $SHARED_SCRATCH is intended for in-flight data being used as input and output for running jobs, and may be periodically cleaned through voluntary and involuntary means as use and abuse dictate.
$SHARED_SCRATCH is not permanent storage
$SHARED_SCRATCH is to be used only for job I/O. Delete everything you do not need for another run at the end of the job or move to $WORK for analysis. Staff may periodically delete files from the $SHARED_SCRATCH file system even if files are less than 14 days old. A full file system inhibits use of the system for everyone. Using programs or scripts to actively circumvent the file purge policy will not be tolerated.
Volatility of $SHARED_SCRATCH
The $SHARED_SCRATCH filesystem is designed for speed rather than data integrity and therefore may be subject to catastrophic data loss! It is designed for input and output files of running jobs, not persistent storage of data and software.
Don't move, but copy files to $SHARED_SCRATCH
When dealing with $SHARED_SCRATCH always copy your data in. A "cp" will update the access time on files whereas a move "mv" will preserve the access time. This is important as our periodic cleaning mechanism may purge files where the access time is maintained via the "mv" command.
Avoid I/O over NFS
$HOME should not be used for job I/O because file operations (reading and writing) are slower there than on $SHARED_SCRATCH, and will cause your jobs to take longer to run, resulting in a waste of computing resources. $HOME is also relatively small and will fill up quickly for jobs that produce any significant amount of data.
RHF allocations are visible to the compute nodes on RANGE, however it is generally more efficient to use $SHARED_SCRATCH for job I/O rather than using RHF. RHF allocations are ideal for staging data into, and out of, the $SHARED_SCRATCH filesystem.
Use Variables Everywhere!
The physical paths for the above file systems are subject to change. You should always access the filesystems using environment variables, especially in job scripts.