festus (under construction,2024)
General remarks
The cluster "festus" (btrzx24) is expected to be available from November 2024 for the groups involved in the procurement. It consists of two management nodes, one virtualization server, two login nodes, several sotrage servers and 73 compute nodes which are connected by an 100G Infiniband Interprocess- and a 25G Sericenetwork. "festus" uses Slurm (24.05) as resource manager. The ITS file server (e.g., the ITS home directory) is not mounted on the cluster for performance reasons, every users has a separate home directory (5GB) which lies on the clusters own nfs-server.
Login
The login nodes of festus will be accessible with ssh via festus.hpc.uni-bayreuth.de only from university networks. If you are outside the university, a VPN connection is required. If your login shell is (t)csh or ksh, you have to change it to bash or zsh in the ITS self-service portal.
Compute nodes
typA | typB | typC | typD | typE | typF | |
N | 62 | 4 | 1 | 1 | 3 | 2 |
CPU (x2) | EPYC 9554 | EPYC 9684X | XEON® 8480+ | XEON® 8480+ | EPYC 9554 | EPYC 9554 |
cores total | 128 | 192 | 112 | 112 | 128 | 128 |
CpuFreqMax | 3.75GHz | 3.42GHz | 3.8GHz | 3.8GHz | 3.75GHz | 3.75GHz |
DDR5 (4.8GT/s) | 24x 16GB | 24x 64 RAM | 16x 128GB | 16x 128GB | 24x 16GB | 24x 16GB |
local /tmp-Space (NVMe) | ~200GB | ~200GB | ~14TB | ~14TB | ~3.84TB | ~3.84TB |
GPU | - | - | 4x H100 | 4x MI210 | 2x L40 | 2x MI210 |
Partition | normal | HighMem | AI | AI | normal | normal |
Queues / Partitions
- normal
Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
Wall time: 8 hours (default), 24 hours (max)
Restrictions: cpu nodes only
- HighMem
Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
Wall time limit: 8 hours (default), 24 hours (max)
Restrictions: HighMem nodes only
- AI
Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
Wall time limit: 8 hours (default), 24 hours (max)
Restrictions: typC and typD nodes only
Network
- Infiniband (100 Gbit/s)
- 2-level Fat Tree (Blocking factor 2)
- Ethernet (25 Gbit/s)
User file space (network and local)
- NFS file system
- /groups/<org-id>: Group directory (only for groups financially involved in the cluster)
- /home: 5GB per User
- /workdir: no soft-quota
- no snapshots, no backup
- data lifetime 60 days
- BeeGFS
- /scratch:
- only for large MPIIO or phdf5 Workloads
- data lifetime 10 days
- only for large MPIIO or phdf5 Workloads
- /scratch:
- Local disk (/tmp):
- typA/B: ~200GB
- typC/D: ~14TB
- typE/F: ~3.84TB
Administrative limitations
- system: max. 500 jobs per cycle (30s) get queued
- per shareholder account: max 1000 jobs submitted, 6192 cores simultaneously
- default account (overall): max 1000 jobs submitted, 2048 cores simultaneously
- November 2024
Resource Manager & Scheduler
- Slurm 24.05
Operating system
- RHEL 9.4 / RockyLinux 9.4