festus (2024)
General remarks
The cluster "festus" (btrzx24) is available from November 2024 for the groups involved in the procurement. It consists of two management nodes, one virtualization server, two login nodes, several sotrage servers and 73 compute nodes which are connected by an 100G Infiniband Interprocess- and a 25G Servicenetwork. "festus" uses Slurm (24.05) as resource manager. The ITS file server (e.g., the ITS home directory) is not mounted on the cluster for performance reasons. Every users has a separate home directory (10GB) which lies on the clusters own nfs-server.
Acknowledging Festus / Publications
As with other DFG-funded projects, results must be made available to the general public in an appropriate manner. The publications must contain a reference to the DFG funding (so-called “Funding Acknowledgement”) in the language of the publication, stating the project number.
Whenever the festus has been used to produce results used in a publication or posters, we kindly request citing the service in the acknowledgements:
Calculations were performed using the festus-cluster of the Bayreuth Centre for High Performance Computing (https://www.bzhpc.uni-bayreuth.de), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 523317330.
Whereby the funding acknowledgement is mandatory.
Login
The login nodes of festus will be accessible with ssh via festus.hpc.uni-bayreuth.de only from university networks. If you are outside the university, a VPN connection is required. If your login shell is (t)csh or ksh, you have to change it to bash or zsh in the ITS self-service portal.
Compute nodes
- 62x typA (CPU)
- 2x AMD EPYC 9554 (64cores@3.75GHz)
- 24x 16GB RAM
- ~200GB local disk
- 4x typB (HighMem)
- 2x AMD EPYC 9684X (96cores@3.42GHz)
- 24x 64GB RAM
- ~200GB local disk
- submit: --constraint=HighMem
- 1x typC (GPUI)
- 2x INTEL XEON® 8480+ (56cores@3.8GHz)
- 16x 128GB RAM
- ~14TB local disk
- 4x NVidia H100
- submit: --partition=gpu --gres=gpu:h100:<n>
- 1x typD (GPU)
- 2x INTEL XEON® 8480+ (56cores@3.8GHz)
- 16x 128GB RAM
- ~14TB local disk
- 4x AMD MI210
- submit: --partition=gpu --gres=gpu:mi210:<n>
- 3x typE (GPU)
- 2x typF
Queues / Partitions
- normal
Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
Wall time: 8 hours (default), 24 hours (max)
Restrictions: typA nodes only
- GPU
Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
Wall time limit: 8 hours (default), 24 hours (max)
Restrictions: typC - typF nodes only
- dev
Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
Wall time limit: 15 minutes (default), 90 minutes (max)
Restrictions:- typA nodes only
- max 2 Nodes per User
Limits
- shareholders
- max CPU per group: -1
- max submitted jobs per group: 1000
- confirmed groups
- max CPU per group: 1024
- max submitted jobs per group: 500
- default
- max CPU all users*¹ together: 1024
- max submitted jobs all users*¹ together: 500
*1: users not member of shareholder groups or confirmed groups
Network
- Infiniband (100 Gbit/s)
- 2-level Fat Tree
- Ethernet (25 Gbit/s)
User file space (network and local)
- NFS file system
- /groups/<org-id>: Group directory (only for groups financially involved in the cluster)
- /home: 10GB per User
- /workdir: no soft-quota
- no snapshots, no backup
- data lifetime 60 days
- BeeGFS
- /scratch:
- only for large MPIIO or phdf5 Workloads
- no snapshots, no backup, no redundancy
- data lifetime 10 days
- /scratch:
- Local disk (/tmp):
- typA/B: ~200GB
- typC/D: ~14TB
- typE/F: ~3.84TB
- January 2025
Resource Manager & Scheduler
- Slurm 24.05
Operating system
- RHEL 9.4 / RockyLinux 9.4