festus (2024)

General remarks

The cluster "festus" (btrzx24) is available from November 2024 for the groups involved in the procurement. It consists of two management nodes, one virtualization server, two login nodes, several sotrage servers and 73 compute nodes which are connected by an 100G Infiniband Interprocess- and a 25G Servicenetwork. "festus" uses Slurm (24.05) as resource manager. The ITS file server (e.g., the ITS home directory) is not mounted on the cluster for performance reasons. Every users has a separate home directory (10GB) which lies on the clusters own nfs-server.

Acknowledging Festus / Publications

As with other DFG-funded projects, results must be made available to the general public in an appropriate manner. The publications must contain a reference to the DFG funding (so-called “Funding Acknowledgement”) in the language of the publication, stating the project number.

Whenever the festus has been used to produce results used in a publication or posters, we kindly request citing the service in the acknowledgements:

Calculations were performed using the festus-cluster of the Bayreuth Centre for High Performance Computing (https://www.bzhpc.uni-bayreuth.de), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 523317330.

Whereby the funding acknowledgement is mandatory.

Login

The login nodes of festus will be accessible with ssh via festus.hpc.uni-bayreuth.de only from university networks. If you are outside the university, a VPN connection is required. If your login shell is (t)csh or ksh, you have to change it to bash or zsh in the ITS self-service portal.

Compute nodes

62x typA (CPU)
- 2x AMD EPYC 9554 (64cores@3.75GHz)
- 24x 16GB RAM
- ~200GB local disk
4x typB (HighMem)
- 2x AMD EPYC 9684X (96cores@3.42GHz)
- 24x 64GB RAM
- ~200GB local disk
- submit: --constraint=HighMem
1x typC (GPUI)
- 2x INTEL XEON® 8480+ (56cores@3.8GHz)
- 16x 128GB RAM
- ~14TB local disk
- 4x NVidia H100
- submit: --partition=gpu --gres=gpu:h100:<n>
1x typD (GPU)
- 2x INTEL XEON® 8480+ (56cores@3.8GHz)
- 16x 128GB RAM
- ~14TB local disk
- 4x AMD MI210
- submit: --partition=gpu --gres=gpu:mi210:<n>
3x typE (GPU)
- 2x AMD EPYC 9554 (64cores@3.75GHz)
- 24x 16GB RAM
- ~3.84TB local disk
- 2x L40
- submit: --gres=gpu:l40:<n>
2x typF
- 2x AMD EPYC 9554 (64cores@3.75GHz)
- 24x 16GB RAM
- ~3.84TB local disk
- 2x MI210
- submit: --gres=gpu:mi210:<n>

Queues / Partitions

normal
Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
Wall time: 8 hours (default), 24 hours (max)
Restrictions: typA nodes only

GPU
Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
Wall time limit: 8 hours (default), 24 hours (max)
Restrictions: typC - typF nodes only

dev
Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
Wall time limit: 15 minutes (default), 90 minutes (max)
Restrictions:
- typA nodes only
- max 2 Nodes per User

Limits

shareholders
- max CPU per group: -1
- max submitted jobs per group: 1000
confirmed groups
- max CPU per group: 1024
- max submitted jobs per group: 500
default
- max CPU all users*¹ together: 1024
- max submitted jobs all users*¹ together: 500

*1: users not member of shareholder groups or confirmed groups

Network

Infiniband (100 Gbit/s)
- 2-level Fat Tree
Ethernet (25 Gbit/s)

User file space (network and local)

NFS file system
- /groups/<org-id>: Group directory (only for groups financially involved in the cluster)
- /home: 10GB per User
- /workdir: no soft-quota
  - no snapshots, no backup
  - data lifetime 60 days
BeeGFS
- /scratch:
  - only for large MPIIO or phdf5 Workloads
  - no snapshots, no backup, no redundancy
  - data lifetime 10 days
Local disk (/tmp):
- typA/B: ~200GB
- typC/D: ~14TB
- typE/F: ~3.84TB

Commissioning & Extension

January 2025

Resource Manager & Scheduler

Slurm 24.05

Operating system

RHEL 9.4 / RockyLinux 9.4

Verantwortlich für die Redaktion: Webmaster BZHPC