Druckansicht der Internetadresse:

BZHPC - Bayreuth Centre for High Performance Computing

Seite drucken

festus (2024)

General remarks

The cluster "festus" (btrzx24) is available from November 2024 for the groups involved in the procurement. It consists of two management nodes, one virtualization server, two login nodes, several sotrage servers and 73 compute nodes which are connected by an 100G Infiniband Interprocess- and a 25G Servicenetwork. "festus" uses Slurm (24.05) as resource manager. The ITS file server (e.g., the ITS home directory) is not mounted on the cluster for performance reasons.  Every users has a separate home directory (10GB) which lies on the clusters own nfs-server.

Acknowledging Festus / Publications

As with other DFG-funded projects, results must be made available to the general public in an appropriate manner. The publications must contain a reference to the DFG funding (so-called “Funding Acknowledgement”) in the language of the publication, stating the project number.

Whenever the festus has been used to produce results used in a publication or posters, we kindly request citing the service in the acknowledgements:

Calculations were performed using the festus-cluster of the Bayreuth Centre for High Performance Computing (https://www.bzhpc.uni-bayreuth.de), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 523317330.

​Whereby the funding acknowledgement is mandatory.

Login

The login nodes of festus will be accessible with ssh via festus.hpc.uni-bayreuth.de only from university networks. If you are outside the university, a VPN connection is required. If your login shell is (t)csh or ksh, you have to change it to bash or zsh in the ITS self-service portal.

Compute nodes

  • 62x typA (CPU)
    • ​2x AMD EPYC 9554 (64cores@3.75GHz)
    • 24x 16GB RAM
    • ~200GB local disk
  • 4x typB (HighMem)
    • ​2x AMD EPYC 9684X (96cores@3.42GHz)
    • 24x 64GB RAM
    • ~200GB local disk
    • submit: --constraint=HighMem
  • 1x typC (GPUI)
    • ​2x INTEL XEON® 8480+ (56cores@3.8GHz)
    • 16x 128GB RAM
    • ~14TB local disk
    • 4x NVidia H100
    • submit: --partition=gpu --gres=gpu:h100:<n>
  • 1x typD (GPU)
    • ​2x INTEL XEON® 8480+ (56cores@3.8GHz)
    • 16x 128GB RAM
    • ~14TB local disk
    • 4x AMD MI210
    • submit: --partition=gpu --gres=gpu:mi210:<n>
  • 3x typE (GPU)
    • ​2x AMD EPYC 9554 (64cores@3.75GHz)
    • 24x 16GB RAM
    • ~3.84TB local disk
    • 2x L40
    • submit: --gres=gpu:l40:<n>
  • 2x typF
    • 2x AMD EPYC 9554 (64cores@3.75GHz)
    • 24x 16GB RAM
    • ~3.84TB local disk
    • 2x MI210
    • submit: --gres=gpu:mi210:<n>

Queues / Partitions

  • normal
    Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
    Wall time: 8 hours (default), 24 hours (max)
    Restrictions: typA nodes only
  • GPU 
    Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
    Wall time limit: 8 hours (default), 24 hours (max)
    Restrictions: typC - typF nodes only
  • dev
    Priority: multifiactor, most weight on the group's financial share in the cluster and consumed ressources
    Wall time limit: 15 minutes (default), 90 minutes (max)
    Restrictions: 
    • typA nodes only
    • max 2 Nodes per User

Limits

  • shareholders
    • ​max CPU per group: -1
    • max submitted jobs per group: 1000
  • confirmed groups
    • ​​max CPU per group: 1024
    • max submitted jobs per group: 500 
  • default 
    • ​max CPU all users*¹ together: 1024
    • max submitted jobs all users*¹ together:  500

*1: users not member of shareholder groups or confirmed groups

Network

  • Infiniband (100 Gbit/s)
    • 2-level Fat Tree 
  • Ethernet (25 Gbit/s)

User file space (network and local)

  • NFS file system
    • /groups/<org-id>: Group directory (only for groups financially involved in the cluster)
    • /home: 10GB per User
    • /workdir: no soft-quota
      • no snapshots, no backup
      • data lifetime 60 days
  • BeeGFS 
    • /scratch:
      • only for large MPIIO or phdf5 Workloads
      • no snapshots, no backup, no redundancy
      • data lifetime 10 days
  • Local disk (/tmp):
    • typA/B: ~200GB
    • typC/D: ~14TB
    • typE/F: ~3.84TB
Commissioning & Extension
  • January 2025

Resource Manager & Scheduler

  • Slurm 24.05

Operating system

  • RHEL 9.4 / RockyLinux 9.4

Verantwortlich für die Redaktion: Webmaster BZHPC

Facebook Twitter Youtube-Kanal Instagram UBT-A Kontakt