Interactive SLURM monitoring dashboards

A set of interactive SLURM monitoring dashboards are available for the PIK Cluster.

This is available only inside the PIK internal network using the following address http://se80/

This platform is aimed at serving the following needs:

  1. Assist users in optimizing and improving their workloads based on the data collected by SLURM. All of this has always been available to the users using command line utilities, but having it easily accessible may assist better in optimization.
  2. To assist IT services by providing more context when investigating incidents related to SLURM jobs on the cluster.
  3. To provide a better overview of live utilization of the system.

Some usage hints you may find useful

  • For each dashboard panel (or plot) you can click on its title bar and select view to get a full-screen view of this panel.
  • For panels with legends, you can always click on each individual legend item to view its time-series separately. You can do this for multiple items by holding down Shift or Ctrl and clicking with the mouse on the legend entries you are interested in.
  • Aggregation fields (like min, max, avg, current) and others on the plot legends are calculated from the selected time-span visible on the upper right corner of the dashboard. You can narrow the time interval to a region of interest by zooming into it using the mouse (Click+hold and drag). You can use your browsers back function to get back to the previous time interval. For users, it is not possible to zoom outside the default time-interval for dashboards.
  • For each plot there is a small letter „i“ at the top left corner – moving the mouse over it will show you a tooltip with an explanation for this plot.

Feedback is welcome at cluster-support e-mail address.

