Discussion
Loading...

Discussion

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Christian Meesters
@rupdecat@fediscience.org  ·  activity timestamp 2 weeks ago

@jannem

- Arbitrarily limiting the characters you can subject to `--wrap`.
- Overloading `sbatch` in various ways.
- deviating from documented GRES behaviour (read: every task / cpu combination regarding GPU reservation).

There is more. But I will eventually make a presentation about it.

Also, frequently, albeit not related to code changes: Setting up various physical clusters instead of one with partitions for the different tasks.

  • Copy link
  • Flag this post
  • Block
Alan Sill
@AlanSill@mast.hpc.social replied  ·  activity timestamp 2 weeks ago

@rupdecat @jannem The #HPC community lost a lot when Slurm failed to implement the DRMAA standard. The API and the code that underpins it are messy as a result. But it was popular and free, so it gained a strong foothold. (Grid Engine was the definitive implementation, and for a while all schedulers supported DRMAA and were interoperable for codes that used it.)

There is support for multiple clusters within a given Slurm instance, but they have to be set up to use these features in advance.

  • Copy link
  • Flag this comment
  • Block
Christian Meesters
@rupdecat@fediscience.org replied  ·  activity timestamp 2 weeks ago

@AlanSill

Indeed.

Even though I started to favour SLURM over LSF (and HTCondor), the transition to it (a few years ago) was a nightmare. And loosing all compatibility between the systems still has so many repercussions ... too many for a single thread.

We had to remove the PBS compatibility layer of SLURM because users hold on to it, but it was never feature complete nor even well usable.

@jannem

  • Copy link
  • Flag this comment
  • Block
Janne Moren
@jannem@fosstodon.org replied  ·  activity timestamp 2 weeks ago

@rupdecat
Can't comment on the other ones without real examples but the last one is unavoidable I think. Different clusters are different, and it may not be feasible to have a single slurm instance covering both.

Especially, like in our case, when the use case (GPU vs. CPU) and the upgrade cadence is completely separate. You can still automate running jobs on both, just not through slurm alone.

  • Copy link
  • Flag this comment
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.1-alpha.8 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login