User Story: As a researcher, I need to know that I can launch and use an interactive job at any moment of my choosing.
To handle this, we will need to develop a windfall (pre-emptible) partition where users may request an interactive job, and make this the default partition on OOD. We can call the partition "ondemand" to match the expectation driven by the name "Open OnDemand", specifically, that jobs will start when demanded.
Thoughts to consider:
- Prepares user community for windfall partitions under a buy-in model
- Build a hierarchical Slurm accounting model with actual user relationship information
- Understand and construct account-based QoS
- Understand, test, and implement a pre-emptible partition
- Make use of the virtual nodes, whose hardware resides on openstack, to build a "rcs-sandbox" partition, for testing.
- Be able to model relationships between hardware and partitions (visualizations would help)
- Reconfigure partitions
- Understand priority calculation
- Decide on a reasonable resource quota maximum
- Call it the "ondemand" queue
- Document expectations and responsibilities carefully
- How windfall works
- How pre-emption works
- How requeuing works
- What checkpointing is and how it can help longer-running jobs on pre-emptible partitions