Data Engineering

Running an HPC Cluster for a Research Group

Ddebwashis
·Apr 14, 2025·7 min read
SQL

Lessons from managing a SLURM cluster + containerized ML workloads for a research group.

What a small lab actually needs

Fair-share scheduling, reasonable defaults, and a container story that doesn't require sudo.

Tools

SLURM for scheduling, Podman for rootless containers, Ansible for config drift.

This is paragraph 1. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.

This is paragraph 2. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.

Stay Updated

Subscribe to get the latest blog posts, project updates, and data science insights straight to your inbox.

No spam. Unsubscribe anytime.