Domyn Swarm now available on Github, bringing scalable LLM inference to HPC

Running Large Language Models at scale just got easier: Domyn Swarm, a new open-source toolkit, is now live on GitHub – publicly available for the first time ever.
Designed for researchers, ML engineers, and HPC users, and built on Python and vLLM (among others), Domyn Swarm bridges the gap between lightweight scripts and heavyweight inference infrastructure, simplifying everything from model evaluation to synthetic data generation.
With minimal configuration and a single CLI command, users can deploy OpenAI-compatible endpoints to run LLM workloads across HPC clusters like Leonardo or NVIDIA’s DGX Cloud, without the hassle of cluster management or complex setup.
This toolkit is already proving valuable in real-world workflows. During the evaluation of LLMs, the Domyn team used Swarm to convert a slow, serial testing process into a parallelized pipeline running multiple endpoints and baselines simultaneously.
By releasing it on GitHub and extending access to the broader community, Domyn Swarm enables any team to spin up endpoints in minutes and run scalable workloads with just a few configuration steps. Ultimately, it will make LLM deployment faster, simpler, and more accessible for high-performance environments.