Domyn Small: A European 10B Reasoning Language Model
Domyn Small is a compact open-weight reasoning model suited for resource-constrained environments. By providing lower latency and needing reduced computational requirements, it offers an ideal compromise between the knowledge and reasoning of a bigger model and the efficiency of a smaller one.
Thanks to its open-weights, it also presents a versatile foundation for more specialized extensions: by fine-tuning Domyn Small to a specific domain of knowledge, an organization can unlock its real power, and get full ownership and control of the model.
Just like Domyn Large, Domyn Small was built with a core principle of Domyn’s LLM training approach: the ability to intervene at any stage of the lifecycle — pre-training, mid-training, or post-training — and apply each selectively to produce targeted capabilities.
This made it possible to develop targeted capabilities without retraining from scratch, supporting the delivery of domain-specific AI for regulated industries operating under strict data sovereignty requirements.
In particular, Domyn Small underwent a comprehensive, multi-staged mid and post-training pipeline, in which each phase was deliberately built on the previous one.
- First, a Continual Pre-Training stage exposed the model to a large volume of high-quality, technical content to extend its ability to process and reason over longer documents;
- This was followed by a Supervised Fine-Tuning stage that taught the model how to follow instructions across a wide range of tasks, and how to reason step by step when needed;
- A first Reinforcement Learning stage using Group Relative Policy Optimization (GRPO) then improved the model's mathematical reasoning skills by training it to produce correct, verifiable answers;
- Next, a Direct Preference Optimization stage brought the model's responses closer to what humans expect — improving its ability to follow instructions accurately and behave in a natural, helpful way;
- Finally, a second Reinforcement Learning stage used a multi-environment GRPO to extend this optimization across five distinct task domains simultaneously, making the model more robust and capable across a broad set of real-world use cases.
Together, these five stages reflect a deliberate, end-to-end approach to post-training that prioritizes reasoning depth, behavioral alignment, and deployment efficiency.
The result is a model that thinks more carefully, handles complexity at scale, and exhibits a positive trade-off between token efficiency (fewer tokens required per task) and accuracy on reasoning tasks, ensuring lower cost and latency in production without sacrificing capability.
Moreover, the pipeline was supported throughout by Domyn Swarm, a framework developed in-house for scalable LLM inference on HPC clusters, released in open-source mode to support the community in synthetic data generation and large-scale model evaluation.
Co-authored by Alberto Veneri, Alessandro Rognoni, Andrea Valenti, Dario Salvati, Federico D’Ambrosio, Francesco Bertolotti, Martin Cimmino, Michele Resta, Nicolò Ruggeri, Simone Angarano, this work reflects Domyn's commitment to enabling enterprises and research institutions across Europe and beyond to build AI systems they can own, govern, and trust.
Want to know more about Domyn Small? Find out in the dedicated paper.