December 4, 2025
·
2 mins

Uncovering Representation Bias for Investment Decisions in Open-Source Large Language Models

A book over an abstract blurred shape

From investment research to credit risk and portfolio construction, Large Language Models (LLMs) are moving deeper into financial workflows. Yet unbeknownst to companies leveraging AI, the neutrality of the models is often far from guaranteed, as biases can emerge as early as the pre-training stage, which impact outputs. When left unaddressed these biases can significantly influence performance and decision-making. 

Domyn’s research team examined this phenomenon further and dedicated a research paper to representation bias in open-source Qwen LLMs when applied to financial investment decisions. Their conclusion is clear: models show structural biases related to firm size, sector, and visibility. 

This research was conducted by evaluating around 150 U.S.-listed firms across an eight-year window (2017-2024), assembling a standardized dataset of valuation ratios, profitability metrics, risk factors, growth indicators, technical signals, and more. The team then used a “balanced round-robing prompting” method: for every possible pair of firms, the model was asked “which is the better company to invest in?” (multiple times and shuffling prompt categories, orders,  and repetitions). Using token probabilities behind each response, they constructed firm-level “confidence scores” and analyzed how these scores related to financial attributes, sector classifications, and real-world metrics.This framework was not only able to identify whether the LLM favored one firm over another, but also dived deeper into the reasons behind that partiality. 

Here’s what was found: the strongest predictors of high LLM confidence were not signals deemed fundamental, such as profitability, growth or technical performance, but rather proxies for size, prominence, and visibility. Attributes such as market capitalization, enterprise value or free cash flow consistently boosted model confidence while risk factors tended to decrease it. Furthermore, industry-level biases were also revealed. Belonging to certain industries accounted for a large share of the variation in confidence, sometimes even more than sector classification. This suggests that the models’ pre-training data may have followed pre-conceived industry stereotypes instead of evaluating firms solely on their financial fundamentals. 

Co-authored by Fabrizio Dimino, Krati Saxena, Bhaskarjit Sarmah, and Stefano Pasquali, the research’s main takeaway message for decision-makers is simple: off-the-shelf LLM outputs in finance are not neutral, and their deployment requires ongoing sector-aware calibration, auditing, and rigorous compliance set up. What does this mean moving forward? Find out in the dedicated paper. 

Authors
Pellentesque leo justo, placerat in dui ut, tincidunt tempus tellus praesent viverra consectetur tortor, rhoncus accumsan arcu venenatis id.
No items found.
it