Data Privacy & Ethics

Is LLM Bias Language Specific?

While debiasing techniques have been researched in-depth for monolingual models in one language (mostly English), multilingual models have received less attention.

Written by: B Baesens & M Reusens

Updated 2:51 PM UTC, Thu January 4, 2024

What are Large Language Models (LLMs)?

Large language models (LLMs) such as OpenAI’s ChatGPT-4, Meta’s Llama 2, and the upcoming xAI’s Grok have become ubiquitous tools in today’s society. Essentially, these LLMs have been built on a humongous corpus of publicly available data.

As such, LLMs learn language patterns and excel at interpreting and answering complex queries or prompts, skills often required by applications such as chatbots, writing assistants, or content creators.

Though it might seem that LLMs have a good semantical understanding of the processed data, closer inspection reveals they merely replicate or extrapolate patterns learned. Hence, their reasoning capabilities are rather limited or even non-existent. As such, they are prone to exhibiting undesired behavior such as societal biases.

Bias in language models

Contemporary techniques introduced to mitigate LLM-generated societal biases are unfortunately not mature enough yet. OpenAI uses reinforcement learning from human feedback to refrain ChatGPT from perpetuating bias.

However, this is insufficient since ChatGPT still entails several societal biases in areas such as gender, race, recency (putting more emphasis on recent events), and availability (prioritizing information that is more easily recalled, emphasizing more well-known examples).

Is LLM bias language specific?

Several techniques exist that aim at mitigating bias. In our research, we considered two types of debiasing techniques: projection-based techniques and techniques involving additional training. One example of the latter is counterfactual data augmentation (CDA).

This dataset is typically obtained from Wikipedia and through a list of attribute words, it is augmented in such a way as to produce an unbiased dataset concerning the defined attribute list.

For the gender bias category, for example, the attribute list can contain words such as he, she, and they. For every sentence found in the Wikipedia text where one of these words occurs, this sentence is duplicated with the other attribute words: if ‘she is happy’ is a sentence, ‘he is happy’ and ‘they are happy’ are added to the dataset.

While debiasing techniques have been researched in-depth for monolingual models in one language (mostly English), multilingual models have received less attention. Moreover, other languages are often less investigated.

However, as many applied LLMs are multilingual, more research is necessary into the specifics of debiasing them, with special attention to non-English languages.

A key question then emerges – Whether these debiasing techniques still work in a multilingual context and a lower-resource language (such as Icelandic, Catalan, etc.)?

More specifically, is LLM bias language-specific?

We investigated this using mBERT as a multilingual LLM.

Our research

We investigated four different languages: English, French, German, and Dutch. For every language, we debias the model and evaluate the bias for each of the languages. These four languages represent a diverse group belonging to two different language families (Germanic and Romance) and containing different amounts of initial training data for mBERT.

Moreover, some languages are more gender-focused (e.g., French and German) than others (e.g., English and Dutch).

Our research looks into three types of bias: gender, race, and religion. We measure the bias using the CrowS-Pairs dataset which is a publicly available data set containing biases against protected demographic groups in the United States. Only English and French variants of this dataset exist, therefore we used Dutch and German translated samples for our experiments.

The dataset contains sentence pairs with a more and less stereotyped variant. An ideal -unbiased- model would be equally likely to prefer the more stereotyped variant over the less stereotyped variant.

When measuring the likelihood of predicting the stereotyped variant, the optimal performance would be 50%. Our bias score measures the deviation from this optimal performance.

We find that cross-lingual debiasing works surprisingly well. More specifically, debiasing in one language and measuring the bias for another language gives desirable results as shown in the figure below. The figure shows the bias scores for the different debiasing and evaluation language combinations (EN = English, FR = French, DE = German, NL = Dutch).

Base refers to the base mBERT model without any debiasing. When averaging the bias scores over the techniques we used, we find that the bias score after debiasing is lower than before (base) for all evaluation languages, except for English.

English initially already had a relatively low bias score. Therefore, debiasing does not result in an improvement of the bias score, but in an overcompensation for several experiments. In other words, before debiasing, the model seemed to behave close to an optimal model giving equal preference to biased and unbiased sentences.

Actually, the debiasing here often created a reverse effect by introducing new bias which is not preferred. For the other languages, all different evaluation languages show clear signs of bias removal.

This is a very important insight, as this means that the applied techniques can be used for debiasing multilingual models as well. For more information on the implementation of these techniques in a multilingual context, we refer to our paper.

To conclude, our results indicate that LLM bias is not language-specific, and cross-lingual debiasing makes sense! For practitioners and digital strategy teams, we believe these insights to be key to first understanding bias and then removing it from LLMs which further contributes to their successful implementation and use for decision-making.

Further research is desirable to extrapolate our research to other languages and types of bias.

About the Authors:

Professor Bart Baesens is a professor of Data Science at KU Leuven, and a lecturer at the University of Southampton. He co-authored more than 250 scientific papers and 10 books. Baesens received the OR Society’s Goodeve medal for best JORS paper in 2016 and the EURO 2014 and EURO 2017 award for best EJOR paper. He is listed in the top 2% of Stanford University’s new Database of Top Scientists in the World. He was named one of the leading academic data leaders in 2023 by CDO magazine. He is also the founder of the BlueCourses ON-LINE learning platform.

Manon Reusens is a PhD student at the Faculty of Business and Economics of the KU Leuven. Her research is focused on different applications of natural language processing, with a special interest into investigating societal biases in large language models.