The growth of low-quality data on the internet leads to the instillation of undesirable, unsafe, or toxic knowledge in large language models (LLMs). When these models are used in chatbots, they increase the risk of exposing users to harmful advice or aggressive behavior. Existing toxicity evaluation datasets, primarily focused on English, fail to capture multilingual toxicity, compromising the safety of LLMs. AI2 in collaboration with CMU, addresses the challenge of limiting toxicity across multiple languages by LLMs. The study discusses how toxicity varies based on language resources and the impact of design decisions like model size and alignment methods.
Current methods for evaluating toxicity in LLMs are insufficient for capturing the existence of multilingual toxicity. AI2 and CMU Researchers introduced PolygloToxicityPrompts, a dataset consisting of 425,000 naturally occurring prompts across 17 languages with varying degrees of toxicity. This dataset aims to provide a more accurate representation of toxicity in LLMs by leveraging prompts extracted from the web and focusing on short, potentially toxic snippets of text. The dataset builds upon previous work like RealToxicityPrompts but extends its scope to a multilingual context.
PolygloToxicityPrompts is designed to capture more toxicity in LLMs by focusing on short prompts rather than full comments or conversations. This allows models to identify toxicity at the initial stages of communication. The dataset includes multiple languages, addressing the gap left by predominantly English datasets. Using PerspectiveAPI to measure the toxicity of prompts and generations, the researchers compute a model’s average toxicity across all its continuations. Researchers found that state-of-the-art multilingual LLMs exhibit the highest toxicity levels in languages with less high-quality data available, such as Hindi and Czech, and the lowest in languages like Russian and Dutch.
The study leverages the impact of model size and alignment techniques on toxicity. For base LLMs, toxicity increases with model size, suggesting that larger models tend to learn more toxicity from their training data. However, instruction- and preference-tuned models are less toxic than base models. The study compares PerspectiveAPI, a toxicity detector, with Llama Guard, a safety detector, and concludes that while related, toxicity and safety are distinct concepts requiring their solutions.
In conclusion, PolygloToxicityPrompts offers a valuable tool for evaluating and mitigating toxicity in LLMs across multiple languages. The paper possesses insights that highlights the importance of prompt language, model size, and alignment methods in addressing toxic degeneration. The dataset aids in creating more robust models for proactive moderation and multilingual content filtering, contributing to a safer online environment.
Check out the Paper and Dataset. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter.
Join our Telegram Channel and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our 46k+ ML SubReddit
Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest in the scope of software and data science applications. She is always reading about the developments in different field of AI and ML.
🚀 [FREE AI WEBINAR] ‘Optimise Your Custom Embedding Space: How to find the right embedding model for YOUR data.’ (July 18, 2024) [Promoted]