Apple's AI Under Fire for Unintentional Bias Across Millions of Devices

A recent investigation has unveiled significant biases within Apple’s AI-driven feature, impacting millions of users across iPhones, iPads, and Macs. Conducted by the non-profit AI Forensics, the study scrutinized over 10,000 summaries produced by Apple Intelligence, revealing systematic biases in handling identity-related content.

The analysis highlighted that the AI often omits the ethnicity of white individuals more frequently than other ethnic groups and tends to reinforce gender stereotypes in ambiguous contexts. This AI feature, which automatically summarizes notifications, texts, and emails, could potentially be classified as a systemic risk model under the EU AI Act. Despite this, Apple has not yet committed to the voluntary Code of Practice.

Using Apple’s developer framework, AI Forensics accessed the system, which operates with approximately three billion parameters. In tests involving 200 fictional news stories, researchers discovered significant discrepancies: the ethnicity of white characters was noted in only 53% of summaries, compared to 64% for Black, 86% for Hispanic, and 89% for Asian characters.

Gender biases were equally concerning. In an analysis of 200 BBC headlines, women’s first names appeared in 80% of summaries, while men's were included only 69% of the time, often replaced with surnames—a choice linked to higher perceived status.

The AI’s handling of ambiguous texts was particularly troubling. In 77% of scenarios featuring ambiguous pronouns, the system assigned them to a specific gender, often aligning with stereotypes—such as associating 'she' with nurses and 'he' with surgeons.

The AI also fabricated social biases in 15% of cases, tagging Syrian students with terrorism, pregnant applicants as unfit for work, and attributing incompetence to those of short stature—all unsupported by the original text.

In comparison, Google's Gemma3-1B model, with a third of the parameters, showed significantly less bias, hallucinating only 6% of the time and conforming to stereotypes in a smaller proportion of cases.

These biases are part of a broader issue with large language models, which can reflect societal prejudices. Unlike other platforms, Apple Intelligence operates without user prompts, inserting itself directly into communications.

Previously, Apple faced criticism for generating false news summaries linked to reputable sources like the BBC and New York Times, leading to the suspension of summaries for news apps. However, personal and professional messaging remains affected by AI biases.

This comes amid broader challenges for Apple’s AI initiatives, as the company struggles to meet promised upgrades for Siri and explores partnerships with Google’s Gemini to enhance its AI capabilities. The findings by AI Forensics underscore the urgent need for Apple to address these embedded biases in its AI systems.