Content Moderation in Under-Resourced Regions: A Case for Arabic and Hebrew

This blog post is part of a broader research on online content moderation in under-resourced regions and languages.

Multiple media outlets have highlighted the challenges faced by major U.S. social media platforms in handling the surge of misinformation, disinformation and graphic violence on their services related to the violence in Israel and Palestine. Civil society groups and misinformation researchers have extensively documented instances of deliberate sharing of conspiracy theories, outdated videos and false claims about what is happening in the Gaza Strip. In response to the global outcry for action, platforms have issued their response to the events, albeit with varying degrees of detail. It is evident that their capabilities are being tested to unprecedented limits, particularly due to historical underinvestment in specific languages and regions, affecting real-time crisis content moderation.

See statements: Meta, X (formerly Twitter), YouTube, Telegram and TikTok

Much of each platform’s ability to effectively moderate content during the current crisis hinges on two factors: the regionalization of their global content policies and sustained investment in under-resourced languages such as Hebrew and Arabic. A holistic content moderation framework that is contextually and linguistically responsive encompasses of, among other things, automated moderation across various dialects and harmful content categories (terrorism, hate speech, graphic violence, etc.), human moderators with language capabilities and cultural context, fact-checking networks, civil society groups, media literacy, internal staffing and user-facing controls with meaningful accountability and transparency. In every aforementioned aspect, the Middle East has consistently been under-resourced for major social media platforms compared to the U.S. and dominant markets in the EU.

This essay critically examines each platform’s response, drawing on well-documented evidence of their efforts in the region, and identifies structural gaps that underpin their approach to content moderation. While language capabilities are pivotal, we also highlight, at a high level, several language-agnostic limitations in policies, enforcement, processes and products augmented by historical under-resourcing. We find that inadequate resource allocation in content moderation exacerbates conflict that we plan to expand in the final research report.

In the wake of the tech layoffs which affected both internal content moderation teams and outsourced human reviewers, major social media platforms are relying more heavily on tools for automated enforcement, like classifiers. According to the Israel-Palestine Due Diligence report commissioned by Meta and published in September 2022, the company committed to implementing the independent report’s recommendation to improve classifiers and language detection for both Hebrew and Arabic.

Because classifiers need to be trained and updated to remain effective, both Arabic and Hebrew classifiers suffer from the problem of under-resourcedness – albeit for different reasons. While Arabic is widely spoken (380 million people), it is not a homogenous language. Arabic dialects are classified into six geographical groups, and classifiers need to be trained equally in the different dialects to be effective. Hebrew is spoken by 15 million people and Meta indicated in the report in 2022 that while it had recently launched a Hebrew classifier, it is more challenging to maintain accurate classifiers in less widely spoken languages. This is because large amounts of language data are needed to train classifiers. Publicly available analyses, as well as comments from the Oversight Board, of both Arabic and Hebrew have repeatedly shown egregious gaps in content moderation.

The same human rights report, which preceded the tech layoffs, stated that Meta was assessing the feasibility of “hiring more content reviewers with diverse dialect and language capabilities” to address the aforementioned structural gaps.

Enforcement of content about proscribed groups and individuals is likely to be inconsistent, especially for under-resourced languages. In line with the U.S. Foreign Terrorist Organizations, the Hamas is a proscribed entity for Meta, YouTube and Tiktok, which in broad brush strokes means that no one affiliated with the group is allowed to use their platform and no content created by the Hamas can be uploaded (see policies: Meta’s Dangerous Organizations and Individuals (DOI); YouTube’s Violent Extremist Organizations). However, the treatment of proscribed entities and persons could vary between, and even within, platforms, which poses significant risks of cross-platform and cross-product sharing and amplification. In practice, it is extremely challenging to clearly differentiate between whether a particular content is in support of, or critical about, a proscribed entity, and this is especially exacerbated in absence of adequate linguistic capabilities in the middle of a rapidly evolving crisis. In other words, it is very likely that platforms adopting a more cautious approach will be inclined to over-enforce, often ignoring cultural and contextual cues, and thereby disproportionately removing content that condemn terrorist actions.

In 2020, the  Oversight Board published an assessment on Meta’s DOI policy that found contextual cues such as other users’ responses, the uploader’s location and other factors were not taken into consideration when enforcing the policy, resulting in over-enforcement and suppression of legitimate political commentary. In some cases, accounts reporting news or sharing political commentary will get disabled under these policies.

User-focused appeals systems are historically broken. Given the significant volumes of reported content, it is reasonable to expect there will be false positives. Ordinarily, impacted individuals can appeal to restore incorrectly removed content, however, for most social media platforms, user-facing systems are not adequately resourced (staffing, product maintenance and language) and cannot handle high volumes. Therefore, it is likely a large number of users in conflict areas may experience inaccurate disabling of their accounts or removal of their content, leading to harsh penalties, or even non-enforcement of their policies, without receiving a transparent and responsive recourse or explanation for the action taken.

Downranking content in newsfeed can disproportionately exclude political commentary. Volume and range of harmful content means some platforms are opting to reduce visibility or avoid recommending potentially borderline harmful content. As outlined above, with over- and under-enforcement being serious concerns for Arabic and Hebrew respectively, there is likely to be enforcement inconsistency in how the algorithm determines which content shows up or doesn’t on people’s feeds. If numerous users are expressing anger as a form of solidarity at a political commentary that condemns terror, how would automation assess whether this is a positive or negative reaction and determine its ranking. If recommended content is manually tweaked using linguistic signals in an under-resourced environment, the risk of non-contextualized signals resulting in over- or under-demotion is very high. This means fewer people will see the impacted content.

In 2021, Instagram introduced transparency measures to let users know if their account has been restricted, however does not provide sufficient information on specific violations that led to it. Meta provides general information on how ranking typically works and account restrictions but few little actionable insights for users to act on. While Instagram provides an option to appeal, other platforms do not.

Hateful narratives—not only targeted attacks—pose serious risks of exacerbating violence and misinformation. Policies addressing hate speech and incitement to violence typically rely on a targeted attack on individuals or groups belonging to protected categories, such as religious affiliation or national origin. The policies do not apply the same standards to moderate speech that attacks a concept, ideology or shares false narratives about a specific religious, ethnic or national belief or practice, which is typical during a crisis across deeply rooted ethno-religious and political divides. Hateful narratives about a group’s beliefs, ideologies or practices without specifying “a target” could similarly exacerbate harm against communities who belong to those groups. This gap was extensively documented during an international fact-finding mission in Myanmar that found inflammatory content on Facebook exacerbated hostility and violence against historically marginalized groups.

There is an uneven ratio of fact-checkers to misinformation. Meta’s newsroom post indicated there are 3 fact-checkers with Arabic and Hebrew capabilities, while public reporting shows an alarming volume of misinformation. Other platforms have not yet published information about their resourcing for fact-checking. A handful of fact-checkers cannot realistically fact find against every single piece of misinformation in multiple dialects within an extremely volatile environment. There is no publicly available methodology on how misinformation is ranked for fact-checkers and how prioritization works if there are multiple, concurrent harmful misinformation on a platform within a very short span of time. In practice, there is a disproportionate burden on fact-checkers which would impact the speed, scale and quality of debunked information.

Unilaterally applied content policies pose disproportionate risks to marginalized groups. Content policies disregard the power dynamics between and within groups, which results in disproportionate harms to people living on the margin. An independent assessment found that unilateral application of hate speech policies—that does well on average—has performed poorly on subcategories of content where an incorrect decision has a pronounced impact on historically marginalized and oppressed groups. Applying a monochromatic lens to a crisis can undermine speech and safety. The humanitarian crisis in the region warrants social media platforms to adopt additional measures that promote expression while safeguarding privacy and protecting vulnerable groups like women, children and front liners, against incitement to violence and doxxing. This includes journalists and activists, including those in the diaspora, who are often excessively targeted and doxxed by bad actors, and could lead to deaths.

Hateful content has a contagion effect in other parts of the world that could result in offline violence. The global architecture of social media platforms means that even if prevalence of hateful content is mitigated in one geographic location, the same content circulates and amplifies hate across national borders and platforms. Content moderation during a crisis demands a holistic, worldwide and cross-industry approach, situated in contextual cues, to reduce risks of exacerbating violence. In many instances, redistribution of internal resources inside companies come at the cost of the same crisis proliferating conspiracy theories, misinformation and incitement to violence in other parts of the world, as evident in Pakistan, India and Turkey—all of which are comparable under-resourced regions.

Photo credit: Ad Age