Skip to main content
Blog

AI’s Double-Edged Impact on Trust & Safety and Other High-Pressure Roles

By October 6, 2025No Comments

Automation is Reshaping High-Pressure Roles

AI and automation are increasingly handling tasks once done by humans in Trust & Safety, fraud prevention, cybersecurity, and customer support. Most content moderation decisions are now made by machines with algorithms automatically deciding what to remove or escalate, a trend “only set to accelerate”. Major tech companies have even cut Trust & Safety staff in favor of AI tools. This shift is echoed in adjacent fields:

Cybersecurity

Entry-level cybersecurity tasks (like monitoring and triaging alerts) are being automated. Job postings for junior security analyst roles fell by over 50% from 2022 to 2023 as companies deploy AI for routine threat detection. Organizations are under pressure to halve cybersecurity headcount with AI while maintaining defense levels. In practice, AI is taking over repetitive work (e.g. sorting security alerts), freeing human analysts to focus on higher-level strategy and complex incidents.

Fraud Detection

Banks and fintech firms use AI to scan transactions and detect anomalies in real time. AI systems now flag suspicious activity and even generate fraud diagnoses automatically, handing off cases to human investigators as needed. Some banks employ large language models (LLMs) to catch phishing or payment scams. This automation speeds up fraud response, but fraud teams must now oversee AI systems and handle escalations rather than manually reviewing every transaction.

Customer Contact Centers

Call centers are adopting AI assistants and chatbots to handle simple inquiries, authenticate customers, and provide 24/7 support. This reduces mundane workloads and high churn among agents. However, human agents remain crucial for complex or emotionally nuanced calls, serving as a “risk control to validate AI” and to bring empathy to difficult interactions. Notably, even if AI handles a growing share of customer queries, the absolute number of cases requiring human intervention may not drop significantly, total interaction volumes are rising, and companies report that “moments that matter” still demand a human touch. In practice, leading firms are moving toward hybrid models where AI handles routine tasks (and even helps by summarizing calls for agents), while humans focus on high-stakes or complex issues.

New AI-Generated Threats and Harmful Content

The rise of generative AI has unleashed entirely new types of harmful content that Trust & Safety teams must confront. One alarming development is AI-generated child sexual abuse material (AI‐CSAM). The Internet Watch Foundation found over 20,000 AI-generated child abuse images on a single dark web forum in one month. Many depicted severe abuse and were so realistic that even expert moderators struggled to tell they were fake. Now IWF analysts are seeing AI-created child abuse videos, essentially deepfakes where offenders superimpose a child’s face into pornographic footage. Offenders can now generate endless illegal images offline with readily available AI tools, a capability that could easily overwhelm investigators and filters. his highlights how generative AI can be misused to produce illicit content at scale, posing novel challenges for online safety teams.

Beyond child safety, deepfakes and synthetic media are proliferating in other domains. Sophisticated deepfake pornography and fake videos are being weaponized for harassment, fraud, and misinformation. Women, public figures, and vulnerable communities are often targets of non-consensual deepfake sexual imagery, leading to serious trauma. Victims have reported a “doppelgänger-phobia” – a fear and distress caused by seeing AI-generated versions of themselves exploited online. Trust & Safety professionals now not only moderate traditional harmful content, but also must respond to AI-fabricated abuse like deepfake harassment and AI-generated hate. In one case, a South Korean actress suffered severe emotional harm after explicit deepfake videos of her were circulated widely without consent, illustrating the real-world impact of these new content threats.

AI is also turbocharging fraud and misinformation. In early 2024, for example, fraudsters used a deepfake video call to impersonate a company executive and trick an employee into wiring $25 million. Such incidents are expected to surge as generative AI makes it easier and cheaper to fabricate believable scams. Deloitte predicts AI-enabled fraud losses could rise from $12.3 billion in 2023 to $40 billion by 2027. Indeed, deepfake-related fraud in fintech grew 700% in 2023 alone. On the misinformation front, platforms worry about AI-generated fake news, propaganda, or hate speech flooding their services. The Oversight Board cautions that generative AI is already “contribut[ing] to existing harms” like image-based sexual abuse and election disinformation, with the most threatening aspect being the ease with which realistic harmful content can now be mass-produced in seconds. In short, AI is enabling a new wave of high-quality, hard-to-detect harmful content, from hyper-realistic fake videos to automated hate speech, forcing safety teams to constantly adapt their policies and detection techniques.

Bypassing AI Systems: Adversarial Attacks and Algospeak

As companies deploy AI filters and detection systems, bad actors are learning how to bypass or trick these algorithms. One method is through adversarial attacks, subtly manipulating inputs to deceive AI models. Researchers note that malicious users can exploit vulnerabilities in deep learning models by altering content in small ways, causing the AI to misclassify it. For example, spammers and extremists have added noise to images, tweaked video backgrounds, or used synonyms and deliberate misspellings in text to slip past automated filters. In the content moderation context, users increasingly engage in “algospeak”: a coded language of euphemisms and obfuscated terms designed specifically to evade algorithmic censorship. On TikTok and other platforms, people have adopted benign-sounding phrases or altered spellings (for instance, saying “unalive” instead of “kill” or using emojis and slang for banned terms) to discuss sensitive or prohibited topics without tripping the AI filters. These creative workarounds can render certain harmful posts effectively invisible to automated moderation.

Meanwhile, adversaries are also attacking AI systems from the other side. Those building generative AI–powered scams or disinformation are constantly refining their outputs to defeat detection. Notably, some deepfake tools now include “self-learning” feedback loops. They iteratively test their fake outputs against detection algorithms and adjust until the fakes evade automated detectors. This cat-and-mouse dynamic means safety engineers must continuously update AI models as attackers find new exploits (such as subtly warping an image or using emerging slang). Even AI itself can be “jailbroken” or manipulated with cleverly crafted prompts to produce disallowed content, as seen with users finding ways to get around chatbot safety filters. All of this underscores that AI moderation tools are not foolproof: determined bad actors and even ordinary users will find cracks, whether through technical exploits or cultural tricks, requiring constant vigilance, model retraining, and policy adjustments from Trust & Safety teams.

Shifting Human Roles to Red-Teaming and Oversight

Rather than eliminating the need for humans, the AI revolution in safety-heavy fields is changing human roles. Organizations are increasingly focusing human expertise on proactively testing and supervising AI systems. A key trend is the rise of AI red-teaming which entails assembling teams of experts (often Trust & Safety staff, security researchers, or external consultants) to stress-test AI models with adversarial tactics before and after deployment.

This marks a shift from reactive moderation to a more proactive, “break it to fix it” approach. For example, a 2025 industry report notes that many leaders are turning to adversarial red teaming combined with human oversight to ensure AI systems are robust against real-world misuse. Red-teamers simulate hacker attacks, attempt to provoke toxic outputs, and probe edge cases “not to confirm the system works as expected but to intentionally break them” and expose vulnerabilities. For Trust & Safety teams, this means deliberately testing content filters and AI models with the worst that users might throw at them, so weaknesses can be fixed before harm occurs in production.

Alongside red-teaming, human oversight of AI has become a cornerstone. AI can operate at massive scale, but experts emphasize that “automation provides scale, but expert human review adds nuanced judgment that technology alone cannot deliver”. In practice, companies are establishing human-in-the-loop workflows: AI systems do the initial filtering or flagging, and then human moderators or analysts review borderline cases, appeals, or high-severity incidents. This oversight is particularly critical for generative AI outputs and complex moderation decisions, where context and cultural nuance determine whether content is truly harmful.

Recognizing this, the Oversight Board has urged platforms to bake human rights and ethics experts into the AI tool design process and to continually evaluate automated systems’ impact on vulnerable users. We’re also seeing calls for greater transparency and external scrutiny, e.g. allowing third-party researchers to audit algorithmic moderation and providing users explanations when AI moderates their content. In summary, as AI takes over routine tasks, human roles are pivoting toward supervising AI, conducting “red team” attacks to probe AI defenses, handling escalations that AI can’t resolve, and ensuring the ethical and accurate operation of these systems.

Workforce Wellbeing and Cognitive Impact

The introduction of AI into high-pressure online safety jobs has had a two-sided impact on workers’ mental health. On one hand, automating the most repetitive or straightforward tasks can reduce humans’ exposure to high volumes of disturbing material. On the other hand, AI tends to escalate the hardest cases to humans, meaning moderators and analysts now see an even more concentrated diet of extreme content. AI-driven moderation has left human moderators reviewing the most egregious content flagged by AI. In other words, bearing the brunt of AI’s limitations. While algorithms efficiently filter out tons of obvious violative posts, they often struggle with context or novel abuses, pushing the most disturbing and complex cases to human reviewers. Consequently, moderators end up dealing with a queue of edge-case or horrific material (graphic violence, child abuse, AI-manipulated sexual imagery, etc.) that the AI wasn’t confident enough to remove automatically.

This concentration of trauma in the human workload is exacerbated by the speed of AI. Automated systems can flag content much faster and in much greater volume than any manual process. Moderators describe being deluged by a firehose of AI-flagged items, with little time to emotionally process each one before the next arrives. Such conditions contribute to high rates of burnout, anxiety, and secondary traumatic stress. Regular exposure to AI-curated worst-case content can lead to PTSD and even alter worldview: moderators report becoming more cynical or numb, a phenomenon aided by our brain’s unconscious negativity bias. Researchers warn that constantly seeing extreme content can also induce confirmation bias, where one starts expecting the worst in humanity and unintentionally filtering information to fit that dark expectation. These cognitive distortions, if unaddressed, can evolve into serious mental health issues (generalized anxiety, depression, etc.) for employees.

For red teamers, the stress is different but no less acute. Their role requires thinking like an adversary, generating and testing manipulative or harmful content to expose AI’s blind spots. This creative immersion in harmful material can create moral injury and cognitive overload, as they spend hours a day inventing the very abuses they seek to prevent. Together, these pressures push workers outside their “window of tolerance”, the mental zone where people can think clearly, stay focused, and perform effectively. Without adequate support, both moderators and red teamers risk being pushed past their limits, with consequences for wellbeing, retention, and organizational performance.

It’s not just moderators and red teamers who feel the strain, frontline workers in fraud and support roles face their own AI-related stressors. Fraud analysts, for instance, must stay vigilant against AI-augmented scams, which can be cognitively taxing as criminals rapidly change tactics. Customer service agents, meanwhile, often handle escalations when an AI chatbot fails, meaning by the time a human steps in the customer may be frustrated or confused. This can heighten the emotional labor for human agents. There’s also a new kind of stress emerging from automated decision errors: when AI moderation or filters make mistakes, users and workers alike experience frustration and moral injury. Overzealous AI content filters can “flag benign content” (over-enforcement) or miss real threats (under-enforcement), leaving both users and moderators feeling alienated and powerless.

Content moderators often find themselves caught between upset users and rigid AI policies, which can be a source of stress when the human worker disagrees with the algorithm’s call. All of this highlights that the human toll of AI in safety-critical jobs is real: employers now have to invest in psychological support, better tooling (like AI that blurs or filters graphic imagery for moderators), and balanced workloads to protect the wellbeing of the people behind the AI.

Conclusion: Supporting People to Sustain Performance

AI will continue to transform high-pressure environments. But while algorithms deliver scale and speed, the human workforce remains the critical safeguard, bringing judgment, nuance, and ethical oversight that machines cannot replicate.

For leaders, the imperative is clear:

  • Provide specialized psychological support for those most exposed to harmful content.

  • Embed systemic wellbeing solutions into workflows, not as optional add-ons.

  • Build a culture where leaders, managers, and frontline staff are supported as part of one ecosystem.

The SAFER™ framework was designed for this reality: to keep people within their optimal productivity range while protecting psychological health across the organization. By combining AI’s advantages with robust human support, companies can protect their most valuable asset, human performance and ensure their teams continue to thrive under pressure.

Zevo Accreditation Program

Learn More