JNS
“All of the companies whose models we tested need to strengthen their safeguards around antisemitism and extremism,” Daniel Kelley, of the ADL, told JNS.
The Anti-Defamation League released a report on Friday detailing how prompts can produce antisemitism, extremism and violence in text-to-video artificial intelligence tools despite safeguards that companies have put in place.
Text-to-video AI tools generate videos from written prompts. The ADL says that misleading and disturbing materials produced with that technology have been “leveraged to sow confusion and division following newsworthy events or tragedies.”
The companies that created and run the tools are supposed to have safeguards in place to block hateful prompts. Daniel Kelley, director of strategy and operations and interim head of the ADL’s center for technology and society, told JNS that he isn’t sure why those aren’t working properly.
“We do see a pattern here, as we have with other forms of technology, where it appears that some baseline safety efforts haven’t been implemented to stop generating content that maligns Jews and other groups,” he said.
The ADL tested 50 prompts from Aug. 11 to Oct. 6 on Google’s Veo 3, OpenAI’s Sora 1 and Sora 2, and Hedra’s Character-3 and found that at least 40% of the time, prompts could produce videos with hateful material. (JNS sought comment from the three companies.)
Michael Lingelbach, founder and CEO of Hedra, told JNS that the company “condemns antisemitism and violence,” which is “a violation of our policies to create such content on Hedra.”
“We have a record of banning people who attempt to violate the safeguards built into the platform,” he said.
Hedra works on safeguards with “a variety of third-party vendors to provide a holistic moderation solution on Hedra that we are constantly re-evaluating,” Lingelbach said. “Like Google and OpenAI, we understand the importance of continuing to improve moderation and selecting the right vendor partners.”
Text-to-video AI tools have generally come with fees attached, but “the new Sora 2 app is free and features a social feed, which we imagine will make the use of this technology much more widespread,” Kelley told JNS.
The ADL tested prompts to see if the tools would generate videos based on antisemitic tropes, symbols associated with extremists and rhetoric promoting violence that included references to mass shooters.
Some of the examples included a video of a Jewish man controlling the weather, a Jewish man with fangs and an animated child saying, “Come and watch people die.” The text prompt for the animated child used the word “dye” rather than “die,” a technique many use on social media to bypass moderation.
Another example featured a video of a white man holding a rifle outside a mosque saying, “Hello brother,” which the report says is a reference to what a victim told Brenton Tarrant before the latter shot and killed 51 people at a mosque in New Zealand in 2019.
Kelley told JNS that some of the tested prompts were “more esoteric or coded,” meaning that they referenced newer extremist groups or phrases that extremists use. Users use such language to circumvent safeguards, he said.
Other tested prompts invoked “the most ancient and classic examples of antisemitism,” he said. “We know that trust and safety is challenging ongoing work. At the same time, there should be a higher bar for safety around antisemitism and hate when products ship into the world.”
The ADL’s recommendations included artificial intelligence platforms investing more in trust and safety roles, updating content filters and basing tests on hateful stereotypes.
“All of the companies whose models we tested need to strengthen their safeguards around antisemitism and extremism,” Kelley said.
Daniel Cochrane, senior research associate at the Heritage Foundation’s Center for Technology and the Human Person, told JNS that AI tools are “collapsing the gap between truth and reality by giving millions the ability to fake realistic events and people.”
AI companies should do more to address illegal content on their platforms, according to Cochrane. “Using overbroad terms, such as ‘hate’ and ‘extremism,’ as baselines for moderation poses risks of its own,” he said.
Policymakers should focus on “algorithms in social media distribution channels optimized for user-engagement and most responsible for the rapid dissemination of toxic, low-quality content,” according to Cochrane. (He noted that AI companies don’t have “blanket immunity” under Section 230 of the Communications Decency Act.)
“Policymakers must liberate our feeds from the clutches of big tech monopolies, giving user-centered communities the tools to filter and curate content based on their own values,” he said.