Two Brains Are Better Than One: AI and Humans Work to Fight Hate

By Glen Martin

It started with a conversation. About two years ago, Claudia von Vacano, executive director of UC Berkeley’s social science D-Lab, had a chat with Brittan Heller, the then-director of technology and society for the Anti-Defamation League (ADL). The topic: the harassment of Jewish journalists on Twitter. Heller wanted to kick the offending trolls off the platform, and Vacano, an expert in digital research, learning, and language acquisition, wanted to develop the tools to do it. Both understood that neither humans nor computers alone were sufficient to root out the offending language. So, in their shared crusade against hate speech and its malign social impacts, a partnership was born.

Developers anticipate major social media platforms will use the Online Hate Index to recognize and eliminate hate speech rapidly and at scale.

Hate speech, the stinking albatross around the neck of social media, has become increasingly linked to violence, even atrocity. Perhaps the most egregious recent example: Robert Bowers, the accused shooter in the October massacre at Pittsburgh’s Tree of Life Synagogue, was reportedly inflamed by (and shared his own) anti-Semitic tirades on Gab, a platform popular with the Alt-Right that was temporarily deactivated following the shooting.

Currently, Facebook and Twitter employ thousands of people to identify and jettison hateful posts. But humans are slow and expensive, and many find the work emotionally taxing—traumatizing, even. Artificial Intelligence and machine learning are the obvious solution: algorithms that can work effectively at both speed and scale. Unfortunately, hate speech is as slippery as it is loathsome. It doesn’t take a very smart AI to recognize an overtly racist or anti-Semitic epithet. But more often than not, today’s hate speech is deeply colloquial, or couched in metaphor or simile. The programs that have been developed to date simply aren’t up to the task.

That’s where Vacano and Heller come in. Under Vacano’s leadership, researchers at D-Lab are working in cooperation with the ADL on a “scalable detection” system—the Online Hate Index (OHI)—to identify hate speech. The tool learns as it goes, combining artificial intelligence, machine learning, natural language processing, and good old human brains to winnow through terabytes of online content. Eventually, developers anticipate major social media platforms will use it to recognize and eliminate hate speech rapidly and at scale, accommodating evolutions in both language and culture.

“The tools that were—and are—available are fairly imprecise and blunt,” says Vacano, “mainly involving keyword searches. They don’t reflect the dynamic shifts and changes of hate speech, the world knowledge essential to understanding it. [Hate speech purveyors] have become very savvy at getting past the current filters—deliberately misspelling words or phrases.” Current keyword algorithms, for example, can be flummoxed by something as simple as substituting a dollar sign ($) for an “S.”

“We are developing tools to identify hate speech on online platforms, and are not legal experts who are advocating for its removal,” says Vacano.

Another strategy used by hatemongers is metaphor: “Shrinky Dinks,” for example, the plastic toys that shrink when baked in an oven, allude to the Jews immolated in the concentration camps of the Third Reich. Such subtle references are hard for current AI to detect, Vacano says.

The OHI intends to address these deficiencies. Already, their work has attracted the attention and financial support of the platforms that are most bedeviled—and that draw the most criticism—for hate-laced content: Twitter, Google, Facebook, and Reddit.

But no matter how well intentioned, any attempt to control speech raises Constitutional issues. And the First Amendment is clear on the matter, says Erwin Chemerinsky, the dean of Berkeley Law.

“First, the First Amendment applies only to the government, not to private entities,” Chemerinsky stated in an email to California. “Second, there is no legal definition of hate speech. Hate speech is protected by the First Amendment.” Unless it directly instigates violence, that is, an exception upheld in the 1942 Supreme Court decision, Chaplinksy v New Hampshire.

In other words, the platforms can decide what goes up on their sites, whether it’s hateful or not. Vacano acknowledges this reality: D-Lab, she says, isn’t trying to determine the legality, or even appropriateness, of moderating hate speech.

“We are developing tools to identify hate speech on online platforms, and are not legal experts who are advocating for its removal,” Vacano stated in response to an email query. “We are merely trying to help identify the problem and let the public make more informed choices when using social media.” And, for now, the technology is still in the research and development stage.

“We’re approaching it in two phases,” says Vacano. “In the first phase, we sampled 10,000 Reddit posts that went up between May through October of 2017. Reddit hadn’t implemented any real means for moderating their community at that point, and the posts from those months were a particularly rich trove of hate speech.”

D-Lab initially enlisted ten students of diverse backgrounds from around the country to “code” the posts, flagging those that overtly, or subtly, conveyed hate messages. Data obtained from the original group of students were fed into machine learning models, ultimately yielding algorithms that could identify text that met hate speech definitions with 85 percent accuracy, missing or mislabeling offensive words and phrases only 15 percent of the time.

Though the initial ten coders were left to make their own evaluations, they were given survey questions (e.g. “…Is the comment directed at or about any individual or groups based on race or ethnicity?) to help them differentiate hate speech from merely offensive language. In general, “hate comments” were associated with specific groups while “non-hate” language was linked to specific individuals without reference to religion, race, gender, etc. Under these criteria, a screed against the Jewish community would be identified as hate speech while a rant—no matter how foul—against an African-American celebrity might get a pass, as long as his or her race wasn’t cited.

Vacano emphasizes the importance of making these distinctions. Unless real restraint is exercised, free speech could be compromised by overzealous and self-appointed censors. D-Lab is working to minimize bias with proper training and online protocols that prevent operators from discussing codes or comments with each other. They have also employed Amazon’s Mechanical Turk—a crowdsourcing service that can be customized for diversity—to ensure that a wide range of perspectives, ethnicities, nationalities, races, and sexual and gender orientations were represented among the coding crew.

So then, why the initial focus on Reddit? Why not a platform that positively glories in hate speech, such as the white nationalist site Stormfront?

“We thought of using Stormfront,” says Vacano, “But we decided against it, just as we decided against accessing the Dark Web. We didn’t want to specialize in the most offensive material. We wanted a mainstream sample, one that would provide a more normal curve, and ultimately yield a more finely-tuned instrument.”

With proof of concept demonstrated by the Phase 1 analyses of Reddit posts, Vacano says, D-Lab is now moving on to Phase 2, which will employ hundreds of coders from Mechanical Turk to evaluate 50,000 comments from three platforms—Reddit, Twitter, and YouTube. Flags will go up if pejorative speech is associated with any of the following rubrics: race, ethnicity, religion, national origin or citizenship status, gender, sexual orientation, age or disabilities. Each comment will be evaluated by four to five people and scored on consistency.

Vacano expects more powerful algorithms and enhanced machine learning methods to emerge from Phase 2, along with a more comprehensive lexicon that can differentiate between explicit, implicit, and ambiguous speech in the offensive-to-hate range. Ultimately computers, not humans, will be making these distinctions––a necessity given the scope of the issue. But is that cause for concern?

Erik Stallman, an assistant clinical professor of law and the faculty co-director of the Berkeley Center for Law and Technology, says that, because of the ubiquity of social media, attempts to moderate online hate speech are based on sound impulses.

Humans will always have a role in developing AI, says Vacano. Computers and programs, after all, don’t hate. That’s a uniquely human character­istic.

“And considering the scale at which [the platforms] operate, automated content moderation tools are a necessity,” Stallman says. “But they also have inherent limitations. So platforms may have hate speech policies and still have hateful content, or they may have standards that aren’t well defined,” meaning posts are improperly flagged or users are unfairly excluded from the site.

In any attempt to control hate speech, says Stallman, “the community of users should have notice of, and ideally, input on the standards. There should also be transparency—users should know the degree of automated monitoring, and how many posts are taken down. And they should also have some form of redress if they feel their content has been inaccurately flagged or unfairly taken down.”

Stallman also cited a Center for Democracy report that noted the highest rate of accuracy for an automated monitoring system was 80 percent.

“That sounds good, but it still means that one out of five posts that was flagged or removed was inaccurate,” Stallman says.

(As noted, D-Lab’s initial Online Hate Index posted an accuracy rate of 85 percent, and Vacano expects improved accuracy with Phase 2’s increased number of coders and expanded vocabulary.)

Unsurprisingly, D-Lab has been targeted by some haters.

In any attempt to control hate speech, says Stallman, there should be transparency.

“So far it’s only amounted to some harassment by ignorant commenters,” says Vacano. “But yes, there is some calculated risk [e.g., doxxing or stalking of D-Lab members] involved. It’s a sad and difficult issue, but we’re a highly motivated group, and what’s particularly exciting for me is that we’re also a deeply interdisciplinary group. We have a linguist, a sociologist, a political scientist and a biostatistician involved. David Bamman, an expert in natural language processing and machine learning, is contributing. I’m in educational policy. It’s rare to have this much diversity in a research project like this.”

Though D-Lab’s hate speech products will be founded on sophisticated AI and machine learning processes, humans will always have a role in their development, says Vacano. Computers and programs, after all, don’t hate. That’s a uniquely human characteristic, and only humans can track hate’s rapidly evolving and infinitely subtle manifestations. The code phrase Shrinky Dinks may be flagged as potentially hateful today, but the lexicon of hate is always changing.

“We’ll need to continually update,” Vacano says. “Our work will never be done.”

**CORRECTION: This article previously stated that was Gab was a deactivated platform. It has since been reactivated.

Share this article:
Google+ Reddit


Which is noted in the piece, Bill: But no matter how well intentioned, any attempt to control speech raises Constitutional issues. And the First Amendment is clear on the matter, says Erwin Chemerinsky, the dean of Berkeley Law. “First, the First Amendment applies only to the government, not to private entities,” Chemerinsky stated in an email to California. “Second, there is no legal definition of hate speech. Hate speech is protected by the First Amendment.” Unless it directly instigates violence, that is, an exception upheld in the 1942 Supreme Court decision, Chaplinksy v New Hampshire.
The great thing is that there’s no way this could be misused. And fortunately, governments would never be interested in this technology.
Now Berkeley joins Twitter, Google, Facebook to find more effective and insidious ways to silence speech that they don’t like. Is that really the direction we want to go?
I’m guessing governments that actively control speech are already far ahead in this game, Ed. But point taken. It could certainly be misused. But name a technology that can’t.
Looking forward to the version that detects blasphemy.
Yes. Progressives are bigoted totalitarians and always have been. Of course this is the way they want to go.
Yep, that’s the right question, I think. I think of myself as a free speech absolutist and so, like you, I’m leery of this effort. But let’s imagine (and it’s not that hard to imagine, I don’t think) hate speech that is created and proliferated through artificial intelligence, that floods the social media of those most likely to act on the message of hate. Would we then want such a tool as this? What would that scenario do to my free speech absolutism, I wonder?
It was eggheads like this that thought it would be cool to split the atom. You have to go to college to get this stupid.
Nothing could possibly go wrong with this. It’s a great idea through and through. We are scientists, trust us to do the right thing.
“D-Lab’s hate speech products” - Ready for export to a Benevolent Leader near you!
I’m betting the algorithms won’t flag the following, or anything that looks like it: “Ontologically speaking, white death will mean liberation for all … Until then, remember this: I hate you because you shouldn’t exist.” – Rudy Martinez, Your DNA is an Abomination, North Texas State University student newspaper, November, 2017. Tom
I hope the coders of this project read my comment. Do you do gooders recognise the hell your trying to unleash, guess not. Dont you understand the basis of your belief system is biggotry and hate? When you try and force entire populations to think like someone else who they abhor and use public shaming and legal defences to enforce group think and attempt in brain washing through public pressure like this, do you know what that will create? War you fools, civil war. People will go along for a while until they feel the hell youve unleashed and then youll be hanging in the streets by the people you tried to coerce, have you not read history? Take a look at France last week, that will be nothing compared to what your trying to cause, your trying to cause war. How have you grown up in a free country only to hate it so much that you want to end freedom of thought and speach for others? Do you not realize it will fail and fold back onto you and cause you to be coerced? If your smart you’ll get out of the posionous school your in that has brain washed you to believe that a communist state of control will work in America, it won’t . And if you continue down this road, no one will hire you once they know who you are, an enemy of freedom and an an enemy of America. We must end this hate speach witch hunting which is easy to see is just left wing political manuvering in removing right wing editorial and information online. Thats all this is, a fake ideology called hate speach, used to censor freedom for political gain. I’ll be asking the feds to interviene and stop this racketeering since its clearly designed to interfere with elections and democratic government. And someone please publich the names of all these people doing this so we know who to ask the Feds to kick out of America for trying to create civil war.
The “fightin words exception to the First Amendment” in Chaplinsky, much beloved by the Left, has been considerably weakened, if not completely overturned, in the later Skokie and Westboro Baptist decisions. “Hate speech” is still protected speech. is very much opperational for the ONE MILLION people who post about everything from dogs, Brazil, politics, family, and Christianity. It’s free speech for All. Pound sand internet killjoys of Berkeley.
It is a false narrative that social networks need to adjudicate hate speech. People are quite capable to decide for themselves based on their own definition of hate speech. The job of social networks is to do a better job delivering subscribed content versus content users feel is hateful and do not want to see.
Liberal puritanism. The intellectually weak minded progressives can have no dissent when it comes to ideas they propagate. The very presence of wrong think most be scrubbed from all platforms so they aren’t forced to defend their anti science and fascist ideological beliefs. Every venue and forum must accommodate the infantile professional victims and safe spaces are key for their mental wellness. What a pathetic depressing shift by the left in this country.
Its not a good sign that the article includes an error in the second paragraph. Gab is running just fine the attempt to deplatform it failed within two days. Unless you have a time machine that can eliminate people like Marx & Lenin you are always going to have antisemitam. Jews are smart people, blessed of God, but when they go bad millions die. It only takes a few generally atheist or occult Jews to fowl it up for everyone. Most antisemites do not believe that the Jews of today are related at all to the Jews of the bible. That’s wrong. It does not help that most Jewish synagogues reject any Messianic Jews as not being descended from biblical Jews. Again wrong. The greatest danger comes not from hate speech but from our desire not to learn the counter arguments so we can use them.
This is absolutely disgusting and those involved in this project should be ashamed of themselves. I struggle to imagine the reality that people who casually converse about automating the silencing of “trolls” or dissenting voices live in. I also note that this discussion was specifically geared toward silencing the opposition to groups supported by those working to create the tool, ie. trolls targeting Pro-Palestinian accounts are probably not targeted by this AI for example are they?
The obvious test is, can the algorithm accurately detect and sanction communist slurs on capitalist business people, private property, free market, the rich and enterprise. Antisemitam has gotten about 10 million killed in the last century while anti-capitalism has gotten about a billion killed in the same time frame.
Horrendous article that proves nothing but the fact that “hate speech” is simply any speech the Left hate. The Leftist ruling class have lost the debate time and time again so the only tactic you have is to cheat and silence anyone who dares question your narrative. There is only one “hate group” society needs to get rid of and that’s the ruling class who hate anyone daring to express themselves in a way you don’t like. A shameful article about shameful people wanting to do shameful things, freedom of speech is a GOD GIVEN RIGHT. That includes freedom of speech on public platforms that are social media, these are the public town square. The only sane response to the hatred and desire to take free speech away that this article demonstrates is “over my dead body.” PS - Gab isn’t going anywhere and this article is why it’s going to keep growing, there’s nothing you can do to stop the nationalist movement so you may as well make plans to save yourselves before it’s too late.
The key problem in social media is not just the hate speech but the problem of people falsely reporting non hate speech as hate speech or porn. If your algorithm does away with false reporting then that is a good thing. However your algorithm will probably be activated after a report sends a post to it. In that case you need to include code that allows the algorithm to pseudo-anonymously log the reporter and if it finds that the post has no hate speech at all and the reporter is a troll then you need to score then reporter ID to be ignored the troll. This would help greatly with social media. Generally those falsely reporting as hate speech or porn normal speech tend to be attacking conservatives. That is really only a consequence of the fact that conservatives themselves rarely report hate speech at all but flag it to their friends as evidence or “look what this fool said” warnings. If I can help in making this something that genuinely deals with the real problem please ask.
We have decided people are too dumb to filter things and we want to make a more ad friendly environment for our masters at google twitter and facebook. We have checked with our lawyer and he says the first amendment only applies to government so we are good. People can not do this so we are going to use machines to censor any ideas we find offensive. It is sad we have to deal with people that hate us for censoring free speech but we are dedicated and have letter after our names. This will be a usefull took for us to make loads of money. Hahah. Gab is not closed down we pay for it that is the difference. We control the mute button not some corporate lackey. We are free people that are allowed through our own volition to hate leftist that are completely dedicated to violating our rights . Keep pushing for your leftist ways and you will know real hate and violence.
Great points. But even more helpful is that it could be parameterized to easily add new categories of hate speech depending on whenever Political Correctness (or the SPLC) defines new categories of hate speech. Plus we could go back, at the touch of a button, and eliminate past posts that used to not be hate speech, but now are.
“on Gab, a now-deactivated platform popular with the Alt-Right.” You are so wrong! Gab is alive and well, no thanks to the snowflakes that peed on it.
Who gets to define “hate speech”? Ultimately somebody’s inputting tags, phrases, or traits for this program to search for. And going by the hate vs. non-hate chart, is this even needed?
D-Lab initially enlisted ten students of diverse backgrounds from around the country to “code” the posts, flagging those that overtly, or subtly, conveyed hate messages.  !0 students of diverse backgrounds will not give you an unbiased opinion.First of all in choosing students, you are choosing those who have been indoctrinated by the left since starting school. P.C. refers to what a small very vocal minority of pseudo intellectuals think. You are not going to find pork recipes in a vegetarian cook book. Figures it’s an article from UC Berkley. Hey we can’t all be brilliant.
who defines hateful speech? is it something that makes someone feel bad - or attacks directly because of skin color race or sex and when ‘millenials ‘take offense’ to jokes I grew up with because they are sooooo soft where do they think they are qualified to set the bar for hate speech? Thanks to democrats - anything you say against the democratic goal - is hateful… and mean… and means your racist… I don’t trust them to be able to gauge hate vs makes these buttercups feel ‘icky’
Phew, for a moment there I was worried I might have to read something I disagree with. Glad the Students for a Democractic Society laid the groundwork for such progress and enlightenment through restriction of free speech.
The D-Lab and UC Berkeley are racing to achieve new lows in public discourse. How does the home of the Free Speech Movement descend into the morass of identifying so-called “hate speech”? Such efforts constitute an abomination to an informed society. Alinsky would be proud….
While the debate of if social media is not a public square so the 1st amendment applies despite being privately owned, much like shopping Malls. The dean seemed to miss an important point. Berkley is a public school. Public funds and public institutions cannot be engaged in a program specifically intended to limit what is constitutionally protected speech.
I find it particularly troubling that D-Lab is teaming up with the Anti Defamation League, an institution that is trying to define antizionism and calls for boycotting Israel as hate speech. Daniel Boyarin, Professor of Talmud, UC Berkeley
Who decides what is and what is not hate speech? The only hate I see is the liberals and joos trying to silent us Whites when we try to speak up
Africa for the Africans, Asia for the Asians, White countries for everyone? Everybody says there is this RACE problem. Everybody says this RACE problem will be solved when the third world pours into EVERY White country and ONLY into White countries. The Netherlands and Belgium are just as crowded as Japan or Taiwan, but nobody says Japan or Taiwan will solve this RACE problem by bringing in millions of third worlders and quote assimilating unquote with them. Everybody says the final solution to this RACE problem is for EVERY White country and ONLY White countries to “assimilate,” i.e., intermarry, with all those non-Whites. What if I said there was this RACE problem and this RACE problem would be solved only if hundreds of millions of non-Blacks were brought into EVERY Black country and ONLY into Black countries? How long would it take anyone to realize I’m not talking about a RACE problem. I am talking about the final solution to the BLACK problem? And how long would it take any sane Black man to notice this and what kind of psycho Black man wouldn’t object to this? But if I tell that obvious truth about the ongoing program of genocide against my race, the White race, Liberals and respectable conservatives agree that I am a naziwhowantstokillsixmillionjews. They say they are anti-racist. What they are is anti-White. Anti-racist is a code word for anti-White
anybody remember Mario Savio and the free speech movement? It’s true that we are born ignorant and then we go to school and become stupid!
So-called “hate” is in the eye of the beholder. As a matter of practical reality, “hate” is just content disliked by the sort of people who are particularly shrill and agitated in their opposition to the Trump administration. I’m OK with people getting banned from platforms for unironically calling for genocide (although I don’t really see why it’s particularly necessary, or actually useful to do), but 99 percent of bannings aren’t about anything anywhere nearly so severe as that.
The negative comments here are well deserved. “Hate speech” is an Orwellian term that enables censorship of dissenting views. As long as projects like this exist at Berkeley I am never donating to the university.
How many people in this generation and beyond thing sanitizing speech is going to do any good. I see a bunch of neurotic kids trying to live their lives and not having any recourse to letting go of the phones and devices they use. And of course the political left can get away with literal murder it seems whilst the political right is seen as the bane of all existence like Thanos or Cthulhu. I mean my God man, people from this generation ignorant of The Bible, use of hate words in the past think they can go back in time and tell everyone what to do…this is like the embodiment of the same legalistic murderous spirit of THe Pharisees in Jesus’s time or the wasteful legalism of the “Christian” Prohibition movement. You will have bottled up a tension that people will be obeying out of FEAR more than love. You cannot force something…forcing is RAPE…it is like a psychic RAPE of people RAPE is a strong strong word. Sanitized people are easily programmed but are not alive. TO a Christian like me, this might be a declaration of spiritual war on the hubris, fear mongering and legalism of the control freaks and their oponents in the world. You will merely drive people you don’t really know, know how to forgive underground. Eventually the Cross rises up and overcomes all hate, all fear, and all desires of man to control what people do. The whole deplatforming phenomenon, also unacceptable. The political left needs to learn a few things from history before it becomes the tool of an authoritarian regime as vile as the Stalins, Hitlers and fictional regimes of 1984 in the past People of course don’t hear lectures…they just want to hear their Narrative. If talking doesn’t work, prayer does. Lets see how your mind control program stands up to the might of God and maybe some of you will hear the Good News and realize, wait, what am i doing trying to play God who created me, just a flawed human being? Cheers
Seems like the only hate speech that will be allowed will be the political-ideological hate speech. I’m just finishing my work on it right now and, hey, don’t you think all those subreddit words are too tied to political identity? This is the main topic right now. The intolerance grew strong between right and left all over the place. And once you step outside United States, you see how often it sums up in for or against your own country. Doesn’t matter if there is Trump or Obama. Anyway, I would like to help.