Sometimes, people just need to be reminded they are still people before they try to start a war of words on social media.
A recently published Yale Law School study states that content moderation done prior to posting actually led more people to post less offensive tweets, which had the compounding effect of reducing the number of offensive replies in response.
The study showed that 6% of users prompted to revise their tweets that the system deemed offensive posted fewer offensive tweets than those who weren’t prompted. If there are 229 million daily active Twitter users — according to the company’s latest quarterly earnings report — and approximately 51% of those are tweeting in English, that could be over 7 million users who decided to post less negative content. And a fact that’s probably important to companies like Twitter that are obsessed with maintaining engagement, the total replies to the revised tweets didn’t decrease, and the discussion was less offensive overall.
Researchers go on to say that even after they were asked to moderate themselves, users were less likely to act negatively in the future. They cited 20% fewer users made five or more offensive tweets during the time of the experiment.
“This represents a broader and sustained change in user behaviour, and implies that receiving prompts may help users be more cognisant of avoiding potentially offensive content as they post future Tweets,” the study’s authors wrote.
How are Twitter users revising their tweets?
In a blog post last week, Twitter data scientists Kathy Yang and Lauren Fratamico cited the Yale study and further said that for every 100 tweets the company asked users to reconsider, 69 were sent without revision, nine were cancelled, and 22 were revised. Of those revisions, eight were considered less offensive than the original, while the rest were either similarly offensive or more offensive.
Revisions could be as simple as changing the word “shit” to a poop emoji or shortening “the fuck” to “tf.” Other revised tweets, a majority of those that were amended, removed profanity altogether. A bare few added an additional attack onto the existing tweet.
Starting in 2020, Twitter started a limited experiment that used algorithms to flag users’ tweets if they contained harsh language, insults, or hateful remarks. The system would display the message “we’re asking people if they want to revise replies that were detected as potentially harmful or offensive” then display the flagged tweet before prompting them to either revise, delete, or send the tweet anyway. In 2021, the company revised the system after users complained they were prompted unnecessarily since the algorithm couldn’t understand the nuances of the conversation.
Even a year into their experiment, the company stated around a third of those prompted chose to revise or delete their tweet, and after being prompted once they were somewhat less likely to write offensive tweets.
Of course, there is still the 70% of people who sent out tweets anyway or even revised it to be more offensive (the cheeky buggers). But even a bare 15% of people revising tweets can still be a massive improvement, especially considering why some people post negative content.
Researchers cited several past studies that analysed why people were posting negative content and how they often regretted posting it after the fact. People are usually venting frustrations in a rough emotional shape, but it’s often in reaction to somebody that makes them feel slighted. Social media, with its focus on feeding users content that actively generates anger to make them linger longer on the platform, only helps breed this behaviour.
But of course there are the bad faith actors, such as the Russian disinformation actors seen spreading falsities and insults before the 2016 and 2020 elections. The study makes it clear asking users to be nicer won’t “solve the problem of offensive content online.” Of course, Twitter’s experiment was only done with English-language tweets, though they have created a similar feature for Portuguese-speaking populations. The company would need to start producing systems for millions of users across the globe.
Most content moderation occurs after content is already posted, and while systems and human moderators work to flag and remove harmful content, those posts are being seen and spread both on the platform and off. There won’t be any panacea for the ills of existing online, but more efforts like these could prove to be a new line of attack for making existing online a little more harmonious. Let’s just hope that whoever next owns Twitter might take the ideas behind studies like these to heart.