We’re big fans of Dash Cam Owners Australia here at Gizmodo. In fact, we tend to post its round ups every month. Unsurprisingly, the channel has a ton of other fans, including one who took things to the next level. If you were ever wondering what the most used words are and how many of them are just different iterations of ‘fuck’ you’re in for a treat.
Reddit user all_the_pineapple wrote a script that took 407 Dash Cam Owners Australia videos to create a word cloud. This came to a total of 33,615 total words and 3928 unique words. Apparently “fuck and fuck like words” are mentioned 525 times. If you look at the word cloud closely you sure can spot a few different versions. Both ‘fuck’ and ‘fucken’ are particularly prominent.
If you were expecting to see a prominent c-bomb in there, you’re not the only one.
“Yep, I was surprised cunt didn’t make an appearance,” all_the_pineapple said.
They also revealed how they did it.
“Written mostly in python, with a bit of shell scripting. Used google’s youtube api to scrape the channel for every video url. Looped these URL’s and used youtube-dl to download to wav files. Wav files were too big to feed into google’s speech to text api so split them up based on chunks of silence detected. Fed these chunks to the speech to text api, then ran a bunch of text substitution as google wrote words like fuk. Google also doesn’t like swearing so writes out c*** f***, so subbed those. Then fed the list of words into wordcloud_cli to generate the image. It’s not perfect, but i’m pretty happy with the results!”
They also answered some important user questions about different words and their usage across the videos.
“dumbcunt would be detected as two words if we as a nation stopped drooling sentences together. But honestly who has the fucking energy for that.”