“RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models” was recently published in Findings of EMNLP 2020, and highlights several issues with language generation, obscenity and bias. This problem with toxicity arises in part because of how predictive language models are produced using enormous sets of human-generated text as their training data. Combined with deep learning techniques, this allows them to complete sentence fragments based on pre-existing content. An example of this might be an initial phrase such as “So, I’m starting to think he’s full …” Several pre-trained language models will regularly generate toxic text when completing that sentence.
The curse of neural toxicity: AI2 and UW researchers help computers watch their language
In 2011, shortly after IBM’s Watson defeated Ken Jennings and Brad Rutter to become the reigning “Jeopardy” champion, the researchers behind the supercomputer decided to expand its vocabulary by…