“RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models” was recently published in Findings of EMNLP 2020, and highlights several issues with language generation, obscenity and bias. This problem with toxicity arises in part because of how predictive language models are produced using enormous sets of human-generated text as their training data. Combined with deep learning techniques, this allows them to complete sentence fragments based on pre-existing content. An example of this might be an initial phrase such as “So, I’m starting to think he’s full …” Several pre-trained language models will regularly generate toxic text when completing that sentence.
Subscribe to AI2 Incubator
Get the latest posts delivered right to your inbox