Entities
OpenAI users
Incidents Harmed By
Incident 7291 Report
GPT-4o's Chinese Tokens Reportedly Compromised by Spam and Pornography Due to Inadequate Filtering
2024-05-14
OpenAI's GPT-4o was found to have its Chinese token training data compromised by spam and pornographic phrases due to inadequate data cleaning. Tianle Cai, a Ph.D. student at Princeton University, identified that most of the longest Chinese tokens were irrelevant and inappropriate, primarily originating from spam and pornography websites. The polluted tokens could lead to hallucinations, poor performance, and potential misuse, undermining the chatbot's reliability and safety measures.
MoreRelated Entities
Other entities that are related to the same incident. For example, if the developer of an incident is this entity but the deployer is another entity, they are marked as related entities.
Related Entities
Other entities that are related to the same incident. For example, if the developer of an incident is this entity but the deployer is another entity, they are marked as related entities.