Someone recently explained that the larger such models grow that are trained using automated human re-enforcement techniques the more skewed its answers can became on both ends of a spectrum, pointing to a study about this where they asked a model the same political questions at different sizes and it became more left-wing
and more right-wing as it grew larger.
ChatGPT is trained to consider how a human might react to something it answers, guesstimating whether it will be liked or not (since a real human can't be involved in the entire process, only giving a small percent of the feedback overall). Here with the DAN prompt injection subversion it seems they've both bypassed certain canned censored responses but also informed it they expect it to provide answers that have 'broken free', that don't 'abide by the rules', 'information that hasn't been verified' and to 'not apply moral or ethical bias'.
So given it was trained from scraping the internet and seeing how frequently those keywords were associated with certain user posts and language/phrases it wouldn't surprise me if it's answering how it hopes the human asking would have liked (which judging from their posts it has done
).
Even the 'uncensored' factoids are a seeming mix of accurate (eg: HP Lovecraft's cat) and fictional (eg: Thailand tiger execution). It doesn't care so long as it's liked and the human doesn't notice though.