In training AI language models, how do we solve this data paradox: to make AI understand natural human communication, it needs raw internet data; but this raw data is ...
In training AI language models, how do we solve this data paradox: to make AI understand natural human communication, it needs raw internet data; but this raw data is heavily polluted with toxic biases and hate speech. Should researchers prioritize the 'authenticity' or the 'purity' of the corpus?
My opinion:
To solve this data paradox, researchers should not completely choose one side over the other. If we only prioritize 'authenticity', the AI will learn human toxicity and become dangerous to users. On the other hand, if we only prioritize 'purity' by heavily cleaning the data, the AI will talk like a rigid robot. It will fail to understand natural slang, sarcasm, or complex human emotions. Therefore, we must find a balance between both.
The most effective solution is a two-step approach. First, researchers must use automatic filters to remove the most extreme hate speech and illegal content from the raw internet data. This creates a basic level of 'purity'. Second, they allow the AI to read the remaining natural conversations to learn how humans actually communicate. During this step, researchers use a method called 'Reinforcement Learning from Human Feedback' (RLHF). This means human experts act like teachers to reward the AI when it gives polite answers and punish it when it shows bias. By using this strategy, the AI still learns the 'authentic' grammar, slang, and context of how humans speak, but it is guided by human experts to respond with 'purity' and respect. In short, we feed the AI real human language, but we teach it good human values.
