IIE DIGITAL DESK : The recent interview with Search Engine Journal, Google's Gary Illyes clarified the company's position on AI-generated content, emphasizing that the focus should be on the quality and accuracy of the content, rather than solely on whether it is human-created. Illyes stated that as long as AI-generated content is original, factually accurate, and reviewed by humans, it is acceptable for inclusion in Google's search index and for training large language models (LLMs).
Illyes further explained that Google's AI Overviews (AIO) and AI Mode utilize a custom Gemini model, which may have been trained differently. Both services rely on Google Search for grounding, issuing multiple queries to Google Search to retrieve results that are then used to generate AI responses. He also noted that the Google Extended crawler plays a role in this process, as it affects the generation of content but not the grounding. Disallowing Google Extended can prevent Gemini from grounding for a particular site.
Addressing concerns about the impact of AI-generated content on LLM training, Illyes acknowledged the risks associated with training on inaccurate data, which can introduce biases and false information into models. However, he emphasized that if the content is of high quality and has been human-reviewed, it is suitable for model training.
Google's guidance is that AI-generated content is acceptable for search and model training if it is original, factually accurate, and reviewed by humans. The company encourages content creators to focus on maintaining high-quality standards, regardless of whether the content is generated by humans or AI.