
Yann LeCun, a leading AI scientist at Meta and one of the ‘godfathers of AI’, has shared his thoughts on the current state of AI and its future direction. He also suggested that AI cannot rely just on textual data to grow in intelligence.
In a LinkedIn post, he points out that animals and humans can learn very quickly with much less data than current AI systems. For example, large language models (LLMs) are trained on so much text data that it would take a human 20,000 years to read it all. Despite this, these models still struggle with basic logic, like understanding that if A is the same as B, then B is the same as A. In contrast, humans and even animals like crows, parrots, dogs, and octopuses can understand this concept quickly, even though they have far fewer neurons and “parameters” (a term used in AI to refer to the parts of the model that are learned from data).
LeCun believes that the future of AI lies in new models that can learn as efficiently as animals and humans. He suggests that using more text data, whether real or synthetic, is just a temporary solution due to the limitations of current AI approaches. He sees the real breakthrough coming from using sensory data, like video, which has more information and structure.
LeCun said, “Animals and humans get very smart very quickly with vastly smaller amounts of training data than current AI systems. Current LLMs are trained on text data that would take 20,000 years for a human to read. And still, they haven't learned that if A is the same as B, then B is the same as A.”
He added, “Humans get a lot smarter than that with comparatively little training data. Even corvids, parrots, dogs, and octopuses get smarter than that very, very quickly, with only 2 billion neurons and a few trillion ‘parameters.’”
He explains that a two-year-old child sees more visual data than the amount of data used to train LLMs. This data is more valuable because it has more redundancy, meaning it repeats the same information in different ways. This helps the child learn about the structure of the world. LeCun said, “The total amount of visual data seen by a 2 year-old is larger than the amount of data used to train LLMs, but still pretty reasonable. 2 years = 2x365x12x3600 or roughly 32 million seconds. We have 2 million optical nerve fibers, carrying roughly ten bytes per second each. That's a total of 6E14 bytes. The volume of data for LLM training is typically 1E13 tokens, which is about 2E13 bytes. It's a factor of 30.”
Essentially, LeCun believes that the future of AI lies in models that can learn as efficiently as humans and animals, and that using sensory data like video will be key to achieving this.
Google recently introduced the Gemini AI model which comes with support for multi-modal input in the form of video, audio and text. The search giant claims that the new model is one of the most powerful in the world and it has been built as a multi-modal AI model from the ground up. LeCun believes that “there is more to learn from video than from text because it is more redundant. It tells you a lot about the structure of the world.”
Also read: McDonald’s to use Google’s generative AI but will it make your burgers, fries fresher and hotter?
For Unparalleled coverage of India's Businesses and Economy – Subscribe to Business Today Magazine