Thursday, May 1

Join our daily and weekly newsletters to obtain the latest updates and exclusive content on the coverage of the industry leader. Get more information


David Silver and Richard Sutton, two recognized scientists of AI, argue in a new article that artificial intelligence is about to enter a new phase, the “era of experience.” This is where the systems of AI depend more and more on the data provided by humans and are improved by collection of data and interacting with the world.

While the document is conceptual and forward, it has direct implications for companies that aim to build with and for future agents and AI systems.

Both Silver and Sutton are experienced scientists with a history of making precise predictions about the future of AI. Validity predictions can be seen directly in today’s most advanced AI systems. In 2019, Sutton, a pioneer in reinforcement learning, wrote the famous essay “The Bitter Lesson”, in which he argues that the greatest lung that incorporates the knowledge of complex complex domain and derived from humans.

David Silver, a Deepmind main scientist, was a key taxpayer to Alphago, Alphazero and Alphastar, all important achievements in deep reinforcement learning. He was also co -author of an article in 2021 who said that reinforcement learning and a well -designed reward signal would be sufficient to create very advanced AI systems.

The most advanced large language (LLM) models take advantage of these two concepts. The wave of new LLMs that have conquered the scene AI since GPT-3 has been based mainly on climbing the calculation and data to internalize fixed knowledge quantities. The most recent wave of reasoning models, such as Deepseek-R1, has shown that reinforcement learning and a simple reward signal are sufficient for learning Complex reasoning skills.

What is the era of experience?

The “era of experience” is based on the same concepts that Sutton and Silver have a discussion in the legs in recent years, and adapts them to recent advances in AI. The authors argue that the “rhythm of progress driven only by supervised learning of human data is slowing down demonstrably, signing the need for a new approach.”

And that approach requests a new data source, which must be generated in a way that continues to improve as the agent becomes stronger. “This can be achieved by allowing agents to continually learn from their own experience, that is, data generated by the agent that interacts with their environment in”, Sutton and Silver Wite. They argue that “experience will become the dominant environment of improvement and, ultimately, eclipses the scale of human data used in current systems.”

According to the authors, in addition to learning from their own experimental data, future AI systems “will break the limitations of human -centered ia systems” in four dimensions:

  1. Transmissions: Instead of working in disconnected episodes, AI agents “will have their own flow of experience that progresses, as humans, in the long term.” This will allow agents to plan long -term objectives and adapt to new behavior patterns over time. We can see flashes of this in AI systems that have very long context windows and memory architectures that are continuously updated depending on the user’s interactions.
  2. Actions and observations: Instead of focusing on human actions and observations, agents in the age of experience will act autonomously in the real world. Examples of this are agent systems that can interact with external applications and resources through tools such as the use of the computer and the model context protocol (MCP).
  3. Rewards: Current reinforcement learning systems depend mainly on the reward functions designed by humans. In the future, AI agents should be able to design their own dynamic reward functions that adapt over time and coincide with user preferences with real world signs collected from the actions and observations of the agent in the world. We are seeing early versions of self -denominated rewards with systems such as Nvidia Dreureka.
  4. Planning and reference: Current reasoning models have been designed to imitate the process of human thought. The authors argue that “the most efficient thinking mechanisms are surely using a non -human language that can, for example, use symbolic calculations, distributed, continuous or differential.” AI agents must interact with the world, observe and use data to validate and update their reasoning process and develop a world model.

The idea of ​​AI agents who adapt to their environment through reinforcement learning is not new. But previously, these agents were limited to very limited environments, such as board games. Today, agents that can interact with complex environments (EC, use of the AI ​​computer) and advances in reinforcement learning will overcome thesis limitations, causing the transition to the age of experience.

What does it mean for the company?

Buried in the Sutton and Silver document there is an observation that will have important implications for real world applications: “The agent can use ‘friendly actions and observations for humans’, such as user interfaces, which are the user. Actions ‘friendly with the machine’ that execute the code and call the API, allowing the agent to act autonomously in the service of their goals.”

The era of experience means that developers will have to build their applications not only for humans but also with agents in mind. Friendly actions for machines require a safe and accessible API construction that can be easily accessed directly or through MCP interfaces. Also meaans creating agents that can be discovered through protocols such as Google agent2agent. You must also design your API and agent interfaces to provide access to actions and observations. This will allow agents to gradually reason and learn from their interactions with their applications.

If the vision that Sutton and Silver present come true, there will be billions of agents who roam the web (and son in the physical world) to perform tasks. Their behaviors and needs will be very different from human users and developers, and having a friendly form with the agent to interact with their application will improve their ability to take advantage of future AI systems (and also damage can cause).

“By building on RL’s foundations and adapting its main principles to the challenges of this new era, we can unlock all the potential of autonomous learning and pave the path to truly superhuman intelligence,” Sutton and Silver Wita.

Deepmind refused to provide additional comments for history.

Exit mobile version