Code for RL - Search News

Nvidia's Nemotron-Cascade 2 wins math and coding gold medals with 3B active parameters — and its post-training recipe is now open-source

Nvidia's Nemotron-Cascade 2 is a 30B MoE model that activates only 3B parameters at inference time, yet achieved gold medal-level performance at the 2025 IMO, IOI, and ICPC World Finals. Nvidia has ...

VentureBeat

Beyond math and coding: New RL framework helps train LLM agents for complex, real-world tasks

Researchers at the University of Science and Technology of China have developed a new reinforcement learning (RL) framework that helps train large language models (LLMs) for complex agentic tasks ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Nvidia's Nemotron-Cascade 2 wins math and coding gold medals with 3B active parameters — and its post-training recipe is now open-source

Beyond math and coding: New RL framework helps train LLM agents for complex, real-world tasks

Trending now