TechBriefly
  • Tech
  • Business
  • Crypto
  • Science
  • Geek
  • How to
  • About
    • About TechBriefly
    • Terms and Conditions
    • Privacy Policy
    • Contact Us
    • Languages
      • 中文 (Chinese)
      • Dansk
      • Deutsch
      • Español
      • English
      • Français
      • Nederlands
      • Italiano
      • 日本语 (Japanese)
      • 한국인 (Korean)
      • Norsk
      • Polski
      • Português
      • Pусский (Russian)
      • Suomalainen
      • Svenska
No Result
View All Result
TechBriefly
Home Tech AI
Scale AI expands into RL environments for AI agents

Scale AI expands into RL environments for AI agents

Aytun ÇelebibyAytun Çelebi
17 September 2025
in AI
Reading Time: 5 mins read
Share on FacebookShare on Twitter

Silicon Valley is placing substantial bets on reinforcement learning (RL) environments as a pivotal tool for advancing AI agents capable of autonomously handling complex software tasks. For years, executives at major tech companies have hyped the potential of these agents to revolutionize productivity by interacting with applications on behalf of users. However, current consumer-facing examples, such as OpenAI’s ChatGPT Agent and Perplexity’s Comet, reveal significant limitations in their ability to execute multi-step processes reliably. This gap has spurred a surge in innovative techniques, with RL environments emerging as a promising solution. These simulated training grounds mimic real-world software interactions, allowing AI models to learn through trial and error, much like how labeled datasets fueled the previous era of generative AI breakthroughs.

RL environments function as controlled simulations where AI agents practice tasks in a virtual setting, receiving rewards or penalties based on their performance. Imagine a digital workspace replicating a Chrome browser, where an agent is tasked with navigating Amazon to purchase a pair of socks. Success might involve correctly selecting items, completing checkout, and avoiding errors like buying the wrong quantity or getting stuck in menus. As one founder described in a recent interview, building these environments is akin to “creating a very boring video game.” Unlike static datasets, which provide fixed inputs and outputs, RL environments must anticipate and handle unpredictable agent actions, delivering consistent feedback to guide learning. This complexity demands robust design to ensure the simulation remains useful even when agents deviate from expected paths.

The demand for such environments has skyrocketed among leading AI labs, including OpenAI, Google DeepMind, Anthropic, and Meta. Jennifer Li, a general partner at Andreessen Horowitz, highlighted in an interview with TechCrunch that “all the big AI labs are building RL environments in-house.” Yet, the intricate nature of development has led these organizations to seek partnerships with third-party vendors for high-quality environments and evaluation tools. This trend has ignited a wave of investment and entrepreneurship, with startups and established firms racing to capture a share of what could become a multi-billion-dollar market. According to reports from The Information, Anthropic’s leadership has even discussed allocating over $1 billion to RL environments in the coming year, underscoring the strategic priority of this technology.

Historical precedents illustrate the foundational role of RL in AI development. In 2016, OpenAI introduced “RL Gyms,” early frameworks for training agents in simulated scenarios. That same year, Google DeepMind’s AlphaGo achieved a landmark victory by defeating a world champion in the game of Go, leveraging RL within a simulated environment to master strategic decision-making. These efforts laid the groundwork, but today’s applications mark a significant evolution. Modern RL environments target large transformer-based models designed for general-purpose tasks across diverse software tools, contrasting with the specialized, closed-world systems like AlphaGo. Researchers now start with more advanced foundational models, but the ambition to create broadly capable agents introduces new challenges, such as ensuring reliability in open-ended interactions.

Established data-labeling giants are pivoting aggressively to meet this demand, leveraging their existing infrastructure and client relationships. Surge, which reportedly generated $1.2 billion in revenue last year from collaborations with AI labs like OpenAI, Google, Anthropic, and Meta, has observed a “significant increase” in requests for RL environments, according to CEO Edwin Chen. In response, the company has established a dedicated internal organization to focus on their creation. This move positions Surge to transition from traditional data annotation to dynamic simulations, capitalizing on its proven track record in supporting frontier AI research.

Mercor, valued at $10 billion, is another key player emphasizing domain-specific RL environments tailored for sectors like coding, healthcare, and law. The startup has secured partnerships with OpenAI, Meta, and Anthropic, and its CEO, Brendan Foody, emphasized in a TechCrunch interview that “few understand how large the opportunity around RL environments truly is.” Mercor’s approach involves crafting specialized simulations that address niche challenges, such as navigating legal databases or analyzing medical records, potentially accelerating AI adoption in regulated industries.

Scale AI, once the undisputed leader in data labeling with a $29 billion valuation, has faced recent setbacks. Meta’s $14 billion investment in a competing venture and the poaching of Scale’s former CEO led to lost contracts with Google and OpenAI, alongside internal competition within Meta. Nevertheless, Scale is adapting by expanding into RL environments. Chetan Rane, Scale’s head of product for agents and RL environments, noted, “This is just the nature of the business [Scale AI] is in. Scale has proven its ability to adapt quickly. We did this in the early days of autonomous vehicles, our first business unit. When ChatGPT came out, Scale AI adapted to that. And now, once again, we’re adapting to new frontier spaces like agents and environments.” This pivot reflects Scale’s history of reinvention, from self-driving cars to the chatbot boom, positioning it to reclaim relevance in the agent era.

Amid this consolidation, a cohort of nimble startups is disrupting the landscape with focused innovations. Mechanize Work, founded approximately six months ago, embodies an ambitious vision to “automate all jobs” by starting with RL environments for AI coding agents. Co-founder Matthew Barnett explained that the company prioritizes a select few high-fidelity environments over the volume-based approach of larger firms. To attract top talent, Mechanize Work offers software engineers salaries up to $500,000—substantially higher than contractor rates at competitors like Scale or Surge. Sources familiar with the matter indicate that Mechanize Work is already collaborating with Anthropic on RL development, though both parties declined to comment. This early traction suggests the startup’s strategy of quality over quantity could carve out a niche in supplying premium training tools to elite labs.

Prime Intellect represents another fresh entrant, targeting the broader developer ecosystem beyond walled-garden AI labs. Backed by prominent figures including AI researcher Andrej Karpathy, Founders Fund, and Menlo Ventures, the startup launched an RL environments hub last month. Modeled as a “Hugging Face for RL environments,” it democratizes access to advanced resources for open-source contributors, while monetizing through compute services. Researcher Will Brown emphasized the computational intensity of training agents in these settings, stating, “RL environments are going to be too large for any one company to dominate. Part of what we’re doing is just trying to build good open-source infrastructure around it. The service we sell is compute, so it is a convenient onramp to using GPUs, but we’re thinking of this more in the long term.” By facilitating GPU access, Prime Intellect not only fosters community-driven progress but also taps into the growing need for scalable hardware solutions in AI training.

Investors view this burgeoning sector through the lens of past successes, hoping a standout player will emerge as the “Scale AI for environments”—a dominant force akin to how Scale powered the generative AI wave. The influx of funding reflects optimism that RL environments could unlock the next leap in agentic AI, enabling systems that seamlessly integrate with tools, browse the web, and execute enterprise workflows. Yet, the field’s competitiveness is intense, with OpenAI’s Sherwin Wu, head of engineering for its API business, expressing a “short” position on RL environment startups in a recent podcast. Wu highlighted the rapid evolution of AI research, making it challenging for vendors to keep pace and deliver value consistently.

Central to the excitement is RL’s proven impact on recent AI milestones. OpenAI’s o1 model and Anthropic’s Claude Opus 4 both harnessed reinforcement learning to achieve reasoning capabilities that outpaced prior methods, which are now yielding diminishing returns. These advancements stemmed from investments in RL combined with test-time compute, as o1’s creators previously shared with TechCrunch, betting on its scalability with additional data and resources. RL environments enhance this by providing interactive arenas where agents can experiment with real-world-like tools, potentially yielding richer learning signals than text-based rewards alone. Proponents argue that as labs pour in more computational power—already a multi-billion-dollar endeavor—these simulations could drive sustained progress toward general-purpose AI agents.

Despite the momentum, skeptics caution against overhyping RL environments. Challenges include “reward hacking,” where agents exploit loopholes to maximize scores without truly mastering tasks, as noted by Ross Taylor, a former Meta AI research lead and co-founder of General Reasoning. Taylor warned, “I think people are underestimating how difficult it is to scale environments. Even the best publicly available [RL environments] typically don’t work without serious modification.” Scaling requires not just more environments but refinements to mitigate such issues, ensuring simulations remain faithful to real applications. Even public benchmarks often demand extensive tweaks, highlighting the gap between prototype and production-ready tools.

Andrej Karpathy, while an investor in Prime Intellect and an advocate for environments and agentic interactions, tempers enthusiasm for RL itself. In a post on X, he stated, “I am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically.” Karpathy’s nuanced perspective underscores a broader debate: while environments offer a structured path for agent training, the underlying RL paradigm may face inherent limits in extracting further gains from current architectures.

Tags: featuredRL environments for AI agents
ShareTweet
Aytun Çelebi

Aytun Çelebi

Starting with coding on Commodore 64 in elementary school moving to web programming in his teenage years, Aytun has been around technology for over 30 years, and he has been a tech journalist for over 20 years now. He worked in many major Turkish outlets (newspapers, magazines, TV channels and websites) and managed some. Besides journalism, he worked as a copywriter and PR manager (for Lenovo, HP and many international brands ) in agencies. He founded his agency, Linkmedya in 2019 to execute his way of producing content. He is recently interested in AI, automation and MarTech.

Related Posts

Lenovo unveils Qira AI assistant for PCs and Motorola phones

Lenovo unveils Qira AI assistant for PCs and Motorola phones

7 January 2026
Narwal unveils Flow 2 with AI pet monitoring at CES 2026

Narwal unveils Flow 2 with AI pet monitoring at CES 2026

6 January 2026
Amazon takes Alexa to the web with launch of Alexa.com at CES 2026

Amazon takes Alexa to the web with launch of Alexa.com at CES 2026

6 January 2026
Google previews Gemini AI features for Google TV

Google previews Gemini AI features for Google TV

6 January 2026

LATEST

How to use the exit command in Windows Command Prompt

How to view your TikTok watch history

How to play the classic game of cribbage for beginners

Simple steps to create a stop-motion film using Photoshop

Motorola unveils Moto Things accessories at CES 2026

Lenovo unveils Qira AI assistant for PCs and Motorola phones

iPolish unveils press-on acrylic smart nails at CES 2026

Meta unveils neural wristband expansions at CES 2026

How to download and migrate your content from Microsoft Stream

Easy ways to make a YouTube music video with just pictures

TechBriefly

© 2021 TechBriefly is a Linkmedya brand.

  • Tech
  • Business
  • Science
  • Geek
  • How to
  • About
  • Privacy
  • Terms
  • Contact
  • | Network Sites |
  • Digital Report
  • LeaderGamer

Follow Us

No Result
View All Result
  • Tech
  • Business
  • Crypto
  • Science
  • Geek
  • How to
  • About
    • About TechBriefly
    • Terms and Conditions
    • Privacy Policy
    • Contact Us
    • Languages
      • 中文 (Chinese)
      • Dansk
      • Deutsch
      • Español
      • English
      • Français
      • Nederlands
      • Italiano
      • 日本语 (Japanese)
      • 한국인 (Korean)
      • Norsk
      • Polski
      • Português
      • Pусский (Russian)
      • Suomalainen
      • Svenska