Reflection 70B does what the big boys can’t and learns from its own mistakes. This AI with open-source language models may have found a countermeasure against delusions.
Launched by HyperWrite, a startup led by co-founder and CEO Matt Shumer, Reflection 70B is based on Meta’s Llama 3.1-70B Instruct. What makes this model different from others is its self-correcting capability, a unique capability that has caught the attention of the AI community.
I'm excited to announce Reflection 70B, the world’s top open-source model.
Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.
405B coming next week – we expect it to be the best model in the world.
Built w/ @GlaiveAI.
Read on ⬇️: pic.twitter.com/kZPW1plJuo
— Matt Shumer (@mattshumer_) September 5, 2024
The standout feature of Reflection 70B
Shumer announced the arrival of Reflection 70B at X and emphasized its superiority over other models in the open-source market. He confidently stated that AI is currently the best open-source AI model, surpassing its predecessors, including Meta’s Llama series. Benchmarks such as MMLU and HumanEval also confirmed Reflection 70B’s high performance, showing better results than both open-source and commercial alternatives.
The reason Reflection 70B achieves these results, and a feature that its competitors lack, is its ability to detect and correct errors. This is a groundbreaking feature in AI because while language models often “hallucinate” or produce false information, Shumer’s AI can recognize errors before providing a final answer. Shumer has been thinking about this concept for months, and with this new model, it is now a reality.
Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o).
It’s the top LLM in (at least) MMLU, MATH, IFEval, GSM8K.
Beats GPT-4o on every benchmark tested.
It clobbers Llama 3.1 405B. It’s not even close. pic.twitter.com/win7cHUOob
— Matt Shumer (@mattshumer_) September 5, 2024
Reflection 70B’s name represents its introspection capabilities as it “reflects” its reasoning to check model accuracy. The design includes special icons that enhance reasoning and error correction, allowing users to interact with the model more effectively. These reasoning processes can be seen in the form of special labels that allow for real-time corrections.
To demonstrate its effectiveness, users can interact with Reflection 70B on a demo website, but there are difficulties accessing the site due to heavy traffic. Some tasks, such as determining which number is larger between 9.11 and 9.9, are designed to challenge the model’s precision. While many AI systems struggle with such queries, Reflection 70B managed to get them right, despite taking more than 60 seconds. Despite the delay, this level of reasoning makes the model stand out for use cases where precision is critical.
Reflection 70B is just the beginning
Reflection 70B is the first model in an expanding series. An even larger model, the Reflection 405B, will be released soon. Shumer’s ambitious vision for Reflection models includes surpassing the capabilities of closed-source models such as OpenAI’s GPT-4. The Reflection 405B will reportedly outperform the best proprietary models and push the boundaries of what open-source AI can achieve.
But Reflection 70B is not just a standalone project. Shumer also announced plans to integrate the model into HyperWrite‘s writing assistant platform. HyperWrite, a Chrome extension that helps users draft emails, summarize text, and more, already has millions of users. Reflection 70B’s upcoming integration into this platform will offer users a new level of accuracy and customization, enabling even more advanced AI-powered typing.
A fast and efficient training process
Training Reflection 70B wasn’t a long, drawn-out process, thanks to a partnership with Glaive, a startup specializing in AI datasets. Glaive’s platform creates use-case-specific datasets, making it easier and faster to train language models. In the case of Reflection 70B, this approach led to the model being trained five times in just three weeks, a feat made possible through the use of Glaive’s synthetic data generation systems.
I want to be very clear — @GlaiveAI is the reason this worked so well.
The control they give you to generate synthetic data is insane.
I will be using them for nearly every model I build moving forward, and you should too. https://t.co/I789UIa5Yg
— Matt Shumer (@mattshumer_) September 5, 2024
Founded by Sahil Chaudhary, the company aims to provide high-quality datasets that can train models quickly and cost-effectively. Their success in training smaller models has been demonstrated in the past. A 3D parameterized model outperformed many large open-source competitors on certain tasks.
Reflection 70B in action
The development of the model shows how important this is for HyperWrite, which was founded in 2020 as Otherside AI. Originally based in Long Island, New York, HyperWrite has grown from a small AI writing assistant to a platform with more than two million users. Its early success earned Shumer and co-founder Jason Kuperberg a spot on Forbes’ “30 Under 30” list in 2023.
In March 2023, HyperWrite raised $2.8 million in funding from investors like Madrona Venture Group, which helped the company grow. New features like browser assistants that can perform tasks like booking flights or finding job candidates on LinkedIn have kept HyperWrite on the cutting edge of AI-powered personal assistance. The integration of Reflection 70B is expected to enhance these features and further solidify HyperWrite’s position in the AI industry.
Reflection 70B will change how we think about AI. It combines precision and reasoning like no other model does. The model is useful for tasks requiring high accuracy, but its potential applications are much broader. The upcoming release of Reflection 405B shows that Shumer and his team are still working hard. As HyperWrite makes its platform better and creates new features, the Reflection series will probably be very important. With Reflection 70B, it will be interesting to see how the AI community and users respond to the model’s unique capabilities and how future models will build upon its foundation.
Featured image credit: DC Studio / Freepik