Elon Musk’s xAI Corp. has launched Colossus, a powerful AI training system. Musk announced this in a recent post on X, formerly known as Twitter. The new system, which uses 100,000 Nvidia graphics cards, is a big step forward for xAI and the AI community.
The Colossus system uses Nvidia’s H100 graphics cards, which have been the standard for AI processing since 2022. The System is one of the most advanced AI training systems ever built. Musk says the new system is the “most powerful AI training system in the world.” It could even surpass the fastest supercomputers, like the U.S. Energy Department’s Aurora.
This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days.
Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months.
Excellent…
— Elon Musk (@elonmusk) September 2, 2024
The driving force behind Colossus’ power
The Colossus system’s processing power comes from Nvidia‘s H100 chips. These chips are among the most powerful in the AI industry and are designed to train large language models. The H100’s Transformer Engine module is a set of circuits optimized for running AI models based on the Transformer neural network architecture. This architecture is used in many top AI models, including Meta’s Llama 3.1 405B and OpenAI’s GPT-4.
Musk’s xAI Corp. has received a $6 billion investment, raising its valuation to $24 billion. This funding is part of Musk’s effort to compete with OpenAI, a company he is suing for breach of contract. The Colossus system is faster than the U.S. Energy Department’s Aurora supercomputer. It uses Nvidia’s advanced H100 graphics cards. The newer H200 chips offer improvements, including a shift from HBM3 to HBM3e memory and an increase in onboard memory capacity to 141 gigabytes. Some of the chips powering Colossus were initially intended for Tesla.
Video of the inside of Cortex today, the giant new AI training supercluster being built at Tesla HQ in Austin to solve real-world AI pic.twitter.com/DwJVUWUrb5
— Elon Musk (@elonmusk) August 26, 2024
The system has more than 100,000 chips. Musk plans to double the system’s chip count to 200,000, with 50,000 being the newer, faster H200 processors. The H200 is an improved version of the H100. It has two new features that make it faster and more powerful. These upgrades make it easier for Colossus to handle complex AI models.
As xAI advances AI technology, the new system will be key to developing next-generation language models. The company’s main model, Grok-2, was trained using 15,000 GPUs. With the power of Colossus’ 100,000 chips, even more advanced models can be created. xAI plans to release a new model by the end of the year.
In addition to its groundbreaking AI developments, xAI’s use of Nvidia hardware highlights the growing demand for powerful AI processing capabilities across various industries. The fact that some of the chips used in Colossus were originally intended for Tesla further underscores the importance of this technology in Musk’s broader vision.
Featured image credit: Furkan Demirkaya / Dall-E