X’s engineering team released the code for its “for you” recommendation algorithm last month. Elon Musk described the publication as a victory for transparency, stating, “We know the algorithm is dumb and needs massive improvements, but at least you can see us struggle to make it better in real-time and with transparency.” Musk added, “No other social media companies do this.”

X is the only major social network to open source elements of its recommendation algorithm. However, researchers argue that the published code provides limited transparency for understanding the platform’s operations in 2024. The code resembles a redacted version released in 2023, according to John Thickstun, an assistant professor of computer science at Cornell University.

Thickstun told Engadget, “What troubles me about these releases is that they give you a pretense that they’re being transparent for releasing code and the sense that someone might be able to use this release to do some kind of auditing work or oversight work. And the fact is that that’s not really possible at all.”

Following the release, X users shared extensive threads interpreting the code to advise creators on boosting visibility. One post, viewed more than 350,000 times, stated that X “will reward people who conversate” and “raise the vibrations of X.” Another post, with over 20,000 views, claimed that posting video is key. A third post recommended sticking to a “niche” because “topic switching hurts your reach.”

Thickstun cautioned against deriving strategies for virality from the code. “They can’t possibly draw those conclusions from what was released,” he said. The code reveals minor operational details, such as filtering out content older than one day. Thickstun described much of the information as “not actionable” for content creators.

A significant structural change separates the current algorithm from the 2023 version. The new system uses a Grok-like large language model to rank posts. Ruggero Lazzaroni, a Ph.D. researcher at the University of Graz, explained the difference: “In the previous version, this was hard coded: you took how many times something was liked, how many times something was shared, how many times something was replied … and then based on that you calculate a score, and then you rank the post based on the score.”

“Now the score is derived not by the real amounts of likes and shares, but by how likely Grok thinks that you would like and share a post,” Lazzaroni continued. This shift increases opacity, according to Thickstun. “So much more of the decision-making … is happening within black-box neural networks that they’re training on their data,” he said. “More and more of the decision-making power of these algorithms is shifting not just out of public view, but actually really out of view or understanding of even the internal engineers that are working on these systems, because they’re being shifted into these neural networks.”

The new release omits details previously disclosed in 2023 about weighting interactions for ranking. In 2023, X specified that a reply equaled 27 retweets, and a reply generating a response from the original author equaled 75 retweets. X redacted these weightings in the latest code, citing “security reasons.”

The code provides no information on the training data for the model. Mohsen Foroughifar, an assistant professor of business technologies at Carnegie Mellon University, emphasized this gap: “One of the things I would really want to see is, what is the training data that they’re using for this model. If the data that is used for training this model is inherently biased, then the model might actually end up still being biased, regardless of what kind of things that you consider within the model.”

Lazzaroni, who works on an EU-funded project simulating social media platforms to test recommendation approaches, noted that the code lacks the model itself. “We have the code to run the algorithm, but we don’t have the model that you need to run the algorithm,” he said. This prevents researchers from reproducing X’s algorithm.

Studying the algorithm holds value beyond social media. Thickstun observed that challenges with social media recommendations mirror issues in AI chatbots. “A lot of these challenges that we’re seeing on social media platforms and the recommendation [systems] appear in a very similar way with these generative systems as well,” he said. “So you can kind of extrapolate forward the kinds of challenges that we’ve seen with social media platforms to the kind of challenges that we’ll see with interaction with GenAI platforms.”

Lazzaroni, who simulates toxic behaviors on social media, criticized priorities in AI development. “AI companies, to maximize profit, optimize the large language models for user engagement and not for telling the truth or caring about the mental health of the users,” he said. “And this is the same exact problem: they make more profit, but the users get a worse society, or they get worse mental health out of it.”

The release occurred last month, with the 2023 version serving as a prior benchmark. X’s approach contrasts with other platforms, as Musk noted, though researchers question its utility for oversight or auditing. User interpretations proliferated immediately, despite expert warnings on the code’s limitations. The transition to a neural network-based ranking replaces explicit interaction counts with model predictions, further obscuring processes. Redactions cover both weightings and training details, limiting external analysis.

Thickstun highlighted the shift’s implications for internal and external comprehension alike. Foroughifar’s focus on training data underscores bias risks. Lazzaroni’s reproduction challenge blocks simulation-based research. These elements collectively diminish the release’s transparency claims, per the researchers.


Featured image credit