In response to the popularity of Google Bard, Microsoft’s Bing Chat, and OpenAI’s ChatGPT, researchers have developed a new AI model with a somewhat more sinister twist: DarkBERT.
DarkBERT was trained purely using data from the dark web, as opposed to the large language models (LLMs) that power ChatGPT and Google Bard, which were trained using data from the open web. Yes, you read it right: data from hackers, fraudsters, and other con artists were used to train this new AI model.
Using information from the Tor network, which is often used to access the dark web, a group of South Korean academics created DarkBERT and published a paper describing their method. They were able to build a database on the dark web by crawling it, filtering the raw data, and then training DarkBERT on it.
Unexpectedly, DarkBERT has already outperformed other substantial language models despite having been trained on data from a very improbable source.
DarkBERT: The dark side of language models
While DarkBERT is a new AI model, it is based on the RoBERTa architecture, an AI strategy established back in 2019 by researchers at Facebook.
It is described as a “robustly optimized method for pretraining natural language processing (NLP) systems” in a research paper by Meta AI that builds on BERT (Bidirectional Encoder Representations from Transformers), which was released by Google back in 2018. BERT’s effectiveness in a replication trial was enhanced as a result of the search engine giant making it open source, according to Facebook researchers.
Facebook published RoBERTa, which achieved cutting-edge scores on the General Language Understanding Evaluation (GLUE) NLP benchmark, as a consequence of its improved methodology.
RoBERTa was first released with insufficient training, but now South Korean academics working on DarkBERT have shown that it is capable of far more. The researchers were able to build DarkBERT by feeding RoBERTa data from the dark web over almost 16 days over two data sets (one raw and the other preprocessed).
Fortunately, the researchers don’t intend to make DarkBERT available to the general public. However, Dexerto (opens in new tab) states that they do accept requests for scholarly reasons. But DarkBERT will probably provide investigators and law enforcement a far greater grasp of the whole dark web.
Tips for utilizing AI chatbots safely
You should use AI chatbots with caution, just as you would with any other program or online service since they might infect you with malware via fraudulent ChatGPT applications or even leak important information, as recently happened to Samsung workers.
- Utilize AI chatbots from trusted and official sources to ensure you are on the correct website.
- Official versions of popular AI chatbots like ChatGPT, Bing Chat, and Google Bard are not yet available to the public.
- Avoid clicking on links in suspicious emails that direct you to AI chatbots or promise immediate access.
- Scammers are taking advantage of the AI chatbot trend, so be cautious of phishing attempts.
- Be wary of ads promoting AI chatbots, as scammers often use them to direct unsuspecting users to phishing websites.
- Install reliable antivirus software on your PC, Mac, and smartphone to enhance security while interacting with AI chatbots.
- A specialized AI model called DarkBERT is being considered a prototype for future AI models trained in specific domains.
- Similar AI models may be developed in the future to cater to specific areas of expertise.
Stay informed about the advancements and potential risks associated with AI chatbots to protect yourself online.
You can read our article OpenAI’s response to the backlash: Safety measures and collaboration with policymakers.