In an experiment conducted by the Australian government, artificial intelligence (AI) was compared to human intelligence in summarizing complex documents, with humans proving to be more effective in all aspects.
Carried out by the Australian corporate regulatory body, the Securities and Investments Commission (ASIC), the research sought to evaluate the efficiency of artificial intelligence in tasks that usually demand meticulous analysis and focus. Based on the results, AI might be causing more work instead of reducing it.
AI falls short of human reviewers
Earlier this year, Amazon tested various AI models from various manufacturers, assisting the Australian government in this experiment. It ultimately chose Meta’s Llama2-70B for the task. The AI was tasked with summarizing five submissions from a parliamentary inquiry, focusing on mentions of ASIC, recommendations, references to regulation, and other key details. Simultaneously, ten humans of varying seniority – ASIC staff – were given the same task. This group of reviewers, who were unaware of AI’s involvement, assessed the summaries for coherence, length, relevance, and accuracy of references to regulation.
Human participants decisively outperformed the AI, scoring an impressive 81% on the evaluation rubric, compared to 47% for the AI. Humans excelled at identifying citations to ASIC documents, a task known to challenge AI. Moreover, human summaries were praised for retaining emphasis, nuance, and context, while AI frequently missed these critical elements. Reviewers also noted that AI summaries sometimes contained irrelevant information or missed important details, making them less reliable.
The implications of AI’s performance in summarization
The results of the experiment indicate that current AI technology might not be as efficient as it is commonly believed to be in terms of saving time. On the other hand, AI’s habit of overlooking key specifics and making mistakes could result in additional tasks for humans, who would have to verify and edit AI-created material. Reviewers were worried that depending on AI for summarization might not be beneficial, as it frequently did not communicate the main points of the documents as well as human reviewers.
Although the report recognized certain constraints, such as the outdated AI model used, it ultimately concluded that human skills in interpreting and evaluating information are still superior to AI. The trial highlighted the importance of viewing AI as a tool to help, rather than supplant, human labor in tasks that demand a thorough grasp of context and subtlety.
Government transparency and the AI
The findings of the report did not come as a surprise to Greens Senator David Shoebridge, who headed the inquiry that resulted in its publication. He mentioned that although AI can assist in evaluating submissions, it should always be supervised by humans. The experiment underscores the idea that at present, AI is most beneficial when assisting human abilities rather than taking them over.
The trial also brings up more general concerns about how transparent AI is when used in government procedures. Senator Shoebridge emphasized the importance of government departments taking the initiative to disclose their AI usage, instead of waiting for the information to be uncovered during Senate committee hearings.
Featured image credit: Furkan Demirkaya / Midjourney