Hero image

Examinations "Turing Test"

AI and Exams Project

Peter Scarfe, Kelly Watcham, Alasdair Clarke, and Etienne Roesch. (2024). “A real-world test of artificial intelligence infiltration of a university examinations system: a “Turing Test” case study”. PLOS ONE.

The project has had extensive international news coverage in around 248 new outlets and feedback from educators shows that it is already helping shape educational policy.  

See examples of online coverage at the following links BBC NewsTimes, Times HES, Guardian, Independent, Telegraph, Daily Mail, New ScientistYahoo News, Phys.org, MSN, Frobes, The Scotsman, Ars Technica.

Also featured on radio and television: on the BBC Today program Radio 4, BBC News at One, BBC WorldTimes Radio, LBC, BBC World Service, BBC Radio 5 Live, Talk TV, Talk Radio, BBC Wales, BBC Foyle, Virgin Radio, BBC 6 MusicSt Louis Public Radio, WWJ-AM Detroit, and BBC Radio Berkshire, as well as many other radio stations worldwide.

Further online coverage can be found at: Scienmag, The Herald, Interesting Engineering, The Standard, Aol., MediumExpress & Star, Science DailyPress & Journal, FE NewsBucks Free Press, Oxford Mail, Glasgow Times, Swindon Advertiser, The Sunday Post, Norwich Evening News, News & Star, Basingstoke Gazette9News, Business Insider, Engineering and Technology, FirstPost, Free Press JournalPravda, NewsDrum, The Economic Times, zeenews, The Indian Express, Mahalsa, IZMU, whatsnew2day, SwiftTelecastPress Bee, NDTV, Buisness Standard, Eurasia Review, VerdictTech Times, Welwyn Times, wionLinkedIn, daily.ai, Times of India, Lancashire Post, Sussex World, The StarNational World, Sunderland Echo, The News, The Wakefield Press, Liverpool World, Edinburgh News, Fifteensquared, Furturism, Gigazine, Melton Times, MK Citizen, Funzilla and many others.

Press Request Details

University of Reading Press Office

Email: pressoffice@reading.ac.uk

Phone: (+44) 0118 378 5757

 

University Press Release can be found here

The picture in the banner of this webpage is a modified version on an image generated by DALL·E to the prompt: “Picture of a lecturer marking an exam and imagining if the exam had been handed in by an old fashion 1950s style robot”. Please feel free to use it for any press. It can be downloaded here.

 

Alan Turing, AI and Exams

In 1951 Alan Turing published a paper called “Computing Machinery and Intelligence”. In this he described the “Imitation Game” (which came to be known as the “Turing Test”). In this test, a human interrogator has a text-based conversation with a man and a woman in a different room. These are known to the interrogator only as X and Y. The interrogator is to determine which of X and Y is a man and which a woman. Turing asked what would happen if X or Y were replaced with a machine. Would the interrogator make the same number of errors as when both X and Y were humans?

Turing’s aim was to outline an operational definition of intelligence through considering the question "can machines think?". If the interrogator made equally many errors when either X or Y was a machine, versus both humans, could the machine be said to be capable of thought and reasoning?

Portrait of Alan Turing at the Royal Society, London

Today we are living amid the dramatic rise of artificial intelligence (AI) models. The current poster child of this revolution is ChatGPT, produced by OpenAI. ChatGPT is a large language model designed to understand inputs such as text and generate human like responses. A user can have extended conversations with GPT-4 probing its knowledge and understanding.

Whilst it is an open question as to whether ChatGPT exhibits intelligence, thought and reasoning. There is a growing worry in the higher educational sector that AI models such as ChatGPT could be used by students to cheat in exams and assessments. Many anecdotal reports have arisen of educators running assessment questions though AI and it is gaining excellent grades.

To rigorously investigate this, we took inspiration from Turing and designed a blind study to test AI’s ability to infiltrate a university examinations system. We asked whether 100% AI written exam submissions to an undergraduate BSc Psychology degree program could (1) be detected and (2) if not, what grades would they achieve?

We found that, not only were AI submissions near undetectable, they also robustly gained higher grades then real student submissions.

Our study opens a discourse about the academic integrity of assessments and highlights the need for the global educational sector to accept a “new normal” which acknowledges both the risks and opportunities afforded by artificial intelligence.

This research was funded by

Product image
Instagram image Instagram image Instagram image Instagram image Instagram image Instagram image