AI and Exams

Peter Scarfe, Kelly Watcham, Alasdair Clarke, and Etienne Roesch. (2024). “A real-world test of artificial intelligence infiltration of a university examinations system: a “Turing Test” case study”. PLOS ONE.

The project has had extensive international news coverage in around 248 new outlets and feedback from educators shows that it is already helping shape educational policy.

See examples of online coverage at the following links BBC News, Times, Times HES, Guardian, Independent, Telegraph, Daily Mail, New Scientist, Yahoo News, Phys.org, MSN, Frobes, The Scotsman, Ars Technica.

Also featured on radio and television: on the BBC Today program Radio 4, BBC News at One, BBC World, Times Radio, LBC, BBC World Service, BBC Radio 5 Live, Talk TV, Talk Radio, BBC Wales, BBC Foyle, Virgin Radio, BBC 6 Music, St Louis Public Radio, WWJ-AM Detroit, and BBC Radio Berkshire, as well as many other radio stations worldwide.

Further online coverage can be found at: Scienmag, IBM, The Herald, Interesting Engineering, The Standard, Aol., Medium, Express & Star, Science Daily, Press & Journal, FE News, Bucks Free Press, Oxford Mail, Glasgow Times, Swindon Advertiser, The Sunday Post, Norwich Evening News, News & Star, Basingstoke Gazette, 9News, Business Insider, Engineering and Technology, FirstPost, Free Press Journal, Pravda, NewsDrum, The Economic Times, zeenews, The Indian Express, Mahalsa, IZMU, whatsnew2day, SwiftTelecast, Press Bee, NDTV, Buisness Standard, Eurasia Review, Verdict, Tech Times, Welwyn Times, wion, LinkedIn, daily.ai, Times of India, Lancashire Post, Sussex World, The Star, National World, Sunderland Echo, The News, The Wakefield Press, Liverpool World, Edinburgh News, Fifteensquared, Furturism, Gigazine, Melton Times, MK Citizen, Funzilla and many others.

Press Request Details

University of Reading Press Office

Email: pressoffice@reading.ac.uk

Phone: (+44) 0118 378 5757

University Press Release can be found here.

The picture in the banner of this webpage is a modified version on an image generated by DALL·E to the prompt: “Picture of a lecturer marking an exam and imagining if the exam had been handed in by an old fashion 1950s style robot”. Please feel free to use it for any press. It can be downloaded here.

In 1951 Alan Turing published a paper called “Computing Machinery and Intelligence”. In this he described the “Imitation Game” (which came to be known as the “Turing Test”). In this test, a human interrogator has a text-based conversation with a man and a woman in a different room. These are known to the interrogator only as X and Y. The interrogator is to determine which of X and Y is a man and which a woman. Turing asked what would happen if X or Y were replaced with a machine. Would the interrogator make the same number of errors as when both X and Y were humans?

Turing’s aim was to outline an operational definition of intelligence through considering the question "can machines think?". If the interrogator made equally many errors when either X or Y was a machine, versus both humans, could the machine be said to be capable of thought and reasoning?

Today we are living amid the dramatic rise of artificial intelligence (AI) models. The current poster child of this revolution is ChatGPT, produced by OpenAI. ChatGPT is a large language model designed to understand inputs such as text and generate human like responses. A user can have extended conversations with GPT-4 probing its knowledge and understanding.

Whilst it is an open question as to whether ChatGPT exhibits intelligence, thought and reasoning. There is a growing worry in the higher educational sector that AI models such as ChatGPT could be used by students to cheat in exams and assessments. Many anecdotal reports have arisen of educators running assessment questions though AI and it is gaining excellent grades.

To rigorously investigate this, we took inspiration from Turing and designed a blind study to test AI’s ability to infiltrate a university examinations system. We asked whether 100% AI written exam submissions to an undergraduate BSc Psychology degree program could (1) be detected and (2) if not, what grades would they achieve?

We found that, not only were AI submissions near undetectable, they also robustly gained higher grades then real student submissions.

Our study opens a discourse about the academic integrity of assessments and highlights the need for the global educational sector to accept a “new normal” which acknowledges both the risks and opportunities afforded by artificial intelligence.

Examinations "Turing Test"

AI and Exams Project

Press Request Details

Alan Turing, AI and Exams