Research conducted by psychologists from UCLA has shown that the artificial intelligence language model GPT-3 can perform analogical reasoning tasks almost as well as college undergraduates, indicating a remarkable advancement in AI capabilities. Analogical reasoning, the process of solving new problems by drawing connections from familiar ones, has long been considered a uniquely human skill.
The study, published in Nature Human Behaviour, examined GPT-3’s performance in solving reasoning problems that typically appear in intelligence tests and standardized exams like the SAT. The results revealed that GPT-3’s performance closely resembled that of college students. However, the researchers raised a critical question: Is GPT-3 mimicking human reasoning due to its extensive language training or is it employing a new form of cognitive processing?
The UCLA scientists noted that without access to GPT-3’s internal workings, which are kept confidential by OpenAI, the company that developed the AI, it is challenging to determine precisely how its reasoning abilities function. They also pointed out that while GPT-3 excels at some reasoning tasks beyond their expectations, it still fails remarkably at other challenges.
Taylor Webb, a postdoctoral researcher in psychology at UCLA and the study’s lead author, emphasized that despite GPT-3’s impressive results, the system has significant limitations. For instance, GPT-3 can perform analogical reasoning but struggles with simple tasks that people, including children, can effortlessly accomplish, such as using tools to solve physical tasks.
The research tested GPT-3’s capacity to solve problems based on Raven’s Progressive Matrices, a test where subjects are asked to predict the next image in a complex arrangement of shapes. To enable GPT-3 to “see” the shapes, the researchers converted the images into a text format that the AI could process, ensuring that it had never encountered the questions before. The results indicated that GPT-3 correctly solved 80% of the problems, surpassing the human subjects’ average score of nearly 60%.
The researchers also tested GPT-3 on SAT analogy questions that were unlikely to have been part of its training data. GPT-3 outperformed the average scores of human college applicants in solving these questions, which ask users to identify word pairs with similar relationships.
However, GPT-3 struggled with analogy problems based on short stories, lagging behind students’ performance in this aspect. The researchers observed that GPT-4, the latest version of OpenAI’s technology, improved upon GPT-3 in this domain.
The UCLA psychologists acknowledged that GPT-3 has not yet demonstrated an understanding of physical space. For instance, it provided bizarre solutions when presented with descriptions of tools to move gumballs from one bowl to another. This highlights the limitations of language learning models, which were initially designed for word prediction, not reasoning tasks.
The researchers are eager to explore whether language learning models like GPT-3 are genuinely evolving to “think” like humans or if they are merely mimicking human thought through a different mechanism. To achieve this understanding, access to the software and training data used by AI models would be necessary. Determining the underlying cognitive processes could pave the way for significant advancements in AI research and development. However, the researchers are aware that gaining such access remains a challenge.
In conclusion, GPT-3’s remarkable performance in analogical reasoning tasks raises new questions about the nature of AI’s cognitive abilities and its potential to develop genuinely human-like thinking. The researchers hope that further exploration will shed light on AI’s capabilities and push the boundaries of what AI can achieve.
Story Source:
Materials provided by University of California – Los Angeles. Original written by Holly Ober. Note: Content may be edited for style and length.
Journal Reference:
- Webb, T., Holyoak, K.J. & Lu, H. Emergent analogical reasoning in large language models. Nat Hum Behav, 2023 DOI: 10.1038/s41562-023-01659-w