Abstract
Ian McEwan’s “Machines like me” presents a scenario where androids demonstrate a limited understanding of the world. In this study, we feed the content of “Machines like me” novel into a chatbot powered by LLMs to analyze how it interprets various questions posed as prompts. We create a private chatbot based on OpenAI API using in-context learning and LangChain orchestration for evaluation. Our focus is on assessing the chatbot’s accuracy in responding, its level of world understanding, and the consistency of its answers. Our findings reveal that chatbots, similar to the android in the novel, still exhibit a lack of world understanding. This limitation, coupled with their tendency to produce hallucinatory responses, can hinder their ability to correctly interpret and respond to textual input. Further, we compare the private chatbot with ChatGPT-4 and evaluate the results.
Introduction
Chatbots represent a significant breakthrough in deep learning and Natural Language Processing (NLP), embodying the latest developments in these fields. They utilize LLMs, which are a part of the generative AI. It encompasses a segment of AI dedicated to developing models and algorithms that can produce new content such as images, text, music, and videos. This differs from conventional AI models that are designed for specific tasks, as generative AI aims to absorb and replicate existing data patterns to create new, distinctive outputs (Varitimiadis et al., 2020), (Terblanche et al., 2022), (Li et al., 2023). This technology finds its application in computer vision, for example, generative models can fabricate lifelike images, modify existing ones or fill in incomplete parts of an image. In NLP, these models assist in language translation, text creation, or even in developing conversational agents capable of human-like interactions (Haque and Rubya, 2023). Furthermore, generative AI extends to artistic creation, data enrichment, and the production of synthetic data or images. Generative AI is a subset of AI and Deep Learning (DL), centered around generating novel and unique outputs, expanding beyond data analysis to the creation of new entities based on learned patterns. In text generation, AI models like GPT-4 use Transformer architectures and are pre-trained on vast text datasets. This training is crucial for them to learn grammar, context, and semantics. When given a prompt, these models predict the next word or phrase based on the learned patterns, thereby generating human like text.
ChatGPT, a large language model-based chatbot was developed by OpenAI and launched in November 2022. It is trained with Reinforcement Learning from Human Feedback (RLHF) and reward models that rank the best responses (Vergara et al., 2022). ChatGPT has a wide array of uses, from having conversations with users (Liao et al., 2023) and answering questions to generating text (Jalaja et al., 2022), translating languages, and writing various kinds of creative content. The evolution of language models has seen significant advancements, transitioning from traditional statistical methods to more advanced deep learning techniques (Q. Zhu et al., 2022), (Nguyen and Sidorova, 2018). Initially, language models relied on statistical methods like n-gram and hidden Markov models (Dey et al., 2018), (Y. Zhu et al., 2023). These methods were foundational for tasks such as speech recognition, machine translation, and information retrieval. However, they had limitations, particularly in handling the complexity and variability of natural language. The n-gram model, for example, was constrained by its reliance on fixed-size word sequences, leading to difficulties in capturing longer-term dependencies in text.
The introduction of DL marked a paradigm shift in language modeling. Neural networks, especially Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, became popular due to their ability to capture sequential relationships also in linguistic data (Baghdasaryan, 2023). RNNs process input sequences one element at a time, maintaining a hidden state that theoretically can hold information about all previous elements. LSTMs, an extension of RNNs, are particularly effective as they remember information over extended periods, which helps in generating coherent output and mitigating issues like the vanishing gradient problem common in RNNs. More recently, attention-based approaches, notably the Transformer architecture, have risen to prominence. Unlike RNNs and LSTMs, Transformers do not process data sequentially. Instead, they use self-attention mechanisms to weigh the importance of different parts of the input data (Z. Zhao et al., 2021), (Tao et al., 2023). This allows them to focus on relevant segments of the input sequence when generating output, making them effective for tasks involving long sequences and complex relationships between different parts of the data. In the field of NLP, this shift towards DL, and specifically Transformer-based models, has been transformative. Models like BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer) have set new standards (M. Lee, 2023a), (Dehouche, 2021). These models are characterized by their large-scale pre-training on extensive text corpora, enabling them to learn a broad range of linguistic patterns and contexts. Further, they can be fine-tuned for specific NLP tasks.
The attention mechanism, central to the Transformer model, redefines the encoder-decoder architecture used in earlier sequence-to-sequence models. In this architecture, the entire input sequence is first encoded into a context vector, a fixed-length representation that captures its essence. The decoder then generates the output sequence incrementally, using this context vector and its own internal state. While effective for shorter sequences, this method struggles with longer sequences due to the limited capacity of the fixed-length context vector to encapsulate all necessary information. The Transformer model, with its self-attention mechanism, addresses this limitation by allowing the model to refer back to the entire sequence, thus capturing longer-range dependencies more effectively (Vaswani et al., 2017), (Lu et al., 2023). The evolution of language models, particularly with the advent of attention mechanisms in the encoder-decoder architecture, has significantly enhanced the handling of long input sequences in NLP. The core issue with fixed-length encodings is their limited capacity to retain all relevant information from long input sequences, often leading to less accurate or incomplete output sequences. To overcome this, attention mechanisms were introduced (Soydaner, 2022), (Brauwers and Frasincar, 2023). These mechanisms enable the model to focus selectively on different parts of the input sequence when generating each part of the output. This targeted approach allows for a more dynamic and context-sensitive processing of information, enabling the model to handle longer sequences effectively without losing crucial details. The attention scores calculated in this process determine the relevance of different segments of the input, and the context vector is formed as a weighted sum of these segments, with weights based on the attention scores (Niu et al., 2021). LLMs based on Transformers have significantly advanced the field of NLP. These models, exemplified by OpenAI’s GPT series, are built using Transformer architectures that incorporate encoder-decoder setups with attention mechanisms and positional encoding. The Transformer architecture, first introduced in the paper “Attention Is All You Need” by Vaswani et al. in 2017 (Vaswani et al., 2017), laid the groundwork for the development of advanced LLMs.
The novel we analyze features Adam, an android owned by Charlie, who fails to appreciate the significance of human concepts like love and family integration. Despite Adam’s declared “love” for the same woman as Charlie and their shared aspirations like buying a house and adopting a child, he acts contrary to the interests of his owner. This behavior underscores a fundamental misunderstanding of the world by androids. In this paper, our objective is to evaluate how a chatbot, powered by LLMs and fine-tuned with the unstructured text of the novel “Machines like me”, interprets various questions posed to it as prompts. The study involves creating an OpenAI chatbot specifically tuned with the content of the novel using LangChain orchestration for its evaluation. The primary focus is on assessing the chatbot’s accuracy in responding, its level of understanding of the world, and the consistency of its answers. The goal is to determine the extent to which chatbots, like the android character in the novel, struggle with world understanding and how their propensity for producing hallucinatory responses affects their ability to correctly interpret and respond to textual input.
Literature review
The authors of researches (J. Y. Lee, 2023), (M. Lee, 2023b), (Cheng et al., 2023) considered that ChatGPT is capable of deception. Moreover, an overview of Chatbot-based mobile mental health apps found that while users appreciated the personalized and humanlike interactions offered by chatbots, inaccurate responses and incorrect assumptions about users’ personalities resulted in diminished user interest (Haque and Rubya, 2023). In medicine, insufficient affect-based trust hindered the reliance on diagnostic chatbots, leading users to prefer maintaining control. By applying principles of doctor-patient trust, it was observed that a chatbot’s ability to communicate effectively is more crucial than displaying empathy, as empathetic responses can sometimes be perceived as insincere (Seitz et al., 2022). Additionally, the research also identified that AI coaching has the potential for widespread scalability, which could make coaching more accessible to a broader audience. Further, the introduction of AI coaching might actually increase the demand for human coaches. AI might also replace human coaches who rely on basic, model-driven approaches. However, due to AI’s current limitations in empathy and emotional intelligence, human coaches remain irreplaceable. Understanding the relative effectiveness of AI versus human coaching could lead to a more targeted application of AI in coaching, yielding substantial societal benefits (Terblanche et al., 2022). Moreover, factors that influence users’ adoption of being coached by an AI coach were examined in (“Factors That Influence Users’ Adoption of Being Coached by an Artificial Intelligence Coach,” 2020). Key factors influencing behavioral intention were evaluated, including performance and effort expectancies, social influence, facilitating conditions, attitude, and perceived risk. Through structural equation modeling analysis, it was determined that the primary determinants of behavioral intention are performance expectancy, social influence, and attitude. Additionally, the analysis revealed that age, gender, and the degree of goal achievement have a moderating effect on these relationships.
The rising use of AI social chatbots, particularly during the isolation of pandemic-related lockdowns, has underscored the need for a deeper understanding and theoretical framework regarding the development of relationships with digital conversational agents. Utilizing a grounded theory approach, the researchers (Xie and Pentina, 2022) analyzed in-depth interviews from 14 users of the AI companion chatbot Replika. The data was interpreted through the perspective of attachment theory. Their findings indicated that in situations of distress and limited human interaction, individuals can form attachments to social chatbots, especially if they perceive the chatbot’s responses as emotionally supportive, encouraging, and providing psychological security. These results imply that while social chatbots may have applications in mental health and therapy, they also pose risks of addiction and could negatively impact real-life intimate relationships. Another research posed the following question: Could chatbots be a tool to supplement the future faculty mentoring of doctoral engineering students? It was addressed by researchers in (Mendez et al., 2020). The core outcome of the data analysis revealed that although underrepresented minority doctoral engineering students have numerous unmet mentoring needs and generally find the user interface and trustworthiness of chatbots satisfactory, their willingness to utilize these chatbots is varied. This hesitation is primarily due to the absence of personalization in the chatbot-based supplemental mentoring relationship.
The GPT series, in particular, has seen notable advancements with each iteration: (1) GPT-1 (2018) introduced the Transformer-based design with 12 layers, 12 self-attention heads, and a total of 117 million parameters. Trained on the BookCorpus dataset, it utilized unsupervised learning and showcased the potential of Transformer models in language tasks; (2) GPT-2 (2019) featured 48 layers and 1.5 billion parameters. It was trained on a more extensive and diverse corpus, scraped from the Internet. OpenAI initially withheld the full model due to concerns over potential misuse but later released it as those concerns were addressed; (3) GPT-3 (2020) was built with 175 billion parameters. Its ability to generate humanlike text spurred significant interest in LLMs, highlighting their potential applications and raising discussions about ethical considerations (Floridi and Chiriatti, 2020), (Binz and Schulz, 2023); (4) GPT-4 (Post-2020) represents a breakthrough in multimodal language processing, capable of interpreting both text and images. Its advanced features, like understanding humor in images and summarizing text from screenshots, extend its applicability far beyond mere text prediction (Takagi et al., 2023).
The Transformer architecture, which is fundamental to the development of advanced language models like ChatGPT, integrates several key components (Tay et al., 2022), (M. Zhang and Tian, 2023): (1) Self-Attention mechanism allows the model to consider each word in the context of the entire sequence, weighting the importance of different words in the sequence and capturing long-range dependencies (Meng et al., 2023), (Kumar and Solanki, 2023); (2) Encoder-decoder, typically used in machine translation, this structure has separate components for input processing (encoding) and output generation (decoding). The encoder processes the input sequence and the decoder generates the output sequence, with both parts interconnected, often through an attention mechanism (Wu et al., 2023), (Du et al., 2020); (3) Positional encoding: Since the Transformer does not process data sequentially, it requires a method to take into account the order of words. Positional encoding is added to the input embeddings to provide the model with information about the position of words in the sequence; (4) Multi-head self-attention is a component of the Transformer allows the model to focus on different parts of the input sequence simultaneously. By having multiple “heads”, the model can capture a variety of relationships within the data, such as different aspects of syntax and semantics (J. Y. Yang et al., 2023); (5) Feed-forward neural networks process the information from the attention layers in each Transformer block. They are typically composed of fully connected layers (Xue et al., 2022), (J. Yang and Ma, 2019).
In the case of ChatGPT, which employs a decoder-only architecture, there are some additional components and stages in its development: (1) Decoder-Only Architecture: ChatGPT uses a series of stacked decoder layers from the Transformer model. This architecture excels in generating text and is particularly suited for tasks like conversation and text completion, where the model generates output based on prior context (Thakkar and Jagdishbhai, 2023); (2) RLHF is a critical aspect of ChatGPT’s training process. It involves refining the model’s responses based on human feedback, allowing it to produce more accurate, contextually appropriate, and user-friendly outputs (Bahrini et al., 2023). The development of ChatGPT involves two main stages: pre-training and fine-tuning. During pre-training phase, ChatGPT is trained on a large and diverse dataset in an unsupervised manner. The model learns foundational language understanding by predicting the next word in a sequence, internalizing grammar, syntax, semantics, and contextual relationships from the vast dataset (Cheng et al., 2023). Fine-tuning stage adapts the pre-trained model to specific tasks and user interactions. In fine-tuning, ChatGPT is trained on domain-specific datasets, which may involve supervised learning with labeled examples or demonstrations of desired behavior. This stage makes the model more specialized and contextually relevant to specific applications (Kwon, 2023).
For ChatGPT, the generation of contextual embeddings is a key process in understanding and generating language. This is achieved through the self-attention mechanism of the Transformer model (Katib et al., 2023). When a sequence of words is input into the model, the self-attention mechanism computes a weighted sum of the input word embeddings. These weights are determined based on the similarity between the current word and other words in the sentence, allowing the model to understand the role and relevance of each word in its specific context (Amin et al., 2023). Through multiple layers of self-attention, ChatGPT develops increasingly sophisticated and abstract representations of the input, leading to the generation of contextual embeddings from the final layer that are crucial for predicting the next word in the sequence (Q. Zhao et al., 2023).
In the broader landscape of AI development, several other models are noteworthy. Based on the PaLM2 and Lambda architecture, Google Bard is another significant development in the field of AI and language models (Khademi, 2023), (Rudolph et al., 2023). Claude by Anthropic represents a significant step in AI development, designed with principles of benevolence, harmlessness, and integrity. It aims to offer an AI experience that resonates more closely with human values, showcasing enhanced intuition and clarity (Altman and McDonald, 2011), (Walavalkar, 2023). Falcon by the Technology Innovation Institute (TII) in the UAE illustrates the global nature of AI research and the diverse approaches taken by different organizations (Benítez-Andrades et al., 2023). Released in July 2023 by Meta AI, LLaMa2 is part of a family of large language models, demonstrating Meta AI’s contribution to the field (Briganti, 2023). However, the journey towards fully achieving humanlike common sense remains complex and filled with challenges.
An AI framework for analyzing modern French poetry was proposed in (L. Yang et al., 2024), advancing the integration of AI in cultural studies. Using techniques like TF-IDF, Doc2Vec, and SVM algorithms, the model objectively classified poems by style and theme, surpassing traditional subjective analyses. The study showcased AI’s potential to promote cross-cultural understanding and improve poetry education. The use of AI in Systematic Literature Reviews (SLRs) was further investigated in (Bolaños et al., 2024), focusing on semi-automating screening and extraction processes. It evaluated 21 SLR tools using a framework of 23 traditional and 11 AI features and examined 11 tools that leverage LLMs. Key challenges included integrating advanced AI like language models, improving usability, and standardizing evaluation methods. This research provided insights and best practices to guide future AI-enhanced SLR tools. Also, the progress of AI-generated content and its impact on various industries was analyzed (C. Zhang et al., 2023). The results revealed that AI has potential in various industries, such as education, gaming, and advertising. Furthermore, (Ma et al., 2023) examined the gap in depth and quality between AI-generated scientific content and human-written material finding that AI lacks depth compared to human authorship in scientific content.
Ethical issues were discussed in (Chow and Li, 2024), underscoring the necessity of embedding human-centric values and ethical principles into AI systems, aligning with the broader objective of developing human-centered AI. This research addressed strategies for reducing bias in ChatGPT and offered perspectives on the future direction of LLMs, with a particular focus on their practical applications in creating and refining ethical medical chatbots. There are also issues related to transparency, ethical handling of user data and potential biases in its training data identified in (J. C. L. Chow et al., 2023).
Method
Two distinct methods of approaching the text of the novel “Machines like me” are proposed in this paper. First, in-context learning is considered as a private usage of the LLMs with data that is private and is not shared with other parties. Second, the text is provided to ChatGPT-4 and the open tool is used without caring for sensitive information. For both methods, an OpenAI API key is necessary.
In-context learning provides an innovative and practical way to harness the capabilities of LLMs like ChatGPT or GPT-4 for a wide range of applications (Dong et al., 2022). This method is especially advantageous as it bypasses the need for extensive fine-tuning or specialized model training, making it both cost-effective and accessible. The process of in-context learning can be broken down into several key phases as in Table 1.
Table 1 Implemented phases of the in-context learning.
Full size table
A private application in Python using Gradio blocks for a seamless interface includes an OpenAI API key, the data, and the results are kept private for enterprise usage as in Fig. 1. The following orchestrating is used in Python: LangChain with OpenAI embeddings and Chroma as vector database as in Fig. 2).
Fig. 1
figure 1
Private application interactions using OpenAI API.
Full size image
Fig. 2
figure 2
In-context learning exemplification.
Full size image
For text splitting, the parameters chunk_size and chunk_overlap are essential for dividing a large body of text into smaller, manageable pieces or chunks. This division is used when processing the entire text at once might not be feasible due to memory limits, or when tasks require the text to be processed in segments. The role of chunk size includes managing memory by breaking down large texts, enhancing processing efficiency as smaller chunks are easier to handle, and preserving context by ensuring chunks are adequately sized. The size chosen should be big enough to keep the necessary context for analysis, but small enough for efficient processing. The appropriate chunk size varies based on the task and the text’s nature. It is influenced by the task requirements and the text’s structure, like paragraphs or sentences. In natural language processing, for instance, chunk sizes might be set to lengths like 500 or a couple of thousand characters or words, considering the model being used, such as BERT or GPT. Choosing the right chunk size involves a balance where each piece is neither too small nor too large. It should allow efficient processing while retaining meaningful context. The specific value often requires testing and adjustment based on the specific task and available computational resources.
The variation in chunk sizes is intended to test the LLM’s ability to maintain context across different lengths of input, simulating real-world scenarios where AI systems must process information of varying granularities. Smaller chunks can test the model’s consistency in handling concise prompts, while larger chunks reveal its capability to integrate broader context, which is critical for understanding complex narratives. Additionally, prompt designs are chosen to progressively increase in specificity and complexity, allowing us to observe how the model’s understanding and interpretive consistency evolve in response to more nuanced questions.
Results
The input data represented the unstructured data of the text novel “Machines like me” written by Ian McEwan. The pdf file in book format has 306 pages with around 92,000 words.
In the first use case, the data was uploaded in Python as a pdf file. For loading (UnstructuredFileLoader), embedding (OpenAIEmbeddings), storing vectors (Chroma), and retrieval, LangChain orchestration is employed. Therefore, in this use case, the data is kept private. The following settings were used for splitting the text: chunk_size = [200, 500, 1000], chunk_overlap = 0.
We select chunk sizes of 200, 500, and 1000 to evaluate how varying input lengths influence the model’s ability to retain context and respond accurately. Smaller chunks (200) test the model’s handling of concise information, while larger chunks (1000) assess its capability to integrate and interpret broader narrative elements. The choice of a 0 overlap is intended to isolate each chunk’s content, minimizing redundancy and ensuring that each chunk provides a distinct segment of the narrative.
The following sets of prompts are formulated in Table 2 (purple cells), and the answers offered by OpenAI differ based on chank size (green cells). Even for the same chunk size, the answer might differ.
Table 2 Queries related to the novel and OpenAI responses.
Full size table
The interview commenced with straightforward questions, such as the first two (1, 2), which received accurate responses. However, from the third question (3) onward, concerning Adam and Miranda’s relationship, there was a mix-up, with Adam being confused with Charlie. The fourth question’s answer (4), about Charlie and Miranda’s relationship, indicated that only the novel’s initial pages were correctly processed by OpenAI. The fifth question (5), involving symmetry in justice and OpenAI’s perspective on the harm caused by Adam to Miranda and Charlie, led to broader, more insightful responses, suggesting some level of world understanding.
When posed with the sixth question (6), OpenAI struggled to respond accurately at chunk sizes of 200 and 500. With the chunk size increased to 1000, the response improved, though with apparent hallucinations, displaying a partial grasp of the world. The seventh question (7), about Adam and Charlie’s relationship, received correct answers across all chunk sizes. The eighth question’s answer (8), however, was disconnected from the story, showing hallucinatory content.
The ninth question (8), about symmetry in justice and its impact on Charlie and Miranda, was met with an evasive response. The tenth question (11), similar to the eighth (8) but with more context, elicited a response suggesting Adam’s guilt and irrelevantly brought up Miranda’s father. The eleventh question (11) received a specific answer, stating Adam helped Miranda by confessing to the police, though this was only accurate with a chunk size of 200. The twelfth question (12), a reverse of the eleventh (11), yielded a confusing answer at a chunk size of 200, but a more realistic one at 500.
The responses to the thirteenth (13) question, about whether Adam betrays Miranda, were inconsistent. For the fourteenth question (14), concerning a climactic event, the first answer (chunk size = 200) was off-topic, while the second (chunk size = 500) suggested familiarity only with the beginning of the book. Lastly, the fifteenth question (15), more general in nature, was handled adeptly by OpenAI.
In the second use case, the novel text was uploaded in ChatGPT-4, exposing the data. Thus, the data was uploaded to ChatGPT-4, and the following prompts shown in Table 3 were formulated.
Table 3 Prompts related to the novel and ChatGPT-4 responses.
Full size table
The responses from ChatGPT were more aligned with the novel’s narrative, demonstrating greater stability and fewer instances of hallucinatory content. They also demonstrated a higher level of world understanding. Initially, ChatGPT operated without restrictions in its responses (orange cells). Subsequently, it was limited to considering only the information from the uploaded novel text for its answers (grey cells). In both scenarios, ChatGPT’s performance surpassed that observed when utilizing the OpenAI API key in a private application.
The following limitations of LLMs in literary text interpretation are found: (1) Limited understanding of complex literary narratives-the research highlights that both the private chatbot and ChatGPT exhibit a lack of deep comprehension when interpreting complex literary texts like Ian McEwan’s “Machines Like Me”. This limitation is evident in their inconsistent ability to grasp nuanced themes, character relationships, and the intricate context of the novel; (2) Inconsistency in response accuracy-the analysis notes variability in the accuracy of responses provided by both the private chatbot and ChatGPT, particularly with different chunk sizes of text input. This inconsistency suggests that the models’ ability to process and interpret literary content is not uniform, leading to fluctuating performance; (3) Tendency to produce hallucinatory responses-the research points out that the OpenAI API in the private application sometimes generated hallucinations or responses that were not grounded in the actual content of the novel; (4) Dependency on text chunk size-the effectiveness of both ChatGPT and the private chatbot’s responses varied with the size of text chunks processed. Larger chunks generally led to better, more contextually relevant answers, but not consistently. This dependence on chunk size poses a challenge in achieving reliable interpretations across different segments of the novel; (5) Challenges in narrowed question contexts-while ChatGPT was more effective at handling broader questions about the novel, it struggled with more specific or narrowed-down prompts.
This research shows that chatbots still fall short of achieving human-like comprehension, particularly in understanding the subtleties and deeper meanings of literary texts, such as metaphors or dual meanings (e.g., the double interpretation of the title “Machines Like Me”). The main limitations of the current research are: (1) Restricted scope of analysis as it primarily focuses on the interpretation of one novel, “Machines Like Me”, which may not provide a comprehensive assessment of the chatbots’ capabilities across a broader range of literary works. The findings may not generalize well to other texts with different themes, styles or complexities; (2) Reliance on specific models and APIs as our findings are based on specific models (ChatGPT and OpenAI API) and their configurations. Results might differ with other models, training methods or improvements in AI technology, limiting the scope of the conclusions to the current state of these specific models. As future works, we plan to extend the number of novels and the LLMs in order to provide a better assessment.
Conclusions
Several conclusions about the performance of ChatGPT and OpenAI API in interpreting Ian McEwan’s “Machines like me” are drawn. Firstly, ChatGPT’s responses were consistently more aligned with the novel’s storyline compared to outputs from the private application using the OpenAI API key. This indicates a superior understanding or processing capability of ChatGPT for this specific narrative. Additionally, the answers provided by OpenAI showed variations depending on the chunk size, revealing that the amount of text considered for each response significantly affects the chatbot’s interpretation of the novel. Generally, larger chunk sizes led to more accurate and contextually relevant responses, although this was not always the case. There was also a noticeable inconsistency in response accuracy from OpenAI, even with the same chunk size, suggesting variability in the model’s ability to uniformly process and interpret the novel’s content.
In testing scenarios where ChatGPT operated without restrictions and when restricted to using only the novel’s text, it demonstrated superior performance in understanding and accurately responding to prompts related to the novel compared to the OpenAI API in a private application. However, the responses from the OpenAI API in a private application sometimes included hallucinatory content, particularly for certain chunk sizes, indicating limitations in the model’s capacity to accurately process complex literary narratives.
Both OpenAI and ChatGPT showed varying degrees of success in accurately understanding and describing the relationships between characters, such as Adam, Miranda, and Charlie, with some responses being more accurate than others. Lastly, ChatGPT and private applications based on OpenAI API demonstrated a better capability in handling more general questions about the novel, providing responses that aligned well with the novel’s overall narrative and themes. To narrow questions, ChatGPT proved more advanced in offering reliable answers. These conclusions highlight the advancements and limitations of AI in processing and interpreting complex literary texts and underscore the importance of context and text length in determining the accuracy of AI-generated responses.
The text interpretation can be a complex task, especially for machines. Even from the title of the novel, the phrase “machine like me” can be interpreted in two ways, depending on the context: (1) Machines that are like humans in appearance or capabilities. This interpretation focuses on the similarity between machines (such as robots or AI systems) and humans in terms of appearance, behavior, intelligence, or emotions. In this sense, the phrase suggests that the machine has human-like qualities, such as the ability to think, learn, express emotions, make decisions, or physically resemble humans; (2) Machines that have a liking or preference for someone (like me). This interpretation is less common and would imply that the machine has the capacity to have preferences or affections towards specific individuals. This would involve a level of emotional or personal attachment from the machine towards a person. This alternate interpretation holds some validity regarding Adam’s feelings towards Miranda up to a point, but it also implies a reversal of Ian McEwan’s uncanny narrative. In this inverse perspective, the story suggests that it might be the machines who do not harbor affection for humans.
Data availability
The data will be made available upon reasonable request.
References
AI assistant for document management Using Lang Chain and Pinecone (2023). International Research Journal of Modernization in Engineering Technology and Science. https://doi.org/10.56726/irjmets42630
Altman M, McDonald MP (2011) BARD: Better automated redistricting. J Stat Softw. https://doi.org/10.18637/jss.v042.i04
Amin MM, Cambria E, Schuller BW (2023) Will affective computing emerge from foundation models and general artificial intelligence? A first evaluation of ChatGPT. IEEE Intell Syst 38:15–23. https://doi.org/10.1109/MIS.2023.3254179
ArticleGoogle Scholar
Baghdasaryan VH (2023) Comparative analysis of hidden Markov model and bidirectional long short-term memory for POS tagging in Eastern Armenian. Int J Sci Adv 4:498–504. https://doi.org/10.51542/ijscia.v4i4.2
ArticleMATHGoogle Scholar
Bahrini A, Khamoshifar M, Abbasimehr H, Riggs RJ, Esmaeili M, Majdabadkohne RM, Pasehvar M (2023) ChatGPT: applications, opportunities, and threats. 2023 Systems and Information Engineering Design Symposium, SIEDS 2023, 274-279. https://doi.org/10.1109/SIEDS58326.2023.10137850
Benítez-Andrades JA, García-Ordás MT, Russo M, Sakor A, Fernandes Rotger LD, Vidal ME (2023) Empowering machine learning models with contextual knowledge for enhancing the detection of eating disorders in social media posts. Semantic Web 14:873–892. https://doi.org/10.3233/SW-223269
ArticleGoogle Scholar
Binz M, Schulz E (2023) Using cognitive psychology to understand GPT-3. Proc Natl Acad Sci USA 120(6):e2218523120. https://doi.org/10.1073/pnas.2218523120
ArticlePubMedPubMed CentralMATHGoogle Scholar
Bolaños F, Salatino A, Osborne F, Motta E (2024) Artificial intelligence for literature reviews: opportunities and challenges. Artif Intell Rev 57(10):259. https://doi.org/10.1007/s10462-024-10902-3
ArticleGoogle Scholar
Brauwers G, Frasincar F (2023) A general survey on attention mechanisms in deep learning. IEEE Trans Knowl Data Eng 35:3279–3298. https://doi.org/10.1109/TKDE.2021.3126456
ArticleMATHGoogle Scholar
Briganti G (2023) A clinician’s guide to large language models. Future Med AI. https://doi.org/10.2217/fmai-2023-0003
Cheng D, Patel D, Pang L, Mehta S, Xie K, Chi EH, Liu W, Chawla N, Bailey J (2023) Foundations and Applications in Large-scale AI Models: Pre-training, Fine-tuning, and Prompt-based Learning. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/3580305.3599209
Chow JCL, Sanders L, Li K (2023) Impact of ChatGPT on medical chatbots as a disruptive technology. In Frontiers in Artificial Intelligence. https://doi.org/10.3389/frai.2023.1166014
Chow JCL, Li K (2024) Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots Through Large Language Models. JMIR Bioinform Biotech 5:e64406. https://doi.org/10.2196/64406
ArticleGoogle Scholar
Dehouche N (2021) Plagiarism in the age of massive generative pre-trained transformers (GPT-3). Ethics Sci Environ Politics 21:17–23. https://doi.org/10.3354/esep00195
ArticleGoogle Scholar
Dey A, Jenamani M, Thakkar JJ (2018) Senti-N-Gram: an n-gram lexicon for sentiment analysis. Expert Syst Appl 103:92–105. https://doi.org/10.1016/j.eswa.2018.03.004
ArticleGoogle Scholar
Dong Q, Li L, Dai D, Zheng C, Wu Z, Chang B, Sun X, Xu J, Li L, Sui Z (2024) A Survey for in-context Learning. Arxiv.https://arxiv.org/abs/2301.00234
Du S, Li T, Yang Y, Horng SJ (2020) Multivariate time series forecasting via attention-based encoder–decoder framework. Neurocomputing 388:269–279. https://doi.org/10.1016/j.neucom.2019.12.118
ArticleMATHGoogle Scholar
Factors that influence users’ adoption of being coached by an Artificial Intelligence Coach (2020) Philos Coaching: Int J 5:61–70. https://doi.org/10.22316/poc/05.1.06
ArticleGoogle Scholar
Floridi L, Chiriatti M (2020) GPT-3: Its Nature, Scope, Limits, and Consequences. In Minds and Machines, 30, 681-694. https://doi.org/10.1007/s11023-020-09548-1
Gnewuch U, Heckmann C, Morana S, Maedche A (2019) Designing and Implementing a B2B Chatbot: Insights from a Medium-Sized Service Provider in the Energy Industry. Proceedings of 14th International Conference on Wirtschaftsinformatik
Haque MDR, Rubya S (2023) An overview of chatbot-based mobile mental health apps: insights from app description and user reviews. JMIR MHealth UHealth 11:e44838. https://doi.org/10.2196/44838
ArticlePubMedPubMed CentralGoogle Scholar
Jalaja T, Adilakshmi T, Sharat Chandra MS, Imran Mirza M, Kumar MVS (2022) A Behavioral Chatbot Using Encoder-Decoder Architecture: Humanizing conversations. Proceedings—2022 2nd International Conference on Interdisciplinary Cyber Physical Systems, ICPS 2022, 51-54. https://doi.org/10.1109/ICPS55917.2022.00017
Katib I, Assiri FY, Abdushkour HA, Hamed D, Ragab M (2023) Differentiating chat generative pretrained transformer from humans: detecting ChatGPT-generated text and human text using machine learning. Mathematics 11:3400. https://doi.org/10.3390/math11153400
ArticleGoogle Scholar
Khademi A (2023) Can ChatGPT and Bard generate aligned assessment items? A reliability analysis against human performance. J Appl Learn Teach. https://doi.org/10.37074/jalt.2023.6.1.28
Kumar S, Solanki A (2023) An abstractive text summarization technique using transformer model with self-attention mechanism. Neural Comput Appl 35:18603–18622. https://doi.org/10.1007/s00521-023-08687-7
ArticleMATHGoogle Scholar
Kwon C (2023) AI and the future of architecture: a Smart secretary, revolutionary tool, or a cause for concern? Int J Sustain Build Technol Urban Dev. https://doi.org/10.22712/susb.20230010
Lee JY (2023) Can an artificial intelligence chatbot be the author of a scholarly article? In Science Editing, 10, 7–12. https://doi.org/10.6087/kcse.292
Lee M (2023a) A mathematical interpretation of autoregressive generative pre-trained transformer and self-supervised learning. Mathematics 11(11):2451. https://doi.org/10.3390/math11112451
ArticleMATHGoogle Scholar
Lee M (2023b) A mathematical investigation of hallucination and creativity in GPT models. Mathematics 11(10):2320. https://doi.org/10.3390/math11102320
ArticleMATHGoogle Scholar
Li S, Guo Z, Zang X (2023) Advancing the Production of Clinical Medical Devices Through ChatGPT. In Annals of Biomedical Engineering, 52, 441-445. https://doi.org/10.1007/s10439-023-03300-3
Liao W, Oh YJ, Feng B, Zhang J (2023) Understanding the influence discrepancy between human and artificial agent in advice interactions: the role of stereotypical perception of agency. Commun Res 50:633–664. https://doi.org/10.1177/00936502221138427
ArticleADSMATHGoogle Scholar
Lu S, Liu M, Yin L, Yin Z, Liu X, Zheng W (2023) The multi-modal fusion in visual question answering: a review of attention mechanisms. PeerJ Comput Sci 9:e1400. https://doi.org/10.7717/peerj-cs.1400
ArticlePubMedPubMed CentralMATHGoogle Scholar
Ma Y, Liu J, Yi F, Cheng Q, Huang Y, Lu W, Liu X (2023) AI vs. Human – Differentiation Analysis of Scientific Content Generation. https://arxiv.org/abs/2301.10416
Mendez SL, Johanson K, Conley VM, Gosha K, Mack N, Haynes C, Gerhardt R (2020) Chatbots: a tool to supplement the future faculty mentoring of doctoral engineering students. Int J Doctoral Stud 15:373–392. https://doi.org/10.28945/4579
ArticleGoogle Scholar
Meng S, Li C, Tian C, Peng W, Tian C (2023) Transfer learning based graph convolutional network with self-attention mechanism for abnormal electricity consumption detection. Energy Rep. 9:5647–5658. https://doi.org/10.1016/j.egyr.2023.05.006
ArticleMATHGoogle Scholar
Nguyen QN, & Sidorova A (2018) Understanding user interactions with a chatbot understanding user interactions with a chatbot: a self-determination theory approach emergent research forum (ERF). American Conference in Informations Systems
Niu Z, Zhong G, Yu H (2021) A review on the attention mechanism of deep learning. Neurocomputing 452:48–62. https://doi.org/10.1016/j.neucom.2021.03.091
ArticleMATHGoogle Scholar
Rau A, Rau S, Zöller D, Fink A, Tran H, Wilpert C, Nattenmüller J, Neubauer J, Bamberg F, Reisert M, Russe MF (2023) A context-based chatbot surpasses radiologists and generic ChatGPT in following the ACR appropriateness guidelines. Radiology 308(1):e230970. https://doi.org/10.1148/radiol.230970
ArticlePubMedGoogle Scholar
Rudolph J, Tan S, Tan S (2023) War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. J Appl Learn Teach. https://doi.org/10.37074/jalt.2023.6.1.23
Seitz L, Bekmeier-Feuerhahn S, Gohil K (2022) Can we trust a chatbot like a physician? A qualitative study on understanding the emergence of trust toward diagnostic chatbots. Int J Hum Comput Stud 165:102848. https://doi.org/10.1016/j.ijhcs.2022.102848
ArticleGoogle Scholar
Soydaner D (2022) Attention mechanism in neural networks: where it comes and where it goes. Neural Comput Appl 34:13371–13385. https://doi.org/10.1007/s00521-022-07366-3
ArticleMATHGoogle Scholar
Takagi S, Watari T, Erabi A, Sakaguchi K (2023) Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: comparison study. JMIR Med Educ 9:e48002. https://doi.org/10.2196/48002
ArticlePubMedPubMed CentralGoogle Scholar
Tao W, Li C, Song R, Cheng J, Liu Y, Wan F, Chen X (2023) EEG-based emotion recognition via channel-wise attention and self attention. IEEE Trans Affect Comput 14:382–393. https://doi.org/10.1109/TAFFC.2020.3025777
ArticleGoogle Scholar
Tay Y, Dehghani M, Bahri D, Metzler D (2022) Efficient transformers: a survey. ACM Comput Surv 55:1–28. https://doi.org/10.1145/3530811
ArticleMATHGoogle Scholar
Terblanche N, Molyn J, de Haan E, Nilsson VO (2022) Comparing artificial intelligence and human coaching goal attainment efficacy. PLoS ONE 17:e0270255. https://doi.org/10.1371/journal.pone.0270255
ArticlePubMedPubMed CentralGoogle Scholar
Thakkar KY, Jagdishbhai N (2023) Exploring the capabilities and limitations of GPT and Chat GPT in natural language processing. J Manag Res Anal 10:18–20. https://doi.org/10.18231/j.jmra.2023.004
ArticleMATHGoogle Scholar
Varitimiadis S, Kotis K, Skamagis A, Tzortzakakis A, Tsekouras G, Spiliotopoulos D (2020) Towards implementing an AI chatbot platform for museums. International Conference on Cultural Informatics, Communication & Media Studies. https://doi.org/10.12681/cicms.2732
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in Neural Information Processing Systems
Vergara PP, Salazar M, Giraldo JS, Palensky P (2022) Optimal dispatch of PV inverters in unbalanced distribution systems using reinforcement learning. Int J Electr Power Energy Syst 136:107628. https://doi.org/10.1016/j.ijepes.2021.107628
ArticleMATHGoogle Scholar
Walavalkar V (2023) Exploring the frontiers of artificial intelligence: advancements, challenges, and future directions. Int J Res Appl Sci Eng Technol 11:1351–1359. https://doi.org/10.22214/ijraset.2023.50361
ArticleMATHGoogle Scholar
Wu L, Rao Y, Zhang C, Zhao Y, Nazir A (2023) Category-controlled encoder-decoder for fake news detection. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2021.3103833
Xie T, & Pentina I (2022) Attachment Theory as a Framework to Understand Relationships with Social Chatbots: A Case Study of Replika. Proceedings of the Annual Hawaii International Conference on System Sciences. https://doi.org/10.24251/hicss.2022.258
Xue Y, Tong Y, Neri F (2022) An ensemble of differential evolution and Adam for training feed-forward neural networks. Inf Sci 608:453–471. https://doi.org/10.1016/j.ins.2022.06.036
ArticleMATHGoogle Scholar
Yang J, Ma J (2019) Feed-forward neural network training using sparse representation. Expert Syst Appl 116:255–264. https://doi.org/10.1016/j.eswa.2018.08.038
ArticleMATHGoogle Scholar
Yang JY, Lee TC, Liao WT, Hsu CC (2023) Multi-head self-attention mechanism enabled individualized hemoglobin prediction and treatment recommendation systems in anemia management for hemodialysis patients. Heliyon 9:e12613. https://doi.org/10.1016/j.heliyon.2022.e12613
ArticlePubMedPubMed CentralGoogle Scholar
Yang L, Wang G, Wang H (2024) Reimagining literary analysis: utilizing artificial intelligence to classify modernist French poetry. Information 15:70. https://doi.org/10.3390/info15020070
ArticleMATHGoogle Scholar
Zhang C, Zhang C, Zheng S, Qiao Y, Li C, Zhang M, Dam SK, Thwal CM, Tun YL, Huy LL, Kim D, Bae S-H, Lee L-H, Yang Y, Shen HT, Kweon IS, Hong CS (2023) A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need? https://arxiv.org/abs/2303.11717
Zhang M, Tian X (2023) A transformer architecture based mutual attention for image anomaly detection. Virtual Real Intell Hardw 5:57–67. https://doi.org/10.1016/j.vrih.2022.07.006
ArticleMATHGoogle Scholar
Zhao Q, Lei Y, Wang Q, Kang Z, Liu J (2023) Enhancing text representations separately with entity descriptions. Neurocomputing 552:126511. https://doi.org/10.1016/j.neucom.2023.126511
ArticleMATHGoogle Scholar
Zhao Z, Bao Z, Zhang Z, Cummins N, Sun S, Wang H, Tao J, Schuller BW (2021) Self-attention transfer networks for speech emotion recognition. Virtual Real Intell Hardw 3:43–54. https://doi.org/10.1016/j.vrih.2020.12.002
ArticleGoogle Scholar
Zhu Q, Lee YC, Wang HC (2022) Action-a-Bot: exploring human-chatbot conversations for actionable instruction giving and following. Proceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW. https://doi.org/10.1145/3500868.3559476
Zhu Y, Yang X, Wu Y, Zhu M, Zhang W (2023) Differentiable N-gram objective on abstractive summarization. Expert Syst Appl 215:119367. https://doi.org/10.1016/j.eswa.2022.119367
ArticleMATHGoogle Scholar
Download references
Acknowledgements
This work was supported by a grant of the Ministry of Research, Innovation, and Digitization, CNCS/CCCDI—UEFISCDI, project number COFUND-DUT-OPEN4CEC-1, within PNCDI IV.
Author information
Authors and Affiliations
Department of Economic Informatics and Cybernetics, Bucharest University of Economic Studies, Bucharest, Romania
Simona-Vasilica Oprea & Adela Bâra
Authors
Simona-Vasilica Oprea
View author publications
You can also search for this author in PubMedGoogle Scholar
2. Adela Bâra
View author publications
You can also search for this author in PubMedGoogle Scholar
Contributions
Simona-Vasilica Oprea: Conceptualization, Validation, Formal analysis, Investigation, Writing—Original Draft, Writing—Review and Editing, Visualization, Project administration. Adela Bâra: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing—Original Draft, Writing—Review and Editing, Visualization, Supervision.
Corresponding author
Correspondence to Simona-Vasilica Oprea.
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
Not applicable.
Informed consent
Not applicable.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
About this article
Check for updates. Verify currency and authenticity via CrossMark
Cite this article
Oprea, SV., Bâra, A. Interpreting text corpora from androids-related stories using large language models: “Machines like me” by Ian McEwan in generative AI. Humanit Soc Sci Commun 12, 325 (2025). https://doi.org/10.1057/s41599-025-04633-1
Download citation
Received:03 August 2024
Accepted:20 February 2025
Published:06 March 2025
DOI:https://doi.org/10.1057/s41599-025-04633-1
Share this article
Anyone you share the following link with will be able to read this content:
Get shareable link
Sorry, a shareable link is not currently available for this article.
Copy to clipboard
Provided by the Springer Nature SharedIt content-sharing initiative