Evaluating machine translation of literature through rhetorical analysis

: This paper looks at how well ChatGPT and DeepL, two AI tools, translate literary works. Not only can ChatGPT translate text, but it can also carry out other jobs. DeepL is a service that performs computer translation and uses neural networks. The paper looks at how ChatGPT and DeepL translate books, poems, and dialogues compared to translations done by humans. The paper also talks about the pros and cons of using machine translation for literary reasons, including issues of creativity, style, and adapting to different cultures. The paper uses both new and old studies on machine translation technologies and how they work with human translation. The paper comes to the conclusion that ChatGPT and DeepL are useful but imperfect tools for translating literature, and they require human review and improvement. The paper adds to the fields of machine translation and natural language processing by looking at how two cutting-edge AI tools, ChatGPT and DeepL, can be used to translate literary works. The paper also adds to literature studies and digital humanities by looking into what machine translation can and can't do for creative writing and dialog systems. The goal of the paper is to encourage researchers, translators, writers, and users from different fields to work together and talk to each other. ⁤


Introduction
The application of artificial intelligence (AI) to fields like machine translation has progressed rapidly, enabling new potentials for automated language conversion at scale.However, while AI translation accuracy has achieved parity with humans for simple texts like news articles, significant challenges remain for translating complex literary works.Can machines effectively handle great novels filled with metaphors, rhyming verse, imaginative narratives, and other artistic language flourishes?This paper aims to answer this question by comparing the capabilities of two popular neural machine translation systems, ChatGPT and DeepL, on literary texts.
Literature makes use of elaborate rhetorical techniques, subtle cultural elements, subjectivity, and creative expression that pose difficulties for current AI capabilities (Moorkens, 2017).According to a number of researchers (Alzeebaree, 2020; Lee & Cha, 2023), initial rule-based and statistical machine translation methods performed poorly on literary texts, failing to capture nuance and style, whereas modern neural networks like Google's Transformer architecture have demonstrated improved fluency and contextual handling (D 'Souza, 2023).Despite these advances, literature remains a challenging domain for AI translation, as it requires not only technical correctness, but also aesthetic essence, emotion, creativity, and artistry.AI translation also offers some benefits for literature, such as enhancing accessibility to literary works across languages, supporting comprehension at scale, and reducing costs and time (Gaspari et al., 2014).Moreover, AI translation can facilitate draft translations that human experts can postedit to polish styling and fix errors (Koponen, 2016).AI is becoming an integral tool in the professional translator's toolkit.Therefore, it is important to evaluate the strengths and weaknesses of different AI translation systems for literature, and to explore the potentials and limitations of hybrid human-machine collaboration.
Previous studies have investigated various aspects of AI translation for literature, such as evaluating the quality of output using customized rubrics (Fantinuoli & Prandi, 2021), enhancing the training of neural networks using stylistically annotated parallel corpora (Cipriani, 2023), and promoting interdisciplinary collaboration between literary scholarship and computer science (Goto & Tanaka, 2017).However, there is a lack of comprehensive and updated comparative analysis of leading AI translation platforms on literary texts, especially on rhetorical devices like metaphors, rhyme, and imagery, which are essential for conveying the author's intended meaning and effect.
The aim of this paper is to provide such an analysis, by assessing the capabilities of two popular neural machine translation systems, ChatGPT and DeepL, on literary texts.ChatGPT is a conversational AI system that uses a generative pre-trained transformer model to produce natural and engaging responses (Roumeliotis & Tselikas, 2023).DeepL is a web-based translation service that uses a deep learning approach to deliver high-quality and accurate translations (Fradana, 2023, p. 207).Both systems support over 100 languages and claim to outperform other AI translation systems on various benchmarks.How do they fare on literary texts?We compare their performance on three types of rhetorical devices: metaphors, rhyme, and imagery, using a corpus of English literary texts and their translations in the Russian language.We evaluate the output using both automatic and human metrics, and discuss the implications for the development of more linguistically knowledgeable and stylistically adept AI.The paper is organized as follows: Section 2 reviews the related literature on AI translation for literature, Section 3 describes the methodology and data, Section 4 presents the results and analysis, Section 5 discusses the findings and limitations, and Section 6 concludes the paper and suggests future directions.

Literature Review
Early machine translation systems relied on rules-based approaches that performed very poorly on literary works (Rivera-Trigueros, 2022).They depended on rigid structures and lacked fluency (Oliver, 2020, p. 125) and were found incapable of handling complex rhetorical devices, ambiguity, metaphors and other creative language (Hasselberger, 2021).However, statistical machine translation (SMT) improved outcomes by training on vast volumes of parallel text corpora Saxena et al., 2022).SMT modeled probabilistic mappings between source and target languages (Sharma & Singh, 2021).Still, their capabilities fell short for literary translation ( 2018) assessed an NMT system's handling of English-Spanish fiction translation, finding it achieved much greater fluency with some metaphors conveying appropriately.But many creative elements were rendered poorly.
Analyses reveal persisting limitations alongside progress.Rădulescu (2019) tested different neural architectures' translation of a Romanian novel.Fluency and word order improved from previous technologies but higher-level flaws in cohesion and mistranslations remained.Overall, NMT indicates promising potentials advancing literary translation but still falls substantially short of human skill for creative stylistic elements, subjectivity, culture and aesthetic mastery (Bentivogli et al., 2018).Yet quality continues improving and semi-automated workflows with human oversight offer capabilities (Castilho et al., 2017;Guerberof Arenas & Moorkens, 2019).Interdisciplinary breakthroughs between AI, linguistics and literary scholarship could enable more robust modeling of creative language (Belinkov et al., 2017).But handling subjective artistry poses fundamental challenges for machines (Chesterman, 2021).The ultimate limits of computational creativity remain unknown.

Methods and Materials
This study aimed to provide updated comparative analysis on leading neural machine translation platforms' capabilities for literary text.The examination focused on two widely used commercial systems: 1) ChatGPT (OpenAI): launched in November 2022, this conversational AI system employs a transformer-based neural network architecture trained on massive volumes of data including books, websites and online conversations (Roumeliotis & Tselikas, 2023).ChatGPT exhibits strong language generation capabilities.
2) DeepL: proprietary NMT service developed by the DeepL Company using deep neural networks.It incorporates models like transformers and convolutional networks (Fradana, 2023, p. 207).DeepL provides both free and paid subscription access.
The platforms were tested on samples of literary text containing creative rhetorical devices and language artistry.Excerpts from the English children's novel Coraline by Neil Gaiman (2003) served as source samples, selected due to its rich metaphors, imagery, alliteration, and imaginative narrative.A set of approximately 15 brief passages highlighting different creative techniques was utilized, equally around 100-200 words each.
The base English input samples were translated into Russian output by ChatGPT and DeepL.Default system settings were used without modification to represent typical application.Each platform's capabilities and limitations translating the literary devices were assessed qualitatively using manual inspection.Outputs were evaluated regarding: -Accuracy: faithfulness conveying source meaning -Fluency: grammaticality, naturalness -Preservation of literary devices: maintenance of metaphor, alliteration, rhymes, etc.This allowed categorization of which creative language elements posed greater difficulties for the AI systems.Their overall capability approaching the quality of human translation was judged.
Additionally, for the ChatGPT system, tests were conducted on providing the model with enhanced guidance specific to translating rhetorical devices in context.Literary elements in input samples were highlighted and ChatGPT prompted to focus on accurately conveying aspects like metaphoric language, emotive content, and aesthetic form.The system was interactively trained via this human-in-the-loop approach as described by Li et al. (2016).Outputs were re-evaluated to assess if translation quality improved through targeted human guidance.
ChatGPT was also questioned about its approach to translating different literary devices to gain technical insights into its strengths, weaknesses, and representation of creative language patterns.The system provided examples and explanations via conversational interaction.
Comparisons were made to professional human English-to-Russian translation of the original samples as a gold standard reflecting the pinnacle of literary translation skill.Outputs of human versus AI systems were investigated regarding mastery of language artistry.The analysis aimed to elucidate gaps where computational methods have yet to reach advanced human expertise for creative stylistic elements.

Results
The testing process on samples of literary text yielded informative findings about ChatGPT and DeepL's capabilities and limitations for key aspects of creative language translation: Accuracy: Both systems exhibited reasonable accuracy conveying literal meanings for straightforward descriptive passages.However, their precision suffered for content involving metaphors, allusions, and other figures of speech.DeepL tended to translate these devices literally rather than interpreting the underlying meaning.ChatGPT was moderately better at paraphrasing metaphors but also struggled with rare or culture-dependent examples.
Fluency: DeepL generally produced very smooth, grammatically coherent Russian output without awkward phrasing.In contrast, ChatGPT had periodic grammar errors reflecting lack of complete language mastery.Though when prompted, ChatGPT could self-correct basic mistakes.Overall, both systems generated reasonably fluent outputs on par with human-level for typical language.
Preservation of literary devices: This area exhibited the biggest challenges.The nuanced creative elements central to literary artistry faced significant difficulties in translation: -Metaphor: Common straightforward metaphors were rendered adequately but obscure metaphors posed problems.ChatGPT occasionally improved on the original metaphor through creative paraphrasing.
-Simile: Direct similes translated well in both systems but some lacked the eloquence of human versions.Idiomatic similes were challenging.
-Alliteration/Rhyme: Neither system recreated alliteration or rhyme schemes to the level exhibited in the professional human translations.They prioritized conveying literal meaning over artistic form.
-Imagery: DeepL struggled translating vivid imaginary phrases, often simplifying to plainer language.ChatGPT offered some improved imagery when prompted.
-Parcelling: ChatGPT partially replicated this technique of creatively fragmented sentences but DeepL solely produced complete sentences.
-Tone/Mood: Human translations provided emotive nuance missing from AI outputs.The systems failed to convey subtle atmosphere.
Overall, both ChatGPT and DeepL failed to recreate the nuance, fluidity, and aesthetic mastery of the professional human translator across these many different dimensions of creative language artistry.However, when provided targeted guidance emphasizing literary devices during translation, ChatGPT's performance did improve on certain creative elements.The interactive learning highlighted potentials to enhance NMT outputs through proper human instruction focusing on artistic goals.
Discussions with ChatGPT further revealed details about its approach: -It stated metaphor translation involves representing the implied meaning in the target language, difficult without real-world knowledge.English metaphors may not have direct equivalents.
-For similes, it seeks corresponding comparisons in the output language, a process benefiting from large training corpora.
-Alliteration and rhyme are challenging due to different linguistic constraints across languages.These are modeled weakly currently.
-Parceling requires detecting fragmented sentences and replicating their effects.This technique is not well captured by its training data.
The system acknowledged representing high-level creative language remains difficult compared to non-literary texts.ChatGPT's insufficiency handling certain metaphors and devices was apparent without human guidance.But cooperative interaction enabled improving some translations by emphasizing stylistic goals.

Discussion
The comparative analysis revealed both strengths and considerable limitations of leading commercial NMT services for translating literary texts relative to human expertise.AI systems have achieved high accuracy and fluency for straightforward language but struggle to replicate the art and essence of literature.The challenges align with findings from academic research on NMT creative deficiencies (Gaspari et al., 2015).Literal resemblance does not equate to expert literary translation.
At their current capability, AI systems function more as diction-driven word converters rather than exploring expression possibilities in the target language.They lack integration of linguistic stylings beyond basics of grammar and vocabulary.Modeling the patterns, subjectivity, and cultural allusions inherent in literary language remains difficult (Eco, 2001).Methods to inject greater linguistic into NMT training could enhance handling of creative devices as suggested by Casas (2020).
Of the two systems, DeepL produced smoother grammatical output thanks to robust models trained on massive high-quality corpora.Meanwhile, ChatGPT exhibited stronger versatility responding to interactive guidance.Tailored human instruction improved some translations by focusing attention on artistic goals.This highlights the promise of guided hybrid human-AI collaboration as argued by Rane (2023).

Conclusion
This paper presented comparative analysis between commercial neural machine translation services ChatGPT and DeepL and professional human translators on literary text samples containing creative rhetorical techniques and artistry.The examination aimed to elucidate strengths, weaknesses, and overall capability of leading AI systems to handle the complexities of literature.
The results underscored persistent challenges for AI in expert literary translation.While accuracy on straightforward passages was reasonable, critical deficiencies emerged regarding creative metaphoric language, emotive expression, nuanced cultural details, aesthetic form, and the communicative goals central to humanistic art.The AI systems functioned more as diction-driven converters rather than exploring the holistic expressive potentials of literature.Even tailored guidance could only partially improve certain creative translations.
However, rapid advances are occurring as neural network scale and techniques improve.The power of deep learning has enhanced fluency and basic comprehension of literary works, enabling accessibility.AI also facilitates productive human-machine collaboration harnessing computation while ensuring quality final polished translations.Transparency from AI systems into their creative limitations provided insights for further progress.
Looking forward, enhanced interdisciplinary collaboration between AI developers, linguists, and literary scholars offers strong potential to advance computational mastery of the arts (Rane, 2023).Integrating knowledge of language, rhetoric, culture, and cognition could strengthen machine learning representations (Chakrabarty et al., 2021).Teaching AI the patterns of literary art may unlock new potentials.But handling subjectivity and emotional essence remains profoundly challenging (Chesterman, 2021).The pinnacle capabilities of human creativity likely persist as irreproducible.
Overall, while significant hurdles to AI matching literary translation expertise endure, we see potentials for thoughtful hybrid symbiosis and specialized breakthroughs.Perhaps someday, AI assistance could aid human understanding across languages for civilization's creative treasures.But computational creativity must overcome difficult barriers regarding the human condition.We should ensure developing AI with ethical priorities and humanistic visions (Jobin, Ienca, & Vayena, 2019).How to judiciously apply AI's growing capabilities for literary arts remains an open challenge but advances could expand accessibility while inspiring new possibilities for expression.

Acknowledgment
This research is funded by the Science Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan on the topic "The connectivist model of a foreign language educational SMART-environment in Kazakhstani context: Necessity, availability, and development strategy" (2023-2025) (Grant number AP19679833) Şahin & Gürses, 2021; Shah, Imran, & Ismail, 2024).Neural machine translation (NMT) has since demonstrated substantial advances by utilizing deep learning (Mohamed et al., 2021).Castilho et al. (2017) explain NMT incorporates embedded vector representations of words and sequences enabling stronger contextual modeling.Toral and Way ( Recent research proposes approaches to enhance NMT creativity.Goto and Tanaka (2017) developed techniques to detect untranslated content, which could flag missing metaphors.Matusov et al. (2019) incorporated transformer and retrieval mechanisms to better handle unknown phrases.LM model fusion explicitly modeled linguistic constraints to improve fluency (Ranathunga et al., 2023).Fantinuoli and Prandi (2021) designed customized rubrics aligning with interpretability, though human communication still exceeded NMT.Other studies advocate greater interdisciplinary collaboration.Bolaños et al. (2021 in Rovira-Esteva et al., 2023, p. 120) suggest stylistically annotating parallel corpora to highlight rhetorical devices for NMT training.Chakrabarty et al. (2021) propose applying cognitive and perception principles to model figurative language.Rane (2023) argues for cooperation between computer scientists, linguists and literary experts.However, limitations arise in automating human translation expertise.Castilho et al. (2018) notes software cannot yet replicate creative ambiguous language use.Moorkens (2018) found NMT output remained distinguishable from human translation.Ploin et al (2022) concludes human creativity remains irreplaceable.Their study also affirms that NMT demonstrates its strengths, but requires further progress to holistically interpret literary text the way humans do (Bentivogli et al., 2018; Castilho et al., 2017).