Exploring the Effectiveness of GPT-3 in Translating Specialized Religious Text from Arabic to English: A Comparative Study with Human Translation

https://doi.org/10.48185/jtls.v4i2.762

Authors

  • Maysaa Banat Rafik Hariri University
  • Yasmine Abu Adla American University of Beirut

Keywords:

GPT3, Machine Translation, Arabic to English Translation, ROUGE Score, BERT Score, Natural Language Processing

Abstract

In recent years, Natural Language Processing (NLP) models such as Generative Pre-trained Transformer 3 (GPT-3) have shown remarkable improvements in various language-related tasks, including machine translation. However, most studies that have evaluated the performance of NLP models in translation tasks have focused on general-purpose text, leaving the evaluation of their effectiveness in handling specialized text to be relatively unexplored. Therefore, this study aimed to evaluate the effectiveness of GPT-3 in translating specialized Arabic text to English and compare its performance to human translation.

To achieve this goal, the study selected ten chapters from a specialized book written in Arabic, covering topics in specialized religious context. The chapters were translated by a professional human translator and by GPT-3 using its translation Application Programming Interface. The translation performance of GPT-3 to was compared to human translation using qualitative measures, specifically the Direct Assessment method. Additionally, the translations were evaluated using two different evaluation metrics, Bidirectional Encoder Representations from Transformers (BERT) score and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric, which measure the similarity between the translated text and the reference text.

The qualitative results show that GPT produced generally understandable translations but failed to capture nuances and cultural context. On the other hand, the quantitative results of the study showed that GPT-3 was able to achieve a relatively high level of accuracy in translating specialized religious text, with comparable scores to human translations in some cases. Specifically, the BERT score of GPT-3 translations was 0.83. The study also found that the Rouge score failed to fully reflect the capabilities of GPT-3 in translating specialized text.

Overall, the findings of this study suggest that GPT-3 has promising potential as a translation tool for specialized religious text, but further research is needed to improve its capabilities and address its limitations.

Downloads

Download data is not yet available.

References

A. Hendy, M. Abdelrehim, A. Sharaf, V. Raunak, M. Gabr, H. Matsushita, Y. J. Kim, M. Afify, and H. H. Awadalla, ``How good are gpt models at machine translation? a comprehensive evaluation,'' 2023.

Z. Tan, S. Wang, Z. Yang, G. Chen, X. Huang, M. Sun, and Y. Liu, ``Neural machine translation: A review of methods, resources, and tools,'' AI Open, vol. 1, pp. 5--21, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/

pii/S2666651020300024

T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, ``Bertscore: Evaluating text generation with bert,'' 2020.

C.-Y. Lin, ``Rouge: A package for automatic evaluation of summaries,'' in Text summarization branches out, 2004, pp. 74--81.

L. Zhou, W. Hu, J. Zhang, and C. Zong, ``Neural system combination for machine translation,'' arXiv preprint arXiv:1704.06393, 2017.

P. Koehn and R. Knowles, ``Six challenges for neural machine translation,'' arXiv preprint arXiv:1706.03872, 2017.

X. Liu, Y. Zheng, Z. Du, M. Ding, Y. Qian, Z. Yang, and J. Tang, ``Gpt understands, too,'' arXiv preprint arXiv:2103.10385, 2021.

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, ``Language models are few-shot learners,'' 2020.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, ``Attention is all you need,'' 2017.

A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., ``Improving language understanding by generative pre-training,'' 2018.

W. Jiao, W. Wang, J. tse Huang, X. Wang, and Z. Tu, ``Is chatgpt a good translator? yes with gpt-4 as the engine,'' 2023.

L. Wang, C. Lyu, T. Ji, Z. Zhang, D. Yu, S. Shi, and Z. Tu, ``Document-level machine translation with large language models,'' 2023.

S. Castilho, C. Mallon, R. Meister, and S. Yue, ``Do online machine translation systems care for context? what about a gpt model?'' 2023.

W. Zhu, H. Liu, Q. Dong, J. Xu, S. Huang, L. Kong, J. Chen, and L. Li, ``Multilingual machine translation with large language models: Empirical results and analysis,'' 2023.

E. Chatzikoumi, ``How to evaluate machine translation: A review of automated and human metrics,'' Natural Language Engineering, vol. 26, no. 2, pp. 137--161, 2020.

M. Farshoukh, Soul Breezes. Beirut: Iijazforum, 2018.

T. K. Kim, ``T test as a parametric statistic,'' Korean journal of anesthesiology, vol. 68, no. 6, pp. 540--546, 2015.

D. Williamson, R. Parker, and J. Kendrick, ``The box plot: A simple visual method to interpret data,'' Annals of internal medicine, vol. 110, pp. 916--21, 07 1989.

Published

2023-07-14

How to Cite

Banat, M., & Abu Adla, Y. (2023). Exploring the Effectiveness of GPT-3 in Translating Specialized Religious Text from Arabic to English: A Comparative Study with Human Translation. Journal of Translation and Language Studies, 4(2), 1–23. https://doi.org/10.48185/jtls.v4i2.762

Issue

Section

Articles