Evaluating the Effectiveness of Optical Character Recognition (OCR) in Preserving and Digitizing Nusantara Traditional Manuscripts

Authors

  • Violinita Anastasya Faculty of Teacher Training and Education, Riau University, Indonesia

Keywords:

OCR, Nusantara manuscripts, Digital preservation, Cultural heritage, Script recognition.

Abstract

The preservation of Nusantara traditional literary texts faces challenges due to the fragility of manuscripts and the complexity of local scripts such as Javanese, Lontara, Balinese, and Batak. While Optical Character Recognition (OCR) has been widely applied in global digitization projects, its effectiveness in recognizing Nusantara scripts remains underexplored. Previous studies highlight OCR’s role in cultural heritage preservation and accessibility but also note significant accuracy gaps in non-Latin scripts. This research aims to evaluate the usability and limitations of OCR in processing Nusantara manuscripts. A qualitative-descriptive approach was employed by testing OCR tools on selected digitized manuscripts, with analysis focusing on recognition accuracy, error patterns, and integration with digital library systems. The findings indicate that OCR can successfully convert texts into searchable formats, improving accessibility for researchers, educators, and the public. However, challenges remain in handling complex diacritics, script variations, and degraded texts. The study concludes that OCR, while not fully reliable as an automated solution, is a valuable tool for safeguarding Nusantara heritage when combined with manual verification and script-specific model development. 

References

Agus, C., Saktimulya, S. R., Dwiarso, P., Widodo, B., Rochmiyati, S., & Darmowiyono, M. (2021). Revitalization of local traditional culture for sustainable development of national character building in Indonesia. In Innovations and traditions for sustainable development (pp. 347–369). Springer.

Alfida, A. (2014). The role of Indonesian national library in preserving and disseminating manuscripts. Heritage of Nusantara: International Journal of Religious Literature and Heritage, 3(1), 47–62.

Alginahi, Y. (2010). Preprocessing techniques in character recognition. Character Recognition, 1, 1–19.

Baker, K. (2015). Adapting the model for information literacy and cultural heritage in Cape Town: investigating user attitudes and preceptions in libraries, museums and archives.

Bankole, O. M. (2010). A review of biological deterioration of library materials and possible control strategies in the tropics. Library Review, 59(6), 414–429.

Bernstein, W. J. (2013). Masters of the word: How media shaped history from the alphabet to the internet. Open Road+ Grove/Atlantic.

Druce, S. C. (2016). Orality, writing and history: The literature of the bugis and makasar of South Sulawesi (introduction to special issue). International Journal of Asia Pacific Studies, 12(1), 1–5.

Dukut, E. M. (2021). Archiving Local Culture through Researching Transnational Popular Culture.

Ghosh, D., Dube, T., & Shivaprasad, A. (2010). Script recognition—a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2142–2161.

Gwerder, Y. (2017). Named Entity Recognition in Digitized Historical Texts. Universität Zürich.

Inkeaw, P., Chaijaruwanich, J., & Bootkrajang, J. (2019). Handwritten Character Recognition for Palm-Leaf Manuscripts. In Document Processing Using Machine Learning (pp. 145–162). CRC Press.

Karthick, K., Ravindrakumar, K. B., Francis, R., & Ilankannan, S. (2019). Steps involved in text recognition and recent research in OCR; a study. International Journal of Recent Technology and Engineering, 8(1), 2277–3878.

Kiessling, B. (2021). Avancées en Reconnaissance Optique des Caractères pour les Documents Arabes Historiques. Université Paris sciences et lettres.

Li, X., Cui, M., Li, J., Bai, R., Lu, Z., & Aickelin, U. (2021). A hybrid medical text classification framework: Integrating attentive rule construction and neural network. Neurocomputing, 443, 345–355.

Miller, C. (2014). A survey of indigenous scripts of Indonesia and the Philippines. Proceedings of the International Workshop on Endangered Scripts of Island Southeast Asia, 1–49.

Pal, U., & Chaudhuri, B. B. (2004). Indian script character recognition: a survey. Pattern Recognition, 37(9), 1887–1899.

Pawi, A. A., & Tamjehi, H. Bin. (n.d.). Researchers’ archives on the ODSAS platform: examples from Vietnam and Burma (Louise Pichard-Bertaux) The Malay Studies Library, University of Malaya, Kuala Lumpur, Malaysia (Awang.

Pudjiastuti, T. (2011). Manuscripts and cultural identity. Wacana: Journal of the Humanities of Indonesia, 13(1), 185–195.

Rajan, S. S., & Esmail, S. M. (2021). Manuscripts: Preservation in the digital age. Library Philosophy and Practice, 1–10.

Razak, Z. (2016). Old Jawi manuscript: digital recognition. University of Malaya (Malaysia).

Salah, A. Ben, philippe Moreux, J., Ragot, N., & Paquet, T. (2015). OCR performance prediction using cross-OCR alignment. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 556–560.

Säljö, R. (2010). Digital tools and challenges to institutional traditions of learning: technologies, social memory and the performative nature of learning. Journal of Computer Assisted Learning, 26(1), 53–64.

Tsakonas, G., & Papatheodorou, C. (2008). Exploring usefulness and usability in the evaluation of open access digital libraries. Information Processing & Management, 44(3), 1234–1250.

Widodo, S. T., & Purwantoro, A. (2021). SOCIAL CONSERVATION MODELS IN ANCIENT JAVANESE MANUSCRIPTS FROM THE 19TH CENTURY. International Journal of Conservation Science, 12(3), 1041–1052.

Downloads

Published

2025-08-30

How to Cite

Anastasya, V. (2025). Evaluating the Effectiveness of Optical Character Recognition (OCR) in Preserving and Digitizing Nusantara Traditional Manuscripts. L’Geneus : The Journal Language Generations of Intellectual Society, 14(2), 61-69. Retrieved from https://iocscience.org/ejournal/index.php/geneus/article/view/6657