nature.com

Digital preservation and access strategies for overseas Chinese documents: challenges and solutions

AbstractThe purpose of the study is to propose a framework that summarizes the processing of existing and incoming sources about the life of the overseas Chinese, their historical heritage, and documents. The study addresses the significant problem of fragmentation and relatively poor accessibility of individual collections or sets of documents to interested researchers, who may not even know about the existence of the documents they are looking for. As an approach to solving this problem, it is proposed to use end-to-end processing of cached data, which is a description of digitized or material sources of artifacts already created by researchers. Cached data stores information about the use of a service, program, or data to structure and facilitate future individual use of that data. This is, in particular, the data about the structure of catalog or directory topics, keywords, and database indexes. With the help of AI, human resources, and already existing approaches to the algorithmizing of various types of digital data (for example, images of artifacts or documents), the created descriptions of sources are gradually being universalized and unified through the practice of database queries, and access to them is simplified using tags and ‘cloud’ data storages. This approach has great practical value because it does not require the use of special agreements, data formats, or complex digital tools that would be difficult to implement in international research practice.

IntroductionThe Chinese diaspora is very widespread and has a long history. Researchers point out that it has been observed in South Africa since the mid-1600s, and in Southeast Asia, the Chinese have been settled throughout recorded history (Chao, 2006; Chu, 2010). The Chinese overseas led an active economic and cultural life, but most often tried to avoid political and social conflicts and practiced ‘social invisibility’ in many countries of residence (Barabantseva, 2012; Ching-Hwang, 2017; Li, 2017). Nowadays, this leads to the paucity of documents of the Chinese overseas and the difficulty of accessing them (Chao, 2001; Hsu, 2013; Nyíri, 2020; Zhao et al. 2020). Many research centers in the United States, Southeast Asia, and other regions are seeking to locate and organize such data and collect archives of the Chinese diaspora (Molenda, 2019; Tan and Chiu, 2007; Theng et al. 2010; Wegars, 2016). However, these efforts are still, by all accounts fairly, fragmented and localized. The researcher will have to independently access each of these projects, process its data, or search for the necessary information in each of these repositories separately, and most researchers may not even have information about many of these repositories (Chao, 2001; Wilson et al. 2019; Yun et al. 2021). The Chinese government and universities also make significant efforts to search and organize information about the lives of Chinese overseas to establish ties with the homeland and preserve historical heritage (Chu, 2010; Perez-Garcia and Diaz-Ordoñez, 2023; Wang et al. 2020). Another significant issue is the preservation and reintegration of Chinese national heritage, which was exported abroad and represents display archives, the volume of which, according to researchers, can exceed 10 million volumes (Su et al. 2022; Zhao et al. 2020).The preservation of historical heritage includes the creation of greater access opportunities and conservation, which can also be considered digitization (Mafrici and Giovannini, 2020; Poole, 2015; Terras et al. 2017). Preserving copies and models of objects in digital form provides greater opportunities for research and evaluation in terms of the content of material culture, even if the object itself is lost or destroyed (Ng-He, 2022; Skublewska-Paszkowska et al. 2022; Wilcox, 2020). It is now critical to create accurate, scalable models of archeological and architectural sites that may be damaged by war or natural and man-made disasters (Champion and Rahaman, 2020; Erturk, 2020). Text monuments (documents) are also most often saved in the form of exact digital copies, models in various formats, images, etc. Creating such a copy of a document makes it possible to expand access to it for libraries, universities, and research centers from anywhere in the world. Such copies are also part of the mining data and should be included in it along with the traditional symbolic representation of documents (Li et al. 2020; Ma, 2020; Wang, 2020).To study the social and economic life of overseas Chinese, the processing of civil documents plays an important role (Hsu, 2013; Ke and Tseng, 2016; Shih, 2012). Often, contracts, biographies, notarial records, and tax records make it possible to restore kinship and study the peculiarities of people’s lives and behavior (Khan et al. 2023; Khoir et al. 2015; Minjie et al. 2016). Such archives must be digitized to enable data mining and in-depth research. On the other hand, modern government and administrative records and web scraping provide the same opportunity to reveal the economic and social life of Chinese abroad (Blanke, 2024; Mukwevho and Ngoepe, 2019). Such evidence is critical because they present a more objectified, less subjective view of the lives of Chinese people through the prism of verifiable documentary data, devoid of the subjective evaluative factor. The effectiveness of web scraping may depend on the restrictions of access to data and the legal and ethical aspects of using the extracted information. These aspects should be regulated by researchers who use the web scraping method and provide access to their data through the proposed framework. It should be noted that web scraping is one of the modern research methods that does not form the basis of the platform methodology proposed below but can be used by researchers and joined to it when such possibility and legal access is available.Several important researchers point to the particular importance of ‘expressive’ types of documents, which include diaries, personal notes, letters, and lyrics (Chu, 2010; Lampert and Vaughan, 2018). These types of documents are specific to Chinese culture and behavior, they reflect the characteristics of ‘Chineseness’ across the seas and also have enduring artistic value (Chu, 2010; Wang et al. 2020). Their specificity for Chinese culture is due to the special styles of keeping such records that have developed in Chinese society and have unique features: reliance on traditional Confucian and Taoist canonical texts, the use of special forms of expression, poetry, and line forms established in ancient texts, etc. (Wegars, 2016; Wang and Zhan, 2020).Literature overviewThe preservation of historical heritage is considered by researchers from several points of view: preservation of material monuments (conservation, restoration, conservation), copying (modeling, digitalization), and expanding access to heritage, including for an international audience (Gao and Jones, 2021; Noh and Chang, 2020; Perushek and Smith, 1999). A number of researchers are increasingly highlighting, in connection with the digitalization of archives and documents, as well as the creation of digital copies or models of material objects of historical heritage, the aspect of expanding opportunities for scientific research and data mining (Khan et al. 2023; Mafrici and Giovannini, 2020; Tripathi, 2018). Digitization of documents from past eras typically involves creating the most accurate, scalable digital copy of a physical object in one of the popular digital image formats (Gao and Jones, 2021; Ng-He, 2022; Wang et al. 2020). 3D modeling is primarily used to reconstruct historical architectural complexes or objects of material culture, but not documents (Champion and Rahaman, 2020; Mafrici and Giovannini, 2020; Theng et al. 2010). On the other hand, ease of use during data processing, comparison, and textual research requires the transformation of a graphical document object into a text format while maintaining maximum accuracy in reproducing the features of the document (Shih, 2012; Skublewska-Paszkowska et al. 2022; Wang et al. 2023). The processing of massive amounts of digital information in the form of state and local administrative archives already makes it difficult for the researcher of Chinese documents overseas to locate and share such isolated digital archives together. The problem is exacerbated when the heritage document is a picture, which cannot be analyzed by automatic algorithms along with text documents (Li et al. 2020; Zhou et al. 2012).Chinese compatriots overseas have left a significant mark on the history of various countries, particularly the United States and Southeast Asian countries (Ching-Hwang, 2017; Chu, 2010; Molenda, 2019). A significant part of their documents are part of the national document collection of these countries and need to be extracted using data mining methods (Wang and Zhan, 2020; Wang, 2020). Researchers have noted that certain types of such documents, such as those related to marriage, land contracts, and tax documents, provide an important basis for broader socio-economic inferences and the connection of overseas Chinese to kinship and the homeland (Ma, 2020; Tran and Chuang, 2020; Xu et al. 2021). Researchers list many university projects, research centers, and individual academics aimed at collecting, describing, and digitalizing Chinese documents (Hsu, 2013; Zhao et al. 2020). A common problem with this heterogeneous historical heritage is that it cannot be collected into one common and organized access frame for all international researchers (Gao and Jones, 2021; Li et al. 2020).Creating easier access to documentary historical heritage has high social significance for uniting Chinese people on the continent and overseas, creating a unified cultural and research field, and creating greater openness of Chinese culture around the world (Guo et al. 2024; Su et al. 2022). Researchers have already proposed a number of projects to combine humanities technologies and digitalization tools to provide access to libraries in an international context (Chao, 2001; Tryon, 2017). Improved access to archival data and documents improves fact-checking and evidence-based research and requires the creation of specific complementary digital tools that rely on complex user behavior (Bharwani, 2006; Saadia and Naveed, 2024; Shahzad et al. 2024). Researchers pay considerable attention to the problems of indexing and describing documents in databases. It can be carried out either through an expert’s description of the document, or through indexing by keywords, content, or as a result of machine text analysis (Blanke, 2024; Erturk, 2020; Skublewska-Paszkowska et al. 2022). The solution to this problem is closely related to preservation and providing access to the document, which should be easily located and recognized by users’ requests (Tripathi, 2018; Wanyan and Hu, 2020).Problem statementThe purpose of this research is to create a common generalized framework that allows us to gradually combine documents of overseas Chinese, facilitate the process of integrating digitized and originally digital documents, and facilitate access to these documents. All this will increase the motivation of researchers and will have a positive impact on the cultural and social development and integration of the Chinese diasporas with their homeland and with the societies where they live. The need for research is due to the scattered nature, relatively low systematization, and complexity of researchers’ access to archives and databases containing documents of overseas Chinese. The challenges faced by researchers are related both to the digitization of physical heritage and to ensuring the proper integration of databases, the ease of processing heterogeneous data, and their accessibility.The objectives of the study are:

To identify the main problems faced by researchers working with overseas Chinese documents;

To develop and propose possible solutions aimed at overcoming existing issues related to digitization processes and ensuring extended access to such documents.

development of an approach and framework that ensures deeper integration and accessibility of digital documents of overseas Chinese to the public and researchers.

MethodsTo achieve the set objectives, a methodology based on the analysis of methodological requirements and the structure of processing heterogeneous information is employed. This information is distributed among various archives, libraries, databases, and other sources located in different parts of the world and interconnected by digital links. Considering that separated components of the used digital tools for processing data on documents of Chinese overseas have already been researched, described, and actively used in various databases, it is necessary to determine the structure of the optimal interaction of these tools. This does not cover the detailed description of individual specific algorithms and tools, such as those used for processes of recognition, processing, cataloging of documents and images, creating thematic lists of sources, and forming indexes in databases, since these tools are described in detail and applied by other researchers. All of these tools can and should be improved in the future, but in this case, the goal is to create a framework that can quickly begin to work with the entire amount of data available without making significant or costly changes or requiring all participants in the framework to follow complex uniform procedures that will be difficult to implement and distribute.The proposed framework, first of all, takes into account the possibility of using various technologies or tools in various data sources and does not require their unification. This proposes the creation of a unified system for processing thematic information on the documents and heritage of the Chinese overseas for all interested research parties. The idea is to ensure that in the process of further research and processing of sources, an optimal and accessible workspace with optimal characteristics for the study of this topic and improved access for researchers and public access both in China and abroad is gradually formed.ResultsThe basis of the framework should be a mechanism for interaction between existing sources of data about the life of the Chinese overseas and catalogs of accumulated documents. In this case, no distinction is made between different types and formats of documents: photocopies, microfilms, records in administrative databases, books and paper documents, or digitized models of things that can be exhibited on museum websites (see Fig. 1). Academic institutions through which users operate (researchers, the public, students and other interested parties) act as a database from which data mining starts and a request is made to one or another source of information about the Chinese overseas (conventionally presented on the left in Fig. 1 such as Archives, Catalogs and Museums—please note that this list is not exhaustive). Each data source describes it in some way, creating its indexes, sets of keywords, or topic structures for storage directories and data directories. This is the key information that the customer gets access to when processing a request from his own database from which he starts. This set of descriptive data serves as a guide for further processing of data in sources that are heterogeneous.Fig. 1A generalized interface for a researcher’s thematic query.Full size imageThe framework is a gateway and a place for storing cached data arrays describing other data. At the same time, problem of an institutional nature, such as data ownership, regulation of access to them, legal problems of using AI, do not concern the framework itself, because they are solved at the level of each individual document that the user wants to access, and to which the framework will direct him during the search. These documents themselves, so to speak, their digital “originals” will be located on the servers of universities and institutions, databases or government agencies that created them.Figure 2 presents a formal description of the model of how the above-mentioned source description system is formed and gradually becomes more uniform (as part of indexes, thematic structures, query keywords, etc., in various archival collections, databases, or other sources). The documents themselves are heterogeneous and can include database data, drawings, three-dimensional models of things or architectural units, text documents translated into a format accessible for digital processing, etc. Moreover, all these sources, one way or another, go through a conditional processing unit, where they are revised, read, and described. This cycle of procedures can be done by people (researchers, archivists, librarians, collectors, and academic institutions when processing data and mining data from Internet sources) (Human resources in Fig. 2). They can also be processed based on already developed algorithms and schemes, as is the case with the International Image Interoperability Framework (IIIF), which is already successfully used by a number of institutions. Sources can also be described using numerous and ever-expanding AI tools. In the latter case, the more times a particular collection or set of documents is accessed, the more machine learning capabilities the AI will have, and the more advanced it will be at describing sources. The result of the process of processing the original sources in the conditional processing block (the central part of Fig. 2) is the same set of indexes, keywords, or thematic structure, which can then be visible to the user when searching for data (Fig. 1).Fig. 2A mechanism for recognizing and transforming document descriptions for the end user.Full size imageA very complex process of constant data exchange and mutual request for data between various databases is depicted schematically in Fig. 3. Any academic institutes, universities, libraries, research centers, museums, and other conventional centers that themselves store and include some part of the information and documents about the Chinese overseas, and on their part, the request for new information is based on a set of their data already stored and a description of their structure available to the user (Fig. 3). During mutual queries, a query cache is formed containing mutually overlapping and intersecting sets of data descriptions (keywords, indexes, topics, etc.). This cache, in turn, is processed by human experts, AI, and existing data description algorithmizing models, as a result of which these sets of source descriptions will gradually be universalized, and servers will accept similar requests for similar information, which are called slightly differently more adequately. Since it cannot be expected or required that all sources of information on the Chinese overseas universalize or lead to uniform requirements for the structure of their data, the proposed system, the cache of which can simply be stored in free ‘cloud’ structures by general agreement, can be opened on some a common, well-known and easily understood tag, for example, ‘Chinese beyond the seas.’ ‘Cloud’ implies storing data and providing the ability to run executable program files on geographically remote and distributed servers, when the user does not need to know where exactly and which machine is executing his instructions or providing memory. A ‘tag’ is a lexical string marker assigned to specific data by the user to facilitate further finding and using such marked data in large arrays. In the future, servers of all institutions (which are referred to uniformly as Requesting databases) and AI will easily access the full variety of available collections and sources by a set of cache descriptions and this tag (Fig. 3).Fig. 3Mechanism for generating publicly accessible source indexes.Full size imageHere it should be specifically noted why the framework assigns such a significant role to the cache of source descriptions (keywords, indexes, thematic structure, etc.). This description may also apply to non-digitized data. For example, a user who is looking for information about the cultural heritage of the Chinese overseas may receive through research data and descriptions of a collection of surviving personal letters and objects of some people preserved in the collection of a private collector and described by someone. The cultural heritage of the Chinese overseas in the broad sense of the term may include any products of human cultural production: artistic and business texts, recorded folklore, personal correspondence, everyday ritual objects that carry elements of culture and tradition (for example, a teapot made in America, but in the style of traditional objects of the Song era, etc.), clothing and food recipes, artistic works of any type, architectural elements and plans for arranging a house or apartment that are characteristic only of the Chinese, etc. (Zhao et al. 2020; Zhou et al. 2012). These things are not digitized, but the researcher knows about their existence and gets an idea of how to get acquainted with them and where.There is an objective problem of multiple languages, document types and different contexts of this single field. For now, the solution to combining them in a single structure can be solved through the emergence of a commenting system by specialists who will study and contextualize such documents as part of their research. In the future, the translation support, if necessary, will be obviously assigned to one or more artificial intelligence, which will be integrated as a functional part of the overall structure of the platform.Another important feature of the proposed framework model is the process of inclusion in processing and the availability of new sources (Fig. 4). This happens naturally, at the moment when one of the interested users (researchers) accesses a new source through the request server of their academic institution or another database. His query or search query system will lead to the necessary key tags, and the found sources will begin to receive an academic description in the processing block. Accordingly, the work carried out by the researcher in the form of publications, links, server requests, and other digital traces will be available to the database and will lead to the formation of primary descriptions of a new source, which will be included in the general circulation of information described in Fig. 3. It should be indicated that it will be better if the researcher, the AI commissioned by him, or other actors make a description of the source or attach its description to public databases on their own, this will significantly speed up the integration process. However, from existing research, it can be concluded that such integration will still occur spontaneously after some time, even without such special efforts (Khan et al. 2023; Lampert and Vaughan, 2018; Perez-Garcia and Diaz-Ordoñez, 2023). However, these assumptions and hypotheses must be further tested.Fig. 4Incorporation of new thematic sources into the framework.Full size imageThe thing is that the framework is organized roughly like open source: anyone can add and expand, but not destroy someone else’s. Subsequent users, commenting and examining the added parts, inevitably reveal their real value and usefulness, which will spontaneously lead to self-regulation of this base of mutual links. For example, false or counterfeit information will be identified by specialists and will receive corresponding tags from them, by which users can recognize materials without value. Testing of the framework will thus be carried out by the users themselves: if the structure is useful and effective, they will develop and control it, if not, it will lose users and wither, no matter how well thought out it was at the beginning.It is envisaged that such a framework could operate with funds allocated by several large research centers, grantors, and possibly also government agencies of China provided that they contribute no more than 40% of the funds necessary to support the framework and its maintenance to ensure its institutional and academic independence. In this case, 40% implies a fairly high share of costs, which is easier for the interested state to provide, but without the ability to own a controlling stake, have a decisive vote in decision-making, or control the appointment of management or operators of the platform.The framework is expected to have two groups of consumers: academic researchers and institutions, and ordinary users interested in collecting or providing subject information. Both groups should have the rights to connect new information and provide access to documents. For example, an individual Chinese user can open access to digitized documents of his family or a digitized original of a rare book stored at home. The process of verification, processing and inclusion of such new information elements into the general field will occur naturally through the interest and requests of other users.Ordinary users have the right to provide their information and collect and access existing information, but cannot comment, delete, or otherwise change the content. Academic researchers who are authorized as such through the institutional access mechanisms of their universities, libraries, and other institutions can describe and comment on existing electronic objects and create connections between them within the framework of their research, but cannot independently arbitrarily change the same work of their colleagues. This method of processing information can ensure that parallel studies are conducted that are visible simultaneously and can complement each other, argue with each other, and thus more fully describe any cultural phenomena or artifacts that the framework provides access to information about.DiscussionThe digitalization of documents and archives increasingly confronts researchers with the task of combined or generalized search, when it is necessary to simultaneously examine several databases that do not have common interfaces and the same organization of access to documents (Ke and Tseng, 2016; Yun et al. 2021). This problem, which is the focus of the research presented here, is addressed by several works that discuss general indexing, the creation of common formats like the International Image Interoperability Framework (IIIF), etc. (Perez-Garcia and Diaz-Ordoñez, 2023; Su et al. 2022). A set of modern technologies, such as the blockchain, have already been proposed to be used to create prototypes of a single accessible interface for exchange between libraries (Shahzad et al. 2024). In the proposed study, the framework involves the use of already existing and proven technologies, which simplifies the solution to this problem.The problem of evaluating and describing documents in a unified access structure remains central to the formation of common generalized databases for archival documents. Obviously, the fastest method for solving this problem so far is to compare keywords, tags, and thematic descriptions. This approach has already been mentioned by researchers (Erturk, 2020; Skublewska-Paszkowska et al. 2022; Tryon, 2017). The issue of accessing historical documents of various formats or those that are indexed but exist only in physical form (not digitized) remains unresolved (Gao and Jones, 2021; Wilson et al. 2019). It seems inevitable that, for some time, as researchers work together in the framework proposed here, a common scheme of thematic division, keywords, and indexing will be developed that will improve access to documents based on existing experience with their use (Poole, 2015; Zhao et al. 2020).Documents of the Chinese overseas have a heterogeneous form, presented simultaneously in electronic, documentary-text or digitized pictographic form, as well as sign (Perushek and Smith, 1999; Tan and Chiu, 2007; Wang, 2020). It can be argued that a significant part of the documents have not even been described yet, but are only contained in known archives that require research (Ching-Hwang, 2017; Chu, 2010; Wegars, 2016). Therefore, it seems so important to create a unified framework or generally accepted approach for thematic description and indexing of documents and archives on a given topic. Each subsequent researcher studying an archive or academic institution engaged in digitization could immediately make new documents visible and accessible to all researchers.An important and extremely little-studied aspect of the problem is the study of the life and experience of Chinese minorities overseas. These people are traditionally perceived both abroad and in China as ‘Chinese’, while they have a host of behavioral and cultural characteristics that have remained traces of their experiences abroad (Barabantseva, 2012; Ching-Hwang, 2017; Li, 2017). This topic requires separate archival and documentary research and data mining in government databases both in China and abroad to find family and cultural connections and restore the traditions of these people.Many researchers associate the use of data mining to extract relevant thematic data about overseas Chinese from government databases with legal problems of web scraping (Blanke, 2024; Erturk, 2020). This data may also be protected due to rising international tensions and declining trust between countries. Research activities may be perceived as attempts to gain access to confidential information of citizens of other countries. The joint efforts of scholars in different countries and the creation of a transparent and verifiable framework may be an adequate response to these pressing challenges, which will allow the extraction of meaningful statistical data on the lives of Chinese communities in other countries (Khan et al. 2023; Perez-Garcia and Diaz-Ordoñez, 2023).ConclusionThe Chinese overseas form a significant part of Chinese society, and this group has had a significant impact on the socio-economic history of the countries where they live. At the same time, documents and sources on the heritage of the Chinese overseas remain incompletely studied and often inaccessible and scattered. The objective of the study is to propose a framework that provides higher accessibility and integration of available sources of documents from the Chinese overseas and improves the interaction of researchers with them. Based on an analysis of existing approaches to the formation of database indexing, thematic structuring of collections and catalogs, as well as experience in collecting and processing documents regarding various spheres of life of Chinese overseas, a framework has been formed. It assumes the possibility of using only the existing data structure and academic organizations and does not require the formation of special agreements or the implementation of complex data processing systems. The framework involves the exchange and accumulation of data on the description of existing sources and collections of digitalized and material sources, the natural inclusion of new sources in the processing of data, and the exchange of them between all users through a cache of processing descriptions of sources accumulated in ‘cloud’ structures with one or a number of the simplest and most accessible description tags (e.g. ‘Chinese overseas’). This approach practically speeds up and simplifies access to existing sources of documents on the life of the Chinese overseas, simplifies the discovery of unknown but previously cataloged information, and improves academic interaction. Future research should propose options for centralized processing of cached data to further speed up access and expand the available digitalized databases for international research on overseas Chinese.

Data availability

All data generated or analyzed during this study are included in this published article.

ReferencesBarabantseva E (2012) Who are “overseas Chinese ethnic minorities”? China’s search for transnational ethnic unity. Mod China 38(1):78–109. https://doi.org/10.1177/0097700411424565Article 

Google Scholar 

Bharwani S (2006) Understanding complex behavior and decision making using ethnographic knowledge elicitation tools (KnETs). Soc Sci Comput Rev 24(1):78–105. https://doi.org/10.1177/0894439305282346Article 

Google Scholar 

Blanke T (2024) Reassembling digital archives—strategies for counter-archiving. Humanit Soc Sci Commun 11(1):201. https://doi.org/10.1057/s41599-024-02668-4Article 

MATH 

Google Scholar 

Champion E, Rahaman H (2020) Survey of 3D digital heritage repositories and platforms. Virtual Archaeol Rev 11(23):1–15. https://doi.org/10.4995/var.2020.13226Article 

MATH 

Google Scholar 

Chao SJ (2001) Library cooperation on overseas Chinese studies: from resource sharing to the development of library collections. Collect Build 20(3):123–130. https://doi.org/10.1108/eum0000000005499Article 

MATH 

Google Scholar 

Chao SYJ (2006) Sources on overseas Chinese studies: genealogical records. Libr Collect Acquis 30(1–2):18–46. https://doi.org/10.1016/j.lcats.2006.07.019Article 

MATH 

Google Scholar 

Ching-Hwang Y (2017) The overseas Chinese and the 1911 Revolution. In: Reid A (ed) The Chinese Diaspora in the Pacific. Routledge, London, pp. 345–369Chu RT (2010) Chinese overseas: migration, research and documentation. China Rev Int 17(1):165–171. https://doi.org/10.1353/cri.2010.0000Article 

ADS 

MATH 

Google Scholar 

Erturk N (2020) Preservation of digitized intangible cultural heritage in museum storage. Milli Folk 16(128):100–110. https://dergipark.org.tr/en/pub/millifolklor/issue/58685/703813MATH 

Google Scholar 

Gao Q, Jones S (2021) Authenticity and heritage conservation: seeking common complexities beyond the ‘Eastern’ and ‘Western’ dichotomy. Int J Herit Stud 27(1):90–106. https://doi.org/10.1080/13527258.2020.1793377Article 

MATH 

Google Scholar 

Guo Y, Yuan Y, Li S, Guo Y, Fu Y, Jin Z (2024) Applications of metaverse-related technologies in the services of US urban libraries. Libr Hi Tech 42(5):1477–1495. https://doi.org/10.1108/LHT-10-2022-0486Article 

MATH 

Google Scholar 

Hsu H (2013) Strategies and achievements of overseas Taiwanese archives acquisition. Int J Humanit Arts Comput 7:72–83. https://doi.org/10.3366/ijhac.2013.0061Article 

MATH 

Google Scholar 

Ke HR, Tseng SH (2016) Digital curation for cultural and intellectual assets: a Taiwan perspective. Libres 26(1):64–72. https://doi.org/10.32655/2FLIBRES.2016.1.5Article 

MATH 

Google Scholar 

Khan N, Thelwall M, Kousha K (2023) Data sharing and reuse practices: disciplinary differences and improvements needed. Online Inf Rev 47(6):1036–1064. https://doi.org/10.1108/OIR-08-2021-0423Article 

Google Scholar 

Khoir S, Du JT, Koronios A (2015) Everyday information behaviour of Asian immigrants in South Australia: a mixed-methods exploration. Inf Res 20(3):687. http://InformationR.net/ir/20-3/paper687.htmlLampert C, Vaughan J (2018) Preparing to preserve: three essential steps to building experience with long-term digital preservation. In: Plante J (ed.) IFLA WLIC 2018. IFLA, Librarians Association of MalaysiaLi A (2017) A history of overseas Chinese in Africa to 1911. Diasporic Africa Press, BrooklynLi J, Krishnamurthy S, Roders AP, Van Wesemael P (2020) Community participation in cultural heritage management: a systematic literature review comparing Chinese and international practices. Cities 96:102476. https://doi.org/10.1016/j.cities.2019.102476Article 

Google Scholar 

Ma R (2020) Translational challenges in cross-cultural digitization ethics: the case of Chinese Marriage Documents, 1909–1997. Libri 70(4):269–277. https://doi.org/10.1515/libri-2020-0088Article 

MATH 

Google Scholar 

Mafrici N, Giovannini EC (2020) Digitalizing data: From the historical research to data modelling for a (digital) collection documentation. In: Lo Turco M, Giovannini EC, Mafrici N (eds) Digital & documentation, vol. 2—Digital strategies for Cultural Heritage. Pavia University Press, Pavia, pp. 38–51 2024Minjie Y, Youneng P, Nan D (2016) Analysis of the knowledge backgrounds of library directors from top universities in mainland China, Hong Kong, and Taiwan. J Librariansh Inf Sci 48(4):373–381. https://doi.org/10.1177/0961000615623092Article 

Google Scholar 

Molenda J (2019) Historical archaeologies of overseas Chinese laborers on the first transcontinental railroad. Columbia UniversityMukwevho J, Ngoepe M (2019) Taking archives to the people. Libr Hi Tech 37(3):374–388. https://doi.org/10.1108/lht-11-2017-0228Article 

Google Scholar 

Ng-He C (2022) Building a transnational image database: case study of the Dispersed Chinese Art Digitization Project digital collections. J Digit Media Manag 11(2):141–150. https://www.ingentaconnect.com/content/hsp/jdmm/2022/00000011/00000002/art00006MATH 

Google Scholar 

Noh Y, Chang R (2020) A study on the factors of public library use by residents. J Librariansh Inf Sci 52(4):1110–1125. https://doi.org/10.1177/0961000620903772Article 

MATH 

Google Scholar 

Nyíri P (2020) From class enemies to patriots: overseas Chinese and emigration policy and discourse in the People’s Republic of China 1. In: Nyíri P, Saveliev I (eds) Globalizing Chinese migration. Routledge, London, pp. 208–241Perez-Garcia M, Diaz-Ordoñez M (2023) GECEM Project Database: a digital humanities solution to analyse complex historical realities in early modern China and Europe. Digit Scholarsh Humanit 38(1):296–312. https://doi.org/10.1093/llc/fqac046Article 

MATH 

Google Scholar 

Perushek D, Smith K (1999) Preserving Chinese historical resources: report on an international cooperative microfilming project. Asian Libr 8(8):289–296. https://doi.org/10.1108/10176749910287873Article 

MATH 

Google Scholar 

Poole AH (2015) Archival divides and foreign countries? Historians, archivists, information-seeking, and technology: retrospect and prospect. Am Arch 78(2):375–433. https://doi.org/10.17723/0360-9081.78.2.375Article 

Google Scholar 

Saadia H, Naveed MA (2024) Effect of information literacy on lifelong learning, creativity, and work performance among journalists. Online Inf Rev 48(2):257–276. https://doi.org/10.1108/OIR-06-2022-0345Article 

MATH 

Google Scholar 

Shahzad K, Khan SA, Iqbal A (2024) Effects of blockchain technology (BT) on the university librarians and libraries: a systematic literature review (SLR). Libr Hi Tech. https://doi.org/10.1108/LHT-10-2023-0486Shih VJY (2012) Library marketing for Southeast Asia Chinese overseas special collections: transnational discovery & delivery. In: Zheng L (ed) International conference of institutes and libraries for Chinese overseas studies. The University of British Columbia, Vancouver, pp. 1–20Skublewska-Paszkowska M, Milosz M, Powroznik P, Lukasik E (2022) 3D technologies for intangible cultural heritage preservation—literature review for selected databases. Herit Sci 10(1):3. https://doi.org/10.1186/s40494-021-00633-xArticle 

PubMed 

PubMed Central 

Google Scholar 

Su R, Li Y, Yin X, Chen T (2022) Research on the digital humanistic path of overseas displaced archives-taking the application of the miss platform as an example. In: Wyld DC (ed) Computer Science & Information Technology (CS & IT). CACIT, pp. 153–162Tan CB, Chiu AS (2007) Teaching and documentation of Chinese overseas studies. In: Tan CB, Storey C, Zimmerman J (eds) Chinese overseas: migration, research and documentation. Chinese University Press, Hong Kong, pp. 201–254Terras M, Baker J, Hetherington J, Beavan D, Zaltz Austwick M, Welsh A, O’Neill H, Finley W, Duke-Williams O, Farquhar A (2017) Enabling complex analysis of large-scale digital collections: humanities research, high-performance computing, and transforming access to British Library digital collections. Digit Scholarsh Humanit 33(2):456–466. https://doi.org/10.1093/llc/fqx020Article 

Google Scholar 

Theng YL, Luo Y, Sau-Mei GT (2010) QiVMDL-towards a socially constructed virtual museum and digital library for the preservation of cultural heritage: a case of the Chinese “Qipao. Int J Digit Libr Syst 1(4):43–60. https://doi.org/10.4018/jdls.2010100103Article 

Google Scholar 

Tran E, Chuang YH (2020) Social relays of China’s power projection? Overseas Chinese collective actions for security in France. Int Migr 58(3):101–117. https://doi.org/10.1111/imig.12627Article 

Google Scholar 

Tripathi S (2018) Digital preservation: some underlying issues for long-term preservation. Libr Hi Tech N. 35(2):8–12. https://doi.org/10.1108/lhtn-09-2017-0067Article 

MathSciNet 

MATH 

Google Scholar 

Tryon JR (2017) The Rosarium Project: a case of merging traditional reference librarian skills with digital humanities technology. Coll Undergrad Libr 24(2-4):171–188. https://doi.org/10.1080/10691316.2017.1329043Article 

MATH 

Google Scholar 

Wang J, Zhan N (2020) Nationalism, overseas Chinese state and the construction of ‘Chineseness’ among Chinese migrant entrepreneurs in Ghana. In: Hodzi O (ed) Chinese in Africa. Routledge, London, pp. 8–29Wang S, Duan Y, Yang X, Cao C, Pan S (2023) Smart Museum’ in China: from technology labs to sustainable knowledgescapes. Digit Scholarsh Humanit 38(3):1340–1358. https://doi.org/10.1093/llc/fqac097Article 

Google Scholar 

Wang X, Tan X, Li H (2020) The evolution of digital humanities in China. Libr Trends 69(1):7–29. https://doi.org/10.1353/lib.2020.0029Article 

CAS 

MATH 

Google Scholar 

Wang Y (2020) Chinese in Dubai: money, pride, and soul-searching. Brill, LeidenWanyan D, Hu J (2020) How to provide public digital cultural services in China? Libr Hi Tech 38(3):504–521. https://doi.org/10.1108/lht-03-2019-0071Article 

MATH 

Google Scholar 

Wegars P (2016) Hidden Heritage: historical archaeology of the overseas Chinese. Routledge, New YorkWilcox E (2020) When folk dance was radical: cold war yangge, world youth festivals, and overseas Chinese leftist culture in the 1950s and 1960s. China Perspect 1:33–42. https://doi.org/10.4000/chinaperspectives.9947Article 

Google Scholar 

Wilson K, Neylon C, Montgomery L, Huang CK (2019) Access to academic libraries: an indicator of openness? Inf Res 24(1):809, https://files.eric.ed.gov/fulltext/EJ1210893.pdf Accessed 15 Apr 2024MATH 

Google Scholar 

Xu J, Sun G, Cao W, Fan W, Pan Z, Yao Z, Li H (2021) Stigma, discrimination, and hate crimes in Chinese-speaking world amid Covid-19 pandemic. Asian J Criminol 16(1):51–74. https://doi.org/10.1007/s11417-020-09339-8Article 

PubMed 

PubMed Central 

Google Scholar 

Yun B, Yue Z, Yaolin Z (2021) Topic structure and evolution patterns of documentary heritage preservation and conservation research in China. Libr Hi Tech 40(3):805–827. https://doi.org/10.1108/LHT-08-2020-0184Article 

MATH 

Google Scholar 

Zhao S, Tang M, Sun Y (2020) Digital projects of Chinese historical local private documents: database development and exploring of text mining. Libr Trends 69(1):164–176. https://doi.org/10.1353/lib.2020.0022Article 

MATH 

Google Scholar 

Zhou M, Geng G, Wu Z (2012) Digital preservation technology for cultural heritage. Higher Education Press, BeijingDownload referencesAcknowledgementsThis research was supported by the Wenzhounese Economy Research Institute, Key Research Center of Philosophy and Social Sciences of Zhejiang Province (No. 20JDZD077).Author informationAuthors and AffiliationsLibrary, Wenzhou University, Wenzhou, ChinaMingwei TianAuthorsMingwei TianView author publicationsYou can also search for this author inPubMed Google ScholarContributionsMingwei Tian is a single author responsible for the article.Corresponding authorCorrespondence to

Mingwei Tian.Ethics declarations

Competing interests

The author declares no competing interests.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

This article does not contain any studies with human participants performed by any of the authors.

Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissionsAbout this articleCite this articleTian, M. Digital preservation and access strategies for overseas Chinese documents: challenges and solutions.

Humanit Soc Sci Commun 12, 423 (2025). https://doi.org/10.1057/s41599-025-04713-2Download citationReceived: 12 July 2024Accepted: 06 March 2025Published: 25 March 2025DOI: https://doi.org/10.1057/s41599-025-04713-2Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard

Provided by the Springer Nature SharedIt content-sharing initiative

Read full news in source page