ARKEY - A content-enriched user-centred access key to digital archives
The ARKEY Project involves on the one hand the National Archives and State Archives in the Provinces (in short the State Archives, or AGR), a federal scientific institution, and on the other hand the Université catholique de Louvain (UCLouvain).
The main objective of the ARKEY research profile is to improve the digital valorisation of archive collections through long-term tools. It involves (1) the research and development of an enhanced access key to digitized content, and (2) the improvement of the navigation experience within archive collections. It builds on the expertise of a multidisciplinary team from AGR and from several research groups within UCLouvain (MiiL, Cental, UCLouvain Archives Service, GEMCA). ARKEY aims to bring added value for society and public service, by improving the accessibility and intelligibility of archives: a priority for many researchers and a foundation of democratic states.
Currently, AGR and UCLouvain own a large number of digitized documents from a wide variety of sources from different periods. This diversity offers a challenge to automated content analysis, particularly to optical character recognition (OCR) tools, which are not trained to such variation. Archives are also challenged by storage, format, metadata, and navigation of digitized documents: most of these documents are not sufficiently spotlighted. To respond to these challenges, ARKEY proposes a 3-step plan:
1. AI-aided text and layout recognition. ARKEY will develop and evaluate semi-automated content-analysis machine-learning techniques, specifically designed for handwritten documents and early printed books. They will rely on state-of-the-art OCR and Handwritten Text Recognition (HTR) methods, and focus on information extraction based on robust layout analysis.
2. Content-enriched digital archival representation. The data extracted from the content analysis will be used to enrich the representation of archive documents. This second challenge therefore aims to investigate and improve state-of-the-art Natural Language Processing methods to enrich Encoded Archival Description (EAD) files with automatically generated metadata based on semantic modeling, named entity recognition, and query expansion.
3. User-oriented and context-aware navigation. The third challenge of ARKEY is to allow archive users to effortlessly benefit from the content-enriched archive description described in the previous sections and, more generally, to improve their navigation experience within the archives. It implies the implementation of a user-oriented design method to elaborate efficient finding aids and visualization tools. In particular, we will contribute to alleviate the 2 following issues: (1) the lack of understanding of the available archival representations and the way they relate to each other, and (2) the difficulty of translating an initial question into a specific search and navigation scenario.
Promoters: Pr. Antonin Descampe (UCLouvain) & Pr. Eddy Put (AGR)
Partners: Dr Louise-Amélie Cougnon, Pr. Aurore François, Pr. Agnès Guiderdoni, Pr. Suzanne Kieffer, Dr Patrick Watrin
Researcher: Dr. Xavier Gillard
Duration: 2023-2033