Team Digital Preservation
In June 2022 the IT Department established a team dedicated to preserving the National Library’s digital collection. This team handles all kinds of digital content, whether it’s digitized from physical sources or born digital. This includes media types like web pages, text documents, images, audio, and moving images.
The team’s responsibilities involve ingesting, checking, storing, preserving, and providing access to high-quality digital files. We work closely with several other specialized media teams in the library. In addition we are members of the Digital Preservation Coalition (DPC).
Organisation
The Digital Preservation team consist of 6 members:
This team reports to a committee of leaders responsible for this area in the National Library. The members are:
- IT Director (Product owner)
- Director of Digitalizing Cultural Heritage
- Head of Metadata Standards Development Section
- Head of IT Platform Section
The National Library’s digital collection in numbers
- Over 2 billion files
- More than 90 different file formats
- 15 Petabytes of data (that’s 15,000 Terabytes!) stored in 3 copies
- The largest single file is 2.5 Terabytes
- Daily ingest of new material averages over 4 Terabytes
Data volume by type
- Video and television: 22%
- Film: 21%
- Newspapers: 19%
- Web Archive: 16%
- Radio and audio: 12%
- Books: 8%
- Photos: 2%
Technology choices used when working with digital preservation
- Apache Kafka for sending messages between systems
- Apache NiFi for running the data flows that validate, move, and package data
- MariaDB as the database engine
- DROID for identifying fileformats
- Grafana for statistics and reporting
- IBM High Performance Storage System (HPSS) as bit repository
- GlusterFS for shared temporary storage
- CentOS Linux as server platform