- PRONOM is a registry of file format information maintained by The National Archives, UK (TNA). PRONOM has an open contribution model but is curated by the teams at TNA.
- First published on the web in 2004.
- Contains narrative information about a file format, relevant to digital preservation.
- Strength however is that it provides a centralized mechanism for the tool DROID to download file format ‘signatures’ from.
- Signatures are encoded using PRONOM regular expression syntax and allow DROID to continuously be updated.
- Tools such as Siegfried can also use these signatures, and as such, PRONOM is a de-facto standard for this type of information in the digital preservation community.
- DROID stands for Digital Record and Object Identification
- Developed and maintained by The National Archives, UK
- Identifies the file formats in a collection of files based on their binary content
- Updated using the PRONOM registry which means it can remain up-to-date with modern signatures for the identification of digital files
- Can be used from the command line
- Programmatically, tools like NANITE wrap DROID to enable easier integration with code
- The Nanite project builds on DROID and Apache Tika to provide a rich format identification and characterization system.
- It aims to make it easier to run identification and characterisation at scale, and helps compare and combine the results of different tools.
- Nanite provides an API (application programming interface) for DROID where DROID currently doesn’t have an open (easy to work with) API.
- Nanite was developed by Andy Jackson of the British Library and the UK Web Archive.
- JHOVE uses the concepts of well-formedness and validity to return statistics about different file formats.
- 14 different file formats including PNG, TIFF, and WAVE Audio.
- JHOVE can tell users whether their file is well-formed; valid; both; or neither.
- Rules which determine this are encoded in the tool and have been elicited from the file format specifications for those file types.
- JHOVE is maintained by the Open Preservation Foundation.
- It was originally developed by Harvard University Libraries (HUL) and Gary McGath.
- Siegfried is a tool that performs a similar function to DROID.
- Developed independently by Richard Lehane of State Archives New South Wales (SANSW/SRNSW).
- Free and Open Source
- Incorporates a number of file format signature and identification mechanisms; not just that of PRONOM.
- Developed initially using the PRONOM specification and can use all of PRONOM's signatures.
- A companion tool for Siegfried that allows the data source (signature file) used by Siegfried to be customized.
- Customizing includes the option of not using the entire corpus of signatures available to it.
- An example might include creating a singature file to just identify images, e.g. as a result of a digitization workflow.
- A smaller singature file can, theoretically, be quicker than the entire corpus.
- A command on Linux that uses a combination of magic numbers and heuristic to determine a file’s format and version.
- The original File command was included in Unix version 4, 1973.
- The modern File command is a variant on those early versions, and is still updated and maintained.
- The primary point of difference between File and tools like DROID is the lack of unique identifier for a format type.
- A unique identifier could potentially make it easier to use the tool's output in a digital preservation system.
- Utility for transferring data maintaining key file system properties such as modified date, and user permissions.
- Found on Linux but can also be uses in Windows through various mechanisms.
- A good combination of flags to use in rsync to preserve important metadata may be:
- Open-source digital repository designed for preservation.
- RODA is a complete digital repository solution that delivers functionality for all the main functional units of the OAIS reference model.
- Capable of ingesting, managing and providing access to the various types of digital content produced by large corporations or public bodies.
- RODA is based on open-source technologies
- Supported by existing standards such as the Metadata Encoding and Transmission Standard (METS), Encoded Archival Description (EAD), Dublin Core (DC) and PREMIS (Preservation Metadata).
- USB controller and write blocker for legacy floppy disk drives.
- Can enable our use of 3.5-inch and 5.25-inch disk drives.
- Can read the flux of a disk - the magnetic signals that are then converted to binary information.
- 8-inch disk drive support is feasible but the community has yet to find a consistent and reliable solution.
- Alternatives to KryoFlux are available such as the SuperCard Pro.
- A project led by the Los Alamos National Laboratory and Old Dominion University.
- Rather than expecting people to know about the growing number of Web archives, and to guess which archive might hold an older version of the resource they’re looking for, Memento proposes to make archived content discoverable via the original URL that the searcher already knew about.
- Memento is an attempt to permit users to view any web page as it looked on a given date in the past.
- Memento does this by utilizing the time dimension built into the content negotiation capabilities of the HTTP protocol.
- BagIt is a hierarchical file packaging format designed to support disk-based storage and network transfer of arbitrary digital content.
- A "bag" consists of a "payload" (the arbitrary content) and "tags", which are metadata files intended to document the storage and transfer of the bag.
- A required tag file contains a manifest listing every file in the payload together with its corresponding checksum.
- A bag can potentially be used as a SIP (Submission Ingest Package) in a digital preservation workflow.
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability.
Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata.
OAI-PMH is a set of six verbs or services that are invoked within HTTP.