PRONOM is a registry of file formats that is maintained by The National Archives, UK.
PRONOM delivers new file format information to a tool called DROID which it can use to identify files in collections and assign a unique identifier.
DROID is a tool that can be used to automatically identify a file's format using 'file format signatures' that it downloads from PRONOM.
DROID assigns a unique identifier to a file format called a PUID (PRONOM Unique Identifier)
Nanite is a programming library that wraps DROID in a way that makes it possible for software developers to incorporate file format identification in their programs.
JHOVE is a tool that checks whether a file format accurately conforms to its specification, for example, it can check whether date formats used in certain file types are standardized.
JHOVE can do this for approximately 12 formats, but the software makes it easy for more to be programmed.
Siegfried is a DROID like, command-line tool that also uses PRONOM information to identify file formats.
Siegfied is approximate to DROID, but adds other mechanisms to identify file formats and alternative ways for users to interact with it.
File is a linux based tool for identifying file formats. Unlike DROID and Siegfried it does not return unique identifiers for what it finds.
FIle uses a different mechanism and different corpus of information to identify the format of a digital object.
An online resource of community contributed ‘recipes’ (commands) for processing audio visual files through the open source audio visual transcode and characterization tool ffmpeg.
- A free and open source tool for working with audio and video.
- ffmpeg can characterize multimedia, even output visual analyses.
- ffmpeg can transcode it into other file formats, and perform many other manipulations.
- Developed and maintained by the ffmpeg team.
A utility for transferring data across file systems while maintaining key file system properties such as last-modified date, and user's permissions.
Not strictly for digital preservation, but useful nonetheless, explainshell.com will annotate Linux commands for users and enables those annotations to be shared.
A free and open source tool for the validation of PDF/A files. Vera PDF Provides some support for other PDF variants.
A tool written in Python that characterizes JPEG2000 (JP2) files. Important in digitization workflows where JP2 is now taking a place for the savings in storage space over TIF.
A tool by Martin Hoppenheit to reduce the number of signatures in the DROID signature file, e.g. for the purpose of quicker identification in image format only digitization workflows.
A large scale, OAIS (Open Archival Information System) compliant, system that implements large pieces of the digital preservation workflow from ingest to delivery. Rosetta is maintained by the company Ex Libris.
RODA is an open-source digital repository designed for preservation developed in Portugal. The repository supports all the main functional components of the OAIS model.
Open source digital preservation system maintained by Artefactual. Younger than Preservica and Rosetta, Archivematica has a growing user-base, and a different support model to the two mentioned.
Originally called Safety Deposit Box, Preservica is an OAIS compliant digital preservation system maintained by Preservica in Abingdon, Oxford, UK.
Safety Deposit Box
The first four implementations of the Preservica digital preservation system went under the name Safety Deposit Box, organisations such as The National Archives, UK, and Swiss Federal Archive, were some of the first to adopt this system.
A tool maintained by the Apache Software Foundation capable of extracting metadata and content from a range of file formats including PDF, Microsoft Office, Rich Text Format, and XML.
A registry of digital forensics tools and training courses developed in 2016 that will prove useful for finding tools for dissecting and interpreting digital files for preservation and access.
Just Solve the File Format Problem
A wiki style registry of file formats that can be edited by all users. It differs from PRONOM in the regard that anyone can add information, and so it is a good idea to submit something to this wiki first, or in concert with PRONOM, for the benefit of the community.
Just Solve It, is an initiative of the Internet Archive.
Forensics hardware that blocks the ability to write to a storage device, thus protecting data and its evidentiary value. Write blocking tools are available from companies such as Tableau and Wiebetech.
USB controller and write blocker for legacy floppy disk drives. It allows us to use 3.5-inch and 5.25-inch disk drives on modern computer hardware.
A USB controller and write blocker for legacy floppy disk drives, specifically 3.5-inch and 5.25-inch disk drives. One of a handful of alternatives to KryoFlux.
A useful way for those in digital preservation to connect with the community. An active forum with lots of branches out to other resources.
- A portal, search engine, and API that connects metadata about content at Australian GLAM institutions.
- TROVE makes this information findable.
- Trove is a collaboration between National Library, Australia's State and Territory libraries.
- A technical registry that describes tools useful for long term digital preservation.
- Acts primarily as a finding and evaluation tool to help practitioners find the tools they need to preserve digital data.
- COPTR collates this knowledge in one place instead of organisations competing against each other with their own registries.
- 'Twitter’ archiving (twarc) is a command line tool and Python library for archiving Twitter JSON data.
- Each tweet is represented as a JSON object that is exactly what was returned from the Twitter API.
- In addition to letting you collect tweets Twarc can also help you collect data on users, and trends.
- Also known as ISO 28500:2009.
- A standardised file format for storing the result of a web crawl – the output of a web archiving effort.
- WARC files many aggregate WARC records.
- WARC can encode any other file format – as you’d expect of any potential digital object on the web.
A search engine, and API for the archived web. Hosted by the Internet Archive, based in San Francisco.