For more on Fixity and checksums, please read the DPC Handbook section on Fixity.
Fixity is a term commonly used in digital preservation when talking about digital files and bitstreams. Fixity means the state of being unchanged or permanent. Confirming a digital file's fixity means that it has remained the same over time. Often this process of confirming is called fixity checking or integrity checking. This process will verify that a digital object has not been altered or corrupted.
The most common way to confirm the fixity of a digital object is to create what is known as a checksum or hash for each individual file or in some cases, bitstream (mainly for audiovisual works). A checksum is a string of numbers and letters generated using a mathematical algorithm. A checksum is like a digital fingerprint for a file, because it will be unique for each file.
The most common checksum algorithms used in digital preservation are: MD5, SHA-256 and SHA-1. However, there are others and they go in and out of use over time. It is important to know what algorithm was used to to generate the checksum for a digital file as they are not interoperable.
By monitoring a file's integrity from as early on as possible, any loss or corruption to that file may be detected. However, a checksum has its limits. While a mismatch of checksums during fixity checking may flag that a file's checksum has changed, it cannot diagnose the problem with the file. It can only say there was one. It will be up to you to investigate further.
There are a number of programs listed in the COPTR tool registry that can generate checksums and verify file fixity. Some of the common tools are:
Aside from verifying that file fixity has been maintained while the file is being stored, checksums have three other main uses:
Below are some examples of what various checksums look like for the following image.
Image By Walter Heubach (German, 1865–1923) (Upload: User:Jarlhelm) [Public domain], via Wikimedia Commons
File name: Heubach_cat.jpg
Md5: 6d5b04d33455ac13a2291216e5b552a2
Sha-1: 1a26f9ce33857a5c742877aa8de982968d87f67b
Sha-256: 06a67229b29321064ab6b83cd3fce40bc8079666a1197d324e8f2ce28dd24dff
Data integrity is important to digital objects. It is about ensuring the maintenance and consistency of the data throughout its lifecycle. Maintaining fixity is a critical part of data integrity.
Other aspects include managing relationships between data and maintaining metadata for contextual purposes.
Fixity can be recorded using the PREMIS metadata standard. It is referred to as a message digest, which is just another term for a checksum. It can record not only the checksum, but the algorithm that created it as well as the software and version. Any subsequent fixity checks can also be recorded using PREMIS, including the outcome of the check.
Recording this type of preservation metadata is crucial for confirming and establishing a digital object's "chain of custody".