Dennis Faas's picture

A checksum is a form of redundancy check, a very simple measure for protecting the integrity of data by detecting errors in data that is sent through space (telecommunications) or time (storage). It works by adding up the basic components of a message, typically the bytes, and storing the resulting value. Later, anyone can perform the same operation on the data, compare the result to the authentic checksum, and (assuming that the sums match) conclude that the message was probably not corrupted.

The simplest form of checksum, which simply adds up the bytes in the data, cannot detect a number of types of errors. In particular, such a checksum is not changed by: reordering of the bytes in the message; inserting or deleting zero-valued bytes; or, by multiple errors that cancel out each other.

More sophisticated types of redundancy check, including Fletcher's checksum, Adler-32, and cyclic redundancy checks (CRCs), are designed to address these weaknesses by considering not only the value of each byte but also its position. The cost of the ability to detect more types of error is the increased complexity of computing the checksum.

These types of redundancy check are useful in detecting accidental modification such as corruption to stored data or errors in a communication channel. However, they provide no security against a malicious agent as their simple mathematical structure makes them trivial to circumvent. To provide this level of integrity, the use of a cryptographic hash function, such as SHA-256, is necessary. (Collisions have been found in SHA-1, currently the most popular choice, but there is no evidence as of 2005 that SHA-256 suffers similar weaknesses).

On UNIX there is a tool called "cksum" that generates both a 32 bit CRC and a byte count for any given input file.

This article is adapted from:

Rate this article: 
No votes yet