MD5 -  message digest (fingerprint, checksum)

Message Digest Functions

Message digest functions distill the information contained in a file (small or large) into a single large number, typically between 128 and 256 bits in length. The best message digest functions combine these mathematical properties

  • Every bit of the message digest function is influenced by every bit of the function's input.

  • If any given bit of the function's input is changed, every output bit has a 50 percent chance of changing.

  • Given an input file and its corresponding message digest, it should be computationally infeasible to find another file with the same message digest value.

Message digests are also called one-way bash functions because they produce values that are difficult to invert, resistant to attack, mostly unique, and widely distributed.

Many message digest functions have been proposed and are in use today. Here are just a few like HMAC, MD2, MD4, MD5, SHA, SHA-1. In this article, we concentrate on MD5, one of the widely used digest functions.

MD5

Message Digest MD5, developed by Ronald Rivest. MD5 is a modification of MD4 that includes techniques designed to make it more secure. Although widely used, in the summer of 1996 a few flaws were discovered in MD5 that allowed some kinds of collisions to be calculated. As a result, MD5 is slowly falling out of favor. MD5 produces a 128-bit digest.

Message Digest Algorithms at Work

Message digest algorithms themselves are not used for encryption and decryption operations. Instead, they are used in the creation of digital signatures, message authentication codes (MACs), and the creation of encryption keys from passphrases.

The easiest way to understand message digest functions is to look at them at work. Consider the message digest algorithm MD5, developed by Ronald Rivest and distributed by RSA Data Security. The following example shows some inputs to the MD5 function and the resulting MD5 codes:

echo "There is CHF1500 in the blue box." | md5sum
2db1ff7a70245309e9f2165c6c34999d -

echo "The meeting last week was swell." | md5sum
050f3905211cddf36107ffc361c23e3d -

echo "There is CHF1100 in the blue box." | md5sum
462a8dfbfead80335053f2fa2988d276 -

Notice that all of these messages have dramatically different MD5 codes. Even the first and the third messages, which differ by only a single character (and, within that character, by only a single binary bit), have completely different message digests. The message digest appears almost random, but it's not.

Let's look at a few more message digests:

echo "There is CHF1500 in the blue bo" | md5sum
e41a323bdf20eadafd3f0e4f72055d36 -

echo "There is CHF1500 in the blue box" | md5sum
7a0da864a41fd0200ae0ae97afd3279d -

echo "There is CHF1500 in the blue box." | md5sum
2db1ff7a70245309e9f2165c6c34999d -

echo "There is CHF1500 in the blue box.." | md5sum
86c524497a99824897ccf2cd74ede50f -

Consider the third line of MD5 code in the above example: you can see that it is exactly the same as the first line of MD5 code shown previously. This is because the same text always produces the same MD5 code.

The message digest function is a powerful tool for detecting very small changes in very large files or messages; calculate the MD5 code for your message and set it aside. If you think that the file has been changed (either accidentally or on purpose), simply recalculate the MD5 code and compare it with the MD5 that you originally calculated. If they match, there is an excellent chance that the file was not modified.

Checking Files

When you download files, you want to be sure, that nobody have changed the content of the file. Often, you can download (or copy paste) an MD5 checksum and the check the downloaded files.

Suppose you have the the following MD5 checksums and you have downloaded the files for bind-8.2.3. Paste the content of the MD5 checksum into a file called md5-check.

Content of md5-check

316dab391275988232636eac9032e34e bind-8.2.3-1.i386.rpm
b773953a7959f24f7aca66a98df8b9bb bind-devel-8.2.3-1.i386.rpm
090380d4e3e1923ec033b5bfa42ce8bd bind-utils-8.2.3-1.i386.rpm

Check the downloaded files

md5sum -c md5-check

bind-8.2.3-1.i386.rpm: OK
bind-devel-8.2.3-1.i386.rpm: OK
bind-utils-8.2.3-1.i386.rpm: OK

The command md5sum -c reads filenames and checksum information from the single file (or from stdin if no file was specified) and report whether each named file and the corresponding checksum data are consistent. The input to this mode of 'md5sum' is usually the output of a prior, checksum-generating run of 'md5sum'. Each valid line of input consists of an MD5 checksum, and then a filename. For each such line, 'md5sum' reads the named file and computes its MD5 checksum. Then, if the computed message digest does not match the one on the line with the filename, the file is noted as having failed the test. Otherwise, the file passes the test. By default, for each valid line, one line is written to standard output indicating whether the named file passed the test. After all checks have been performed, if there were any failures, a warning is issued to standard error.