What is an MD5 or SHA1 checksum?

Q:What is an MD5 or SHA1 checksum? What is it used for?
A:

A good number of websites, especially those that provide large software downloads like Apple and Microsoft, have begun providing what looks like a long string of random numbers and letters with the title of either MD5 or SHA1. MD5 (Message Digest rev. 5) and SHA1 (Secure Hash Algorithm rev. 1) are both commonly used One-way Cryptographic Hash Functions.


So what does that mean?

So let’s start by trying to define what a “function” is. At their core, all computers currently available for consumer use operate in the world of Binary. This means that every song and photo on your computer is nothing but a series of 1s and 0s. Computers are very good at doing calculations very quickly and if you remember your algebra classes, numbers can be passed through functions that will output a resulting value based on the characteristics of the function. For example, if I have a “file”, let’s say it is “20” and I want to run it through a function called HASH that multiples the file contents by 10. The function HASH could be viewed like this:

HASH(x) = 10 * x

…where “x” is our source file.

Running our file “20” through our HASH algorithm will result in the HASHED output “200”.

Cryptographically speaking, our example function is not a very good way of protecting data as it would be very simple to discover how the hashing function works with a very small sample set of data.

In the case of these “One-way” Cryptographic Hash Functions, the function is a super complicated one and is well outside the realm of this article. The One-way part of a “One-way Hashing Algorithm” indicates that the function is constructed in a way that it only works in one direction and it is effectively impossible to try to figure out the source file “x” from the result of the function HASH(x).

There are all sorts of reasons why Cryptography like this is important. This sort of function is used all over the web. Most recognizably, when a user creates a password for an account on a website, the password itself (your pet’s name “spike”) doesn’t get stored as it is entered by the user. This would not be a good method of security because anyone who has access to maintain the database could get access to that password. Instead, it gets run through one of these Hashing algorithms that produces a bunch of seemingly random unintelligible alphanumeric characters:

73cc33b96ddcddc98995c569e3a0bca29451c8a8

Since these functions are so complicated, there is no way to get “spike” out of the Hash. To protect your password from anyone that might have access to the database, the Hash gets stored there along with your username rather than the word “spike”. This is why, customer service representatives for web based companies can’t provide your password to you, it isn’t possible for them to do so. They can only change your password to something else and send that new value to you. However, when you navigate to the site and enter your username followed by the password “spike”, the web server runs your entered password through the same function that calculated the SHA1 hash of the original password as you entered it when you setup your account. If this newly calculated value matches what is stored in the database then you are allowed to access the site. If, instead of entering “spike”, you accidentally enter “Spike”, the computer calculates the SHA1 hash of that as:

9af639c8fb1e08bdf232ce4d9bf46c0e73e41cd0

Clearly our original hash of “spike”:

73cc33b96ddcddc98995c569e3a0bca29451c8a8

…doesn’t match the hash of “Spike”:

9af639c8fb1e08bdf232ce4d9bf46c0e73e41cd0

As a result, you aren’t able to login until you enter the password that results in the correct match to the hash stored in the database.

In the case of files being downloaded, a simple text file called “hello.txt” with nothing but the word “hello” can be represented in binary like this:

01101000 01100101 01101100 01101100 01101111

If you don’t believe me, you can try it yourself. You can copy the above binary information, CLICK THIS LINK, paste that set of numbers into the box labeled “Binary Value”, click the “Convert” button and see that the output is “hello”.

The file “hello.txt” when processed through SHA1 outputs this string:

aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d

If you were to click the above link and download it to your desktop, you could use one of several hash calculators, including one that I created using “AppleScript” and you could verify that my SHA1 hash as listed above is accurate for the file that you downloaded. If those two values didn’t match, it is because the file had been changed between the file server and your “Downloads” folder.

While this doesn’t sound like a big deal for a simple text file, the same combination of steps can be used when you download a large software installer. If the MD5 or SHA1 hash of the file that you downloaded matches the one provided by a good quality developer, you can be pretty much guaranteed that the file is safe to use.