Cryptography hash method MD5 (Message Digest 5) explanation made easy with Python

10 min readMay 8, 2021

Stuart Lippincott — http://www.instagram.com/stuz0r — October Everyday Project https://www.behance.net/gallery/44523987/October-Everyday-Project

Continuing my serie of python step-by-step tutorials to understand MD hash algorithm i’m now exploring MD5, the success of my previous two tutorials on MD2 and MD4 keep me excited continuing the adventure.

Still, managing expectations, this occurrence will be much reduced to my previous article. The main reason behind it being the very high resemblance between MD4 and MD5 hence I’ll build mainly on the difference between the two and use the same ‘chassis’ as a code.

So for those who missed the two previous article, I recommend you to go through the two below ones:

Especially on the MD4 one as multiple references will be made to it.

MD5 an additional complication layer to MD4

Interestingly when you check the MD5 RFC (RFC 1321 available here https://www.ietf.org/rfc/rfc1321.txt), it’s the first MD algorithm that has a part stating differences between its previous version.

The reason is the very high similarities between MD5 and MD4, for that reason I will make reference to my previous article instead of creating a set of redundancies.

MD5 algorithm

As for its previous version 5 steps are followed:

1. padding
2. append message length
3. MD buffers initialization
4. 16-words block processing
5. output

For the step 1-2–3 I will ask you to check the details of it in my MD4 article (https://nickthecrypt.medium.com/cryptography-hash-method-md4-message-digest-4-explained-with-python-f201b74f51d)

The code will be as follow (just to mention, it’s by far the scene I love the most from the Matrix):

# define the messagemsg = 'You know, I know this steak doesn\'t exist. I know that when I put it in my mouth, the Matrix is telling my brain that it is juicy and delicious. After nine years, you know what I realize? Ignorance is bliss.'print('---\r\n'+msg+'\r\n---')# translate text message into array of msg's character ascii codesmsg = [ord(c) for c in msg]# =========================== STEP 1 - paddingmsglen = len(msg)index = msglen % 64if index < 56:msg = msg + PADDING[0:(56-index)]else:msg = msg + PADDING[0:(120-index)]# =========================== STEP 2 - append message length in bitsmsg = msg+list((msglen*8).to_bytes(8, 'little'))# =========================== STEP 3 - init wordsword_A = 0x67452301word_B = 0xefcdab89word_C = 0x98badcfeword_D = 0x10325476

Again, nothing new here, now let’s now look into how differently MD5 makes us treat the message.

MD5 step 4 — process each 512 bits block

It’s important to understand what a 16-word is, how is it treated, etc… => again MD4 article.

We will first change the fundamental fonctions we are using. In MD4 these are the function which were used as a base of the algorithm:

F: if X then Y else Z
G: if at least two of X, Y, Z then ‘1’ else ‘0’
H: X xor Y xor Z

In MD5 we move to the following base functions:

F: if X then Y else Z
G: if Z then X else Y / the change is meant to be less symmetric then it’s previous version (did not wanted the capture that big sorry for that), if looking at the old-G table we feel a sense of symmetry between the end value end z for example. It’s hard for me to determine what exactly ‘more’ symmetry means in that spirit (happy to get any insight)

Update of the G function for a less ‘symmetric’ one

H: X xor Y xor Z
I: Y xor (X or not Z) / new function here, in correlation with the addition of a new round as per below

I will let you look into the process of encoding the 512 bits (64 bytes or chars into an array of 16 long integers) on the MD4 article.

Adding as well a function to rotate the bits, (not standard as it’s a circular rotation — very common in cryptography — again article MD4 for more details).

Here is the code of the base functions:

def fun_F(x, y, z):return (((x) & (y)) | ((~x) & (z)))def fun_G(x, y, z):return (((x) & (z)) | ((y) & (~z)))def fun_H(x, y, z):return ((x) ^ (y) ^ (z))def fun_I(x, y, z):return ((y) ^ ((x) | (~z)))def fun_rotleft(x, n):return (((x) << (n)) | ((x) >> (32-(n))))def decode_block(raw):proecssdblock = []for i in range(0, len(raw), 4):proecssdblock.append((raw[i]) | ((raw[i+1]) << 8) | ((raw[i+2]) << 16) | ((raw[i+3]) << 24))return(proecssdblock)

Now let’s look into 512 bits (64 octets) block processing and let me highlight that here we have three major changes versus MD4:

1. addition of a new round which involve the newly created base function I
2. addition of a dedicated variable for each step, pasting here the table of constants (I didn’t go through the programmatic construction of the array myself but it should consist in the following T[i] = integer part of (4294967296 * abs(sin(i in radian))) )

T = [   0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee,0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501,0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be,0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821,0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa,0xd62f105d, 0x2441453,  0xd8a1e681, 0xe7d3fbc8,0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed,0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a,0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c,0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70,0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x4881d05,0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665,0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039,0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1,0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1,0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391]

3. IMPORTANT each steps add the result of the previous step !!! ( to increase the avalanche effect).

So, apologies for the long indigest ( :-) ) code, but each 4 rounds on each block pass look like this, as for the MD4 the processing itself has low to none interest into in-depth detail — or its logic surpasses my abilities (which is possible, and then I’m happy to get the bigger picture from any reader that would have it):

for i in range(math.ceil(len(msg) / BLOCK_SIZE)):block = decode_block(msg[i*BLOCK_SIZE:(i+1)*BLOCK_SIZE])AA = word_ABB = word_BCC = word_CDD = word_D# STEP 4 - ROUND 1word_A = (word_B + fun_rotleft((word_A + fun_F(word_B, word_C, word_D) + block[0] + 0xd76aa478) % 0x100000000 , 7))word_D = (word_A + fun_rotleft((word_D + fun_F(word_A, word_B, word_C) + block[1] + 0xe8c7b756) % 0x100000000 , 12))word_C = (word_D + fun_rotleft((word_C + fun_F(word_D, word_A, word_B) + block[2] + 0x242070db) % 0x100000000 , 17))word_B = (word_C + fun_rotleft((word_B + fun_F(word_C, word_D, word_A) + block[3] + 0xc1bdceee) % 0x100000000 , 22))word_A = (word_B + fun_rotleft((word_A + fun_F(word_B, word_C, word_D) + block[4] + 0xf57c0faf) % 0x100000000 , 7))word_D = (word_A + fun_rotleft((word_D + fun_F(word_A, word_B, word_C) + block[5] + 0x4787c62a) % 0x100000000 , 12))word_C = (word_D + fun_rotleft((word_C + fun_F(word_D, word_A, word_B) + block[6] + 0xa8304613) % 0x100000000 , 17))word_B = (word_C + fun_rotleft((word_B + fun_F(word_C, word_D, word_A) + block[7] + 0xfd469501) % 0x100000000 , 22))word_A = (word_B + fun_rotleft((word_A + fun_F(word_B, word_C, word_D) + block[8] + 0x698098d8) % 0x100000000 , 7))word_D = (word_A + fun_rotleft((word_D + fun_F(word_A, word_B, word_C) + block[9] + 0x8b44f7af) % 0x100000000 , 12))word_C = (word_D + fun_rotleft((word_C + fun_F(word_D, word_A, word_B) + block[10] + 0xffff5bb1) % 0x100000000 , 17))word_B = (word_C + fun_rotleft((word_B + fun_F(word_C, word_D, word_A) + block[11] + 0x895cd7be) % 0x100000000 , 22))word_A = (word_B + fun_rotleft((word_A + fun_F(word_B, word_C, word_D) + block[12] + 0x6b901122) % 0x100000000 , 7))word_D = (word_A + fun_rotleft((word_D + fun_F(word_A, word_B, word_C) + block[13] + 0xfd987193) % 0x100000000 , 12))word_C = (word_D + fun_rotleft((word_C + fun_F(word_D, word_A, word_B) + block[14] + 0xa679438e) % 0x100000000 , 17))word_B = (word_C + fun_rotleft((word_B + fun_F(word_C, word_D, word_A) + block[15] + 0x49b40821) % 0x100000000 , 22))# STEP 4 - ROUND 2word_A = (word_B + fun_rotleft((word_A + fun_G(word_B, word_C, word_D) + block[1] + 0xf61e2562) % 0x100000000 , 5))word_D = (word_A + fun_rotleft((word_D + fun_G(word_A, word_B, word_C) + block[6] + 0xc040b340) % 0x100000000 , 9))word_C = (word_D + fun_rotleft((word_C + fun_G(word_D, word_A, word_B) + block[11] + 0x265e5a51) % 0x100000000 , 14))word_B = (word_C + fun_rotleft((word_B + fun_G(word_C, word_D, word_A) + block[0] + 0xe9b6c7aa) % 0x100000000 , 20))word_A = (word_B + fun_rotleft((word_A + fun_G(word_B, word_C, word_D) + block[5] + 0xd62f105d) % 0x100000000 , 5))word_D = (word_A + fun_rotleft((word_D + fun_G(word_A, word_B, word_C) + block[10] + 0x2441453) % 0x100000000 , 9))word_C = (word_D + fun_rotleft((word_C + fun_G(word_D, word_A, word_B) + block[15] + 0xd8a1e681) % 0x100000000 , 14))word_B = (word_C + fun_rotleft((word_B + fun_G(word_C, word_D, word_A) + block[4] + 0xe7d3fbc8) % 0x100000000 , 20))word_A = (word_B + fun_rotleft((word_A + fun_G(word_B, word_C, word_D) + block[9] + 0x21e1cde6) % 0x100000000 , 5))word_D = (word_A + fun_rotleft((word_D + fun_G(word_A, word_B, word_C) + block[14] + 0xc33707d6) % 0x100000000 , 9))word_C = (word_D + fun_rotleft((word_C + fun_G(word_D, word_A, word_B) + block[3] + 0xf4d50d87) % 0x100000000 , 14))word_B = (word_C + fun_rotleft((word_B + fun_G(word_C, word_D, word_A) + block[8] + 0x455a14ed) % 0x100000000 , 20))word_A = (word_B + fun_rotleft((word_A + fun_G(word_B, word_C, word_D) + block[13] + 0xa9e3e905) % 0x100000000 , 5))word_D = (word_A + fun_rotleft((word_D + fun_G(word_A, word_B, word_C) + block[2] + 0xfcefa3f8) % 0x100000000 , 9))word_C = (word_D + fun_rotleft((word_C + fun_G(word_D, word_A, word_B) + block[7] + 0x676f02d9) % 0x100000000 , 14))word_B = (word_C + fun_rotleft((word_B + fun_G(word_C, word_D, word_A) + block[12] + 0x8d2a4c8a) % 0x100000000 , 20))# STEP 4 - ROUND 3word_A = (word_B + fun_rotleft((word_A + fun_H(word_B, word_C, word_D) + block[5] + 0xfffa3942) % 0x100000000 , 4))word_D = (word_A + fun_rotleft((word_D + fun_H(word_A, word_B, word_C) + block[8] + 0x8771f681) % 0x100000000 , 11))word_C = (word_D + fun_rotleft((word_C + fun_H(word_D, word_A, word_B) + block[11] + 0x6d9d6122) % 0x100000000 , 16))word_B = (word_C + fun_rotleft((word_B + fun_H(word_C, word_D, word_A) + block[14] + 0xfde5380c) % 0x100000000 , 23))word_A = (word_B + fun_rotleft((word_A + fun_H(word_B, word_C, word_D) + block[1] + 0xa4beea44) % 0x100000000 , 4))word_D = (word_A + fun_rotleft((word_D + fun_H(word_A, word_B, word_C) + block[4] + 0x4bdecfa9) % 0x100000000 , 11))word_C = (word_D + fun_rotleft((word_C + fun_H(word_D, word_A, word_B) + block[7] + 0xf6bb4b60) % 0x100000000 , 16))word_B = (word_C + fun_rotleft((word_B + fun_H(word_C, word_D, word_A) + block[10] + 0xbebfbc70) % 0x100000000 , 23))word_A = (word_B + fun_rotleft((word_A + fun_H(word_B, word_C, word_D) + block[13] + 0x289b7ec6) % 0x100000000 , 4))word_D = (word_A + fun_rotleft((word_D + fun_H(word_A, word_B, word_C) + block[0] + 0xeaa127fa) % 0x100000000 , 11))word_C = (word_D + fun_rotleft((word_C + fun_H(word_D, word_A, word_B) + block[3] + 0xd4ef3085) % 0x100000000 , 16))word_B = (word_C + fun_rotleft((word_B + fun_H(word_C, word_D, word_A) + block[6] + 0x4881d05) % 0x100000000 , 23))word_A = (word_B + fun_rotleft((word_A + fun_H(word_B, word_C, word_D) + block[9] + 0xd9d4d039) % 0x100000000 , 4))word_D = (word_A + fun_rotleft((word_D + fun_H(word_A, word_B, word_C) + block[12] + 0xe6db99e5) % 0x100000000 , 11))word_C = (word_D + fun_rotleft((word_C + fun_H(word_D, word_A, word_B) + block[15] + 0x1fa27cf8) % 0x100000000 , 16))word_B = (word_C + fun_rotleft((word_B + fun_H(word_C, word_D, word_A) + block[2] + 0xc4ac5665) % 0x100000000 , 23))# STEP 4 - ROUND 4word_A = (word_B + fun_rotleft((word_A + fun_I(word_B, word_C, word_D) + block[0] + 0xf4292244) % 0x100000000 , 6))word_D = (word_A + fun_rotleft((word_D + fun_I(word_A, word_B, word_C) + block[7] + 0x432aff97) % 0x100000000 , 10))word_C = (word_D + fun_rotleft((word_C + fun_I(word_D, word_A, word_B) + block[14] + 0xab9423a7) % 0x100000000 , 15))word_B = (word_C + fun_rotleft((word_B + fun_I(word_C, word_D, word_A) + block[5] + 0xfc93a039) % 0x100000000 , 21))word_A = (word_B + fun_rotleft((word_A + fun_I(word_B, word_C, word_D) + block[12] + 0x655b59c3) % 0x100000000 , 6))word_D = (word_A + fun_rotleft((word_D + fun_I(word_A, word_B, word_C) + block[3] + 0x8f0ccc92) % 0x100000000 , 10))word_C = (word_D + fun_rotleft((word_C + fun_I(word_D, word_A, word_B) + block[10] + 0xffeff47d) % 0x100000000 , 15))word_B = (word_C + fun_rotleft((word_B + fun_I(word_C, word_D, word_A) + block[1] + 0x85845dd1) % 0x100000000 , 21))word_A = (word_B + fun_rotleft((word_A + fun_I(word_B, word_C, word_D) + block[8] + 0x6fa87e4f) % 0x100000000 , 6))word_D = (word_A + fun_rotleft((word_D + fun_I(word_A, word_B, word_C) + block[15] + 0xfe2ce6e0) % 0x100000000 , 10))word_C = (word_D + fun_rotleft((word_C + fun_I(word_D, word_A, word_B) + block[6] + 0xa3014314) % 0x100000000 , 15))word_B = (word_C + fun_rotleft((word_B + fun_I(word_C, word_D, word_A) + block[13] + 0x4e0811a1) % 0x100000000 , 21))word_A = (word_B + fun_rotleft((word_A + fun_I(word_B, word_C, word_D) + block[4] + 0xf7537e82) % 0x100000000 , 6))word_D = (word_A + fun_rotleft((word_D + fun_I(word_A, word_B, word_C) + block[11] + 0xbd3af235) % 0x100000000 , 10))word_C = (word_D + fun_rotleft((word_C + fun_I(word_D, word_A, word_B) + block[2] + 0x2ad7d2bb) % 0x100000000 , 15))word_B = (word_C + fun_rotleft((word_B + fun_I(word_C, word_D, word_A) + block[9] + 0xeb86d391) % 0x100000000 , 21))word_A = (word_A + AA) % 0x100000000word_B = (word_B + BB) % 0x100000000word_C = (word_C + CC) % 0x100000000word_D = (word_D + DD) % 0x100000000

Sorry for that, just to keep in mind, we keep word addition (hence I keep modulo %0x100000000 across additions to ensure we keep words as words. It’s a very common practice in cryptography as well (detailed in MD4).

MD5 — output

Here very common, we built the digest starting with word A finishing with word B and for each we adopt the little endian convention, meaning low orders bytes first.

As per below:

md_digest = word_A.to_bytes(8, 'little')
md_digest += word_B.to_bytes(8, 'little')
md_digest += word_C.to_bytes(8, 'little')
md_digest += word_D.to_bytes(8, 'little')print("".join(map(lambda x: hex(x).lstrip("0x"), md_digest)))

Conclusions

As per my previous tutorials, my implementations are not meant to be scalable or beautiful but to help understand the algorithm.

Remember, this algorithm has been proven unsafe, do not use it in critical applications.

Happy to reach out to further discuss if you have inputs, feedback nick@thecrypt.io

full code available here

https://github.com/NickRHR/hashfunctions/blob/aa00c9a75d9efa24c9b6e0857ea3a7b042657ded/MD5/MD5_nfc.py