php - Is MD5 really that bad? - Stack Overflow
up vote 79 down vote favorite
59
share [fb]

I hear everywhere that MD5 is "broken" (in the context of the password hashing).
Though there are much talk on the matter but not a single positive proof.
So, here came I with this simple question:

I have an MD5 hash dc3c2be8869ec299b5ab2748184c50ab
And a simple bit of code to produce it

$salt="V4&jd3M3^s5eF";
$pass="*hidden*";
echo $hash=md5($salt.$pass);

So, my question is:

What the password is?

I'm looking to get past theoretical reasoning, and come up with a simple example that we can use to demonstrate the problem(s) with MD5.

WELL
Due to dishonest demand in the recent answer, I am going to refresh the conditions.
Setting new hash and new salt.

Everyone is welcome to break the hash and post the actual password.

Also, because I am not bound with bounty, the question remains open forever.
Until it gets actually answered.

So, we can have a proof at last, showing to everyone that MD5 is bad, bad and broken.

link|edit|reopen|undelete|flag|protect
115 upvote
 flag
Most of us can't break even the most flawed encryption or hashing algorithms. Just because we can't break it, doesn't mean it's a good algorithm. If you are going to conclude that md5 is safe just because nobody will come up with your password, you are sticking your head in the sand. – Jacco May 4 '10 at 19:23
16 upvote
 flag
The formulation of this question and the "discussion" below seem excessively argumentative. "Broken" is in this case a term of art from cryptography (i.e. this is also not programming related), and this comes down to complaining that you don't like the way crytographers use their own jargon. – dmckee May 4 '10 at 19:25
12 upvote
 flag
Just because some of us MIGHT be able to break it, doesn't mean we will. Collisions can take hours/days to generate. More time that most of us are willing to put in to answer a question. A hacker doesn't have any such time restraints. – Reece45 May 4 '10 at 19:27
19 upvote
 flag
MD5 isn't yet broken to the point where we can snap our fingers and find the answer. It is broken to the point where it would take a surprisingly small amount of resources - today, probably still measured in millions or greater - to crack it. It's also not like it's going to get less broken - it's just going to get worse as research is done and as technology improves. You're welcome to use it, and you will probably never be cracked, but that's partially due to lack of interest. Why make your security worse intentionally? – ZorbaTHut May 4 '10 at 19:56
62 upvote
 flag
My guess is that Col Shrapnel has stolen the hash to the password to a very valuable server. He can't figure out how to crack it himself, so he has cleverly worded it as a challenge on here to try and get someone to do it for him. – davr May 4 '10 at 20:23
show 44 more comments

closed as not constructive by meagar, Kev Nov 21 '11 at 19:25

This question is not a good fit to our Q&A format. We expect answers to generally involve facts, references, or specific expertise; this question will likely solicit opinion, debate, arguments, polling, or extended discussion. See the FAQ.

deleted by Mr.Wizard, awoodland, Jeff Mercado, meagar, interjay, yoda, Bo Persson, Tim Post Nov 24 '11 at 18:33

18 Answers

Will that be proof enough (code included):

Peter Selinger: MD5 Collision Demo

Update: Comments claim that this answer is not addressing preimage attacks (which the question didn't ask about at the time of writing this answer). I would like to add that there also have been preimage attacks against MD5 (e.g. confer Finding Preimages in Full MD5 Faster Than Exhaustive Search). They are not feasible yet, but "Attacks never get worse, they only get better".

A nice summary on the current status of MD5 can be found at http://www.jcsecurity.co.uk/?p=215. The article concludes:

In conclusion, MD5 has ‘decent-enough’ preimage resistance but has some serious collision vulnerabilities which may or may not effect the application it’s deployed to. Even though MD5 might be safe under certain scenarios, unless you absolutely have to use it for reasons such as backwards compatibility or interoperability don’t. There is no point and as Schneier always says “Attacks never get worse, they only get better”. There are numerous alternatives available today, SHA256, SHA512 and Whirlpool for example are all sensible choices.

Update 2: There is another problem with MD5 in the context of being used as a password hash function. MD5 is fast. Although this article is already more than 2 years old, it still holds today:

Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes

As mentioned by ShuggyCoUk, this question is in essence a restatement of Does any published research indicate that preimage attacks on MD5 are imminent? I recommend to read on there.

link|edit|flag
deleted Nov 24 '11 at 18:33
4 upvote
 flag
+1 this is the only answer so far that addresses the OP's request for an example of cracking MD5. – Bill Karwin May 4 '10 at 19:32
4 upvote
 flag
Note to OxA3 : +1 for great example. This is exactly what the poster wanted. Note to OP (Col Shrapnel): Other answers are equally valid, they just didn't give you the answer in the format you wanted. This site (because it's well-designed) does not force users to give boolean responses. If so, no one would ever get the BEST answer to their questions. If you have a problem with that, use another site that functions more like a poll. – NateDSaint May 4 '10 at 19:57
3 upvote
 flag
-1, you're not distinguishing between preimage attacks and collision attacks. The Peter Selinger demo demonstrates a collision attack, not a preimage attack. – Jason S May 4 '10 at 20:14
3 upvote
 flag
@Jason: Col. Shrapnel does not seem to understand that there are different ways for a hash to be broken. No one has claimed that there is a good (first) preimage attack, only that MD5 is broken, which this page very clearly shows that it is. – BlueRaja - Danny Pflughoeft May 4 '10 at 20:22
3 upvote
 flag
use Digest::MD5 qw(md5_hex); my $salt='#bh35^&Res%'; my @var = (); my @num = (1 .. 255); push @var, chr($_) foreach @num; foreach my $a (@var) { foreach my $b (@var) { foreach my $c (@var) { my $md = md5_hex($salt.$a.$b.$c); print "Found: pass ($a $b $c) md5($md)" if $md eq 'c1e877411f5cb44d10ece283a37e1668'; } } } – Konerak May 4 '10 at 20:28
show 8 more comments
up vote 128 down vote accepted
+550

Your password is hunter2

link|edit|flag
deleted Nov 24 '11 at 18:33
52 upvote
 flag
+ 1 for awesome geek reference – Rubys May 4 '10 at 21:27
7 upvote
 flag
Does he keep 1 2 3 4 5 for his luggage? – Donal Fellows May 4 '10 at 22:19
12 upvote
 flag
@Donal: amazing! That's the same combination as on my air shield! – Graham Lee Jun 9 '10 at 10:19
30 upvote
 flag
Did OP really accept this as best answer??? Yikes. There were some really good answers below. – webbiedave Jun 16 '10 at 16:31
4 upvote
 flag
@web I was just thinking that. This was amusing, but there were some excellent answers. It got 0xA3 a Populist badge, anyway – Michael Mrozek Jun 16 '10 at 18:12
show 4 more comments

To address your points directly:

1) Yes, many collision vulnerabilities have been documented. The bottom line is that MD5 is fundamentally broken. According to Wikipedia:

US-CERT of the U. S. Department of Homeland Security said MD5 "should be considered cryptographically broken and unsuitable for further use," and most U.S. government applications will be required to move to the SHA-2 family of hash functions after 2010.

Bruce Schneier has also weighed in with regard to one such attack on MD5 to forge SSL certificates:

I'm not losing a whole lot of sleep because of these attacks. But -- come on, people -- no one should be using MD5 anymore.

2) Its impossible to determine what is behind the asterisks. The point of "breaking" MD5 is not to determine the original password value - it is to find a collision. If I can find another value that will hash to the same MD5 string, then that is as good as finding your original password. Of course, salting makes MD5 stronger, but it is still "broken".

link|edit|flag
deleted Nov 24 '11 at 18:33
3 upvote
 flag
Okay okay, it is broken. Well 1. Find a collision please. 2. show me a scenario of using that collision, to get access to the admin area on my site – Col. Shrapnel May 4 '10 at 19:20
49 upvote
 flag
@Col. Shrapnel, I get the feeling you have already made up you mind.. and are just looking for an excuse not to implement something safe. – Jacco May 4 '10 at 19:28
8 upvote
 flag
1. Find it yourself: woodmann.com/collaborative/tools/index.php/SnD_Reverser_Tool 2. It's not our job to break into your site, even if anyone here really cared enough too. If you want to rely on MD5, you can have fun trying to sleep at night knowing your security is fundamentally broken. – peachykeen May 4 '10 at 19:30
4 upvote
 flag
I break into your site and steal the hashed version of your passwds - the point of hashing is that those hashes are no use to me, I can't get the passwds from them. The flaw in MD5 is that I can find another word with the same MD5 hash without trying all possible combinations - I can then use that new word to log into your site. In practice it's not a problem - I still have to do a hell of a lot of tests, just not as many as you would think. And the collision I come up with may be 10k long which I can't use as a passwd anyway. – Martin Beckett May 4 '10 at 19:41
3 upvote
 flag
@Martin: Quite frankly, the scenario you gave isn't plausible. If the hacker has access to get the hashed version of the passwords, then they probably already have access to UPDATE one of those passwords. Considering the salt is typically stored in the same record as the password itself, it would be trivial to replace the users password (and salt value) with my own. – Chris Lively May 4 '10 at 20:10
show 9 more comments

There seems to be two ways in which MD5 can be "broken," not at all equivalent:

  1. It is easy to find two inputs X and Y such that md5(X) == md5(Y).

  2. Given md5(X), it is easy to find a Y such that md5(X) == md5(Y).

It's my understanding that md5 has vulnerability 1, but not 2. Although I see why vulnerability 1 might be undesirable for some applications (e.g. digital signatures), the original poster is using it for password authentication. Why is md5 "broken" for this application?

link|edit|flag
deleted Nov 24 '11 at 18:33
4 upvote
 flag
+1 for the only response here that distinguishes between a preimage attack (your point #2, which is actually a "first preimage attack") and a collision attack (your point #1). – Jason S May 4 '10 at 20:03
2 upvote
 flag
"Why is md5 "broken" for this application?" - because hashes do not become "less" broken, only more. It takes less than brute-force to brute-force an MD5 hash, which is clearly not what you want... And why use a broken hash when it is just as easy to use a non-broken one? – BlueRaja - Danny Pflughoeft May 4 '10 at 20:27
1 upvote
 flag
The scenario is known as a Birthday Attack. When I'm searching for 1 of n answers, I can find it in substantially less than 1/n fraction of the effort needed to get 1 specific answer. Given a copy of the password file (Chris Lively in another answer thinks this is infeasible, but I wonder if he has never misplaced a backup tape in his life), I may not be able to get (or fake) YOUR credentials, but it's likely I could get several sets of credentials by this means. – Jason May 4 '10 at 21:24
1 upvote
 flag
Yes, that's basic Information Theory - The Pigeonhole principle. The idea with a cryptographically strong hash function is that you have enough pigeonholes to make finding a collision infeasible for the forseable future. That quality no longer applies sufficienty to MD5. Where opinions differ is to whether the attacks are feasible now, or simply feasible soon. But the general concensus, for a couple of years now, is that there is no justification for deploying NEW systems that rely on MD5, and that old systems should review their use and have a plan to migrate away. – Jason May 4 '10 at 22:27
2 upvote
 flag
@theUnhandled: That's #2, not #1. – user168715 Nov 15 '10 at 21:51
show 2 more comments

md5() is broken due to possibility of a malicious person to intentionally generate a collision. This is more of a problem for Certificate Authorities and the Public Key Infrastructure (PKI) than passwords. By generating a hash collision its possible to forge a SSL certificate.

Further more a collision against md5() is generated by using a specially crafted prefix, if you append a salt to the beginning of the message then an attacker cannot control the prefix and a collision cannot be generated.

However my biggest argument is that it is trivial to use a secure message digest function. You can install mhash or use one of the many sha2 php implementations.

link|edit|flag
deleted Nov 24 '11 at 18:33
7 upvote
 flag
+1 for it is trivial to use a secure algorithm. – Jacco May 4 '10 at 19:44
  upvote
 flag
I haven't been here long, but this seems to be a theme around here: People are violently determined to expend extra energy to accomplish something with (provably) worse results than the accepted practice. – Jason May 4 '10 at 22:29
1 upvote
 flag
@Jason I haven't been here long either and I find that people like to chit-chat about security more than actually proving weather or not a system can hold water. Have you ever written an exploit? – Rook May 4 '10 at 22:35
  upvote
 flag
Your question has nothing to do with the current discussion. Did you think I was baiting you? Or are you baiting me... but the answer is yes. – Jason May 4 '10 at 23:26
  upvote
 flag
@Jason well that makes 2 of us(milw0rm.com/author/677), because its safe to say no one else on SO has. – Rook May 4 '10 at 23:29
show 1 more comment

What you have done here is epitomized a train of broken thinking into a question that seemingly challenges a logical fallacy.

In fact, your question is as much a statement as it is a fallacy.

If a one way hash collides, even once, don't use it, period, for the purposes that you are describing.

I really, really hope that you don't work for my bank.

link|edit|flag
deleted Nov 24 '11 at 18:33
5 upvote
 flag
Well I didn't think of it from such a point of view. I apologize for that. Let me explain what do I mean. There are many statements in our life like "wash your hands before eat". I don't want to believe in these statements. I want to understand why. Yes, I've lack of intellect to understand theoretical rezoning. But I suspect that any theory can be proved with simple experiment. And I have never heard of a real example of breaking of salted MD5 hash. While everyone keep talking of it's weakness. I want to distinguish the rumors from the truth. In practical way. – Col. Shrapnel May 4 '10 at 22:22
  upvote
 flag
I think the fallacy boils down basically to this: The strength of a cryptographic hash is based on some mathematical properties of the algorithm. Why wouldn't one use math to disprove the strength of that same algorithm? In fact don't exploits against these systems often come as concrete implementations of the math? (ignoring for a moment the class of errors caused by bugs in the original implementation) – Jason May 5 '10 at 0:16
1 upvote
 flag
I remember as several years ago I wanted to deliver content in my web-application based on a REQUEST_URI hash, and decided to use md5 as the hash function. I happened to run into a collision on about 300 uris in my development environment and do not believe in md5 since that day =) – newtover May 5 '10 at 8:49
2 upvote
 flag
@newtover, you're one lucky guy... @Tim Post: You mean a hash that has a collision published, right? All fixed-length-output hashes have an infinite number of collisions, it's just a matter of being able to find them. – Longpoke May 6 '10 at 4:14
  upvote
 flag
@Longpoke - Published or 'found' by a program breaking. Part of my point is, md5 had uses far beyond hashing passwords. – Tim Post May 6 '10 at 6:23
show 1 more comment

Finding a collision means I find two different inputs that result in the same hash.

Finding a preimage means finding an input that results in a single, specified hash. I.e., you hash something, and I know only the hash, and my challenge is to find a second input that results in the same hash as output.

The currently known attacks on MD5 allow generation of collisions, but do not allow the generation of a second preimage.

On the other hand, the fact that MD5 has been broken to the extent necessary to find collisions easily1 gives a fairly strong indication that finding a second preimage is much more likely than with another hash that hasn't been broken to the same degree. It's possible that nobody will ever develop a preimage attack against MD5 -- but then again, it's also possible that somebody already has, and is putting it to use for their own nefarious ends rather than publishing it.

  1. There's a huge difference between the breaks of MD5 and SHA-1: the break against MD5 makes it trivial for almost anybody to create colliding inputs. If you have an old (e.g. Pentium IV) computer lying around mostly unused, it's sufficient to find colliding inputs at a rate of something like one pair per hour or so (and the "or so" mostly translates to "or very likely faster"). The break against SHA-1 is right at the border between theoretical and practical. As close as I can figure it, you'd need to spend something like a million (US) dollars to build a machine that could find one collision per week.
link|edit|flag
deleted Nov 24 '11 at 18:33
  upvote
 flag
Col. S: comments like that get this question closed – Jason S May 4 '10 at 20:59
1 upvote
 flag
@Jason it doesn't really matter as there are no answers anyway. – Col. Shrapnel May 4 '10 at 21:05

Assuming a password of a maximum of 20 characters and an alphabet of Latin characters and numbers, there are as many as 5e+35 possible strings.

I wrote a very simple program that attempts to generate every possible string and then compares the salted hash against the one you presented.

It will take roughly 16,1e+22 (16+22 zeros) years to calculate all hashes with this program. It's a lot of time, of course, so yeah, it seems to you that your hash is unbreakable. However, now, remember that there are collisions in MD5 hashes, right? So for every two strings that generate the same hashs, you can reduce the total number of calculations in one. If it collides a lot -- then maybe we can find your pass in a feasible time... one week, maybe?

I will try running the application in my home desktop, which has more processing power and maybe multi-thread it, even though I surely won't left it running for a week.

UPDATE: Recalculated the total of years after improving the algorithm: 15e13 years, way below the last estimative. That's with a single thread (actually, two threads, one to do the calculations and another to give status reports every 30 sec...). Tomorrow I might try creating a thread model for this app.

link|edit|flag
deleted Nov 24 '11 at 18:33
4 upvote
 flag
+1 for effort, but given the context of a password file, a better argument would be made if the OP provided you with a list of 10k hashes and said, "give me a collision for one". But in either case, you're just doing brute force cracking here. And no digest, SHA2 or otherwise, is going to resist that for long unless you increase the complexity of the passwords + the salt to create enough possibilities that your attacker can't just try them all. Weakening a hash is the process of reducing the search space far below the total number of combinations. And that's what's happening. – Jason May 4 '10 at 23:35
2 upvote
 flag
Read the article linked above: chargen.matasano.com/chargen/2007/9/7/… and especially some of the links out of that article. I think you'll realize that you can shave many orders of magnitude off of your current implementation. Threading will probably only ever shave one or two orders off of the time, but might illustrate how you'd perform the same operation on a bot net. However I will reiterate that this is still brute-forcing it, and the attacks are about getting below O(2^N) runtime. – Jason May 7 '10 at 21:43
  upvote
 flag
@Jason: thanks... I will read the article. – Bruno Brant May 7 '10 at 22:21
  upvote
 flag
@Jason: UP! Very f$%@$ cool article. :) – Bruno Brant May 7 '10 at 22:31
  upvote
 flag
If you can guess word or character patterns that have been used to create the password, John The Ripper can get through md5 encrypted passwords pretty quickly. For fun I ran it on an oscommerce password table, and got roughly 70% of the passwords out in an hour! People really do use passwords like 'hunter2'. Of course, what helped was that the salt was kept in plaintext on the table. – Pete855217 Nov 24 '11 at 12:01

The weakness isn't that you can discover the value being hashed. The weakness is that if you know X and MD5(X), it's possible to construct a Y such that MD5(Y) == MD5(X). This means that it's possible to forge a message to match a signature. On top of that, it has a higher rate of collisions than other, stronger algorithms.

In general, you should use SHA-1 or better.

Read this post for more information.

link|edit|flag
deleted Nov 24 '11 at 18:33
2 upvote
 flag
You don't even have to know X. If you know MD5(X) it's possible to construct a Y such that MD5(Y) == MD5(X). – Stephen P May 4 '10 at 19:20
1 upvote
 flag
I am sorry, I weren't asking for the posts. I were asking for the example. – Col. Shrapnel May 4 '10 at 19:22
  upvote
 flag
@Alex: -1. (And @Stephen P as well) You're stating incorrect information. MD5 is known to have collision attacks (possible to construct two messages X and Y such that MD5(Y)==MD5(X)) but not yet known to suffer from preimage attacks, where a hash and/or a message X has been given, and the attack is to construct a message Y where MD5(X) == MD5(Y). – Jason S May 4 '10 at 20:06

Using a single iteration of a simple hash, even with salt, is entirely inadequate for password hashing, as they're entirely too easy to brute-force even with a hash that isn't broken. Use a proper hash-strengthening scheme or key derivation function such as bcrypt, scrypt, PBKDF2, or Glibc's iterated SHA-2 family hashes.

link|edit|flag
deleted Nov 24 '11 at 18:33
3 upvote
 flag
Please don't take it as a trolling but I really don't understand why everyone says it's entirely too easy, but noone to show an example? okay, 3 alpha characters in lower case is easy. But what about 8 alphanumeric mixed case? – Col. Shrapnel May 4 '10 at 22:56
  upvote
 flag
That's only a few hundred billion hashes. A modern video card can compute about a hundred million hashes per second. So you're talking a few hours for an exhaustive search of alnum^8. – hobbs May 4 '10 at 23:00
  upvote
 flag
Whoops, that's not a modern video card anymore. A 3-year-old video card can do a hundred million hashes per second. A newer one will do a half-billion hashes per second. – hobbs May 4 '10 at 23:11
  upvote
 flag
I had a time to think it over. Does this key derivation help against bruteforce? If so, it would be great, though I don't understand how it can be – Col. Shrapnel May 5 '10 at 7:51
1 upvote
 flag
Of course, that's the point. A good KDF is hard to parallelize in hardware, and makes computing the hash arbitrarily difficult -- meaning you can slow down a brute-force attack by a factor of 1,000 or 1,000,000 or whatever factor you like compared to a naïve hash. – hobbs May 5 '10 at 11:51
show 1 more comment

You are using salted hashes, which is good, because it makes it nearly impossible to do a simple reverse-lookup on a hash, even for common words. MD5 has a higher chance of collisions versus SHA1 though. This means that it's more likely that you'll get the same hash for 2 different strings with MD5, although it's still extremely unlikely. However, sha1() is a drop-in replacement for md5(), so why wouldn't you use it?

link|edit|flag
deleted Nov 24 '11 at 18:33
1 upvote
 flag
Because SHA1 is broken as well: schneier.com/blog/archives/2005/02/sha1_broken.html – Jacco May 4 '10 at 19:16
1 upvote
 flag
Ok ok. Higher chance of collision. I've heard thet nearly thousand times. So, what's the password? – Col. Shrapnel May 4 '10 at 19:17
3 upvote
 flag
SHA-1 is not a drop-in replacement. It has 160 bits output, not 128. Have fun dropping that in without causing problems. – Joey May 4 '10 at 19:17
1 upvote
 flag
Or use sha256, since sha1 is also "broken" – Stephen P May 4 '10 at 19:18
3 upvote
 flag
@Jacoo, 2^69 operations to find a single collision? THAT'S AN AWFUL LOT OF OPERATIONS. Fine, if you insist, SHA-512. Break that >.> – iconiK May 4 '10 at 19:20
show 2 more comments

You probably don't need to worry about it. The problem with MD5 is that somebody may be able to find another message that hashes to the same value, which only matters in cases where the hash is visible. This is a big problem with message authentication where you publish a message and say that it is valid if it has a specific MD5 hash. If I can find another message that has the same hash, I can pass it off as valid.

In your case, however, it looks like you're hashing passwords. If you keep the hash value private, there's no way to create a collision, so you'll be safe.

link|edit|flag
deleted Nov 24 '11 at 18:33
4 upvote
 flag
Unless someone gains access to the database where all of your hashed passwords are stored, but you'll likely have bigger problems at that point. – Justin Johnson May 4 '10 at 19:40
1 upvote
 flag
@Justin Johnson you might have bigger problems, but your users' biggest problem will be the fact that you leaked their password. You could at least try to protect them adequately by using good hashing practices. – hobbs May 5 '10 at 3:04
2 upvote
 flag
@hobbs: How does leaking a salted MD5 hash of a password leak a user's password? – Gabe May 5 '10 at 11:36
2 upvote
 flag
hobbs: How do you know which of the infinite messages that hash to the same MD5 as my password is actually my password? – Gabe May 5 '10 at 16:06
2 upvote
 flag
@hobbs: Given that there's no feasible preimage attack on MD5 yet, you're better off with a dictionary attack on the web site -- you don't even need access to the password database! If your concern is that MD5 is just too fast, Col. Shrapnel can make it take 1000x longer just by hashing 1000 times in a loop, so why bother with a whole other algorithm? – Gabe May 6 '10 at 2:14
show 5 more comments

Here's a decent practical explanation of why letting people know your salt is a horrible idea:

  1. Go to http://www.md5decrypter.com/
  2. Enter: 99e9446e78aac2056d3903e1adb8fbcd and the Recaptcha
  3. Hit Decrypt.
  4. BOOM goes the dynamite!

Results
Md5 Hash: 99e9446e78aac2056d3903e1adb8fbcd
Normal Text: #bh35^&Res%s4mep8ss

Salt: #bh35^&Res% Pass: s4mep8ss

link|edit|flag
deleted Nov 24 '11 at 18:33
  upvote
 flag
Just needs to be in a rainbow table. Instructions to create your own here: project-rainbowcrack.com/tutorial.htm – evan Nov 1 '11 at 8:10
  upvote
 flag
Individually yes, but having each in a different system that requires different methods for hacking, no. SQL injection to get your hashes is one thing. Access to your system to get your salt is another. Besides, you only asked for an example - they state on their site that they only hold 15M hashes, not all of them. – evan Nov 1 '11 at 8:14
  upvote
 flag
I'm looking to get past theoretical reasoning, and come up with a simple example that we can use to demonstrate the problem(s) with MD5. I gave you your example. Don't get angry when I've done exactly what you've asked. – evan Nov 1 '11 at 8:25
  upvote
 flag
The password was posted by me a half-year ago. So, there is no use to come up with the same password. If you want to start over - here is another hash for you. – Col. Shrapnel Nov 18 '11 at 6:00

I'll come at this a different way. A lot of companies use MD5 hashes to "sign" their files, or assert that a file with a given hash is unique from another file. They base their entire systems on this, especially with respect to file deduplication, or single instance storage.

Now given the fact that you can have different files with the same hash (see these examples), what possible faith can you put into a system that asserts that no two files exist with the same MD5 hash?

Edit to answer comments:

Let's assume a few things, to take it out of the context of the mathmatical realm, and place it into the context of the original question "Why is the use of MD5 bad in practice?"

Say your company is involved in litigation, and the opposing party demands any and all documents relating to "X". You go and buy some software that will crawl all your storage locations and caltalogs the billions of files and emails and attachments, generating an MD5 hash for each. You then exclude all "duplicate" files based on the MD5 hash, and produce the rest of the relevant documents to opposing consul.

Now say the opposing counsel is a bit of an "enthusiastic" litigator, and wants to cast doubt that your company actually met its obligations, specifically in the trustworthiness of using MD5 as a deduplication mechinsim. The opposing pary is going for your company's throat, wanting the judge to impose some hefty sanctions, or even a summary judgement.

So if you were to go in front of a court in a litigation setting, where your company was under penalty of such sanctions, your defense woud be, yes, using MD5 is fine, because:

You need to distinguish among the cases that

  • (a) hash collisions can happen (albeit with extremely small probability),
  • (b) two files can be intentionally constructed to cause a collision (this is a "collision attack", it's possible with MD5),
  • (c) an arbitrary file can be intentionally constructed to cause a collision with another file (preimage attack, not known for MD5)...
  • (d) that a hash collision w/o intentional construction of files is likely to happen (which is not true... you'd need approx 2^64 different files to have a likely collision in a 128-bit hash.)

To which the litigator would likely respond:

  • (a1) Is it possible that two different files can have the same MD5 hash?

    (your answer would have to be yes)

  • (b1) Do you know if there are any examples of two different files that have the same MD5 hash?

    (again, your answer would have to be yes)

At this point, you have lost support in the eyes of most judges. It is now up to your legal team to steer the course back onto the "MD5 is fine" track. I'd rather not be in that position in the first place. At least with SHA-256 or other longer hashes, you can answer "No" to (b1). And thus, the whole point to the question: "Why is the use of MD5 bad in practice?"

link|edit|flag
deleted Nov 24 '11 at 18:33
  upvote
 flag
You can always have different files with the same hash: any practical hash is a function with domain space much larger than range space, so there will always be many inputs mapping into the same output. You need to distinguish among the cases that (a) hash collisions can happen (albeit with extremely small probability), (b) two files can be intentionally constructed to cause a collision (this is a "collision attack", it's possible with MD5), (c) an arbitrary file can be intentionally constructed to cause a collision with another file (preimage attack, not known for MD5)... – Jason S Jun 11 '10 at 13:27
  upvote
 flag
...and (d) that a hash collision w/o intentional construction of files is likely to happen (which is not true... you'd need approx 2^64 different files to have a likely collision in a 128-bit hash.) For further reference see RFC4270 tools.ietf.org/html/rfc4270 – Jason S Jun 11 '10 at 13:29
  upvote
 flag
Depending on the document format used I'd probably be able to answer the judge that no meaningful document could be produced that way (MD5 collision generator likes to generate files with bad bytes). – Joshua Jun 11 '10 at 19:07
  upvote
 flag
And then the opposing conusel would retort with these two files: th.informatik.uni-mannheim.de/people/lucks/HashCollisions/… th.informatik.uni-mannheim.de/people/lucks/HashCollisions/… (use this to view: view.samurajdata.se) – GalacticJello Jun 11 '10 at 20:27
  upvote
 flag
The point is, you don't need to put yourself in that position in the first place, as the original question is asking "Why is it a bad practice?" If you are designing a new system, don't add that risk. It's following a "best practice" that experts have been saying for years: don't use MD5. – GalacticJello Jun 11 '10 at 20:30
show 3 more comments

The first step is admitting the problem which you are doing by writing this question. That's good.

Now step slowly away from the broken algorithm and use one of the many fine alternatives. If you really want to be forward looking, use one of the new candidate hashes submitted to NIST such as Skein.

OTOH your simplest and most widely implemented bet is probably just SHA-2 at 256 bits. That's a wonderful middle ground of size and strength. I'd advise against using SHA-1 at this point as it will be the next to fall. New code should anticipate the next move.

WRT MD5, that has pretty much been answered by others. Safe to say major X.509 certificate implementations still depend on that old beast, but anyone writing new code should use something that isn't known to be broken. The harder question is how to get rid of MD5 from old code, it's sort of like the year 2000 problem except it has no end date so it ends up just being a multi-decade security flaw happening over and over again.

link|edit|flag
deleted Nov 24 '11 at 18:33

Three words: Fixed prefix attack.

link|edit|flag
deleted Nov 24 '11 at 18:33

Collision space aside, MD5 is a fast hashing algorithm - if I want to brute force it (even with salting) then it takes less effort to do than an equivalent in SHA1, SHA-256 or the like.

Your objective is to make it computationally expensive and slow to generate collisions; using a unique salt per value, and using a slower hashing algorithm, both work towards that goal.

link|edit|flag
deleted Nov 24 '11 at 18:33

MD5 is broken. Search Google for MD5 collisions.

link|edit|flag
deleted May 23 '10 at 20:59
  upvote
 flag
-1, you're not distinguishing between preimage attacks and collision attacks. – Jason S May 4 '10 at 20:11
7 upvote
 flag
"Google it" is not an answer... – jheriko May 4 '10 at 23:01
  upvote
 flag
@jheriko: "Google it" is an answer, if the question is of a typical "homework" kind. But in this case, I agree with you. – Boldewyn May 12 '10 at 13:13

Browse other questions tagged or ask your own question.