You Shall Not Password
Password security seems to be a weirdly controversial topic. Obviously, when compliance is the thing at play and a company needs to prove to its various stakeholders that they're taking security seriously, it's better to over-do it than under. That said, there are a lot of myths out there in terms of password security which are creating a lot of unnecessary fuss (or, in some cases, actively decreasing people's security) so I'm hoping to clear the air a bit as to what practices I think work and why.
Since this is an important topic for everyone, I'm going to try to make this topic accessible to the layman by giving a high-level explanation of how the system works and the common approaches used by hackers to crack passwords before I launch into my view on best practices.
So how do passwords actually work?
Duh. The website you're logging into has a copy of your password. When you log in, it looks up the password for the username/email address you've given - and if the password matches, great!
Is it really that simple? Of course not. Whilst yes - the service you're logging into obviously has to keep some kind of record of your password, it will never do so (or, should never do so) in 'plaintext', that being, no website should ever store your password without any form of encryption. A database table like the one below should never appear anywhere.
User 1 | User 2 | |
---|---|---|
Username | p.griffin | homer.jay.s |
Name | Peter Griffin | Homer Simpson |
Password | Apple6 | password |
This is because...
If the database is breached, hackers will be able to see your password directly and use it to log into the website (and possibly others, if the password has been re-used).
The service themselves might leak it to the wider internet through either malice or incompetence.
The owner of the database could be blackmailed by hackers or otherwise compelled by authorities to give up the information.
...etcetera etcetera. It's much safer for the site not to directly save it at all.
So they don't store the password directly. What do they do instead?
It needs to be encrypted. So before I answer that question, I'd like to briefly outline the three types of cryptography...
Symmetric (secret-key) cryptography - uses a cryptographic key to turn the plaintext into ciphertext, meaning: turning human-readable text into gibberish. This gibberish has no useful meaning until it is converted back into the human-readable text using the same key that was used to encrypt it.
Asymmetric (public-key) cryptography - uses a cryptographic key ('public key') to turn plaintext into ciphertext, but a different key ('private key') to convert it back into plaintext. This is usually more secure than symmetric cryptography since it's easier to prevent your private key from leaking, but it's much more computationally expensive.
Hashing algorithms - this is an algorithm that converts plaintext into a hash; a non-human-readable string of gibberish that cannot be converted back into the plaintext. An encryption key is not needed.
The implementation details of the above algorithms are beyond the scope of this article but this should be enough context for our discussion about passwords (especially if you're wondering why that last one is ever useful).
Seriously? Turning something into gibberish that can't be read or turned back?
There are some important things to note about hashing algorithms.
They are deterministic; that is, if you use the same algorithm on the same input, you will always get the same output. Similarly, different inputs should give different outputs. The output should also not be in any way reminiscent of the input data.
They are very much irreversible; the information has been jumbled to the point where it can't be recovered, a bit like how we can't turn mince meat back into the cow.
They should take a fairly large amount of computing power (and thus, a reasonably long time, by computing-standards) to calculate, especially if trying to calculate them en-masse.
With this in mind, it turns out that the only way to reverse a hashing function is to guess the starting text, perform the hash, and repeat this until you get a match. This might involve going through every string from 'aaaaaaaa' to 'ZZZZZZZZ' (plus numbers, special characters, strings of different lengths...that's a heck of a lot of combinations). This is called brute force and is a horribly inefficient way of trying to undo hashing functions.
So how does this apply to passwords?
Websites should thus store the hash of your password and not the password itself. That way, when you try to log into the website, it can compare the hash of the submitted password with the hash stored in the database and then decide whether or not to authenticate you. Additionally, anyone who has access to the database (legitimately or otherwise) is not able to directly see your password. A table like the one following, whilst not perfect, is substantially more secure than the one shown above.
User 1 | User 2 | |
---|---|---|
Username | p.griffin | homer.jay.s |
Name | Peter Griffin | Homer Simpson |
Password* | Apple6 | password |
Hash | f3235e3ba4ce... | 5f4dcc3b5aa7... |
*this row is only shown here for demonstrative purposes and wouldn't appear in a (well-implemented) production system.
Passwords can't be recovered from the hash. Does that mean we're safe?
The short answer: no, not without a further layer of protection. Let's assume the attacker knows which hashing algorithm you're using. There's a chance they have a pre-computed lookup table of some sort for that algorithm; they have a huge database of billions of possible passwords and their hashes (especially common passwords), at which point they simply have to check if the stolen password hash matches anything in their table. Luckily, there are still some tuning knobs we can play with to make things trickier for them.
Why only hash once? If we take the output of a hashing algorithm and simply run the same algorithm on it again, we double the amount of work to calculate the hash. If we do this 10 times (known as the number of 'iterations'), we've increased the amount of effort of a brute force attack by a factor of 10 (OWASP's recommendation is to hash at least 210,000 times).
Additionally, we can 'salt' the password to effectively make lookup tables useless.
Salted passwords - good enough to eat
Imagine if, when a user registers and sets their password, we generate a nonsense string (not even a hash; just random noise) and add it to the password when it is set. The stored hash is then the result of running the hashing function on the password + salt combination, where the salt can also simply be stored alongside the hash in plaintext. When the user later tries to log in, we simply add the random salt to the provided password, take the hash of the two and compare it to the saved value.
User 1 | User 2 | |
---|---|---|
Name | Peter Griffin | Homer Simpson |
Password** | Apple6 | password |
Salt | ii3vd64c... | 7HSx87oY... |
Pre-hash** | Apple6ii3vd64c... | password7HSx87oY... |
Hash | 165fbb651e0b... | 6e35dd11b4a7... |
**as above.
Don't worry if this concept isn't entirely intuitive to you. Just imagine if 10 users have the same password. Without a salt, they'd all have the same password hash and cracking this password once would endanger all 10 of them. Having a different salt for every user stops this being a problem, and also means pre-calculated lookup tables now need to have every possible password along with every possible salt, which will likely take more storage than currently exists on the planet. In this case, even Homer Simpson's dumb choice of using 'password' as his password won't immediately get caught because the attacker's lookup table would need to have entries for 'password' concatenated with every possible salt string. The hacker needs to pick a user and start guessing passwords again (perhaps they'll start by guessing 'password').
Conclusion - therein lies the catch
Ever wondered why security experts get so passionate about not using words like 'password' for your password? We understand that in order to break a password hash, the attacker has to repeatedly guess what the password is, run the hashing function, and see if it matches. The vast majority of strings in the roughly three sextillion possible 12-digit alphanumeric strings are extremely unlikely (without even accounting for longer strings or special characters) and it would be folly to try to literally check them all. Attackers are going to reduce their search space by trying to predict how we targets might behave.
Guessing common passwords is quite typical but there are several other strategies often used by hackers to crack passwords. We'll explore that side of things, along with good practises to defend against them, in the next post!
-tommy
Thanks for reading! If you enjoyed reading this post and/or learned something, please get in touch and let me know. Better yet, if you've found any errata in my blog posts, please do make me aware. I'm always looking for opportunities to improve my writing!