Something that gets talked about a lot on the internet is password security and password hygiene.
Everyone has heard that it’s best practice to use long passwords that are unique and complex, but I want to write a short post about why those things are important, how passwords are stored by websites, and what that means for you.
The first thing to mention is USE PASSWORD MANAGERS. It’s the most important thing for keeping your accounts secure! If you don’t know what they are, it’s software that helps you generate and store passwords securely on your computer. You can google them, there are quite a lot of popular ones such as LastPass and KeepPass.
Now that that’s been said, on with the show.
Let’s focus on how passwords are stored by websites, because I think that’s something that doesn’t get explained enough. This is going to be a long read, because we’re going to have to talk about the history of storing passwords too.
To do that we’ve got to talk about hashing and how malicious people actually get your password when websites get hacked.
The first part of this is explaining hashing algorithms. There are a lot of different hashing algorithms, and they are something that whole bodies of literature and study are devoted to. For the purposes of this post though, we’re going to (very much simplyify) them as a brilliant bit of math that turns something (in this case a password) into a bit of data that will be unique to what it was computed from. This unique data is called a hash and most importantly, this bit of math cannot be reversed. You can’t take the hash and do some math on it to turn it back into your password.
An example of this can be seen here. I put “password” through the SHA1 hashing algorithm:
echo -ne "password" | sha1sum
That long string of characters starting with 5ba is the hash. Theoretically there is no way to reverse the algorithm to get “password” back out of it.
Back in the day, passwords were stored in “clear-text” i.e. essentially just stored somewhere on the computer with the hopes that somebody malicious wouldn’t be able to access them. Hackers, however, are wily creatures, and obviously, they did. So someone set upon the idea of using hashing algorithms. Instead of storing your password somewhere, now you could store the hash. When somebody logs in, you can turn the password into a hash, and check if it matches the stored password. Hurray, passwords were solved.
Again though, hackers are wily creatures, and we set upon ruining everyone’s day here too. We decided to find ways to discover the plaintext passwords again, by “cracking” hashes. There are a lot of ways to talk about doing this, but broadly and in the most simple way, it can be explained by putting lots of possible passwords through the same hashing algorithm, and seeing if any of the created hashes match. Sometimes this is done “live”, and sometimes this is done with a big list of hashes and their associated passwords; this big list is called a “rainbow table”, and it’ll become important in a little bit. Another little digression that will become important is that hashing algorithms weren’t originally designed for being used to store passwords securely – they are very easy and fast for computers to calculate by design, so we attackers can turn a large number of possible passwords into hashes very quickly.
Obviously, once the idea of hashing became popular, a lot of people started doing it and so a thriving online economy for cracking hashes developed. Rainbow tables, those large lists of passwords, started appearing – you can google “online hash cracking service” and for the most part those are websites where you put in your hash, and they’ll see if there’s a match in their rainbow table. To combat this, people introduced a new concept, which is called a “salt”. In very, very simple terms, a salt is a set of unique characters that gets stored in clear-text. The hash that gets stored is actually the result of your password being combined with the salt. It’s basically a way to make your password a little longer and complex, so that it won’t be as easily cracked. Because the salt is stored in clear-text, it’s a little imperfect.
Let’s go back to our previous example, where I showed a hash for “password”. Now, what’s happening looks more like this:
echo -ne "j4hm31password" | sha1sum
j4hm31 is the hash, and the website knows that. Every time you log in, it adds the hash to your password before it hashes it, and then it compares that with the stored hash. If they match, you log in. But, if somebody gets access to the salts, they can also try to crack the hash by prepending “j4m31” to a bunch of possible passwords. It means that they’ve got to attack each hash separately, instead of simply running a bunch of words through the algorithm and seeing what matches. It’s an improvement on before! But it’s still not perfect, because the algorithms we are using are still very quick to perform, so we can still try a lot of different combinations very quickly, and that leads us to the next chapter in our saga.
At this point in the story, computers are getting faster and faster. Hashing algorithms that were considered to be difficult enough to compute in real time a few years ago are now very, very easy to compute. One of the most famous and earliest algorithms was called MD5. When it was introduced as a password hashing algorithm it was considered “hard enough” for computers to attack, let me show you how many attempts my computer (a mid-range laptop, without using any specialised processing, which would make it MUCH faster) can make a second.
1g 0:00:00:00 DONE 2/3 (2019-10-31 11:28) 413833pp/s
That’s 413833 password guesses a second – it’s actually capable of much more (at least two hundred million guesses a second), but the MD5 hash of “digital” got cracked within that one second, before it could really ramp it up.
Obviously, the hashing algorithms weren’t working out – they were very fast to crack, and so we needed to create something new. Before that, we need to talk about what all of this means for you, and why you need long, complex passwords, that aren’t reused.
Let’s go back to attacking password hashes. There are a few ways of doing this. There’s rainbow tables, which we talked about earlier. There’s also brute force attacks, where you try every single possible combination of characters from, say, “a” all the way up to “zzzzzzzzzzzzzzzzzz”. These take a lot of time, but as computers get faster, it becomes more feasible to brute force all passwords of a certain length for some algorithms. I haven’t done any experimentation on it, but I imagine I could brute force all combinations of six characters for MD5 in under a few minutes with my computer, and it’s not a specialised, fast one made for cracking passwords.
Another way to crack passwords is to use wordlists or dictionaries, which are exactly as they sound. They’re big files that contain a lot of likely candidate passwords – lots of words, phrases, and names. Building dictionaries is a bit of a science, which we don’t get into, but you can make very effective dictionaries that can crack even unusual passwords with, say, numbers added to the end, like, say, “interruption2019”.
This is why we talk about password complexity and length – in theory, it’s harder to crack int3rrupt!on2019Digital” than “interruption2019”. It’s not impossible, however, and if a website is storing passwords in plaintext (and some still do!), or via an algorithm that’s easy to brute force like MD5, it’s possible for someone to get your password, and once they have it, they can try it out on all your other accounts and possibly breach them, if you’ve reused the same password.
The best possible password is, however, a very long password that is extremely random. Random is almost impossible for a person to remember, and quite difficult for computers brute force. One of the things that helps here are password managers. These help you generate a secure, unique, random password of a secure length for each account you have. This is why we always recommend password managers!
Back to hashing. We talked about algorithms that are easy to brute force like MD5. These are easy to brute force because storing passwords isn’t what they’re designed to do. We have since designed new algorithms, which aren’t quite hashing algorithms – they’re called “key derivation functions”, which are designed specifically to be very slow to crack, so that it takes a very long time to run them through wordlists. For the sake of education, I’d like to list a couple of good ones – argon2 is one, scrypt is another, and bcrypt is another. They’re kind of the best-practice ways of storing passwords. If someone gets hacked, and they mention they use one of these algorithms, you don’t really need to worry – you should still change that password, but it’s unlikely that it’ll be cracked.
The converse of this is that if you hear about a website that is using ANYTHING other than those, it may be the case that they don’t care too much about their security. Even though salted hashes of other algorithms are better than plaintext, it’s a bit like securing something with a big door with a big lock, versus using a high-tech vault full of advanced security systems and lasers. Unfortunately, you can’t usually tell what someone is using, and that’s why, again, always use password managers. Hopefully, I’ve managed to demystify this a little, and explain the reasons behind password recommendations.
If you’re a developer reading this, and you’re looking for some advice, feel free to contact us or send us an email at firstname.lastname@example.org.