Encryption & Hashing

What is Encryption?

Encryption is the process of turning data into other data, data which can only be made meaningful if you know how to decrypt it.

When do I use it?

You should encrypt any data your organization has deemed "sensitive", both in transit and at rest.

States of data

Data is generally thought of as having three "states". Just like matter can be a liquid, solid or gas¹, data can be "in transit" (or "in motion"), "in use", or "at rest".

Data "in use" is a reference to data being stored in volatile memory, i.e. RAM. We could extend this metaphor to data displayed and edited in the web browser, but either way it's generally thought of as being the end-user's problem, and there's not much we can do about encrypting it, so we're not going to bother with it much today.

For our purposes, we can think of data "in transit" as being data transmitted from a client (like a browser) to our web server, or the reverse.

Data "at rest" is data stored in non-volatile memory, like in our database.

You should think of it as your responsibility to encrypt data that is "in transit" or "at rest".

Encrypting Data in Transit

Great news! We learned how to do this last week, and it's pretty easy.

Get an SSL certificate for your website
Set your Strict-Transport-Security HTTP header to make https mandatory.

Encrypting Data at Rest

Ok, step 1: use an encryption library, not some "roll-your-own" solution.

More on encryption libraries in a moment.

How does encryption work?

Let's take a look at a (very, very simple, and therefore very, very bad) example of encryption:

If you're just reading along in the notes without the accompanying lecture, make sure you check out the comments in the JavaScript.

While in this example, we used a single formula (letter + 1), a proper encryption algorithm would have a different formula every time it encrypts data. To know the formula used to encrypt a particular block of data, you need the "key".

Symmetric vs. Asymmetric encryption

If you use the same key to encrypt and decrypt, this is known as "symmetric key encryption".

If different keys are used to encrypt and decrypt (usually a 'public' key and a 'private' key), this is known as "asymmetric encryption". In some asymmetric encryption algorithms, both the public key and private key can encrypt, but only the private key can decrypt. Asymmetric encryption is pretty computationally expensive, so it's usually only used to secure small amounts of data, like keys themselves.

RSA (asymmetric) encryption

"RSA" is an example of asymmetric encryption - brought to you by the same people who make those little authentication fobs.

RSA fob — This key fob provides authentication, but the same company provides encryption tools.

The way I remember that RSA is asymmetric encryption is by remembering that we use asymmetric encryption with Really Small Amounts of data ;)

AES (symmetric) encryption

The current gold-standard of symmetric encryption (encryption with a single key) is AES. AES is the only publicly-available encryption method deemed secure enough to secure top-secret information for the U.S. government, so it's pretty solid.

As previously mentioned, despite the availability of powerful encryption algorithms like AES, you should not be attempting to create your own encryption library.

Encryption libraries

Currently, there are two libraries that are very well-regarded and available for use in most contexts (Node, PHP, .NET, etc.): OpenSSL Opens in a new window and libsodium Opens in a new window.

If you're wondering what the difference is - OpenSSL is considered more interoperable and "future-proof", while libsodium is considered more "idiot-proof". It's possible to choose older, less secure encryption through OpenSSL, whereas with libsodium, if you've got it working, you're pretty much ok. That being said, libsodium is harder to keep working as security threats evolve over the years and your application grows more complex.

Using OpenSSL

Let's look at a very simple example, where we use OpenSSL through the command line (one of the many places, including most server-side code, that you can use OpenSSL).

openssl enc -aes-256-cbc -e -salt -in my-diary.txt -out nz-ejbsz.txt

Broken down,

openssl calls the command line application 'openssl',
enc calls the method for using encryption/decryption ciphers,
-aes-256-cbc specifies the algorithm with which we will encrypt our data - in this case, AES with a key 256 bits long, using the Cipher Block Chaining mode,
-e means we will be encrypting the data,
-salt means we will be adding a "salt" to our data - an extra layer of security, which we'll talk about shortly
-in means the next thing in the command will be our input source, the thing getting encrypted
my-diary.txt is the input source, a file in the working directory (i.e. the same folder where this command is run),
-out means we're about to specify the output destination, the place where our output data goes, and finally
nz-ejbsz.txt is the file where we'll put our encrypted data.

At this point, you're prompted to enter and confirm a password. After that, `nz-ejbsz.txt` gets created (if it didn't already exist).

This:

10/09/2001
Ugh, it's my turn with these stupid 'travelling pants'. I don't care what Lena, Tibby, Bridget, and Carmen say, they totally stretched them out. I hear some lady wrote a book about us, and it's coming out tomorrow. That will definitely be the worst thing that happens in the whole world. The only thing that could possibly be worse is if they wrote me out of the book.

…becomes this:

5361 6c74 6564 5f5f 8456 721a 878a 5dd8
6a46 a2cc d47d 9268 eb7d beac e1ea 4300
c4f9 49d5 138e 27f8 ddbf e4fd bfce 7abc
e75b 7b2c 0241 29f7 459b c47c 9e91 8ac3
e258 90c2 3693 14a1 4a1b 45bc 9883 b16f
8e37 a854 9699 18cb 7660 5033 1c7f 13ca
599f 3687 f2fc 7dda 5d0d 34c9 db33 16eb
d67f d6b6 bfff b142 31ae a451 1095 6213
68ee fa5a a1b1 5795 0870 8fde c081 2e52
5c10 fcd9 a098 580d e49d 8aa5 7eee f703
de39 8028 669f e62c 944c 3fdd 5eb3 5719
2f3a 420a 7ae1 87b5 1ec4 9d78 829d eb93
a3ec 1592 2761 49b0 e78c 8fe9 6b16 f9b6
e9e7 337a fec0 9a2e 504b 14eb e565 f83e
b5f5 b46e f1b9 5d49 6b41 d6d8 909a c478
86fe b1fe efad 5045 c67d 8496 286a ad0d
08e8 8dc3 eb65 0a44 9f6d e40a 2bc8 002f
b4b8 81c1 9b7e f9f7 37fb a037 58bd e5b8
d160 6239 e306 38e5 5e07 f2d8 b962 a968
3a20 bda3 1c09 6239 9c02 af4c 5909 27cb
9bfc b8ab 22fa 7790 20f4 4712 df29 841e
cdc0 d265 b5ec b7f0 dd56 bc73 ace2 eac9
54eb 5f4e 5514 1fc9 3ab0 b2fb ba24 b82b
50ea b7ed 85a7 80f1 339c 1f24 0dea 5e5a
a62f 3dfb 963b e6bc 6c3d e5f6 5a6b 6908
ad4f ca6e 0808 e25f 5adb 0428 f9d0 b41f
d8c4 87ce 5034 368b 4bc5 23b0 7ca1 b62d
fcb4 8e81 2224 60d2 0c24 3fa3 56d7 5154
cbcc e0c3 27af 6572 69e4 1a99 2d0e 9c6d
58c8 2b1a f040 06dc 5e79 64f8 81b4 bdf5
0735 8660 d286 6c8b e642 e225 8e5c e4d7
31c8 25bf dd49 9a5b 2f5b 716d 7669 9d79
071b 827f 728f 3a0b 4300 ae39 5aab c9f9
3296 e315 e895 ee63 d679 5326 16ac 542f

The command to decrypt is almost identical:

openssl enc -aes-256-cbc -d -in nz-ejbsz.txt -out got-your-diary.txt

The only differences are that -e becomes -d, the input is set to the encrypted file, and the output is set to a new file.

We're prompted for the password again, and this:

5361 6c74 6564 5f5f 8456 721a 878a 5dd8
6a46 a2cc d47d 9268 eb7d beac e1ea 4300
c4f9 49d5 138e 27f8 ddbf e4fd bfce 7abc
e75b 7b2c 0241 29f7 459b c47c 9e91 8ac3
e258 90c2 3693 14a1 4a1b 45bc 9883 b16f
8e37 a854 9699 18cb 7660 5033 1c7f 13ca
599f 3687 f2fc 7dda 5d0d 34c9 db33 16eb
d67f d6b6 bfff b142 31ae a451 1095 6213
68ee fa5a a1b1 5795 0870 8fde c081 2e52
5c10 fcd9 a098 580d e49d 8aa5 7eee f703
de39 8028 669f e62c 944c 3fdd 5eb3 5719
2f3a 420a 7ae1 87b5 1ec4 9d78 829d eb93
a3ec 1592 2761 49b0 e78c 8fe9 6b16 f9b6
e9e7 337a fec0 9a2e 504b 14eb e565 f83e
b5f5 b46e f1b9 5d49 6b41 d6d8 909a c478
86fe b1fe efad 5045 c67d 8496 286a ad0d
08e8 8dc3 eb65 0a44 9f6d e40a 2bc8 002f
b4b8 81c1 9b7e f9f7 37fb a037 58bd e5b8
d160 6239 e306 38e5 5e07 f2d8 b962 a968
3a20 bda3 1c09 6239 9c02 af4c 5909 27cb
9bfc b8ab 22fa 7790 20f4 4712 df29 841e
cdc0 d265 b5ec b7f0 dd56 bc73 ace2 eac9
54eb 5f4e 5514 1fc9 3ab0 b2fb ba24 b82b
50ea b7ed 85a7 80f1 339c 1f24 0dea 5e5a
a62f 3dfb 963b e6bc 6c3d e5f6 5a6b 6908
ad4f ca6e 0808 e25f 5adb 0428 f9d0 b41f
d8c4 87ce 5034 368b 4bc5 23b0 7ca1 b62d
fcb4 8e81 2224 60d2 0c24 3fa3 56d7 5154
cbcc e0c3 27af 6572 69e4 1a99 2d0e 9c6d
58c8 2b1a f040 06dc 5e79 64f8 81b4 bdf5
0735 8660 d286 6c8b e642 e225 8e5c e4d7
31c8 25bf dd49 9a5b 2f5b 716d 7669 9d79
071b 827f 728f 3a0b 4300 ae39 5aab c9f9
3296 e315 e895 ee63 d679 5326 16ac 542f

…becomes this:

10/09/2001
Ugh, it's my turn with these stupid 'travelling pants'. I don't care what Lena, Tibby, Bridget, and Carmen say, they totally stretched them out. I hear some lady wrote a book about us, and it's coming out tomorrow. That will definitely be the worst thing that happens in the whole world. The only thing that could possibly be worse is if they wrote me out of the book.

Encryption learning curves

Now at this point, if we were able to devote the entire semester to encryption workflows, we might build an encrypted Node or PHP application that can store a user's password and credit card info and deepest, darkest secrets, but I have to draw a line at a certain point to make it really clear that what we're learning is not sufficient to build a secure application. To go much further, we'd be hitting a huge jump in the learning curve, where we talk about…

random vs. pseudo-random
AES modes
bit-lengths of keys
asynchronous coding
the UINT8 data type
initialization vectors (IVs, which are nonces (number occurring once(s)) - although while all IVs are nonces, not all nonces are IVs)

…and that's just to take the next step into a discussion of secure full-stack architecture. This is absolutely not something you'll be tasked with in your first job. Or third or fifth. Bigger places have security specialists, and smaller places use 3rd-party APIs to handle things like online payments and user authorization.

Today, towards the end of the lesson, we'll look at how service providers handle security, and I'll provide you with a reading list of great materials if you want to keep going ahead with learning about full-stack web application security, but, trust me, you've come a long way in a short while if you can read half of today's lesson without your eyes crossing.

What is Hashing?

Hashing, like encryption, is turning data into something indecipherable. Unlike encryption, however, with hashing, you can never turn the data back.

So is your data just lost forever? Well, kind of. But it's still useful. It's all about asking the right question.

Hashing with SHA

Let's say I were to use one of the most secure hash functions, SHA-256 Opens in a new window. I hash 'password1234' and get the value b9c950640e1b3740e98acb93e669c65 766f6670dd1609ba91ff41052ba48c6f3

b9c950640e1b3740e98acb93e669c65 766f6670dd1609ba91ff41052ba48c6f3 is impossible to reverse-engineer to get the value 'password1234'.

'password1234' is effectively gone, and all we're left with is b9c950640e1b3740e98acb93e669c65 766f6670dd1609ba91ff41052ba48c6f3 (which I'll start referring to as theHash for brevity's sake).

However, what you can do is check to see if values match theHash. Hashing always returns the same output when provided the same input.

What hashing lets you do is avoid storing sensitive data. You can 100% never store a user's password, but still have the ability to check if a user's password is correct.

In other words, the wrong question is "What is the original value of the hash?"

The right question is "does thisHash equal theHash?"

Okay, but doesn't that still leave us vulnerable to credential stuffing, or dictionary attacks, like trying the top 10 million passwords?

I mean, hopefully you're limiting the rate/number of attempted logins, but if someone were able to get around that, then yes! Hackers have some pretty ~~cool~~ advanced Opens in a new window ways of cracking hashes.

Hashing is funny, because it's kinda like the decryption password for each bit of hashed data is the data itself.

We can defend against attacks like credential stuffing by adding an additional password to the user's password. This is known as a 'salt'.

What is Salting?

A 'salt' is basically an extra password that you generate for a user, added to the end of their password.

This means that two users will never have the same password, and a hacker can't decode a list of hashed passwords by themselves.

Before you hash their password, you add the extra password (the salt) to the end of the password they've supplied. You don't save their password, but you do store the salt.

Salts are "cryptographically secure Opens in a new window" random strings generated by your encryption library.

This means that a hacker would need to get into your database, defeat your database encryption, get the users' salts, and then perform their dictionary attack against the passwords.

At this point, you've made it much easier for hackers to simply cold-call people and say, "uh, hey, it's… Bill, from the I.T. department - can you tell me your username and password?". In other words, your job is done.

Where to store your passwords/keys/salts.

Okay, at this point you've probably noticed that administrative passwords are still important for authenticating who is allowed to decrypt things, verify passwords, etc.

Obviously, you don't want to have to type in a password every time a user creates an account, so they need to be stored somewhere. But if a hacker gets into an encrypted database, and the encryption key is stored in that same database, it's like leaving the key to your house just sitting in the lock.

If you're in charge of your own hardware, there are extra-secure devices Opens in a new window called "hardware security modules" that you can use to store keys and perform encryption, instead of doing it on your regular server. If someone breaches your server, they still have to get into this thing:

Similar devices can also come in the form of small, portable keys Opens in a new window for the security-minded person on the go.

At very least, you should be storing your passwords on a separate server from the encrypted data. That way, a hacker would need to hack into both your web server and your database server.

A key hidden in a rock — The web security equivalent of a "hide-a-key"

How the pros do it

As I mentioned before, smaller shops (and big ones, too!) will often choose not to handle secure data, like credit card transactions. The most popular option for this is Paypal. The second most popular is called Stripe. Today we're going to talk about how Stripe secures their data (because Stripe's documentation Opens in a new window is much nicer to read).

HTTPS and HSTS for secure connections

Hey, we know how to do that! HTTPS just means having an SSL certificate, and HSTS just means setting your Strict-Transport-Security header.

Encryption of sensitive data and communication
All card numbers are encrypted at rest with AES-256. Decryption keys are stored on separate machines. None of Stripe’s internal servers and daemons can obtain plaintext card numbers but can request that cards are sent to a service provider on a static allowlist. Stripe’s infrastructure for storing, decrypting, and transmitting card numbers runs in a separate hosting environment, and doesn’t share any credentials with Stripe’s primary services (API, website, etc.).

Okay, we don't know how to do all this stuff, but with today's lesson we know about them.

✅ …encrypted at rest with AES-256
✅ …keys are stored on separate machines (HSM)
✅ …None of Stripe’s internal servers and daemons can obtain plaintext card numbers but can request that cards are sent to a service provider on a static allowlist (whitelisting/blacklisting)
✅ …infrastructure for storing, decrypting, and transmitting card numbers runs in a separate hosting environment (separate server)

Vulnerability disclosure and reward program
Stripe maintains a private, invite-only bug bounty program, with the assistance of HackerOne.

This is pretty cool, and a not uncommon technique. "Bug bounties" are rewards that companies offer for finding security problems. In other words, they pay hackers for telling them they're vulnerable.

Sometimes bug bounties are public, and sometimes they're run through an organization like HackerOne Opens in a new window, which has paid out >$100 million dollars in bounties.

Continuing your security journey

Further Reading:

Introduction to Web Security Opens in a new window, Stanford University Free course
Web Application Security Opens in a new window, O'Reilly
All learning materials Opens in a new window, Web Security Academy (from the people who brought us The Web Application Hacker's Handbook)
Best practices for building secure API Keys Opens in a new window, freeCodeCamp

That's it!

Footnotes

Ugh, ok nerds, also plasma and liquid crystal and fermionic condensate or whatever Back to "States of data"

Table of Contents