Password encryption: rationale and Java example

Where has your password been?

Most of the web sites today have some sort of a registration module where a user is asked to choose a username/password combination. This data gets stored in the database. You might wonder if the password you provide will be kept well-protected (read encrypted). In case you are the person designing such backend registration component, why not give your users peace of mind by encrypting their passwords?

One-way hash encryption

This scenario is a perfect candidate for "one-way hash encryption" also known as a message digest, digital signature, one-way encryption, digital fingerprint, or cryptographic hash. It is referred to as "one-way" because although you can calculate a message digest, given some data, you can't figure out what data produced a given message digest. This is also a collision-free mechanism that guarantees that no two different values will produce the same digest. Another property of this digest is that it is a condensed representation of a message or a data file and as such it has a fixed length.

There are several message-digest algorithms used widely today.

Algorithm Strength
MD5 128 bit
SHA-1 160 bit

SHA-1 (Secure Hash Algorithm 1) is slower than MD5, but the message digest is larger, which makes it more resistant to brute force attacks. Therefore, it is recommended that Secure Hash Algorithm is preferred to MD5 for all of your digest needs. Note, SHA-1 now has even higher strength brothers, SHA-256, SHA-384, and SHA-512 for 256, 384 and 512-bit digests respectively.

Typical registration scenario

Here is a typical flow of how our message digest algorithm can be used to provide one-way password hashing:

1) User registers with some site by submitting the following data:

username password
jsmith mypass

2) before storing the data, a one-way hash of the password is created: "mypass" is transformed into "5yfRRkrhJDbomacm2lsvEdg4GyY="

The data stored in the database ends up looking like this:

username password
jsmith 5yfRRkrhJDbomacm2lsvEdg4GyY=

3) When jsmith comes back to this site later and decides to login using his credentials (jsmith/mypass), the password hash is created in memory (session) and is compared to the one stored in the database. Both values are equal to "5yfRRkrhJDbomacm2lsvEdg4GyY=" since the same password value "mypass" was used both times when submitting his credentials. Therefore, his login will be successful.

Note, any other plaintext password value will produce a different sequence of characters. Even using a similar password value ("mypast") with only one-letter difference, results in an entirely different hash: "hXdvNSKB5Ifd6fauhUAQZ4jA7o8="

plaintext password encrypted password
mypass 5yfRRkrhJDbomacm2lsvEdg4GyY=
mypast hXdvNSKB5Ifd6fauhUAQZ4jA7o8=

As mentioned above, given that strong encryption algorithm such as SHA is used, it is impossible to reverse-engineer the encrypted value from "5yfRRkrhJDbomacm2lsvEdg4GyY=" to "mypass". Therefore, even if a malicious hacker gets a hold of your password digest, he/she won't be able determine what your password is.

Java code that implements one-way hash algorithm

Let's assume that you are writing a web application to be run in a servlet container. Your registration servlet might have the following portion (for clarity, I ommitted input validation steps and assume that a password value was passed in within the password form input field):

[...]
public void doPost(HttpServletRequest request, HttpServletResponse response)
{
  User user = new org.myorg.registration.User();
  user.setPassword(org.myorg.services.PasswordService.getInstance().encrypt(request.getParameter("password"));
[...]

Here is the definition of my PasswordService class that does the job of generating a one-way hash value:

package org.myorg.services;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import org.myorg.SystemUnavailableException;
import sun.misc.BASE64Encoder;
import sun.misc.CharacterEncoder;

public final class PasswordService
{
  private static PasswordService instance;

  private PasswordService()
  {
  }

  public synchronized String encrypt(String plaintext) throws SystemUnavailableException
  {
    MessageDigest md = null;
    try
    {
      md = MessageDigest.getInstance("SHA"); //step 2
    }
    catch(NoSuchAlgorithmException e)
    {
      throw new SystemUnavailableException(e.getMessage());
    }
    try
    {
      md.update(plaintext.getBytes("UTF-8")); //step 3
    }
    catch(UnsupportedEncodingException e)
    {
      throw new SystemUnavailableException(e.getMessage());
    }

    byte raw[] = md.digest(); //step 4
    String hash = (new BASE64Encoder()).encode(raw); //step 5
    return hash; //step 6
  }
  
  public static synchronized PasswordService getInstance() //step 1
  {
    if(instance == null)
    {
       instance = new PasswordService(); 
    } 
    return instance;
  }
}

The method of interest here is encrypt(). I chose to make this class a singleton in order to ensure that there is only one instance of it at any given time to avoid concurrency issues and conflicts between generated hash values. For an explanation of this design pattern, try a google search for "java singleton pattern".

Let's step through the code above to see what's going on:

step 1: The registration servlet will interface with our PasswordService class using this static getInstance() method. Whenever it is invoked, a check will be made to see if an instance of this service class already exists. If so, it will be returned back to the caller (registration servlet). Otherwise, a new instance will be created.

step 2: We are asking Java security API to obtain an instance of a message digest object using the algorithm supplied (in this case, SHA-1 message digest algorithm will be used. Both SHA and SHA-1 refer to the same thing, a revised SHA algorithm). Sun JDK includes JCA (Java Cryptography Architecture) which includes support for SHA algorithm. If your environment does not support SHA, NoSuchAlgorithmException will be thrown.

step 3: Feed the data:
a) convert the plaintext password (eg, "jsmith") into a byte-representation using UTF-8 encoding format.
b) apply this array to the message digest object created earlier. This array will be used as a source for the message digest object to operate on.

step 3: Do the transformation: generate an array of bytes that represent the digested (encrypted) password value.

step 4: Create a String representation of the byte array representing the digested password value. This is needed to be able to store the password in the database. At this point, the hash value of the plaintext "jsmith" is "5yfRRkrhJDbomacm2lsvEdg4GyY=".

step 5: Return the String representation of the newly generated hash back to our registration servlet so that it can be stored in the database. The user.getPassword() method now returns "5yfRRkrhJDbomacm2lsvEdg4GyY="

That's all. Your database password data is now encrypted and if an intruder gets a hold of it, he/she won't have much use of it. Note, you have to consider how you will handle "forgot password" functionality in this case as you now cannot simply send a password to the user's email address. (Well, you should not be doing things like that anyway) . Sounds to me like a perfect topic for my next article.

Comments

Lost passwords

Checking that a password is correct is just a small part of the registration mechanism a typical site has to go though. In most cases you'll want to provide your users with an "I forgot my password" link. With encrypted passwords you can't then simply send him is old password by email.

You have other options of course, for instance generating a new password when one is forgotten, but you'll then have to be careful that no DoS is possible using this mechanism, which further complexifies the setup. Or you can ask the user to re-validate his email when setting the new password. That's all added complexity that many sites won't want to go through.

re: Lost passwords

Yeah, as I mentioned in the last paragraph, "forgot password" functionality will be affected as you now cannot simply send a password to the owner's email address. What I like to do seems even more complex than what you list but I think this is the most secure way:
  1. opaque Id: i like to store two id's in user database: one is their username and the other is a random opaqueId that gets generated during registration and is stored in the user table.
  2. email verification: verify user's email address during registration (send him/her a confirmation email and ask to visit a link). the opaqueId gets sent on the query string within the verification email.
  3. forgot password?: when user forgets a password, he enters his username and gets only if the email was previously verified in step 1. if so, the email sent to the user contains a link that includes an opaqueId on the query string. when user visits the url, he/she is auto-logged in and is presented with a "choose new password" form. note, you can visit this url only once. it expires immediately after you that orit can expire if it is not visited after a specific period (such as 1 hour).
So, I agree. These steps are way more than most sites do to secure user passwords. But would anyone like their password to be sent to their email address if they use a non-ssl mail client? What if you happen to use this password with other important (such as banking) sites?

correction to my prev. comment

I'd like correct myself for the sake of completeness and not to mislead anyone. step 3, "forgot password?" should consist of the following:
  1. when user forgets password, he enters his username and gets help only if the email was previously verified. if so, a new random id is generated for this request. it is associated with a particular username and temporarily gets stored either in memory (using singleton hashtable) or in a special table in the database along with the timestamp when it was created.
  2. an email is sent to the user with a link that includes that temporary random value we created.
  3. when the user visits the url received, the temp key value gets looked up to see what username is associated with it. then, he/she is auto-logged in and is presented with a "choose new password" form. note, once the url is visited, the temporary random key expires and gets deleted (it will also expire if the url was not visited for a specific period of time)

re: Lost Passwords

Madeonmoon, If someone is using a non-ssl mail client and you send a temporary link to their email address, wouldn't it be possible for someone else to get into their email and visit the URL before they do? Eg. If I know John Doe is using a financial web site and his email address is john@doe.com, I could plan to have the email sent with new password and get into his account as soon as I hit that 'forgot password' button. I understand that obviously the system won't be flawless and your steps are a good way towards securing things as tightly as possible, but I was just wondering if the above is an issue as my understanding of accessing others emails isn't that good.

a link vs. your password in your inbox

Keej, I agree with you that both are not a perfect solution. But the solution I describe is probably the best I've seen. Sending a link to the user is better in several ways then sending a plaintext password in the email:
  1. If you reuse the same password at several sites, and somebody gets a hold of your password for one of the sites, you are in trouble. Using a link that is only applicable to that particular site can only be compromized on that one site
  2. Your password does not expire. Chances are when you get a plaintext password in your email, you keep using it and don't always delete the original message. This could be dangerous.. The solution I described is designed so that the password help link expires as soon as it is visited or as soon as a specific time period that you set expires, whichever comes first.
These are just two points I could think of at the time. I am sure there are more..

re: insecure inboxes

>> possible for someone else to get into their
>> email and visit the URL before they do?

possible, I suppose. The system I use creates an md5 hash, and compares an md5 encrypted version submitted password and the stored hash as referenced in the article.

the way I deal with lost passwords is by sending in the email a hashed email address and hashed password then check these against whats in the db and then have the user input a new password, etc.

there's no way anyone scamming an inbox would know what the account NAME was, so they'd be unable to use it...

does that make sense?

re: insecure inboxes

the way I deal with lost passwords is by sending in the email a hashed email address and hashed password then check these against whats in the db and then have the user input a new password, etc.

This makes sense to me. What do you mean when you say what the account NAME was ?

I guess I am not a big fan of sending someone's password over the internet (even when it's hashed, especially given the fact that md5 is not the most secure hash algorithm). But I could be too paranoid about it.

re: insecure inboxes

when I set up a usermangement system I require a username, email, and password... not merely an email and password... to login you need both the username and the password... in the email for lost passwords I don't send the username... I am able to validate that by the email...

re: insecure inboxes

Ok, I see what you are saying. Seems like a good system to me. On another note, I guess your system is designed not allow users to find other users' username based on email their addresses. Also, if your system allows for multiple user accounts with the same email addresses in which case the combination of the email + password may render duplicates...

re: insecure inboxes

right - no finding users by their email, and no multiple users on one email account... i actually find that more secure ... and easier to track users...

do you know if PHP supports SHA - I mostly use PHP for sites, and md5 is a native function, but SHA does sound more secure... nice job on the article BTW

re: insecure inboxes

thanks! i never used it but check out http://www.php.net/manual/en/function.sha1.php

sha1

new it was there, was looking for more of a user comment - none there - but the best way to figure something out is to use it right? sha1 will always return the same hash I presume...

re: sha1

if string sha1 ( string str) works correctly it will always return the same value for a given str

re: sha1

heh, i thought your last sentence was a question not a statement :-)

yeah - thx

yeah - thx :)

Longest encrypted return value?

Excellent article. Is there a guideline for how long a digested password will be? I want to make sure my database field is large enough to handle all possible values. Assume a max plaintext length of 15 characters.

re: Longest encrypted return value?

thanks. if you use sha-1 algorithm, which is what i would recommend you use, the length of the digested value will always be fixed whether your plaintext password is 2 characters or 200 characters long. it will have the same length as my example above has (5yfRRkrhJDbomacm2lsvEdg4GyY=). so, the total length will be 28 characters including the "=" (which will always be there at the end of the digest).

really? no duplicate hashes?

even though the article states that you're guaranteed no two hashes will be the same, that's not really true right? technically there is an infinite number of possible strings that can be hashed. on the other hand there is a finite number of hashes that can be produced. i guess it's just a matter of probability right? it's most likely such a small possibility that it's impossible i'm sure. i was just thinking about it is all.

re: no duplicate hashes?

yeah, i belive you are right. from what i understand it is mathematically possible to hit the same hash value but the probability of that happenning is *exremely* low. if you need to, you can always code to check for duplicates.

Speed?

You mentioned that SHA-1 wasn't very fast? How fast is that? I'm looking to digest files in the 10s to 100s of gigabytes (need to ensure they haven't changed, md5 isn't quite secure enough becuase we have some pretty big computers around that people could easily use to break 128bit). Any links to SHA-1 perf tests or something like that?

The Single Quote Problem

Does the getBytes("UTF-8") function return chars that might pose problems for the database (like single quotes?) I assume it does, so I'll look for them and do a replace...

HTTP and security of password

Maybe I am missing something in this discussion, but how do you protect the password between the user's browser and your server, when it is transmitted?

Re: HTTP and security of password

With HTTPS. Your server needs an SSL certificate from an organization like www.verisign.com or www.freessl.com. The form should post to a page that is SSL enabled.


hth, Chris.

Re: HTTP and security of password

yeah, this article did not cover how to securely transmit data between user browser and the server (using SSL as glaven mentioned). Rather it concentrates on how to securely store password and other sensitive info once it's on the server. best, james

If Some Passwords in DB are encrypted and some are Non encrypted

Hi, If Some Passwords in DB are encrypted and some are Non encrypted, then how to handle this situation, by using this code. There need to check whether DB value is encrypted or not. How to check this value is encrypted or not. pls give code for checking value is encrypted or non encrypted.