ssh host key fingerprint and known host file

I have always been wondering what below message means when I try to connect to a new host but I have always given “yes” to the question.
After all, what does those fingerprint mean, how should I know whether I should trust it, instead of some Man-in-The-Middle(MITM) attack waiting for me to fall into the trap?

The authenticity of host 'XXXXXXXXX (127.0.0.1)' can't be established.
RSA key fingerprint is ed:9f:f5:88:8d:51:05:93:e8:56:3c:5a:b7:83:d7:e8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'XXXXXXXXX' (RSA) to the list of known hosts.

It is until last week when someone asks about this and I decided that I should find out more about it.

Our production environment uses Kerberos over ssh so there is really no use for known hosts file.
To reproduce it I have to first run “kdestroy” to wipe out my kerberos cache and then I have to specify a local file as my known host file since the default known host file is on AFS and with kerberos ticket gone, afs tokens are destroyed as well, which just means ssh client is unable to update known host file.

I use open-ssh so my client command looks like this:

$ssh -o UserKnownHostsFile=/var/tmp/known_hosts -o HostKeyAlgorithms=ssh-rsa -o HashKnownHosts=no XXXXXXX

Here I have specified a cutomized user known host file instead of the default one at “~/.ssh/known_hosts”.
Also I have force to uses ssh-rsa as the host key algorithm since by default my version of open-ssh seems to favor ECDSA over the rest.
RSA is relatively simply in concept and very widely used in various applications so we use it as a starting point.

Last I have also specified HashKnownHosts=no so known hosts file doesn’t have hostname hashed, which is simpler, but we shall see the difference soon.

With above I was asked to input password after the familiar fingerprint banner is displayed.
All goes well and let’s see what is stored in known hosts file:

$cat /var/tmp/known_hosts
XXXXXXX ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxV2OrfVT6uhkIfjI873lxKnirR8eXehIdACkTdzkRrSLQ8A7b+NmqBOzrtbeqUc47qBKYPMYxptf9ILzXj4+uxpp5pynliUUiOkoXVCAPGgFqeetczCAyPNaqr70O4Mv1iVlriGKhPiTEFYgf4Nr+xiAAOCjoBQqbcJOmL/NZQKtmcFFDA8RxHUBUq6+IpJA+8kVfLm7e+NBrsT828zhnf3ff0mTbhouVimluH3MwAiEhULy6ccNMjLsgLKR8k+ktHURiU15mp5HpC5tgWdMPWY4djMlgrjEhXcpVC6tm2VG/EIvapC/Dh+cFDmNmwXwjnEA2KSP+avtivUIdNrXf

We have 3 fields here, first one the host name, then “ssh-rsa” which seems to be the host key algorithm, with the last field being a bunch of printable characters and seems to be the most critical info we are after.

It actually looks quite familiar if you have ever tried to peek into the sshd configuration dir at /etc/ssh.

$cat /etc/ssh/ssh_host_rsa_key.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxV2OrfVT6uhkIfjI873lxKnirR8eXehIdACkTdzkRrSLQ8A7b+NmqBOzrtbeqUc47qBKYPMYxptf9ILzXj4+uxpp5pynliUUiOkoXVCAPGgFqeetczCAyPNaqr70O4Mv1iVlriGKhPiTEFYgf4Nr+xiAAOCjoBQqbcJOmL/NZQKtmcFFDA8RxHUBUq6+IpJA+8kVfLm7e+NBrsT828zhnf3ff0mTbhouVimluH3MwAiEhULy6ccNMjLsgLKR8k+ktHURiU15mp5HpC5tgWdMPWY4djMlgrjEhXcpVC6tm2VG/EIvapC/Dh+cFDmNmwXwjnEA2KSP+avtivUIdNrXf root@XXXXXXX

Here we go.
The known hosts file in this case is just the public key file content!

Next question is how does fingerprint gets connected with the key file here.
For that, we actually have a simpler command to use – ssh-keygen.

$ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key.pub
2048 ed:9f:f5:88:8d:51:05:93:e8:56:3c:5a:b7:83:d7:e8  root@XXXXXXX (RSA)

So the finger prints generated from this file looks exactly the same as the finger print that was printed out when we first connected to a new host.
We have an answer now to how we verify a new host – if we have obtained the public key via some other channels we can use ssh-keygen to help us get a fingerprint which can then be used later for verifying the identify of the target host.

Now we know where the fingerprint comes from we want to know what the list of long string means and how we get fingerprint out of it.

First, the string is a long list of printable chars which sounds very familiar and just as we guessed it, it’s base64 encode.
Base64 wiki can be found here: http://en.wikipedia.org/wiki/Base64 and luckily we have a linux command that is readily available to use.

So we copies the string to a temp file and runs base64 to decode it:

$cat a
AAAAB3NzaC1yc2EAAAADAQABAAABAQCxV2OrfVT6uhkIfjI873lxKnirR8eXehIdACkTdzkRrSLQ8A7b+NmqBOzrtbeqUc47qBKYPMYxptf9ILzXj4+uxpp5pynliUUiOkoXVCAPGgFqeetczCAyPNaqr70O4Mv1iVlriGKhPiTEFYgf4Nr+xiAAOCjoBQqbcJOmL/NZQKtmcFFDA8RxHUBUq6+IpJA+8kVfLm7e+NBrsT828zhnf3ff0mTbhouVimluH3MwAiEhULy6ccNMjLsgLKR8k+ktHURiU15mp5HpC5tgWdMPWY4djMlgrjEhXcpVC6tm2VG/EIvapC/Dh+cFDmNmwXwjnEA2KSP+avtivUIdNrXf
$cat a|base64 -d > o
$wc -c o
279 o
$od -tx1 o
0000000 00 00 00 07 73 73 68 2d 72 73 61 00 00 00 03 01
0000020 00 01 00 00 01 01 00 b1 57 63 ab 7d 54 fa ba 19
0000040 08 7e 32 3c ef 79 71 2a 78 ab 47 c7 97 7a 12 1d
0000060 00 29 13 77 39 11 ad 22 d0 f0 0e db f8 d9 aa 04
0000100 ec eb b5 b7 aa 51 ce 3b a8 12 98 3c c6 31 a6 d7
0000120 fd 20 bc d7 8f 8f ae c6 9a 79 a7 29 e5 89 45 22
0000140 3a 4a 17 54 20 0f 1a 01 6a 79 eb 5c cc 20 32 3c
0000160 d6 aa af bd 0e e0 cb f5 89 59 6b 88 62 a1 3e 24
0000200 c4 15 88 1f e0 da fe c6 20 00 38 28 e8 05 0a 9b
0000220 70 93 a6 2f f3 59 40 ab 66 70 51 43 03 c4 71 1d
0000240 40 54 ab af 88 a4 90 3e f2 45 5f 2e 6e de f8 d0
0000260 6b b1 3f 36 f3 38 67 7f 77 df d2 64 db 86 8b 95
0000300 8a 69 6e 1f 73 30 02 21 21 50 bc ba 71 c3 4c 8c
0000320 bb 20 2c a4 7c 93 e9 2d 1d 44 62 53 5e 66 a7 91
0000340 e9 0b 9b 60 59 d3 0f 59 8e 1d 8c c9 60 ae 31 21
0000360 5d ca 55 0b ab 66 d9 51 bf 10 8b da a4 2f c3 87
0000400 e7 05 0e 63 66 c1 7c 23 9c 40 36 29 23 fe 6a fb
0000420 62 bd 42 1d 36 b5 df
0000427

The generated output file is 279 bytes long and seems quite random with lots of non-printable chars.
We can take a peek by trying to print what is printable

$od -tx1a o
0000000  00  00  00  07  73  73  68  2d  72  73  61  00  00  00  03  01
        nul nul nul bel   s   s   h   -   r   s   a nul nul nul etx soh
0000020  00  01  00  00  01  01  00  b1  57  63  ab  7d  54  fa  ba  19
        nul soh nul nul soh soh nul   1   W   c   +   }   T   z   :  em
0000040  08  7e  32  3c  ef  79  71  2a  78  ab  47  c7  97  7a  12  1d
         bs   ~   2   <   o   y   q   *   x   +   G   G etb   z dc2  gs
0000060  00  29  13  77  39  11  ad  22  d0  f0  0e  db  f8  d9  aa  04
        nul   ) dc3   w   9 dc1   -   "   P   p  so   [   x   Y   * eot
0000100  ec  eb  b5  b7  aa  51  ce  3b  a8  12  98  3c  c6  31  a6  d7
          l   k   5   7   *   Q   N   ;   ( dc2 can   <   F   1   &   W
0000120  fd  20  bc  d7  8f  8f  ae  c6  9a  79  a7  29  e5  89  45  22
          }  sp   <   W  si  si   .   F sub   y   '   )   e  ht   E   "
0000140  3a  4a  17  54  20  0f  1a  01  6a  79  eb  5c  cc  20  32  3c
          :   J etb   T  sp  si sub soh   j   y   k   \   L  sp   2   <
0000160  d6  aa  af  bd  0e  e0  cb  f5  89  59  6b  88  62  a1  3e  24
          V   *   /   =  so   `   K   u  ht   Y   k  bs   b   !   >   $
0000200  c4  15  88  1f  e0  da  fe  c6  20  00  38  28  e8  05  0a  9b
          D nak  bs  us   `   Z   ~   F  sp nul   8   (   h enq  nl esc
0000220  70  93  a6  2f  f3  59  40  ab  66  70  51  43  03  c4  71  1d
          p dc3   &   /   s   Y   @   +   f   p   Q   C etx   D   q  gs
0000240  40  54  ab  af  88  a4  90  3e  f2  45  5f  2e  6e  de  f8  d0
          @   T   +   /  bs   $ dle   >   r   E   _   .   n   ^   x   P
0000260  6b  b1  3f  36  f3  38  67  7f  77  df  d2  64  db  86  8b  95
          k   1   ?   6   s   8   g del   w   _   R   d   [ ack  vt nak
0000300  8a  69  6e  1f  73  30  02  21  21  50  bc  ba  71  c3  4c  8c
         nl   i   n  us   s   0 stx   !   !   P   <   :   q   C   L  ff
0000320  bb  20  2c  a4  7c  93  e9  2d  1d  44  62  53  5e  66  a7  91
          ;  sp   ,   $   | dc3   i   -  gs   D   b   S   ^   f   ' dc1
0000340  e9  0b  9b  60  59  d3  0f  59  8e  1d  8c  c9  60  ae  31  21
          i  vt esc   `   Y   S  si   Y  so  gs  ff   I   `   .   1   !
0000360  5d  ca  55  0b  ab  66  d9  51  bf  10  8b  da  a4  2f  c3  87
          ]   J   U  vt   +   f   Y   Q   ? dle  vt   Z   $   /   C bel
0000400  e7  05  0e  63  66  c1  7c  23  9c  40  36  29  23  fe  6a  fb
          g enq  so   c   f   A   |   #  fs   @   6   )   #   ~   j   {
0000420  62  bd  42  1d  36  b5  df
          b   =   B  gs   6   5   _
0000427

Now we see some familar “ssh-rsa” so apparently we are on the right track.

The decoded contents actually follows a simple coding rule which is “length(4 bytes) + content (length bytes)”.
So if we look at first 4 bytes, it’s “00 00 00 07” which suggests 7 bytes after it and the next 7 bytes happens to be a string indicating key type “ssh-rsa”.

Next we have another 4 bytes for length “00 00 00 03” so it’s another 3 bytes of data after it and then after it the length bytes again “00 00 01 01” which is 257 bytes.

All those added up as “4 * 3 + 7 + 3 + 257 = 279” bytes which is exactly the size of this blob.

And if we run md5sum over this block we will get the familiar finger print:

$md5sum o
ed9ff5888d510593e8563c5ab783d7e8  o

Compared this with the fingerprint in the banner and only difference is the fingerprint is separated by “:” for easy comparison.

RSA key fingerprint is ed:9f:f5:88:8d:51:05:93:e8:56:3c:5a:b7:83:d7:e8.

To understand those blocks we will have to refer to some code logic.
in open-ssh code this part is actually called blob and there is a function key_to_blob() that does the creation.

int
key_to_blob(const Key *key, u_char **blobp, u_int *lenp)
{
  Buffer b;
  int len;

  if (key == NULL) {
    error("key_to_blob: key == NULL");
    return 0;
  }
  buffer_init(&b);
  switch (key->type) {
...
 case KEY_RSA:
    buffer_put_cstring(&b, key_ssh_name(key));
    buffer_put_bignum2(&b, key->rsa->e);
    buffer_put_bignum2(&b, key->rsa->n);
    break;
  default:
    error("key_to_blob: unsupported key type %d", key->type);
    buffer_free(&b);
    return 0;
  }
  len = buffer_len(&b);
  if (lenp != NULL)
    *lenp = len;
  if (blobp != NULL) {
    *blobp = xmalloc(len);
    memcpy(*blobp, buffer_ptr(&b), len);
  }
  memset(buffer_ptr(&b), 0, len);
  buffer_free(&b);
  return len;
}

So as can be seen in the code above, for RSA key type we wrote 3 components to the blob: the key type name, rsa’s e and rsa’s n.
RSA is a fairly easy to understand algorithm and wiki is here: http://en.wikipedia.org/wiki/RSA_%28cryptosystem%29

So e stands for exponent and n is the very big number that is the product of two chosen prime numbers and encryption is done by:
rsa_encyption

The two RSA components are written using openssl’s big number routines, we can write a simple program to decode it.

#include <string.h>
#include <stdio.h>
#include <iostream>
#include <openssl/bn.h>

using namespace std;

int get_len(char * p)
{
  char buf[4];

  buf[0] = p[3], buf[1] = p[2], buf[2] = p[1], buf[3] = p[0];

  return *(int *)buf;
}

int main(int argc, char **argv)
{
  FILE * f  = fopen(argv[1], "rb");

  char out[1024];
  fread(out, 1024, 1, f);

  cout << "type len = " << get_len(out) << endl;

  char * p = out + 4 + get_len(out);

  cout << "e len=" << get_len(p) << endl;
  BIGNUM * e = BN_new();
  BN_bin2bn((const unsigned char *)(p + 4), get_len(p), e);

  cout << "e:" << BN_bn2dec(e) << endl;

  p += (4 + get_len(p));
  cout << "n len=" << get_len(p) << endl;
  BIGNUM * n = BN_new();
  BN_bin2bn((const unsigned char *)(p + 4), get_len(p), e);

  cout << "n:" << BN_bn2dec(e) << endl;
  cout << "n bits:" << BN_num_bits(e) << endl;

  return 0;
}

In the above code we try to convert e and n to decimal string format as well as the number of bits for n.
when running the code with o file as input we get:

$g++ -o d d.C -I/usr/local/ssl/include -L/usr/local/ssl/lib -lcrypto
$./d o
type len = 7
e len=3
e:65537
n len=257
n:22387273266423075974082624384767470388441416409782079734920946086408479910108407914605858079603815175293616967951844051997662415827944406181566697263690893252489180705885464521472973608875612687650427809726490577965429630207097385076748558459229887620060910628000969064563226457292645897085918432043776066954489512323887029308656430918098327082340263996980696161490587364361106684945571491158623135736036239855137396505546767787748917412724884366660583295520279686914295260500180017539923750328532476202752209061391566016342148670265750351454810483921391735731272628022474064070029922823291197165047081645049946486239
n bits:2048

Note that we links to openssl’s crypto lib to be able to use those BN_ routines.

Result shows that this RSA is 2048 bits and as can be imagined, n is a forbiddingly large number.
After all, RSA’s security relies on the difficulty in factoring this large number into product of 2 prime numbers.

Also as is shown in earlier RSA encryption formula, with the knowledge of both e and n we are already able to encrypt the message so for RSA that’s all a public key needs to include.

Last as we mentioned earlier, for better security, when updating the known host file we can choose to hash the host name so it becomes intelligible.
Here is the example:

$ssh -o UserKnownHostsFile=/var/tmp/known_hosts_hash -o HostKeyAlgorithms=ssh-rsa -o HashKnownHosts=yes XXXXXXX
The authenticity of host 'XXXXXXX (127.0.0.1)' can't be established.
RSA key fingerprint is ed:9f:f5:88:8d:51:05:93:e8:56:3c:5a:b7:83:d7:e8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'XXXXXXX' (RSA) to the list of known hosts.

$cat /var/tmp/known_hosts_hash
|1|kt8zyNpvHeYN/9GNIlI0NmXUd9o=|zdF1BFu5DeCi/t6X/F7tABw68Bk= ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxV2OrfVT6uhkIfjI873lxKnirR8eXehIdACkTdzkRrSLQ8A7b+NmqBOzrtbeqUc47qBKYPMYxptf9ILzXj4+uxpp5pynliUUiOkoXVCAPGgFqeetczCAyPNaqr70O4Mv1iVlriGKhPiTEFYgf4Nr+xiAAOCjoBQqbcJOmL/NZQKtmcFFDA8RxHUBUq6+IpJA+8kVfLm7e+NBrsT828zhnf3ff0mTbhouVimluH3MwAiEhULy6ccNMjLsgLKR8k+ktHURiU15mp5HpC5tgWdMPWY4djMlgrjEhXcpVC6tm2VG/EIvapC/Dh+cFDmNmwXwjnEA2KSP+avtivUIdNrXf

And if we look at the part of code that does host name hash, it’s using SHA1 with salt to fight rainbow table based brute force attacks.
And salt is generated from arc4 random algorithm.

char *
host_hash(const char *host, const char *name_from_hostfile, u_int src_len)
{
  const EVP_MD *md = EVP_sha1();
  HMAC_CTX mac_ctx;
  char salt[256], result[256], uu_salt[512], uu_result[512];
  static char encoded[1024];
  u_int i, len;

  len = EVP_MD_size(md);

  if (name_from_hostfile == NULL) {
    /* Create new salt */
    for (i = 0; i < len; i++)
      salt[i] = arc4random();
  } else {
    /* Extract salt from known host entry */
    if (extract_salt(name_from_hostfile, src_len, salt,
        sizeof(salt)) == -1)
      return (NULL);
  }

  HMAC_Init(&mac_ctx, salt, len, md);
  HMAC_Update(&mac_ctx, host, strlen(host));
  HMAC_Final(&mac_ctx, result, NULL);
  HMAC_cleanup(&mac_ctx);

  if (__b64_ntop(salt, len, uu_salt, sizeof(uu_salt)) == -1 ||
      __b64_ntop(result, len, uu_result, sizeof(uu_result)) == -1)
    fatal("host_hash: __b64_ntop failed");

  snprintf(encoded, sizeof(encoded), "%s%s%c%s", HASH_MAGIC, uu_salt,
      HASH_DELIM, uu_result);

  return (encoded);
}

About codywu2010

a programmer
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s