Kdb with C style printf() function extension support

Recently a colleague talked about the possibility of extending kdb with the support of C style printf() function so he can do it in q with something like “qprintf[“%s is %05d”;(“foo”;foo)]“.

This is an interesting idea and q does allow C extension with rather simple grammar “2:”.
Here is the link that explains how one can do it:http://code.kx.com/wiki/Cookbook/ExtendingWithC
addc
However, the biggest difficulty lies in how to pass to “printf” at run time when the number of arguments is not known.

Normally when we call printf() even though we don’t care how many arguments we put, the number is fixed in compile time so gcc can do all kinds of tricks to generate va_list and allow printf to use those built-in functions like va_start to process the arguments one by one.

At run time, this is nearly impossible due to that you basically have to redo the work of compiler and potentially this means you put yourself in a position to deal with all kinds of calling convention as well as register/stack use.

For example, x64 uses registers for its first several parameters with “rdi,rsi, rcx, rdx…”. Dealing with this means manipulating those registers at best and would not help with a clean and portable solution.

After given it some thought it seems at least for printf() we have a solution that can workaround var-args limitation.

As we know printf() supports various format specifiers and all those format specifiers start with “%” prefix.
Here is a snapshot from http://www.cplusplus.com/reference/cstdio/printf/
printf_format

So the simple idea is to split the format spec like “%s is %d” into several segments with each containing one “%” then we can call printf() multiple times then concatenate the result, sorta map-reduce like.

And although spec format can be quite flexible in that it’s not always as simple as “%d” we can always cut the spec format on the character right before ‘%’.
Of course we also have to think about the case where ‘%’ is escaped by two consecutive ‘%%’ which we should simply skip over and look for next occurence of ‘%’.

So here goes the code

#define KXVER 3
#include "k.h"

#include <stdio.h>
#include <string.h>

static K _arg(K p_args, int index)
{
  // if args is a mixed list, return nth arg
  if(p_args->t == 0)
  {
    if(index >= p_args->n) return NULL;

    return kK(p_args)[index];
  }

  // if args is not a list, we only allow return it for index 0
  if(index == 0) return p_args;

  return NULL;
}

static int _printf_arg(char * p_res, char * p_fmt, K p_arg)
{
  // if cannot get corresponding arg, just print spec fmt as it is, this should give user visual cue
  if(!p_arg)
  {
    char buf[256];
    sprintf(buf, p_fmt, ""); // do an additional sprintf to get rid of %%
    strcat(p_res, buf);
    return 0;
  }

  char buf[256];

  // check arg type
  switch(p_arg->t)
  {
    // char *
    case KG:
    case KC:
    {
      char str_buf[256];
      strncpy(str_buf, kG(p_arg), p_arg->n);
      str_buf[p_arg->n] = 0;

      sprintf(buf, p_fmt, str_buf);
      break;
    }

    // short
    case -KH:
      sprintf(buf, p_fmt, p_arg->h);
      break;
    // int
    case -KI:
      sprintf(buf, p_fmt, p_arg->i);
      break;

    // float
    case -KJ:
      sprintf(buf, p_fmt, p_arg->j);
      break;

    // double
    case -KF:
      sprintf(buf, p_fmt, p_arg->f);
      break;

    // symbol
    case -KS:
      sprintf(buf, p_fmt, p_arg->s);
      break;

    default: // take as error condition
      sprintf(p_res, "unsupported type [%d]\n", p_arg->t);
      return 1;
  }

  strcat(p_res, buf);

  return 0;
}

K printq(K k_fmt, K k_args)
{
  int i_arg = 0;

  char r[2560];
  r[0] = 0;

  char * p1    = 0;
  char * p_end = 0;

  // support both symbol and string type format
  if(k_fmt->t == KC)
  {
    p1    = (char *)(k_fmt->G0);
    p_end = p1 + k_fmt->n;
  }
  else if(k_fmt->t == -KS)
  {
    p1    = k_fmt->s;
    p_end = p1 + strlen(p1);
  }
  else
  {
    return ks(ss("unsupported format type"));
  }


  char * p = p1;

  // each segment 256 max
  char buf_fmt[256];

  while(*p && p < p_end)
  {
    if(*p == '%')
    {
      if(*(p+1))
      {
        // skip escape sequence %% and first char % case
        if(*(p+1) == '%' || p == p1)
        {
          p+=2;
          continue;
        }

        // we have a new segment of fmt specifier to process
        strncpy(buf_fmt, p1, p - p1);
        buf_fmt[p-p1] = 0;

        if(*p1 != '%')
        {
          strcat(r, buf_fmt);
        }
        else if(p1 + 1 < p && *(p1 + 1) == '%') // if this starts with escape %
        {
          if(_printf_arg(r, buf_fmt, NULL)) return ks(ss(r));
        }
        else
        {
          if(_printf_arg(r, buf_fmt, _arg(k_args, i_arg++))) return ks(ss(r));
        }

        // move on to next segment
        p1 = p;
        p += 2;

        continue;
      }
    }

    ++p;
  }

  // null char not allowed in the middle

  if(p < p_end && !*p)
  {
    return ks(ss("invalid format:contains null char"));
  }

  // deal with tail segment
  if(p1 < p_end)
  {
    strncpy(buf_fmt, p1, p_end - p1);
    buf_fmt[p_end - p1] = 0;

    if(*p1 == '%')
    {
      if(_printf_arg(r, buf_fmt, _arg(k_args, i_arg++))) return ks(ss(r));
    }
    else if(p1 + 1 < p_end && *(p1 + 1) == '%')
    {
      if(_printf_arg(r, buf_fmt, NULL)) return ks(ss(r));
    }
    else
    {
      strcat(r, p1);
    }
  }

  return ks(ss(r));
}

to compile using free version of q which is available from:http://kx.com/software-download.php, you will have to try this only on a 32 bit linux box or compile 32bit shared library by on 64bit box because the free version only supports 32bit.

also you will need to download kx.h yourself which is available: http://code.kx.com/wsvn/code/kx/kdb+/c/c/k.h

here we go:

$gcc -fPIC -shared -o printf.so printf.c

and before we load we have to set up LD_LIBRARY_PATH, here I simply point it to current dir:

$export LD_LIBRARY_PATH=.

and here is how we test it in q:

$q
KDB+ 3.2 2014.10.04 Copyright (C) 1993-2014 Kx Systems
l32/ 2()core 2003MB codywu nypbxs108.ms.com 192.168.0.27 NONEXPIRE

Welcome to kdb+ 32bit edition
For support please see http://groups.google.com/d/forum/personal-kdbplus
Tutorials can be found at http://code.kx.com/wiki/Tutorials
To exit, type \\
To remove this startup msg, edit q.q
q)p:`printf 2:(`printq;2)
q)f1:9.6789
q)foo:2
q)teststr:"can I get it?"
q)p[`$"%%test %3.2f, and str:%s and int %05d and last one weird %";(f1;teststr;foo)]
`%test 9.68, and str:can I get it? and int 00002 and last one weird
q)p[`$"test %% empty again";()]
`test % empty again
q)\\

The code would work for most cases.
However, the error checking would be slightly different than real printf(), e.g. when you provide less arguments than what format spec requires or other invalid “%X” char combinations.

Also, there seems to be some limited exposed function that allows user to parse format spec though for C I could not find one locally available without importing other open source lib.

About codywu2010

a programmer
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a comment