isolcpus, numactl and taskset

isolcpus is one of the kernel boot params that isolated certain cpus from kernel scheduling, which is especially useful if you want to dedicate some cpus for special tasks with least unwanted interruption (but cannot get to 0) in a multi-core system.

There is not too much details but the param is listed in kernel documents Kernel Parameters

isolcpus= [KNL,SMP] Isolate CPUs from the general scheduler.
Format:
,…,
or

(must be a positive range in ascending order)
or a mixture
,…,-

This option can be used to specify one or more CPUs
to isolate from the general SMP balancing and scheduling
algorithms. You can move a process onto or off an
“isolated” CPU via the CPU affinity syscalls or cpuset.
begins at 0 and the maximum value is
“number of CPUs in system – 1”.

This option is the preferred way to isolate CPUs. The
alternative — manually setting the CPU mask of all
tasks in the system — can cause problems and
suboptimal load balancer performance.

With the options set, by default all user processes are created with cpu affinity mask excluding the isolated cpus.

To check whether the kernel is booted with isolcpus set simply check /proc/cmdline, for example, for my ubuntu system, I have 4 cpus:

$lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             4
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 30
Stepping:              5
CPU MHz:               2925.979
BogoMIPS:              5851.95
Virtualization:        VT-x
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-3
$uname -a
Linux ubuntu 3.16.0-44-generic #59~14.04.1-Ubuntu SMP Tue Jul 7 15:07:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.16.0-44-generic root=UUID=.. ro find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US isolcpus=3 quiet

Now all user processes should be scheduled free from cpu 3.

For example, we can check the current shell process, and our focus is on Cpus_allowed and Cpus_allowed_list which has specified only 0-2 cpus is allowed (the system only has 4 cpus):

$cat /proc/$$/cmdline
/bin/ksh93
$cat /proc/$$/status|tail -6
Cpus_allowed:   ffffffff,fffffff7
Cpus_allowed_list:      0-2,4-63
Mems_allowed:   00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        399
nonvoluntary_ctxt_switches:     146

Setting of isolcpus has one interesting side effect that numactl stopped working.
We can bind to any other cpus but binding to cpu 3 would fail immediately.

First we try to bind to cpu1 and cpu2 and we can see Cpus_allowed_list updated correctly:

$numactl --physcpubind=1 /bin/ksh -c "cat /proc/\$\$/status|grep Cpus_allowed"
Cpus_allowed:   00000000,00000002
Cpus_allowed_list:      1
$numactl --physcpubind=2 /bin/ksh -c "cat /proc/\$\$/status|grep Cpus_allowed"
Cpus_allowed:   00000000,00000004
Cpus_allowed_list:      2

But if we try to bind to cpu3 it will fail:

$numactl --physcpubind=3 /bin/ksh -c "cat /proc/\$\$/status|grep Cpus_allowed"
libnuma: Warning: cpu argument 3 is out of range

<3> is invalid
usage: numactl [--all | -a] [--interleave= | -i <nodes>] [--preferred= | -p <node>]
...

But we can still use taskset to bind it to cpu3:

$taskset -c 3 /bin/ksh -c "cat /proc/\$\$/status|grep Cpus_allowed"
Cpus_allowed:   00000000,00000008
Cpus_allowed_list:      3

And why?

So it turns out numactl‘s logic is a bit funny.
It first read back the cpu affinity mask and use that to check whether the specified cpu list is valid or not and only when that is valid will it apply the cpu affinity mask.

struct bitmask *
numa_parse_cpustring(char *s)
{
  int invert = 0, relative=0;
  int conf_cpus = numa_num_configured_cpus();
  char *end;
  struct bitmask *mask;

  mask = numa_allocate_cpumask();

  if (s[0] == 0)
    return mask;
  if (*s == '!') {
    invert = 1;
    s++;
  }
  if (*s == '+') {
    relative++;
    s++;
  }
  do {
    unsigned long arg;
    int i;

    if (!strcmp(s,"all")) {
      copy_bitmask_to_bitmask(numa_all_cpus_ptr, mask);
      s+=4;
      break;
    }
    arg = get_nr(s, &end, numa_all_cpus_ptr, relative);
    if (end == s) {
      numa_warn(W_cpuparse, "unparseable cpu description `%s'\n", s);
      goto err;
    }
    if (!numa_bitmask_isbitset(numa_all_cpus_ptr, arg)) {
      numa_warn(W_cpuparse, "cpu argument %s is out of range\n", s);
      goto err;
    }
...

And numa_all_cpus_ptr is obtained by reading from the same file we have been reading from /proc/self/status:

/*
 * Read a processes constraints in terms of nodes and cpus from
 * /proc/self/status.
 */
static void
set_task_constraints(void)
{
  int hicpu = sysconf(_SC_NPROCESSORS_CONF)-1;
  int i;
  char *buffer = NULL;
  size_t buflen = 0;
  FILE *f;

  numa_all_cpus_ptr = numa_allocate_cpumask();
  numa_all_nodes_ptr = numa_allocate_nodemask();
  numa_no_nodes_ptr = numa_allocate_nodemask();

  f = fopen(mask_size_file, "r");
  if (!f) {
    //numa_warn(W_cpumap, "Cannot parse %s", mask_size_file);
    return;
  }

  while (getline(&buffer, &buflen, f) > 0) {
    /* mask starts after [last] tab */
    char  *mask = strrchr(buffer,'\t') + 1;

    if (strncmp(buffer,"Cpus_allowed:",13) == 0)
      numproccpu = read_mask(mask, numa_all_cpus_ptr);
...

And taskset‘s logic is simpler in that it directly go and apply the affinity mask using syscall sched_setaffinity:

static void do_taskset(struct taskset *ts, size_t setsize, cpu_set_t *set)
{
  /* read the current mask */
  if (ts->pid) {
    if (sched_getaffinity(ts->pid, ts->setsize, ts->set) < 0)
      err(EXIT_FAILURE, _("failed to get pid %d's affinity"),
          ts->pid);
    print_affinity(ts, FALSE);
  }

  if (ts->get_only)
    return;

  /* set new mask */
  if (sched_setaffinity(ts->pid, setsize, set) < 0)
    err(EXIT_FAILURE, _("failed to set pid %d's affinity"),
        ts->pid);

  /* re-read the current mask */
  if (ts->pid) {
    if (sched_getaffinity(ts->pid, ts->setsize, ts->set) < 0)
      err(EXIT_FAILURE, _("failed to get pid %d's affinity"),
          ts->pid);
    print_affinity(ts, TRUE);
  }
}

Probably from users’ point of view, taskset‘s way is better.

Advertisements

About codywu2010

a programmer
This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to isolcpus, numactl and taskset

  1. new23d says:

    Had to chain commands to bind memory region as well, eg: “taskset -c 4,5,6,7 numactl -m 1 -N 1 -C 4,5,6,7 doIt.bin”. Thanks!

  2. praveenmak says:

    I want to know if there is a way to find out is some one used “taskset” already and assigned a core?

  3. Ani A says:

    I cant thank you enough, today! 🙂 I was looking for a neat way to figure out whether
    isolcpu were enabled or not, and cat /proc/cmdline didnt strike, until Google landed me here.
    Thanks again.

  4. AAMI says:

    Thanks for this wonderful blog.
    I tried implementing as per your blog but couldn’t quite get it. \\

    When trying to taskset on 11 and 12 it fails. Please see the following:
    $> cat /proc/cmdline
    ro root=… isolcpus=11-12

    $> taskset -c 10 /bin/ksh -c “cat /proc/\$\$/status|grep Cpus_allowed”
    Cpus_allowed: 00000000,00000400
    Cpus_allowed_list: 10
    $> taskset -c 11 /bin/ksh -c “cat /proc/\$\$/status|grep Cpus_allowed”
    sched_setaffinity: Invalid argument
    failed to set pid 0’s affinity.
    $> taskset -c 12 /bin/ksh -c “cat /proc/\$\$/status|grep Cpus_allowed”
    sched_setaffinity: Invalid argument
    failed to set pid 0’s affinity.
    $> taskset -c 13 /bin/ksh -c “cat /proc/\$\$/status|grep Cpus_allowed”
    Cpus_allowed: 00000000,00002000
    Cpus_allowed_list: 13

  5. Evan says:

    With numactl version 2.0.9-rc3 or later can use the “–all” option: “Unset default cpuset awareness, so user can use all possible CPUs/nodes for following policy settings.” That option will let you numactl launch on cores listed in isolcpus.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s