cpuset is an important concepts in linux system and is created to provide a mechanism to assign a set of cpus and mem nodes to a set of tasks.
Details can be found here from kernel documents kernel documents.
Here I am just trying to show some examples on how we can read the info manually and manipulate it to change task’s cpu affinity.
So again I am using my 64 bit Ubuntu vmware system and it has 4 cpus available:
$lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 30 Stepping: 5 CPU MHz: 2925.979 BogoMIPS: 5851.95 Virtualization: VT-x Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-3
First of all, cpuset is a hierarchical system like a regular file system starting from root directory.
Each process has a cpuset file in procfs which shows where in the hierarchy the process is attached to.
For example we tried it on our shell process and it shows we are attached to root which is the default.
$cat /proc/$$/cmdline /bin/ksh93 $cat /proc/$$/cpuset /
We want to be able to explore the cpuset file system and we need to mount it first.
$ls /sys/fs/cgroup/ systemd $sudo mkdir /sys/fs/cgroup/cpuset $sudo mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset $ls /sys/fs/cgroup/cpuset/ cgroup.clone_children cpuset.cpu_exclusive cpuset.mem_hardwall cpuset.memory_pressure_enabled cpuset.mems notify_on_release cgroup.procs cpuset.cpus cpuset.memory_migrate cpuset.memory_spread_page cpuset.sched_load_balance release_agent cgroup.sane_behavior cpuset.mem_exclusive cpuset.memory_pressure cpuset.memory_spread_slab cpuset.sched_relax_domain_level tasks
Now we see files appearing in /sys/fs/cgroup/cpuset/, here we mainly focus on set of cpus allowed:
$cat /sys/fs/cgroup/cpuset/cpuset.cpus 0-3
As can be seen, by default the set of cpus allowed is all cpus online at the system.
And we can see it also controls the memory policy by giving the numa node where memory is required to be allocated from:
$cat /sys/fs/cgroup/cpuset/cpuset.mems 0
So in this case memory is required to be allocated from numa node 0 which is also the only node in the system:
$ls -d /sys/devices/system/node/node* /sys/devices/system/node/node0
And we can also get information such as what process and tasks are attached to this set by reading file cgroup.procs and tasks:
$head -5 /sys/fs/cgroup/cpuset/cgroup.procs 1 2 3 5 7 $head -5 /sys/fs/cgroup/cpuset/tasks 1 2 3 5 7 $diff -u /sys/fs/cgroup/cpuset/cgroup.procs /sys/fs/cgroup/cpuset/tasks|head -10 --- /sys/fs/cgroup/cpuset/cgroup.procs 2015-09-27 15:13:28.858820058 -0400 +++ /sys/fs/cgroup/cpuset/tasks 2015-09-27 15:13:28.858820058 -0400 @@ -252,6 +252,9 @@ 684 789 821 +827 +828 +829 859
The main difference is cgroup.procs also lists thread’s tid while tasks only list main thread’s pid.
$ps -L -p 821 -o tid,pid,cpuid,ppid,args TID PID CPUID PPID COMMAND 821 821 0 1 rsyslogd 827 821 0 1 rsyslogd 828 821 2 1 rsyslogd 829 821 0 1 rsyslogd
Now we see how cpuset works and how it links to processes, we can try to manipulate by creating new cpuset down the hierarchy.
We will name the new cpuset set1:
$sudo mkdir /sys/fs/cgroup/cpuset/set1 $ls /sys/fs/cgroup/cpuset/set1/ cgroup.clone_children cpuset.cpus cpuset.memory_migrate cpuset.memory_spread_slab cpuset.sched_relax_domain_level cgroup.procs cpuset.mem_exclusive cpuset.memory_pressure cpuset.mems notify_on_release cpuset.cpu_exclusive cpuset.mem_hardwall cpuset.memory_spread_page cpuset.sched_load_balance tasks $cat /sys/fs/cgroup/cpuset/set1/cpuset.cpus $cat /sys/fs/cgroup/cpuset/set1/cpuset.mems $cat /sys/fs/cgroup/cpuset/set1/cgroup.procs
As we can see, the mkdir command will instruct the sysfs to automatically create all the needed files and those initial files are just empty, and no processes are currently attached to the new cpuset.
If we simply go and try to attach task we would run to error:
$sudo ksh -c "echo 3671 > /sys/fs/cgroup/cpuset/set1/tasks" ksh: echo: write to 1 failed [No space left on device]
At minimum we need to set up cpus and mems and here we set it up such that cpu will only be allocated from cpu0 and cpu1:
$sudo ksh -c "echo 0-1 > /sys/fs/cgroup/cpuset/set1/cpuset.cpus" $sudo ksh -c "echo 0-1 > /sys/fs/cgroup/cpuset/set1/cpuset.mems" $cat /sys/fs/cgroup/cpuset/set1/cpuset.cpus 0-1 $cat /sys/fs/cgroup/cpuset/set1/cpuset.mems 0
Now we can successfully attach another shell process to the new set:
$sudo ksh -c "echo 3671 > /sys/fs/cgroup/cpuset/set1/tasks"
And we can verify the connections is established correctly:
$cat /sys/fs/cgroup/cpuset/set1/cgroup.procs 3671 $cat /sys/fs/cgroup/cpuset/set1/tasks 3671
It will also be reflected in process 3671’s procfs system:
$cat /proc/3671/cgroup 2:cpuset:/set1 1:name=systemd:/user/1000.user/1.session $cat /proc/3671/cpuset /set1
Since we only limited to cpu0 and cpu1 we expect the cpu affinity mask updated as well:
$cat /proc/3671/status|grep Cpus_allowed Cpus_allowed: 00000000,00000003 Cpus_allowed_list: 0-1
cpuset is a stricter limitation on the cpus allowed so if we try to assign it to any cpus not in cpu0-1 range it will be rejected by the system:
$taskset -p 4 3671 pid 3671's current affinity mask: 3 taskset: failed to set pid 3671's affinity: Invalid argument
And any wider range is AND’ed with the cpuset 0-1 so setting mask of 7 we will get 3 back:
$taskset -p 7 3671 pid 3671's current affinity mask: 3 pid 3671's new affinity mask: 3
Now we update set1 to only allow cpu2 and to verify that we can run ps continuously and see how process 3671 is now always running on cpu2 (the cpuid column):
$cat /sys/fs/cgroup/cpuset/set1/cpuset.cpus 2 $ps -p 3671 -o pid,cpuid,args,user PID CPUID COMMAND USER 3671 2 /bin/ksh93 codywu