Kernel, Linux

Managing your system resources with Control Groups.

How important are the machine resources? How important is the correct usage of the machine resources? In spite of the computational capacity growth over the history due to the technology improvement, we still come up against the same problem: limited resources. Even in large quantities? Yes. It still happens because growth of the computational capacity is directly proportional to the complexity of the processes running concurrently. For a single user, resources are less important than other factors, but for a server who receive a high load of requests, having the resources available to meet the demand is indispensable and critical.

Today, there is a huge amount of servers running many Linux distributions: paid or not. Considering resources management, the Kernel Linux has an interesting feature since the version 2.6.24 called Control Groups. Popularly known as CGroups.

Basically, CGroups insert processes into groups and controls them. Each group, has its own rules for resources allocation/limitation. Further, the groups are called “control groups” and they provide three basic characteristics: limitation of available resources to one or more processes; isolation of the policies of the processes that are not part of the same group; control of the groups and processes.

So, considering the first item limitation of resources, the system can have a varied number of control groups. Each groups can limit the following machine’s resources: CPU, memory, disk I/O and/or network I/O.

About isolation, each group has its own rules and policies, in which a process inside a group is surrendered only to the policies of that particular group. Moreover, CGroups extends the rules and policies using an hierarchical approach. Whenever a process from a particular group creates a child process, this new process and its children automatically become part of the group.

Lastly, considering control, the system administrator can reorganize the processes between the groups, remove and create new groups, add processes inside groups and change or define new groups policies for existing groups.


According to the figure, lets suppose a Linux system that uses CGroups and has two groups, one called “CPU Bound” and another one called “Disk I/O Bound”. The first group contains critical processes who demands CPU usage. A “Process A” inside this group can only require up to  40% of CPU usage and 20% of Memory usage. In the second group, only critical processes who demands Disk I/O. A “Process B” was added inside this group and it can requires up to 40% of Disk usage and 10% of Memory usage. A “Process C” is not part of any group and it does not follow any rule or policy, the process is free to allocate resources.

Besides, if the “Process A” uses only 5% of CPU usage, the “Process C” can require 95% or more, according the current usage of “Process A”. This model avoids a critical process waiting for an unavailable resource that is being used by an ordinary process. Considering the hierarchy of CGroups, if “Process A” forks a child process called “Process A'”, this process will follow the same rules and policies of “Process A”.

Until now, it is clear what it the main purpose of CGroups design: limit the available resources of a system/machine. But, in the new era of Cloud Computing and dynamic allocation, it does not make sense to limit resources. If a set of application running in the cloud are using 100% of CPU usage, it would be enough to allocate more cores to free CPU usage. This approach can be more expensive. Maybe, an study to qualify which applications is critical and apply rules to limit resources can prevent extra costs.

For this reason, CGroups are related to Cloud Computing. OpenStack uses it to keep the quota resources of the virtual machines, for example. On the other hand, Cloud who runs Containers such as Docker and LXC uses CGroups exactly to limit the resources available for a Container.

In conclusion, CGroups can ensure the right usage of the system resources, providing ways to manage and plan them to obtain a better performance with the lowest loss.


The original text is in Portuguese and can be accessed here.



Kernel, Linux

pktgen: A packet traffic generator in kernel space for testing network throughput.

If you are developing a network device driver for Linux Kernel or if you only need to test a throughput of a driver, you really need to take a look at pktgen. This module generates is able to create network packet on kernel space. A kind of packet generator in high speed, comparing a packet generator in user space.

This article will cover how you can do some test cases for a specific driver or interface.

First of all, you can read the documentation of pktgen at the official kernel linux documentation.

The first step to use pktgen is loading the module. You have many ways to do it. I usually load it and unload it when I want to do a test. Everybody who is used to load/unload will know the commands modprobe and rmmod.

To load pktgen:

# modprobe pktgen

And to unload pktgen

# rmmod pktgen

After loading the module, pktgen will create a directory into /proc/net/ called pktgen obviously. There are some important files inside this directory. The module pktgen will create a file called kpktgend_X where X maps each CPU core of your machine. There is another important file called pgctrl to control the module. The module will create files for each interface that you are sending packets. For example, if you are using eth0 and eth1, you will see 2 files with the same name into /proc/net/pktgen. By the way, to generate packages or a stream, you need to handle those files. You can do it manually (in my opinion, it is not the best option) or creating shell scripts. Here a simple script will be described to teach how you can use pktgen settings.

The first thing to do: the variables must be defined.


# Number of CPUs. If you have more than 2.
# Number of packets generated for each core.
PKTS=`echo "scale=0; 1000000/$CPUS" | bc`
# Number of copies of the same packet. The number 0
# will use the same packet forever.
CLONE_SKB="clone_skb 10"
# Size of a single packet.
PKT_SIZE="pkt_size 9014"
# Number of packets to send. The number 0 will send 
# packets until the user stops the stream.
COUNT="count 0"
# The transmission delay between two packets.
DELAY="delay 0"
# Get the MAC address. This case we will user eth0 
# as interface.
# if can be eth1, eth2, wlan0, wlan1, etc...
MAC=$(ifconfig -a | grep eth0 | cut -d' ' -f 11)
# The rate of the stream.
RATEP=`echo "scale=0; 1000000/$CPUS" | bc`

Before, it would be nice define a function called pgset to add the seetings into the properly files.

function pgset() {
    local result

    echo $1 > $PGDEV

    result=`cat $PGDEV | fgrep "Result: OK:"`
    if [ "$result" = "" ]; then
        cat $PGDEV | fgrep Result:

After, all pending devices must be removed to avoid errors during the process.

for ((processor=0;processor<$CPUS;processor++))
#for processor in {0..1}
    echo "Removing all devices"
    pgset "rem_device_all"

Now, let’s add some settings. All settings below will be explained one by one. Read the comments before the command to see what the setting really does. All the settings must be defined for each CPU core either.

for ((processor=0;processor<$CPUS;processor++))
#for processor in {0..1}
    echo "Adding $ETH"
    pgset "add_device $ETH@$processor"
    echo "Configuring $PGDEV"
    # Set the count variable defined above.
    pgset "$COUNT"
    # One queue per core.
    pgset "flag QUEUE_MAP_CPU"
    # Set the clone_skb variable defined above.
    pgset "$CLONE_SKB"
    # A packet is divided into 10 fragments.
    pgset "frags 10"
    # Set the packet size variable defined above.
    pgset "$PKT_SIZE"
    # Set the delay variable defined above.
    pgset "$DELAY"
    # Set the rate of the stream.
    pgset "ratep $RATEP"
    # Queue 2 copies of the same packet.
    pgset "burst 2"
    # Set the destination of the packets.
    # You can you use your own range of IPs.
    # IMPORTANT: be aware, you can cause a DoS attack
    # if you flood a machine with so many pack packets.
    pgset "dst"
    # Set the MAC address of the interface.
    pgset "dst_mac $MAC"
    # Random address with in the min-max range
    pgset "flag IPDST_RND"
    # Set the minimum IP address to send the packets.
    # The same as "dst" field.
    pgset "dst_min"
    # Set the maximum IP address to send the packets.
    pgset "dst_max"
    # Enable configuration packet.
    pgset "config 1"
    # 1k Concurrent flows at 8 pkts
    pgset "flows 1024"
    pgset "flowlen 8"

Now, it is time to start the process and generate the packets. To control the module you need to use the file pgctrl. So, lets redefine some variables.


And start the process.

echo "Running... ctrl^C to stop"
# Start the process using pgctrl file.
pgset "start"
echo "Done"

To generate a network stream, you have many options to do that. In my opinion, this is the best way to create test cases to testing network throughput. Because you can automate it using shell script as described. Another point, considering that the packets are being created in kernel space, this option is faster than any option in user space. If you are interested in network topics or in kernel development, this is a interesting subject to study.

Kernel, Linux

How can I test a .ndo_tx_timeout function from a network kernel module? The answer is: Netem.

After a long time searching keywords on Google and sending several e-mails to mailing lists, I finally found how to test a tx_timeout function of a network kernel module.

First, lets do a simple overview about this function. This is not an essential function to a network kernel module. Some drivers does not implement this operation. Is it really necessary? No, if you have ways to prevent this kind of exception, you don’t need to spend your time.

What is the purpose of this function? This function is thrown when the driver fails to transmit a packet. So, this functions is responsible to act to reset or interrupt the driver. Sometimes, it is hard to simulate this scenario. If you want to do a stress test to you will waste so many time waiting for an exception. To avoid it, we will use Netem feature.

Netem is a Network Emulation. He is part of the Linux Kernel and it has many capabilities to test the network and traffic control. To enable this feature you need to recompile your kernel setting up the option:

Networking -->;
  Networking Options -->;
    QoS and/or fair queuing -->;
      Network emulator

If you recompile your kernel and reinstall it, you will be able to use Netem features and controlling them using the command tc.

Obviously, you need to implement a tx_timeout function and set up a time for watchdog_timeo. After that, you can simulate an exception considering that it will throw only when the tx buffer will be overflowed.

So, we can set a high delay for our packets.

# tc qdisc add dev eth0 root netem delay 9s

See that we need to identify the interface: eth0, eth1, wlan0, etc. For me 9 seconds is enough to test without changes other advanced settings on my system.

After, I will using a ping command to cause an exception.

# ping -i 0.1 -s 2048

I’m using a small interval (-i) between the ping’s and I’m sending a packet with a greater size (-s) than the default one to have an exception quickly.

If your tx_timeout function is showing a log message, you can check it using a dmesg command or opening the kernel log into /var/log/.

I wish I can help many people with this article.