Linux

Using libguestfs to mount a Disk Image.

Recently, the opensource community and Red Hat are not supporting NTFS filesystem to mount images using libguestfs. You can read the official documentation accessing the specific section: “mount: unsupported filesystem type” with NTFS in RHEL ≥ 7.2. There is another bug raised at the bugzilla explaining the decision: guestfish fails to mount the ntfs filesystem.

Many developers and users are having problems to mount Windows (NTFS) images to read and backup important files. Here, some suggestions to avoid this problem and mount disk images with Windows (NTFS) or any other file system using Fuse.

First, there is a Python script that uses the module python-libguestfs and mount an image as Read-Only.

#/usr/bin/python
import guestfs

g = guestfs.GuestFS(python_return_dict=True)

# You can specify the right file name.
disk = "disk.img"

g.add_drive_opts(disk, format="qcow2")
g.launch()

# You specify your rules to choose the right partition to mount.
partitions = g.list_partitions ()
assert (len (partitions) == 1)

# Here we are assuming that this is the right partition
g.mount (partitions[0], "/")

g.mount_local("/mnt/", options="allow_other", readonly=True)
g.mount_local_run()

This code will freeze when you run g.mount_local_run(), but if you umount the directory “/mnt/” this code will be freed.

You can use guestfish commands or the guestfish terminal to mount this images using the same process. See the script with the guestfish commands. You can use each command inside guestfish shell.

add disk.img
run
mount "/dev/sda1" "/"
mount-local /mnt/ readonly:true
mount-local-run

This two options are a workaround to mount images with NTFS file systems, but it can be used to mount all file systems such as ext3, ext4, FAT, etc.

Advertisements
Standard
Kernel, Linux

Managing your system resources with Control Groups.

How important are the machine resources? How important is the correct usage of the machine resources? In spite of the computational capacity growth over the history due to the technology improvement, we still come up against the same problem: limited resources. Even in large quantities? Yes. It still happens because growth of the computational capacity is directly proportional to the complexity of the processes running concurrently. For a single user, resources are less important than other factors, but for a server who receive a high load of requests, having the resources available to meet the demand is indispensable and critical.

Today, there is a huge amount of servers running many Linux distributions: paid or not. Considering resources management, the Kernel Linux has an interesting feature since the version 2.6.24 called Control Groups. Popularly known as CGroups.

Basically, CGroups insert processes into groups and controls them. Each group, has its own rules for resources allocation/limitation. Further, the groups are called “control groups” and they provide three basic characteristics: limitation of available resources to one or more processes; isolation of the policies of the processes that are not part of the same group; control of the groups and processes.

So, considering the first item limitation of resources, the system can have a varied number of control groups. Each groups can limit the following machine’s resources: CPU, memory, disk I/O and/or network I/O.

About isolation, each group has its own rules and policies, in which a process inside a group is surrendered only to the policies of that particular group. Moreover, CGroups extends the rules and policies using an hierarchical approach. Whenever a process from a particular group creates a child process, this new process and its children automatically become part of the group.

Lastly, considering control, the system administrator can reorganize the processes between the groups, remove and create new groups, add processes inside groups and change or define new groups policies for existing groups.

CGroup

According to the figure, lets suppose a Linux system that uses CGroups and has two groups, one called “CPU Bound” and another one called “Disk I/O Bound”. The first group contains critical processes who demands CPU usage. A “Process A” inside this group can only require up to  40% of CPU usage and 20% of Memory usage. In the second group, only critical processes who demands Disk I/O. A “Process B” was added inside this group and it can requires up to 40% of Disk usage and 10% of Memory usage. A “Process C” is not part of any group and it does not follow any rule or policy, the process is free to allocate resources.

Besides, if the “Process A” uses only 5% of CPU usage, the “Process C” can require 95% or more, according the current usage of “Process A”. This model avoids a critical process waiting for an unavailable resource that is being used by an ordinary process. Considering the hierarchy of CGroups, if “Process A” forks a child process called “Process A'”, this process will follow the same rules and policies of “Process A”.

Until now, it is clear what it the main purpose of CGroups design: limit the available resources of a system/machine. But, in the new era of Cloud Computing and dynamic allocation, it does not make sense to limit resources. If a set of application running in the cloud are using 100% of CPU usage, it would be enough to allocate more cores to free CPU usage. This approach can be more expensive. Maybe, an study to qualify which applications is critical and apply rules to limit resources can prevent extra costs.

For this reason, CGroups are related to Cloud Computing. OpenStack uses it to keep the quota resources of the virtual machines, for example. On the other hand, Cloud who runs Containers such as Docker and LXC uses CGroups exactly to limit the resources available for a Container.

In conclusion, CGroups can ensure the right usage of the system resources, providing ways to manage and plan them to obtain a better performance with the lowest loss.

 

The original text is in Portuguese and can be accessed here.

References:

 

Standard
Kernel, Linux

pktgen: A packet traffic generator in kernel space for testing network throughput.

If you are developing a network device driver for Linux Kernel or if you only need to test a throughput of a driver, you really need to take a look at pktgen. This module generates is able to create network packet on kernel space. A kind of packet generator in high speed, comparing a packet generator in user space.

This article will cover how you can do some test cases for a specific driver or interface.

First of all, you can read the documentation of pktgen at the official kernel linux documentation.

The first step to use pktgen is loading the module. You have many ways to do it. I usually load it and unload it when I want to do a test. Everybody who is used to load/unload will know the commands modprobe and rmmod.

To load pktgen:

# modprobe pktgen

And to unload pktgen

# rmmod pktgen

After loading the module, pktgen will create a directory into /proc/net/ called pktgen obviously. There are some important files inside this directory. The module pktgen will create a file called kpktgend_X where X maps each CPU core of your machine. There is another important file called pgctrl to control the module. The module will create files for each interface that you are sending packets. For example, if you are using eth0 and eth1, you will see 2 files with the same name into /proc/net/pktgen. By the way, to generate packages or a stream, you need to handle those files. You can do it manually (in my opinion, it is not the best option) or creating shell scripts. Here a simple script will be described to teach how you can use pktgen settings.

The first thing to do: the variables must be defined.

#!/bin/bash

# Number of CPUs. If you have more than 2.
CPUS=2
# Number of packets generated for each core.
PKTS=`echo "scale=0; 1000000/$CPUS" | bc`
# Number of copies of the same packet. The number 0
# will use the same packet forever.
CLONE_SKB="clone_skb 10"
# Size of a single packet.
PKT_SIZE="pkt_size 9014"
# Number of packets to send. The number 0 will send 
# packets until the user stops the stream.
COUNT="count 0"
# The transmission delay between two packets.
DELAY="delay 0"
# Get the MAC address. This case we will user eth0 
# as interface.
# if can be eth1, eth2, wlan0, wlan1, etc...
ETH="eth0"
MAC=$(ifconfig -a | grep eth0 | cut -d' ' -f 11)
# The rate of the stream.
RATEP=`echo "scale=0; 1000000/$CPUS" | bc`

Before, it would be nice define a function called pgset to add the seetings into the properly files.

function pgset() {
    local result

    echo $1 > $PGDEV

    result=`cat $PGDEV | fgrep "Result: OK:"`
    if [ "$result" = "" ]; then
        cat $PGDEV | fgrep Result:
    fi
}

After, all pending devices must be removed to avoid errors during the process.

for ((processor=0;processor<$CPUS;processor++))
#for processor in {0..1}
do
    PGDEV=/proc/net/pktgen/kpktgend_$processor
    echo "Removing all devices"
    pgset "rem_device_all"
done

Now, let’s add some settings. All settings below will be explained one by one. Read the comments before the command to see what the setting really does. All the settings must be defined for each CPU core either.

for ((processor=0;processor<$CPUS;processor++))
#for processor in {0..1}
do
    PGDEV=/proc/net/pktgen/kpktgend_$processor
    echo "Adding $ETH"
    pgset "add_device $ETH@$processor"
    PGDEV=/proc/net/pktgen/$ETH@$processor
    echo "Configuring $PGDEV"
    # Set the count variable defined above.
    pgset "$COUNT"
    # One queue per core.
    pgset "flag QUEUE_MAP_CPU"
    # Set the clone_skb variable defined above.
    pgset "$CLONE_SKB"
    # A packet is divided into 10 fragments.
    pgset "frags 10"
    # Set the packet size variable defined above.
    pgset "$PKT_SIZE"
    # Set the delay variable defined above.
    pgset "$DELAY"
    # Set the rate of the stream.
    pgset "ratep $RATEP"
    # Queue 2 copies of the same packet.
    pgset "burst 2"
    # Set the destination of the packets.
    # You can you use your own range of IPs.
    # IMPORTANT: be aware, you can cause a DoS attack
    # if you flood a machine with so many pack packets.
    pgset "dst 192.168.0.10"
    # Set the MAC address of the interface.
    pgset "dst_mac $MAC"
    # Random address with in the min-max range
    pgset "flag IPDST_RND"
    # Set the minimum IP address to send the packets.
    # The same as "dst" field.
    pgset "dst_min 192.168.0.10"
    # Set the maximum IP address to send the packets.
    pgset "dst_max 192.168.0.200"
    # Enable configuration packet.
    pgset "config 1"
    # 1k Concurrent flows at 8 pkts
    pgset "flows 1024"
    pgset "flowlen 8"
done

Now, it is time to start the process and generate the packets. To control the module you need to use the file pgctrl. So, lets redefine some variables.

PGDEV=/proc/net/pktgen/pgctrl

And start the process.

echo "Running... ctrl^C to stop"
# Start the process using pgctrl file.
pgset "start"
echo "Done"

To generate a network stream, you have many options to do that. In my opinion, this is the best way to create test cases to testing network throughput. Because you can automate it using shell script as described. Another point, considering that the packets are being created in kernel space, this option is faster than any option in user space. If you are interested in network topics or in kernel development, this is a interesting subject to study.

Standard
Virtualization

Lispvirt Announcements.

Hi everybody!

I’d like to announce the Lispvirt project: a Common Lisp bindings for Libvirt.

We are so happy because Lispvirt was accepted as an official project of common-lisp.net now and the official webpage is hosted at:
https://common-lisp.net/project/lispvirt/

You can download the source code from GitHub at:
https://github.com/jcfaracco/lispvirt

There, you can find any instructions to install, test and use Lispvirt.

There are some examples and tests to learn how to use the API. As this release is the first one, there is nothing to announce as a new feature or bug fix.

This version is based on Libvirt (>= 1.2). So, if you are using an old version of libvirt, I cannot guarantee that Lispvirt will work properly.

Please, clone the code and enjoy!

If you want to contribute, please, send me a message, email, commits or anything you want.
We still need many tests to check if the API is correctly implemented, examples and some missing structures.

Any contribution is welcome and I would be glad if people help me. =D

Regards.

Standard
Kernel, Linux

How can I test a .ndo_tx_timeout function from a network kernel module? The answer is: Netem.

After a long time searching keywords on Google and sending several e-mails to mailing lists, I finally found how to test a tx_timeout function of a network kernel module.

First, lets do a simple overview about this function. This is not an essential function to a network kernel module. Some drivers does not implement this operation. Is it really necessary? No, if you have ways to prevent this kind of exception, you don’t need to spend your time.

What is the purpose of this function? This function is thrown when the driver fails to transmit a packet. So, this functions is responsible to act to reset or interrupt the driver. Sometimes, it is hard to simulate this scenario. If you want to do a stress test to you will waste so many time waiting for an exception. To avoid it, we will use Netem feature.

Netem is a Network Emulation. He is part of the Linux Kernel and it has many capabilities to test the network and traffic control. To enable this feature you need to recompile your kernel setting up the option:

Networking -->;
  Networking Options -->;
    QoS and/or fair queuing -->;
      Network emulator

If you recompile your kernel and reinstall it, you will be able to use Netem features and controlling them using the command tc.

Obviously, you need to implement a tx_timeout function and set up a time for watchdog_timeo. After that, you can simulate an exception considering that it will throw only when the tx buffer will be overflowed.

So, we can set a high delay for our packets.

# tc qdisc add dev eth0 root netem delay 9s

See that we need to identify the interface: eth0, eth1, wlan0, etc. For me 9 seconds is enough to test without changes other advanced settings on my system.

After, I will using a ping command to cause an exception.

# ping www.google.com -i 0.1 -s 2048

I’m using a small interval (-i) between the ping’s and I’m sending a packet with a greater size (-s) than the default one to have an exception quickly.

If your tx_timeout function is showing a log message, you can check it using a dmesg command or opening the kernel log into /var/log/.

I wish I can help many people with this article.

Regards!

Standard
Linux, Virtualization

An Introduction to QCOW2 Image Format.

An Overview

There are many formats for disk images. The most known is VMDK, VDI, VHD, raw format and QCOW2. The QCOW2 is the native format of QEMU emulator. Its acronym means “QEMU Copy-On-Write” version 2. In other words, this format uses the Copy-On-Write (COW) feature. It reduces the space in the original disk. The data is written into disk after a delay and only when it is really needed. It creates several layers that contains copies of the original data. Only the copy will be changed and the original data is kept for a while. This feature is useful for create snapshots and backups of the disk.

Considering that we have a disk “Root” for example. The disk can have multiple overlays based in a disk image. Those overlays can be reverted or disconsidered after a certain period of time.

COW1

Figure 1. Cascading overlays based on a root image disk.

We can have all type of topologies. We can have multiple overlays based on the same root disk.

COW2

Figure 2. Parallel overlays with serial overlays based on a root disk image.

If the current state of an image is 1B, you can return to state 1A, 1 or Root for example. You can discard overlays 2 and 3. Or use them as a backup of some state of the image.

QCOW2 Header Structure

The QCOW2 format uses its own file to store snapshots. So the file size increase as long as the snapshots are being saved. The QCOW2 format has its own structure to store informations about snapshots and some useful informations related with the disk.

Lets start with the header of the format. The first 72 bytes of the disk store the header of the image. That structure is set using the big-endian format (that’s why we use functions to convert to little-endian format) and can be defined as:

typedef struct QCowHeader {
    uint32_t magic;
    uint32_t version;
    uint64_t backing_file_offset;
    uint32_t backing_file_size;
    uint32_t cluster_bits;
    uint64_t size; /* in bytes */
    uint32_t crypt_method;
    uint32_t l1_size;
    uint64_t l1_table_offset;
    uint64_t refcount_table_offset;
    uint32_t refcount_table_clusters;
    uint32_t nb_snapshots;
    uint64_t snapshots_offset;
} QCowHeader;

So, what do these attributes mean? Below, one by one is explained.

  • magic (4 bytes): it contains the characthers ‘Q’, ‘F’ and ‘I’. The last byte has the value 0xfb. This field identifies if the image is a QCOW.
  • version (4 bytes): it has the version of QCOW used by the disk image. The value can be version 1, version 2 or version 3, the new one.
  • backing_file_offset (8 bytes): it has the offset to the beginning string of the backing file path. The string cannot be null. If the disk does not have a backing file, this field is set to 0. If this is a Copy-on-write image, this field will have the original image path.
  • backing_file_size (4 bytes): it has the size of the string above. If the image does not have a backing file this field is not defined.
  • cluster_bits (4 bytes): contains the number of bits used for addressing an offset within a cluster. This filed will be explained after.
  • size (8 bytes): it contains the original size of the image.
  • crypt_method (4 bytes): a boolean value to say if the image was encrypted using AES or it does not have any encryption.
  • l1_size (4 bytes): available entries in L1 table.
  • l1_table_offset (8 bytes): Offset of the location where L1 table starts.
  • refcount_table_offset (8 bytes): Offset of the location where refcount table starts.
  • refcount_table_clusters (4 bytes): it has the number of clusters refcount table is occupying.
  • nb_snapshots (4 bytes): Number of snapshots available in image.
  • snapshots_offset (8 bytes): Offset of the location where the snapshot table starts.

A QCOW image is structured following the image below:

QCOW+Struct

Figure 3: QCOW image structure.

The purpose of this article is introduce the QCOW structure with the headers of the image and how this format stores its own snapshots. Other details about cluster, refcount tables, L1 tables, L2 caches will be mentioned only.

After an overview about the header of the image and the structure of the format lets find a way to read this data from image. The first piece of C code show how can you read the header data from the image. You must note the conversion of big-endian to little-endian format using system libraries and known functions such as nhtons(), ntohl() (both from arpa/inet.h) and htobe64() (from endian.h).

The first part of code read the block of the Header:

  FILE *fp;

  QCowHeader _header;

  fp = fopen(argv[1], "rt");

  fgets((char *)_header, sizeof(_header), fp);

The second part conver big-endian to little-endian. Take a look to how you can print the magic number:

_header.magic = ntohl(_header.magic);
_header.version = ntohl(_header.version);
_header.backing_file_offset = htobe64(_header.backing_file_offset);
_header.backing_file_size = ntohl(_header.backing_file_size);
_header.cluster_bits = ntohl(_header.cluster_bits);
_header.size = htobe64(_header.size);
_header.crypt_method = ntohl(_header.crypt_method);
_header.l1_size = ntohl(_header.l1_size);
_header.l1_table_offset = htobe64(_header.l1_table_offset);
_header.refcount_table_offset = htobe64(_header.refcount_table_offset);
_header.refcount_table_clusters = ntohl(_header.refcount_table_clusters);
_header.nb_snapshots = ntohl(_header.nb_snapshots);
_header.snapshots_offset = htobe64(_header.snapshots_offset);

printf("magic number: %c%c%c%x\n",
    (_header.magic >> 24) & 0xff,
    (_header.magic >> 16) & 0xff,
    (_header.magic >> 8) & 0xff,
    _header.magic >> 0xff);

With those simple lines of code you can read header information about the image. With the information stored in nb_snapshots and snapshots_offset we can read all the information about the snapshots of the image. The next step teach how you can get those information.

QCOW2 Snapshot Structure

The structure of the QCOW snapshots using at least two data structures to store information about snapshots. The header of the snapshot and the snapshot itself. Below, the snapshot header is defined as:

typedef struct QCowSnapshotHeader {
    /* header is 8 byte aligned */
    uint64_t l1_table_offset;
    uint32_t l1_size;
    uint16_t id_str_size;
    uint16_t name_size;
    uint32_t date_sec;
    uint32_t date_nsec;
    uint64_t vm_clock_nsec;
    uint32_t vm_state_size;
    uint32_t extra_data_size; /* for extension */
    /* extra data follows */
    /* id_str follows */
    /* name follows  */
} QCowSnapshotHeader;

All these attributes means:

  • l1_table_offset (8 bytes): offset of the location where L1 table starts.
  • l1_size (4 bytes): the size of the L1 table.
  • id_str_size (2 bytes): the size of the Snapshot ID string.
  • name_size (2 bytes): the size of the Snapshot name string.
  • date_sec (4 bytes): the elapsed time of the virtual machine in seconds (UTC).
  • date_nsec (4 bytes): the elapsed time of the virtual machine in nanoseconds (UTC).
  • vm_clock_nsec (8 bytes): the elapsed time of the virtual machine clock in seconds.
  • vm_state_size (4 bytes): the size of the information about virtual machine state.
  • extra_data_size (4 bytes): the size of the extra data of the snapshot. The table entry can contain data and his size is defined by this field.

Each snapshot created has a header with that structure. Besides that, the L1 table is copied and all the refcount in L2 tables and all references for L2 table has incremented by 1. When you do this operation, a Copy-and-Write action is done and this action is not visible to the others snapshots. It can be considered as an image checkpoint.

The code below implements a function to read the headers of the snapshots.

int qcow2_get_snapshots_headers(FILE *fp, QCowHeader *header)
{
    int i, id_str_size, name_size;
    uint64_t offset;
    uint32_t extra_data_size;
    QCowSnapshotHeader _sn_header;
    QCowSnapshotExtraData _sn_extra_data;
    QCowSnapshot *snapshots;

    snapshots = (QCowSnapshot *) malloc(sizeof(QCowSnapshot)*header->nb_snapshots);
    offset = header->snapshots_offset;

    for(i=0; i < header->nb_snapshots; i++)
    {
        offset = (offset + 7) &amp;amp;amp; ~(7);

        fseek(fp, offset, SEEK_SET);
        fgets((char *)&amp;amp;amp;_sn_header, sizeof(_sn_header), fp);

        printf(&amp;amp;quot;l1_table_offset: %&amp;amp;quot; PRIu64 &amp;amp;quot;\n&amp;amp;quot;, htobe64(_sn_header.l1_table_offset));
        printf(&amp;amp;quot;id_str_size: %d\n&amp;amp;quot;, htons(_sn_header.id_str_size));
        printf(&amp;amp;quot;name_size: %d\n&amp;amp;quot;, htons(_sn_header.name_size));
        printf(&amp;amp;quot;l1_size: %&amp;amp;quot; PRIu32 &amp;amp;quot;\n&amp;amp;quot;, ntohl(_sn_header.l1_size));
        printf(&amp;amp;quot;vm_state: %&amp;amp;quot; PRIu32 &amp;amp;quot;\n&amp;amp;quot;, ntohl(_sn_header.vm_state_size));
        printf(&amp;amp;quot;date_sec: %&amp;amp;quot; PRIu32 &amp;amp;quot;\n&amp;amp;quot;, ntohl(_sn_header.date_sec));
        printf(&amp;amp;quot;date_nsec: %&amp;amp;quot; PRIu32 &amp;amp;quot;\n&amp;amp;quot;, ntohl(_sn_header.date_nsec));
        printf(&amp;amp;quot;vm_clock_nsec: %&amp;amp;quot; PRIu64 &amp;amp;quot;\n&amp;amp;quot;, htobe64(_sn_header.vm_clock_nsec));
        printf(&amp;amp;quot;extra_data_size: %&amp;amp;quot; PRIu32 &amp;amp;quot;\n&amp;amp;quot;, ntohl(_sn_header.extra_data_size));

        snapshots[i].l1_table_offset = htobe64(_sn_header.l1_table_offset);
        snapshots[i].l1_size = ntohl(_sn_header.l1_size);
        snapshots[i].vm_state_size = ntohl(_sn_header.vm_state_size);
        snapshots[i].date_sec = ntohl(_sn_header.date_sec);
        snapshots[i].date_nsec = ntohl(_sn_header.date_nsec);
        snapshots[i].vm_clock_nsec = htobe64(_sn_header.vm_clock_nsec);

        id_str_size = htons(_sn_header.id_str_size);
        name_size = htons(_sn_header.name_size);
        extra_data_size =
                 (sizeof(_sn_extra_data) &amp;amp;gt; ntohl(_sn_header.extra_data_size)) ?
                  sizeof(_sn_extra_data) : ntohl(_sn_header.extra_data_size);

        offset += sizeof(_sn_header);
        fseek(fp, offset, SEEK_SET);
        fgets((char *)&amp;amp;amp;_sn_extra_data, extra_data_size, fp);
        offset += extra_data_size;

        if (extra_data_size &amp;amp;gt;= 8)
        {
            snapshots[i].vm_state_size =
                          ntohl(_sn_extra_data.vm_state_size_large);
        }

        if (extra_data_size &amp;amp;gt;= 16)
        {
            snapshots[i].disk_size = ntohl(_sn_extra_data.disk_size);
        }

        snapshots[i].id_str = (char *) malloc(sizeof(char)*(id_str_size + 1));

        if (extra_data_size &amp;amp;gt; 0)
            fseek(fp, offset, SEEK_SET);

        fgets((char *)snapshots[i].id_str, id_str_size+1, fp);
        offset += id_str_size;

        snapshots[i].id_str[id_str_size] = '\0';
        printf(&amp;amp;quot;snapshot id: %s\n&amp;amp;quot;, snapshots[i].id_str);

        /* Read snapshot name */
        snapshots[i].name = (char *) malloc(sizeof(char)*(name_size + 1));
        fseek(fp, offset, SEEK_SET);
        fgets((char *)snapshots[i].name, name_size+1, fp);
        offset += name_size;

        snapshots[i].name[name_size] = '\0';
        printf(&amp;amp;quot;snapshot name: %s\n\n&amp;amp;quot;, snapshots[i].name);
    }
    return 0;
}
Standard
Uncategorized

Opening Session…

Hi everyone!


#include <stdio.h>

int main() {
    printf("Welcome!\n");
    return 0;
}

This is my first post since I decided to move from blogspot.

I want to write a new post/article soon, talking about virtualization and others interesting things and thoughts.

So, if you like Linux, C and/or coding, this is the new version of my Blog.

Welcome!!!!

Standard