Linux, Virtualization

An Introduction to QCOW2 Image Format.

An Overview

There are many formats for disk images. The most known is VMDK, VDI, VHD, raw format and QCOW2. The QCOW2 is the native format of QEMU emulator. Its acronym means “QEMU Copy-On-Write” version 2. In other words, this format uses the Copy-On-Write (COW) feature. It reduces the space in the original disk. The data is written into disk after a delay and only when it is really needed. It creates several layers that contains copies of the original data. Only the copy will be changed and the original data is kept for a while. This feature is useful for create snapshots and backups of the disk.

Considering that we have a disk “Root” for example. The disk can have multiple overlays based in a disk image. Those overlays can be reverted or disconsidered after a certain period of time.

COW1

Figure 1. Cascading overlays based on a root image disk.

We can have all type of topologies. We can have multiple overlays based on the same root disk.

COW2

Figure 2. Parallel overlays with serial overlays based on a root disk image.

If the current state of an image is 1B, you can return to state 1A, 1 or Root for example. You can discard overlays 2 and 3. Or use them as a backup of some state of the image.

QCOW2 Header Structure

The QCOW2 format uses its own file to store snapshots. So the file size increase as long as the snapshots are being saved. The QCOW2 format has its own structure to store informations about snapshots and some useful informations related with the disk.

Lets start with the header of the format. The first 72 bytes of the disk store the header of the image. That structure is set using the big-endian format (that’s why we use functions to convert to little-endian format) and can be defined as:

typedef struct QCowHeader {
    uint32_t magic;
    uint32_t version;
    uint64_t backing_file_offset;
    uint32_t backing_file_size;
    uint32_t cluster_bits;
    uint64_t size; /* in bytes */
    uint32_t crypt_method;
    uint32_t l1_size;
    uint64_t l1_table_offset;
    uint64_t refcount_table_offset;
    uint32_t refcount_table_clusters;
    uint32_t nb_snapshots;
    uint64_t snapshots_offset;
} QCowHeader;

So, what do these attributes mean? Below, one by one is explained.

  • magic (4 bytes): it contains the characthers ‘Q’, ‘F’ and ‘I’. The last byte has the value 0xfb. This field identifies if the image is a QCOW.
  • version (4 bytes): it has the version of QCOW used by the disk image. The value can be version 1, version 2 or version 3, the new one.
  • backing_file_offset (8 bytes): it has the offset to the beginning string of the backing file path. The string cannot be null. If the disk does not have a backing file, this field is set to 0. If this is a Copy-on-write image, this field will have the original image path.
  • backing_file_size (4 bytes): it has the size of the string above. If the image does not have a backing file this field is not defined.
  • cluster_bits (4 bytes): contains the number of bits used for addressing an offset within a cluster. This filed will be explained after.
  • size (8 bytes): it contains the original size of the image.
  • crypt_method (4 bytes): a boolean value to say if the image was encrypted using AES or it does not have any encryption.
  • l1_size (4 bytes): available entries in L1 table.
  • l1_table_offset (8 bytes): Offset of the location where L1 table starts.
  • refcount_table_offset (8 bytes): Offset of the location where refcount table starts.
  • refcount_table_clusters (4 bytes): it has the number of clusters refcount table is occupying.
  • nb_snapshots (4 bytes): Number of snapshots available in image.
  • snapshots_offset (8 bytes): Offset of the location where the snapshot table starts.

A QCOW image is structured following the image below:

QCOW+Struct

Figure 3: QCOW image structure.

The purpose of this article is introduce the QCOW structure with the headers of the image and how this format stores its own snapshots. Other details about cluster, refcount tables, L1 tables, L2 caches will be mentioned only.

After an overview about the header of the image and the structure of the format lets find a way to read this data from image. The first piece of C code show how can you read the header data from the image. You must note the conversion of big-endian to little-endian format using system libraries and known functions such as nhtons(), ntohl() (both from arpa/inet.h) and htobe64() (from endian.h).

The first part of code read the block of the Header:

  FILE *fp;

  QCowHeader _header;

  fp = fopen(argv[1], "rt");

  fgets((char *)_header, sizeof(_header), fp);

The second part conver big-endian to little-endian. Take a look to how you can print the magic number:

_header.magic = ntohl(_header.magic);
_header.version = ntohl(_header.version);
_header.backing_file_offset = htobe64(_header.backing_file_offset);
_header.backing_file_size = ntohl(_header.backing_file_size);
_header.cluster_bits = ntohl(_header.cluster_bits);
_header.size = htobe64(_header.size);
_header.crypt_method = ntohl(_header.crypt_method);
_header.l1_size = ntohl(_header.l1_size);
_header.l1_table_offset = htobe64(_header.l1_table_offset);
_header.refcount_table_offset = htobe64(_header.refcount_table_offset);
_header.refcount_table_clusters = ntohl(_header.refcount_table_clusters);
_header.nb_snapshots = ntohl(_header.nb_snapshots);
_header.snapshots_offset = htobe64(_header.snapshots_offset);

printf("magic number: %c%c%c%x\n",
    (_header.magic >> 24) & 0xff,
    (_header.magic >> 16) & 0xff,
    (_header.magic >> 8) & 0xff,
    _header.magic >> 0xff);

With those simple lines of code you can read header information about the image. With the information stored in nb_snapshots and snapshots_offset we can read all the information about the snapshots of the image. The next step teach how you can get those information.

QCOW2 Snapshot Structure

The structure of the QCOW snapshots using at least two data structures to store information about snapshots. The header of the snapshot and the snapshot itself. Below, the snapshot header is defined as:

typedef struct QCowSnapshotHeader {
    /* header is 8 byte aligned */
    uint64_t l1_table_offset;
    uint32_t l1_size;
    uint16_t id_str_size;
    uint16_t name_size;
    uint32_t date_sec;
    uint32_t date_nsec;
    uint64_t vm_clock_nsec;
    uint32_t vm_state_size;
    uint32_t extra_data_size; /* for extension */
    /* extra data follows */
    /* id_str follows */
    /* name follows  */
} QCowSnapshotHeader;

All these attributes means:

  • l1_table_offset (8 bytes): offset of the location where L1 table starts.
  • l1_size (4 bytes): the size of the L1 table.
  • id_str_size (2 bytes): the size of the Snapshot ID string.
  • name_size (2 bytes): the size of the Snapshot name string.
  • date_sec (4 bytes): the elapsed time of the virtual machine in seconds (UTC).
  • date_nsec (4 bytes): the elapsed time of the virtual machine in nanoseconds (UTC).
  • vm_clock_nsec (8 bytes): the elapsed time of the virtual machine clock in seconds.
  • vm_state_size (4 bytes): the size of the information about virtual machine state.
  • extra_data_size (4 bytes): the size of the extra data of the snapshot. The table entry can contain data and his size is defined by this field.

Each snapshot created has a header with that structure. Besides that, the L1 table is copied and all the refcount in L2 tables and all references for L2 table has incremented by 1. When you do this operation, a Copy-and-Write action is done and this action is not visible to the others snapshots. It can be considered as an image checkpoint.

The code below implements a function to read the headers of the snapshots.

int qcow2_get_snapshots_headers(FILE *fp, QCowHeader *header)
{
    int i, id_str_size, name_size;
    uint64_t offset;
    uint32_t extra_data_size;
    QCowSnapshotHeader _sn_header;
    QCowSnapshotExtraData _sn_extra_data;
    QCowSnapshot *snapshots;

    snapshots = (QCowSnapshot *) malloc(sizeof(QCowSnapshot)*header->nb_snapshots);
    offset = header->snapshots_offset;

    for(i=0; i < header->nb_snapshots; i++)
    {
        offset = (offset + 7) &amp;amp;amp; ~(7);

        fseek(fp, offset, SEEK_SET);
        fgets((char *)&amp;amp;amp;_sn_header, sizeof(_sn_header), fp);

        printf(&amp;amp;quot;l1_table_offset: %&amp;amp;quot; PRIu64 &amp;amp;quot;\n&amp;amp;quot;, htobe64(_sn_header.l1_table_offset));
        printf(&amp;amp;quot;id_str_size: %d\n&amp;amp;quot;, htons(_sn_header.id_str_size));
        printf(&amp;amp;quot;name_size: %d\n&amp;amp;quot;, htons(_sn_header.name_size));
        printf(&amp;amp;quot;l1_size: %&amp;amp;quot; PRIu32 &amp;amp;quot;\n&amp;amp;quot;, ntohl(_sn_header.l1_size));
        printf(&amp;amp;quot;vm_state: %&amp;amp;quot; PRIu32 &amp;amp;quot;\n&amp;amp;quot;, ntohl(_sn_header.vm_state_size));
        printf(&amp;amp;quot;date_sec: %&amp;amp;quot; PRIu32 &amp;amp;quot;\n&amp;amp;quot;, ntohl(_sn_header.date_sec));
        printf(&amp;amp;quot;date_nsec: %&amp;amp;quot; PRIu32 &amp;amp;quot;\n&amp;amp;quot;, ntohl(_sn_header.date_nsec));
        printf(&amp;amp;quot;vm_clock_nsec: %&amp;amp;quot; PRIu64 &amp;amp;quot;\n&amp;amp;quot;, htobe64(_sn_header.vm_clock_nsec));
        printf(&amp;amp;quot;extra_data_size: %&amp;amp;quot; PRIu32 &amp;amp;quot;\n&amp;amp;quot;, ntohl(_sn_header.extra_data_size));

        snapshots[i].l1_table_offset = htobe64(_sn_header.l1_table_offset);
        snapshots[i].l1_size = ntohl(_sn_header.l1_size);
        snapshots[i].vm_state_size = ntohl(_sn_header.vm_state_size);
        snapshots[i].date_sec = ntohl(_sn_header.date_sec);
        snapshots[i].date_nsec = ntohl(_sn_header.date_nsec);
        snapshots[i].vm_clock_nsec = htobe64(_sn_header.vm_clock_nsec);

        id_str_size = htons(_sn_header.id_str_size);
        name_size = htons(_sn_header.name_size);
        extra_data_size =
                 (sizeof(_sn_extra_data) &amp;amp;gt; ntohl(_sn_header.extra_data_size)) ?
                  sizeof(_sn_extra_data) : ntohl(_sn_header.extra_data_size);

        offset += sizeof(_sn_header);
        fseek(fp, offset, SEEK_SET);
        fgets((char *)&amp;amp;amp;_sn_extra_data, extra_data_size, fp);
        offset += extra_data_size;

        if (extra_data_size &amp;amp;gt;= 8)
        {
            snapshots[i].vm_state_size =
                          ntohl(_sn_extra_data.vm_state_size_large);
        }

        if (extra_data_size &amp;amp;gt;= 16)
        {
            snapshots[i].disk_size = ntohl(_sn_extra_data.disk_size);
        }

        snapshots[i].id_str = (char *) malloc(sizeof(char)*(id_str_size + 1));

        if (extra_data_size &amp;amp;gt; 0)
            fseek(fp, offset, SEEK_SET);

        fgets((char *)snapshots[i].id_str, id_str_size+1, fp);
        offset += id_str_size;

        snapshots[i].id_str[id_str_size] = '\0';
        printf(&amp;amp;quot;snapshot id: %s\n&amp;amp;quot;, snapshots[i].id_str);

        /* Read snapshot name */
        snapshots[i].name = (char *) malloc(sizeof(char)*(name_size + 1));
        fseek(fp, offset, SEEK_SET);
        fgets((char *)snapshots[i].name, name_size+1, fp);
        offset += name_size;

        snapshots[i].name[name_size] = '\0';
        printf(&amp;amp;quot;snapshot name: %s\n\n&amp;amp;quot;, snapshots[i].name);
    }
    return 0;
}
Advertisements
Standard

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s