Ganeti Deployment Notes - Céondo Technology Notes

Created: Tue 28 June 2011 / Last updated: Tue 19 June 2012

What is Ganeti?

Ganeti is a not so thin layer on the side of an hypervisor to facilitate the management of your virtual machines. It helps you move virtual machine instances from one node to another, create an instance with DRBD replication on another node and do the live migration from one to another and basically everything you can expect from a robust platform.

Historically, Ganeti started within Google to manage the business infrastructure (print servers, LDAP, accounting, etc.). I looked at it because as it was possible to install it from source and without a patched kernel on Debian Squeeze. The quality of the code, documentation, development process and discussions on the news group finished to convince me to try it. Here is the way I have setup the system with KVM.

This document must be read together with the Ganeti installation documentation. Some of the steps described in the official Ganeti installation documentation are not described here.

For reference, the software versions are:

Debian Squeeze for the nodes and the guests.
Ganeti 2.4.2 installed from source.

Terminology

node, server: The non virtualized server.
virtual machine, VM, instance: The virtualized operating system.

Everything is running Debian Squeeze in 64bit, so many terms are taken from the Debian way to name things.

Network Topology

Your cluster will run on your network, it means that this configuration will need to be adapted to fit your requirements. In this case, the servers are hosted with OVH and have:

A single physical network interface eth0 with a fixed public IP address.
A tagged interface VLAN eth0.2186. On this interface, two networks are available a private network 192.168.0.0/16 and a public network 178.33.145.128/26. The public network is a RIPE block.

The goal is to have for each VM:

A private IP address on the private network.
A public IP address on the RIPE block.
The ability to connect to all the other VMs on each network.

The private IP address is used for the infrastructure and the public IP for outside communication. The VLAN is working accross the 3 datacenters of OVH.

Base System Setup

Each server has at 12GB+ RAM and two harddrives (750GB or 1.5TB). They are all basically the same. It is important to have an homogeneous park of servers to have better predictability in the performances. It is also very important to setup them the same way. These notes are very manual, scripting things with fabric is recommended.

The base setup and Ganeti must be performed on all the nodes. As each node can become master, you need the software on each node.

The partitions are pretty simple. The base OS is on 25GB software RAID1 partition and each drive get a 12GB swap partition for a total of 36GB virtual memory.

root@node1:~# cat /etc/fstab 
# <file system> <mount point>   <type>  <options>   <dump>  <pass>
/dev/md1    /   ext4    errors=remount-ro   0   1
/dev/sda2   swap    swap    defaults    0   0
/dev/sdb2   swap    swap    defaults    0   0

The rest of the drives is used as a big LVM xenvg volume group — at origin Ganeti was only supporting Xen, this is why the default names often use xen. On this node there is 2.6 TiB of raw available storage for the VMs. No RAID is used, that is, if you create a non DRBD replicated VM, you have a single point of failure. See the replication and backup strategies below.

root@node1:~# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/sda5
  VG Name               xenvg
  PV Size               1.33 TiB / not usable 3.00 MiB
  ...

  --- Physical volume ---
  PV Name               /dev/sdb5
  VG Name               xenvg
  PV Size               1.33 TiB / not usable 3.00 MiB
  ...

After the storage setup, the network needs to be setup too. Ganeti supports both routed and bridged networking, here bridged is used.

You need to be sure to have the right packages to support the bridge and VLAN.

apt-get install vlan netcat fping tcpdump netmask bridge-utils

The setup is pretty simple, each node gets the dedicated address assigned by the provider on eth0 and a private IP address on eth0.2186 (replace 2186 with your own VLAN or maybe your own NIC, for example eth1).

root@node1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

# This is given by our provider, it used only for monitoring at the
# provider level (health, load, etc) and maintenance of the server
auto eth0
iface eth0 inet static
    address 188.165.237.1
    netmask 255.255.255.0
    network 188.165.237.0
    broadcast 188.165.237.255
    gateway 188.165.237.254

# This bridge is where all the VMs are connected. It bridges over the
# tagged interface.
auto xen-br0 
iface xen-br0 inet static
    # of course you need a different IP for each node
    address 192.168.0.1
    netmask 255.255.0.0
    network 192.168.0.0 
    broadcast 192.168.255.255
    bridge_ports eth0.2186
    bridge_stp off
    bridge_fd 0

No routes are defined on the bridge. The routes are directly defined in the VM.

Some kernel parameters need to be adjusted too, to ensure IP forwarding and a good working bridge.

root@node1:~# cat /etc/sysctl.conf 
net.ipv4.tcp_syncookies=1
net.ipv4.ip_forward=1
net.ipv4.conf.all.accept_redirects=1
net.ipv4.conf.all.accept_source_route=1
net.ipv4.conf.all.send_redirects=1
net.ipv4.conf.all.rp_filter=0
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.all.log_martians=0
net.ipv4.conf.all.proxy_arp=0
net.ipv4.conf.default.proxy_arp=0

Then update the changes:

sysctl -p

It is very important to disable proxy_arp on your interfaces. This is because you are not creating a pseudo bridge but a real one.

Ganeti prefers the en_US.UTF-8 locale and I prefer the UTC timezone, so:

dpkg-reconfigure locales 
dpkg-reconfigure tzdata

To be sure that everything is working correctly, I normally reboot.

Setting Up Ganeti

Keep your hands on the official documentation, only the Debian specific things are given here. A rather long list of packages are required, just go ahead:

apt-get install lvm2 ssh bridge-utils iproute iputils-arping \
                ndisc6 python python-pyopenssl openssl \
                python-pyparsing python-simplejson \
                python-pyinotify python-pycurl socat \
                python-paramiko debootstrap dump kpartx \
                qemu-utils gawk make drbd-utils qemu-kvm

Then download and install Ganeti itself:

mkdir -p /home/vendors
cd /home/vendors
wget http://ganeti.googlecode.com/files/ganeti-2.4.2.tar.gz
tar -xzf ganeti-2.4.2.tar.gz
cd ganeti-2.4.2
./configure --localstatedir=/var --sysconfdir=/etc
make
make install
mkdir /srv/ganeti/ /srv/ganeti/os /srv/ganeti/export

At the end, you will have to install the startup scripts and the watcher.

cp /home/vendors/ganeti-2.4.2/doc/examples/ganeti.initd /etc/init.d/ganeti
chmod +x /etc/init.d/ganeti
update-rc.d ganeti defaults 20 80
cp /home/vendors/ganeti-2.4.2/doc/examples/ganeti.cron /etc/cron.d/ganeti

The base software is now installed.

DRBD Configuration

Follow the recommendations from the official documentation. Especially, be sure to have the usermode helper being just /bin/true, that is, in your /etc/modules file you have a line with:

drbd minor_count=128 usermode_helper=/bin/true

Installing the Operating System Support Packages

To be able to install instances you need to have an Operating System installation script. You need the scripts on all the nodes (maybe using Puppet to manage them) as the creation of an instance is done directly on the target node.

The easiest way to go was to use the ganeti-instance-image package and the ganeti-instance-debootstrap package. These packages are only loosely connected to the Ganeti release number, so at the moment, it is possible to directly install them from the provided debian packages.

cd /home/vendors
wget http://code.osuosl.org/attachments/download/2169/ganeti-instance-image_0.5.1-1_all.deb
dpkg -i ganeti-instance-image_0.5.1-1_all.deb
cd /srv/ganeti/os
ln -s /usr/share/ganeti/os/image
apt-get install ganeti-instance-debootstrap
ln -s /usr/share/ganeti/os/debootstrap

The symbolic links are needed as Ganeti is looking at the OS definitions in /srv/ganeti/os.

Repeat all this setup for each node. Now, you know why automation is needed. For example, use 192.168.0.2 as the IP of the secondary node.

Startup of the Cluster and the First Secondary Node

It is extremely simple, first define the IP address of your cluster. In my case I selected 192.168.1.1 with the name clust1.ceondo.net. So, in the /etc/hosts of each node, I add:

192.168.1.1     clust1.ceondo.net

Then on the first node run:

gnt-cluster init clust1.ceondo.net

Ok, so what do you have now?

A single node in a Ganeti cluster with IP 192.168.0.1.
The single node has the primary (master) role at the cluster level, as it is master, the IP of the cluster 192.168.1.1 is added to the xen-br0 bridge. This is done automatically by Ganeti, you must not do it yourself. This IP must be on the same subnet as your bridge, because only the ip is added and it reuses the network information of the bridge.

The next step is of course to add another node to the cluster. If you do not have a DNS server for your private network, simply add the node IP to your hosts file. For example:

192.168.0.2     node2.ceondo.net

Now on the master, run:

gnt-node add node2.ceondo.net

Doing so will add the node to the cluster and update the ssh configuration the node to ensure communication between the nodes. So, now, you can get information about your nodes — here 3 instances are running on these nodes:

# gnt-node list
Node             DTotal DFree MTotal MNode MFree Pinst Sinst
node1.ceondo.net   2.7T  2.6T  11.8G  514M 11.2G     2     0
node2.ceondo.net   2.7T  2.6T  11.8G  226M 11.5G     1     0

If you have setup a third node, you can add it too... but now, you are maybe more interested by creating your first instance.

Instance Creation from an Installation Image

The first simplest way to create an instance is to simply boot an installation CD with KVM and VNC. All the operations are run on the master. If you want to create the instance on the secondary node, you need to download the iso file on the secondary node.

cd /home/vendors
wget http://cdimage.debian.org/debian-cd/6.0.1a/amd64/iso-cd/debian-6.0.1a-amd64-netinst.iso

Create your first instance without doing any installation and without starting it. Read the gnt-instance man page for more information.

gnt-instance add -t plain -s 10g -o image+default -n node1.ceondo.net \
   --no-start --no-install -H kvm:vnc_bind_address=127.0.0.1 vm116.ceondo.net

Decomposing the command to help you understand what is going on:

gnt-instance add: add an instance to the cluster.
-t plain: the instance will run from a plain LVM volume.
-s 10g: it will have a single disk of 10GB (the partitions in the disk are up to you).
-o image+default: it will use the image os with de default variant.
-n node1.ceondo.net: it will be created on node1.
--no-start --no-install: after the addition, we do not start it and we do not run the image+default os installation scripts.
-H kvm:vnc_bind_address=127.0.0.1: we inform the hypervisor that we want VNC binded on localhost. You can put 0.0.0.0 to bind on all the interfaces if you do not want to use a ssh tunnel, but this is not really secure and the default Gnome VNC viewer — Remote Desktop Viewer — supports ssh tunneling very easily.
vm116.ceondo.net: this is the name of the instance. The name must resolve. If you do not have a DNS server, put it in the node hosts file.

If you run this command, it basically just adds the instance to the cluster on node1. Now, it is time to boot and install the instance. First, we need to be sure that KVM will but with the kernel from the CD and we do not want serial console.

gnt-instance modify -H serial_console=false vm116.ceondo.net
gnt-instance modify -H kernel_path= vm116.ceondo.net

Ganeti offers very convenient tools to manage the configuration of your VMs. So, time to boot this instance:

gnt-instance start -d -H \
 boot_order=cdrom,cdrom_image_path=/home/vendors/debian-6.0.1a-amd64-netinst.iso \
 vm116.ceondo.net

When starting with the -H option, it means that for this boot and this boot only, KVM will uses these parameters. It also mean that if you restart the instance, it will not have the cdrom — which is what we want.

After you run this command, run:

gnt-instance info

at the top, you will have the information on the VNC IP and port. So just connect your VNC client. For example to 127.0.0.1:11001 and use the host, in my case provided by OVH, root@ns12345.ovh.net as SSH tunnel. You can now start the installation.

To be able to clone and reuse this instance as template for new instances, the partitions can only be ext3/ext4 or swap and the order of the disks in the partition table must be either:

/dev/$disk1    /boot
/dev/$disk2    swap
/dev/$disk3    /

/dev/$disk1    /boot
/dev/$disk2    /

I prefer to run without swap and a possible careful over commit of the memory at the node level. RedHat provides some good background information about it. The /boot partition is needed because the kernel used is not the kernel from the node.

Run the installation as usual, you will have to define the network connection manually, in my case, this means providing the RIPE block netmask and gateway information:

IP of the VM: 178.33.145.152
Netmask: 255.255.255.192
Gateway: 178.33.145.190
As it is easy, I first set the DNS server to the google one: 8.8.8.8. A private dns server is available on the private network, but the CD installation does not offer the ability to have two IP address directly.

So, everything is fine, you can finish the installation (do not forget to install SSH!) and then restart the instance without VNC:

gnt-instance modify -H vnc_bind_address= vm116.ceondo.net
gnt-instance reboot vm116.ceondo.net

then, from your personal computer, you should be able to ssh into your node:

$ ssh yourlogin@vm116.ceondo.net

Customize, clean, make this instance a base for mass deployment. It will be the template used by the image os definition. The image template will take care of changing the IP/hostname etc. for you and even the RAM and disk size.

Instance Creation with ganeti-instance-image

So, everything is nice under the Sun, you have your instance running, but now you want to start a new instance. Better not to have to go through the CD install each time. The image OS definition is doing just that. You can create a template out of a running instance and reuse it to deploy as many times as you need.

First, shutdown the instance to have the disks in a consistent state for the dump:

gnt-instance shutdown vm116.ceondo.net

Now, we create the default image OS definition. When creating a new instance, it means we will pass the -o image+default option. You can create many variants, but pay attention if you have too many of them, it will fast be a nightmare to manage them. So, our default will be:

SWAP=no
FILESYSTEM="ext4"
IMAGE_NAME="debian-6.0"
IMAGE_TYPE="dump"
IMAGE_DIR="/srv/ganeti/instance-image"
ARCH="x86_64"
CUSTOMIZE_DIR="/etc/ganeti/instance-image/hooks"
IMAGE_DEBUG=0

You can either put them in /etc/default/ganeti-instance-image as I do — this makes sane defaults for all the variants — or directly for the default variant definition in /etc/ganeti/instance-image/variants/default.conf. After you update the file, do not forget to sync it on all the cluster nodes. Again, Ganeti as some tools to do it:

gnt-cluster copyfile /etc/default/ganeti-instance-image

gnt-cluster copyfile /etc/ganeti/instance-image/variants/default.conf

This is now time to make the dump of the first instance to reuse it as template.

mkdir /srv/ganeti/instance-image
cd /srv/ganeti/os/image/tools/
./make-dump vm116.ceondo.net

Now, you have the files debian-6.0-x86_64-root.dump and debian-6.0-x86_64-boot.dump in your /srv/ganeti/instance-image folder. You need to sync this folder on all your nodes to create an instance from this template. You can also have a small NFS share, mount it as /srv/ganeti/instance-image and that way you do not have to sync. This is up to you. My provider OVH has some managed NAS which fit perfectly this requirement.

Time to create a new instance based on this template. As you can expect it, it will be for vm117.ceondo.net. As we do not want it to have the same IP address, we need to define the customization of the instance in the OS installation scripts. To do that, you need to define the network of your instance and its IP address. As you will reuse the network information many times, it receives its own definition:

# cat /etc/ganeti/instance-image/networks/subnets/ripe 
GATEWAY=178.33.145.190
NETMASK=255.255.255.192
NETWORK=178.33.145.128
BROADCAST=178.33.145.191

Basically, a simple ripe text file with the definition of the ripe network. Then for the instance, we create a file with the fully qualified name of the instance:

# cat /etc/ganeti/instance-image/networks/instances/vm117.ceondo.net 
ADDRESS=178.33.145.157
SUBNET=ripe

As the instance has its own kernel, we not only need the interfaces hook but also the grub hook to be run at OS installation:

chmod +x /etc/ganeti/instance-image/hooks/grub
chmod +x /etc/ganeti/instance-image/hooks/interfaces

Do not worry, the boilerplate is only at first run, next time you will just need a single file for the instance IP and subnet selection.

Time to add the instance:

gnt-instance add -t plain -o image+default -s 25g -n node1.ceondo.net vm117.ceondo.net

Done, the new instance is available and you can start to play with it. What you can notice is that instead of a 15GB disk, you can change to use a different size. You can also change the RAM size. Even better, you can use DRBD instead of plain LVM volume, just pass -t drbd as disk template.

For example, changing the number of virtual CPUs an the memory:

gnt-instance add -t plain -o image+default -s 100g -B memory=4G,vcpus=4 \
   -n node3.ceondo.net vm152.ceondo.net

Now, if you haven't done it yet, do not forget to add the init and crontab files of Ganeti.

Private and Public Networks

If you are using EC2, you are used to get two network interfaces for each instance, one with a private address and one with a public address. Ganeti is extremely flexible and allows you to startup an instance with two network interface or add a new network interface to an instance:

gnt-instance modify --net add vm123
Modified instance vm123
 - nic.1 -> add:mac=aa:00:00:2a:12:34,ip=None,mode=bridged,link=xen-br0

This is adding a new NIC nic.1 with a new random MAC address. The default parameters come from the cluster wide parameters. So, if your hardware node has two bridges one on the public network xen-br0 and one the private network xen-br1, you would add a NIC on the private network by running:

gnt-instance modify --net add:link=xen-br1 vm123
Modified instance vm123
 - nic.1 -> add:mac=aa:00:00:2a:12:35,ip=None,mode=bridged,link=xen-br1

The new NIC is not using the cluster wide default but the specified bridge. This provides a lot of flexibility in managing your instance networking. As this is bridged networking, you have to do the traditional network configuration at the instance level.

To create right from the start an instance with two network cards based on an image, you could run:

gnt-instance add -t plain -o image+default -s 100g -B memory=4G,vcpus=4 \
   -n node3.ceondo.net \
   --net 0:ip=192.168.1.152 \
   --net 1:ip=178.33.145.192 \
   vm152.ceondo.net

You put two --net arguments to define the two network cards.

What About Security?

After setting a new system, running nmap is a good idea. You will figure out that the remote api binds on all the interfaces of your master node. This is not so good. This can be changed. As the cluster IP in this case is on the private network, this can be use. 127.0.0.1 is also an option:

root@node1:~# cat /etc/default/ganeti
RAPI_ARGS="-b 192.168.1.1"
NODED_ARGS="-b 192.168.0.1"
CONFD_ARGS="-b 192.168.0.1"

Do not forget to have it on all your nodes. Pay attention that the remote API daemon is binding on the cluster IP and the noded, confd daemons on the IP of the node.

High Availability

Ganeti does not provide HA. It is like Amazon EC2, you can create an instance, perform backup, restore and better than EC2 you can move one to another node without downtime, but the automatic failover system is not provided.

The only provided automation is the watcher running from the cron. If an instance is down in error state, it will try to start it. Nothing more but nothing prevents you to build HA on top of Ganeti or to have HA at your application level and not at the instance level (this is what I prefer).

Replication and Backup Strategies

Replication

For real time replication you can use DRBD, just create your instance with the -t drbd template and Ganeti will take care of all the DRBD details. Please remember that replication is not backup. If you replicate corrupted data, you have nothing left, if you drop your database in your replicated instance, you have nothing left.

Again Replication is not Backup. This is why Google still use tapes to perform backup! Céondo's approach, which is not necessarily the best, but which fits the way our software is designed is:

Uses DRBD for low I/O instances requiring fast failover — for example a web load balancer or a SSH dispatcher.
Uses application level replication for high I/O instances — MySQL master/slave replication, MongoDB replica set.

Backup

Once you have replication, you can do backup. If your replication is well designed, you can stop the replication the time to perform a backup.

Ganeti provide an easy way to backup a stopped instance and restore it:

gnt-backup export <instance>
gnt-backup import <instance>

this can be a convenient way to increase the disk size of an instance as you can change the disk size at import time. The problem is of course that you need your instance to be down. To limit downtime, you can do a LVM snapshot and/or try to limit the size of your instances.

The backup destination can be on a NAS in another data center to do point in time recovery. Once you push a backup file on the NAS, chmod it as 0444 to prevent accidents.

Oh, backups are of no use if you do not test them. This is hard, it means that you need a special environment to restore and test without affecting your production system.

Performance Tuning

If you do not require very specific CPU features, you can pass to the -cpu host flag to KVM.

dpkg-divert --add --rename --divert /usr/bin/kvm.real /usr/bin/kvm
cat <<EOF > /usr/bin/kvm
#!/bin/sh
/usr/bin/kvm.real -cpu host "\$@" 
EOF
chmod +x /usr/bin/kvm

If you do not need it, you should disable VNC. In our case, it was eating 6% of a CPU all the time.

gnt-instance modify -H vnc_bind_address= <instance>

Solving Problems

Ganeti is very nice, not only because it works well, but also because when things are not going well, a lot of diagnostic tools are available to figure out what is going on. The first thing to do is checking your instance configuration:

gnt-instance info <instance>

The second one is to check the cluster info and verify it:

gnt-cluster info
gnt-cluster verify

The last one is to do some testing of your cluster.

/usr/local/lib/ganeti/tools/burnin -o image+default --disk-size=10g <newinstance>

Take a look at the information, read the manual pages, Ganeti is well designed, it means that it is usually easy to figure out what is going on.

Changelog

Tue 28 June 2011, initial version.
Wed 20 Jul 2011, fixed some typos.
Tue 8 Nov 2011, updated some security notes and the list of required packages.
Wed 9 Nov 2011, added the command to set the number of CPUs and RAM at creation time.
Tue 19 June 2012, added the way to create an instance with multiple network cards.