DRBD

From wiki.mikejung.biz
Jump to: navigation, search

Liquidweb 728x90.jpg

How to install DRBD on CentOS 6

These commands will install the kernel module and utilities for DRBD on CentOS 6

wget http://dl.atrpms.net/el6-x86_64/atrpms/stable/drbd-8.4.3-33.el6.x86_64.rpm
wget http://dl.atrpms.net/all/drbd-kmdl-2.6.32-358.14.1.el6-8.4.3-33.el6.x86_64.rpm
rpm -iv drbd-kmdl-2.6.32-358.14.1.el6-8.4.3-33.el6.x86_64.rpm
rpm -iv drbd-8.4.3-33.el6.x86_64.rpm

Example DRBD Configs and Tweaks

Set Linux I/O Scheduler to Deadline

vim /boot/grub/grub.conf 

At the end of the "kernel" line, add in "elevator=deadline"

title CentOS (2.6.32-358.14.1.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-358.14.1.el6.x86_64 ro root=UUID=d5567ba9-fef9-41a9-af8c-2a1e0ce0ec80 rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD nodmraid SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_NO_LVM rd_NO_DM elevator=deadline

Global Config

This is the default config with a lot of the comments removed. Most of the "optimizations" are placed here.

global {
        usage-count yes;
}

common {
        protocol C;
        handlers {
        }

        startup {
                # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
        }

        options {
                # cpu-mask on-no-data-accessible
        }

        disk {
        al-extents 3389;
        disk-barrier no;
        disk-flushes no;
        }

        net {
        sndbuf-size 1024k;
        unplug-watermark 16;
        max-buffers 8000;
        max-epoch-size 8000;
        }
}

DRBD Replication Modes

  • Protocol A

Asynchronous replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has been placed in the local TCP send buffer. In the event of forced fail-over, data loss may occur. The data on the standby node is consistent after fail-over, however, the most recent updates performed prior to the crash could be lost.

  • Protocol B

Memory synchronous (semi-synchronous) replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has reached the peer node. Normally, no writes are lost in case of forced fail-over. However, in the event of simultaneous power failure on both nodes and concurrent, irreversible destruction of the primary's data store, the most recent writes completed on the primary may be lost.

  • Protocol C

Synchronous replication protocol. Local write operations on the primary node are considered completed only after both the local and the remote disk write have been confirmed. As a result, loss of a single node is guaranteed not to lead to any data loss. Data loss is, of course, inevitable even with this replication protocol if both nodes (or their storage subsystems) are irreversibly destroyed at the same time.

  • Protocol C appears to be the safest, however it could also be the slowest. I will try and test out the various modes to see exactly how much of a performance difference there is.

Tuning

  • DRBD 8.4.3 uses two files in /etc/drbd.d/ one is a global file, the other is per resource. Depending on what you want to tune determines where to set the value.

Command to apply new settings:

drbdadm adjust all


Network Interfaces and DRBD

DRBD loves fast networks and fast NICS. There is a rather large performance difference between using 1Gb NICS and 10Gb NICS. This is true even if your storage is not capable of reading or writing faster than the network. The real gain is from reduced latency, which generally comes along with a nicer NIC, which generally happen to be 10Gb+.

Since all writes on a DRBD cluster generally have to wait for the secondary node to complete it's critical to try and reduce network latency as much as possible. If you are not getting good performance out of a 1GB NIC then I would suggest using 10GB NICS.


al-extents

Activity Log Extents

  • Each extent marks 4M of backing storage. Configuring this tells DRBD how big the "hot" area can get. If a node goes down, it must resync this area when it comes back online, so the larger this is set, the longer the resync time will be. Any change to a file (write) in the hot area must be updated on the meta data device for the DRBD volume. The higher this goes the less often the meta data needs to be updated, which will improve performance by reducing the amount of updates to the meta data, however if this is set really high a resync could take a long time.

If you are using Spinning disks, and your application performs many small writes on a large dataset, increasing al-extents can help performance. If you are using SSDs or don't have a much of a random write workload then leaving this value alone is probably ok.

If you lower al-extents you will be performing a lot more metadata updates (writes) which is going to kill performance unless you have a dedicated drive, or SSD which is specifically handling the Activity Log. If you are using SSDs then this might not make a massive impact either way.

TL;DR Version

  • Lower al-extents = shorter resync times at the expense of lower write performance.
  • Higher al-extents = improved write performance, but longer resync times.

Either way you should set the value to be a Prime number because math Also Linbit told me so personally. The recommended range for this should be between 7 and 3833. The default value is 127.

disk {
# DEFAULT: al-extents 127;
al-extents 3833;
}

unplug-watermark

The unplug-watermark setting determines how often your storage system is forced to process pending IO requests (writes). The higher this value is, the more often you force writes to disk. If you are using a BBU to cache writes, you probably don't want to set this very high, otherwise you are not going to utilize the RAID cache efficiently. For SSDs and really fast RAIDs you probably should set this value to the lowest possible value 16, then try 32, 64 and see if performance is getting better or worse.

You should set this on both nodes.

net {
# DEFAULT is 128
# MIN value is 16
# MAX value is 131072

unplug-watermark 16;
}

sndbuf-size

The send buffer is used to store packets that need to be sent to the secondary node, but have not yet been acknowledged by the secondary node. The default value is 128K and should be ok for most use cases. If you are a baller and have a really sweet network with 10Gb + Nics and are using a direct connection between the two nodes, then you might see some performance gains by raising this to 256K or even 512K.

Setting this too high is not a good idea, otherwise you risk losing data if the primary node goes down without clearing out the buffer. If that happens and you set this to a few MBs, then you just lost some data which is entirely not the point of DRBD. This should probably be one of the last settings to tweak unless you have a baller network.

net {
sndbuf-size 128K;
}

max-epoch-size and max-buffers

These two options affect random write performance on the secondary node.

max-buffers determines the maximum buffers that DRBD will allocate for writing data to the disk. If you are doing a lot of random writes, and are NOT using SSDs, then a higher value might help to reduce latency by writing more data at once instead of writing less data more often.

max-epoch-size is the upper limit of write requests that are allowed between two write barriers.

The network latency really adds up with random writes because you must wait for TCP handshakes between the two nodes. If you have a 10Gb or faster direct connection between the two nodes and you are using SSDs then these buffers probably won't dramatically improve performance. However if you are using spinning disks, then raising these values can significantly improve performance. Mileage will vary, so it is worth testing out larger and smaller values to see which settings work best. In general though, the less TCP handshakes the better.

You must set max-buffers to be equal to, or greater than max-epoch-size. Otherwise you are going to hurt performance.


The Default settings are:

net {
max-buffers 2048;
max-epoch-size 2048;
}

DRBD suggests using larger values if you are using RAID, but why would you use RAID if you have DRBD? ;)

net {
max-buffers 8001;
max-epoch-size 8001;
}