Software RAID

From wiki.mikejung.biz
Jump to: navigation, search

Liquidweb 728x90.jpg

mdadm overview

Mdadm is Linux Based software that allows you to use the Operating System to create and handle RAID arrays with SSDs or normal HDDs. In general, software RAID offers very good performance and is relatively easy to maintain. I've personally seen a software RAID 1 beat an LSI hardware RAID 1 that was using the same drives. A lot of software RAIDs performance depends on the CPU that is in use. If you are using a very old CPU, or are trying to run software RAID on a server that already has very high CPU usage, you may experience slower than normal performance, but in most cases there is nothing wrong with using mdadm to create software RAIDs.

Software RAID 1 Tweaks for Linux

Command to see what scheduler is being used for disks. (Can change {a..p} to whatever your disks are labeled). With software RAID, you might actually see better performance with the CFQ scheduler depending on what types of disks you are using. I suggest testing out all 3 schedulers to see what one offers the best performance for your workload.

for drive in {a..p}; do cat /sys/block/sd${drive}/queue/scheduler; done

Command to change all drives schedulers at once. This will not persist after a reboot though.

## for Deadline
for drive in {a..p}; do echo deadline > /sys/block/sd${drive}/queue/scheduler; done

## For CFQ
for drive in {a..p}; do echo cfq > /sys/block/sd${drive}/queue/scheduler; done

## For Noop
for drive in {a..p}; do echo noop > /sys/block/sd${drive}/queue/scheduler; done

To change Queue Depth of drives to 1 (Basically disables QD)

for drive in {a..p}; do echo 1 > /sys/block/sd${drive}/device/queue_depth; done

How to use fstrim to boost SSD Software Raid 1 Performance

If you notice lower than expected performance with an SSD Software RAID 1 you should run fstrim to make sure both SSDs are "trimmed". If one of the SSDs was used prior to being in the RAID then performance may be reduced, especially for random writes.

fstrim / 

I've noticed significant performance gains after running fstrim. In some cases almost 100% improvement for random write iops!

How to create a software RAID with mdadm

You can use the mdadm command to create a software RAID 10 using 4 drives

mdadm --create /dev/md0 --run --level=10 --raid-devices=4 /dev/sda /dev/sdb /dev/sdc /dev/sdd

You can also use mdadm to specify the chunk size to use for the RAID

mdadm --create /dev/md0 --run --level=10 --raid-devices=4 chunk=128 /dev/sda /dev/sdb /dev/sdc /dev/sdd

How to view software RAID status with mdadm

To view the status of software RAIDs, you can cat /proc/mdstat to view useful information about that status of your Linux software RAID. If the RAID is rebuilding, or syncing the output of the command below will tell you

cat /proc/mdstat

Chunk Size

First, an explaination of how this works.

Let's say that you wanted to write 1 byte on two disks in parallel (RAID 0). To do this, you would write half of the file to one disk and the other half to the second disk. This means that 4 bits write to one, and 4 bits write to the other. Ideally, this should reduce the amount of time needed to write in half.

Hardware cannot do this on it's own, so we need to specify this for the RAID.

  • Chunk Size = The smallest amount of data that can be written to a device.
  • Another example would be the following:
File Size = 16 KB
RAID Chunk Size = 4 KB
  • If we have two disks in a RAID 0 array, and we write this file to the array, the following will happen.
Write 1: First 4KB chunk is written to disk 1
Write 2: Second 4KB chunk is written to disk 2
Write 3: Third 4KB chunk is written to disk 1
Write 4:  Fourth 4KB chunk is written to disk 2

Now, since we can do two writes at the same time, this takes half the time it would take using one disk.

This means that depending on the type of writes happening, it could be better to use smaller, or larger chunks.

Large Writes (10GB example): Having larger chunks could improve performance. (256KB) Small Writes (4KB): Having smaller chunks could improve performance. (4KB)

RAID 0:

Chunk size depends on the amount of disks in the array.

RAID 1:

For writes, the chunk size does not make a difference, since every write must go to all disks anyway. The chunk size s pecifies the amount of data that can be read serially from the disks.

XFS and RAID Configuration

Mount options to help with performance:

#Disable access time updates to files and directories
noatime,nodiratime

Command to measure XFS fragmentation:

xfs_bmap

Mount option to help reduce fragmentation (add to /etc/fstab):

allocsize=$value (default is 64KB)

##Starting point
allocsize=64m


Command to defragment on XFS:

xfs_fsr -c frag -r /dev/sd$

MDADM Values and Meanings:

  • block size = filesystem blocks, the default is 4K
  • chunk size and stripe size = values set when the RAID is created. This would be set using mdadm.

For XFS the two main parameters are:

  • sunit: This is the stripe unit
  • swidth: This is the stripe width
  • The stripe unit (sunit) resides on a single disk in the array
  • The stripe width (swidth) spans the entire array, this is similar to the stripe size of the array.
  • To calculate what these should be, you need to know the following:
1) What type of RAID are you using? (RAID 0, RAID 1, RAID 10, etc)
2) The number of disks in the array
3) The stripe size (chunk size) of the array
  • For RAID 0,1,10 this is equal to the number of spindles (total disks, mirrored or stiped)
  • The sunit is measured in 512-byte block units, so this is what the value should be with the following chunks:
64KB stripe size: sunit=128
256KB stripe size: sunit=512
  • The swidth spans the entire array, but it is also measures in 512 byte blocks. To determine this value you would multiply the number of disks in the array by the sunit value.
  • Example: 4 disk RAID 10 with a chunk size of 64KB
4 x 128 = 512

The command you would run to format the RAID for XFS:

mkfs.xfs -d sunit=128, swidth=512 /dev/md$value
  • Example: 8 disk RAID 10 with a chunk size of 64KB
8 x 128 = 1024

The command you would run to format the RAID for XFS:

mkfs.xfs -d sunit=128, swidth=1024 /dev/md$value

How to increase MDADM Software Raid Resync Limits

When you first create a software RAID using MDADM you may notice that it takes quite some time to resync / sync the RAID, during this time performance may be degraded.

The default sync limits for MDADM on CentOS 7 are:

dev.raid.speed_limit_max = 300000
dev.raid.speed_limit_min = 30000

To check the current values for dev.raid.speed_limit_max and dev.raid.speed_limit_min you can use the sysctl utility to check:

sysctl -a | grep -i raid

If you are using SSDs, you can increase these limits significantly which will shorten the time it takes for the software RAID to sync. We do this by using sysctl to update the limits. You can run this on CentOS 6, CentOS 7 or any recent Ubuntu Distro. To do this, run the following:

sysctl -w dev.raid.speed_limit_max=3000000
sysctl -w dev.raid.speed_limit_min=3000000
sysctl -p

You can then check the resync speed by running watch against /proc/mdstat

watch -n 1 cat /proc/mdstat

At this point the new settings should be applied after a reboot and future RAID resyncs will always complete as fast as possible (unless you're using crazy fast SSDs then you may need to raise this limit even further).