GNU Linux/Software RAID

From WhyAskWhy.org Wiki
< GNU Linux
Revision as of 05:45, 8 August 2012 by Deoren (talk | contribs) (Saving progress)
Jump to: navigation, search



The following content is a Work In Progress and may contain broken links, incomplete directions or other errors. Once the initial work is complete this notice will be removed. Please contact me via Twitter with any questions and I'll try to help you out.


These are my scratch notes for recovering Software RAID arrays on a GNU/Linux box. The examples here are for a CentOS 5.x box, but presumably any recent GNU/Linux distro could be used that has support for Software RAID via mdadm. In case it's not clear, I'm a newbie when it comes to Software RAID, so some of these steps may be redundant or nonsensical. If so, please feel free to point that out so I can make this easier to read.


The problem report

This started off with me receiving emails from mdadm (that was monitoring three RAID devices on a 1U server with 4 physical disks) that there was a DegradedArray event on md device /dev/md0. Further directions on this page make heavy use of the mdadm tool for administrative tasks dealing with GNU/Linux Software RAID devices.

This is an automatically generated mail message from mdadm running on server.example.org

A DegradedArray event had been detected on md device /dev/md0.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid5 sdd1[3] sdc1[2] sdb2[1]
2917676544 blocks level 5, 256k chunk, algorithm 2 [4/3] [_UUU]

md1 : active raid1 sdd2[1] sdc2[0]
8385856 blocks [2/2] [UU]

md3 : active raid5 sdd3[3] sdc3[2] sdb3[1]
2917700352 blocks level 5, 256k chunk, algorithm 2 [4/3] [_UUU]

md0 : active raid1 sdb1[1]
8385792 blocks [2/1] [_U]

unused devices: <none>


In the output above, anywhere you see an underscore it represents a failed member of the RAID array. This means that the following RAID devices need repair:

  • /dev/md0
  • /dev/md2
  • /dev/md3


Gathering Information

We won't need it right away, but it's a good idea to ahead and gather some basic system information use for reference later while restoring the RAID devices.


mdadm.conf contents

cat /etc/mdadm.conf
DEVICE partitions
MAILADDR root
ARRAY /dev/md0 level=raid1 num-devices=2 uuid=cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
ARRAY /dev/md3 level=raid5 num-devices=4 uuid=a5690093:5c58a8d9:ac966bcf:a00660c2
ARRAY /dev/md2 level=raid5 num-devices=4 uuid=a45e768c:246aca55:1c012e56:58dd3958
ARRAY /dev/md1 level=raid1 num-devices=2 uuid=183e0f5d:2ac92a56:f064a724:9c4cc3a4


Partitions list

fdisk -l
Disk /dev/sda: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        1044     8385898+  fd  Linux raid autodetect
/dev/sda2            1045      122122   972559035   fd  Linux raid autodetect
/dev/sda3          122123      243201   972567067+  fd  Linux raid autodetect

Disk /dev/sdb: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1        1044     8385898+  fd  Linux raid autodetect
/dev/sdb2            1045      122122   972559035   fd  Linux raid autodetect
/dev/sdb3          122123      243201   972567067+  fd  Linux raid autodetect

Disk /dev/sdc: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1   *           1      121078   972559003+  fd  Linux raid autodetect
/dev/sdc2          121079      122122     8385930   fd  Linux raid autodetect
/dev/sdc3          122123      243201   972567067+  fd  Linux raid autodetect

Disk /dev/sdd: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *           1      121078   972559003+  fd  Linux raid autodetect
/dev/sdd2          121079      122122     8385930   fd  Linux raid autodetect
/dev/sdd3          122123      243201   972567067+  fd  Linux raid autodetect

Notice a pattern in the partition sizes?


RAID device composition

Based on a previous conversation with another tech, I knew that a RAID device could be composed of entire disks or partitions from multiple disks. The advantage of using partitions instead of entire disks is the ease in which you can satisfy the requirement that all RAID members be the same size. In this case, partitions were used to assemble the RAID devices instead of entire disks.


Determining what disks or partitions a RAID device is composed of

We can determine RAID device members via several different methods, each one with varying degrees of reliability.

  • Based on block size alone
  • mdadm --examine /dev/DEVICE
    
  • mdadm --misc --detail /dev/DEVICE
    

We can also combine these as shown below.


Finding array members based on block size

In our output earlier we saw that /dev/sdb1 is part of /dev/md0 which is a RAID 1 array that is composed of two disks. Since /dev/sdb1 is present, we know we're only missing one other disk.

Since RAID devices require identically sized array members, I realized that to find other array members I could determine the block size of one member and use that to find matching partitions/disk. So if /dev/sdb1 as the remaining member of the /dev/md0 array has a block size of 8385898, the other member would also need to have the same block size. This wouldn't guarantee that all partitions/disks in the list would be a direct match, but the likelihood would be high.

Because we have all arrays active, we have a listing of the remaining members (/proc/mdstat contents in the problem report we received via email). We can use that information to find candidates for the matching members.

fdisk -l | grep 8385898
/dev/sda1   *           1        1044     8385898+  fd  Linux raid autodetect
/dev/sdb1   *           1        1044     8385898+  fd  Linux raid autodetect

It would appear that /dev/sda1</code is the missing member of the /dev/md0 RAID device, but let's confirm that.


Finding array members using mdadm --examine

First we look back at the contents of mdadm.conf and see that for /dev/md0 we have this line:

ARRAY /dev/md0 level=raid1 num-devices=2 uuid=cbae8de5:892d4ac9:c1cb8fb2:5f4ab019

That UUID is the identifier for that particular RAID device and is written to a special superblock on all disks participating in the array. This is the default behavior for Software RAID arrays now; it used to be optional. Using that UUID we can verify that one of the identically sized disk/partitions we suspect of being a RAID device member is truly a member.

For example, let's start with getting the UUID for /dev/sdb1:

mdadm --examine /dev/sdb1

Snippet from the output:

/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
  Creation Time : Wed Jul 13 23:04:19 2011
     Raid Level : raid1
  Used Dev Size : 8385792 (8.00 GiB 8.59 GB)
     Array Size : 8385792 (8.00 GiB 8.59 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Wed Aug  8 06:35:41 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : be673899 - correct
         Events : 4493518


That UUID matches the corresponding line in mdadm.conf as we suspected.

mdadm --examine /dev/sda1


/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
  Creation Time : Wed Jul 13 23:04:19 2011
     Raid Level : raid1
  Used Dev Size : 8385792 (8.00 GiB 8.59 GB)
     Array Size : 8385792 (8.00 GiB 8.59 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Wed Aug  8 06:35:41 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : be673887 - correct
         Events : 4493518


The UUID for /dev/sda1 matches as well, so we've found our missing member for the /dev/md0 RAID device.


There is also a quicker way to make comparisons, assuming that all of our RAID device members are mapped to /dev/sdX devices:

for dev  in $(ls /dev/sd*); do mdadm --examine $dev|grep UUID; done
mdadm: No md superblock detected on /dev/sda.
           UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
           UUID : a45e768c:246aca55:1c012e56:58dd3958
           UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
mdadm: No md superblock detected on /dev/sdb.
           UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
           UUID : a45e768c:246aca55:1c012e56:58dd3958
           UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
mdadm: No md superblock detected on /dev/sdc.
           UUID : a45e768c:246aca55:1c012e56:58dd3958
           UUID : 183e0f5d:2ac92a56:f064a724:9c4cc3a4
           UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
mdadm: No md superblock detected on /dev/sdd.
           UUID : a45e768c:246aca55:1c012e56:58dd3958
           UUID : 183e0f5d:2ac92a56:f064a724:9c4cc3a4
           UUID : a5690093:5c58a8d9:ac966bcf:a00660c2

The No md superblock detected on lines help to mark which disk we're looking at, and all following lines show the UUIDs for the partitions. We can visually see in this case that /dev/sda1 and /dev/sdb1 are members of the same array. By looking at the contents of mdadm.conf, we can tell which one.


FIX ALL CONTENT BELOW THIS

Determining what disks or partitions a RAID device is composed of

You have several options, but the best approach I've found so far is to identify a member of a RAID device and query it with mdadm to determine matching members:

mdadm --examine /dev/DEVICE

For example,








Determining array members based on mdadm output

mdadm allows us to get the list of array members for a specified array with a short command:

mdadm --misc --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Wed Jul 13 23:04:19 2011
     Raid Level : raid1
     Array Size : 8385856 (8.00 GiB 8.59 GB)
  Used Dev Size : 8385856 (8.00 GiB 8.59 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Tue Aug  7 14:12:52 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 183e0f5d:2ac92a56:f064a724:9c4cc3a4
         Events : 0.168

    Number   Major   Minor   RaidDevice State
       0       8       34        0      active sync   /dev/sdc2
       1       8       50        1      active sync   /dev/sdd2

In this case we see that both /dev/sda1 and /dev/sdb1 make up the /dev/md0 RAID device, so we'll need to make sure both are active. In this case we can see that they are because I've already added /dev/sda1 back to the array.

However, you require the RAID device to be active and all members assembled in order to get the complete listing. Otherwise, you have to example each disk/partition for the presence of the UUID that identifies that RAID device. This is where having a current mdadm.conf file really comes in handy.


Repairing the root RAID device

To find out which array member is missing, let's determine which one isn't:


cat /proc/mdstat | grep md0
md0 : active raid1 sdb1[1]

So, it appears that /dev/sda1 needs to be added back.

mdadm --add /dev/md0 /dev/sda1

Snippet from /var/log/messages related to the last command:

Aug  7 14:16:00 lockss1 kernel: md: bind<sda1>
Aug  7 14:16:00 lockss1 kernel: RAID1 conf printout:
Aug  7 14:16:00 lockss1 kernel:  --- wd:1 rd:2
Aug  7 14:16:00 lockss1 kernel:  disk 0, wo:1, o:1, dev:sda1
Aug  7 14:16:00 lockss1 kernel:  disk 1, wo:0, o:1, dev:sdb1
Aug  7 14:16:00 lockss1 kernel: md: syncing RAID array md0
Aug  7 14:16:00 lockss1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Aug  7 14:16:00 lockss1 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Aug  7 14:16:00 lockss1 kernel: md: using 128k window, over a total of 8385792 blocks.
Aug  7 14:22:31 lockss1 kernel: md: md0: sync done.
Aug  7 14:22:31 lockss1 kernel: RAID1 conf printout:
Aug  7 14:22:31 lockss1 kernel:  --- wd:2 rd:2
Aug  7 14:22:31 lockss1 kernel:  disk 0, wo:0, o:1, dev:sda1
Aug  7 14:22:31 lockss1 kernel:  disk 1, wo:0, o:1, dev:sdb1
Aug  7 14:26:10 lockss1 smartd[3387]: Device: /dev/sda, 1 Currently unreadable (pending) sectors
tail /proc/mdstat
md1 : active raid1 sdd2[1] sdc2[0]
      8385856 blocks [2/2] [UU]

md3 : active raid5 sdd3[3] sdc3[2] sdb3[1]
      2917700352 blocks level 5, 256k chunk, algorithm 2 [4/3] [_UUU]

md0 : active raid1 sda1[0] sdb1[1]
      8385792 blocks [2/2] [UU]

unused devices: <none>

Even with the smartd error, it looks like /dev/md0 is holding. We'll have to go back at some point and run fsck on it from a rescue disc so the filesystem isn't mounted while we're trying to verify its consistency.


mdadm --misc --detail /dev/md1

As we can see, /dev/md0 has been restored to service.

/dev/md0:
        Version : 0.90
  Creation Time : Wed Jul 13 23:04:19 2011
     Raid Level : raid1
     Array Size : 8385792 (8.00 GiB 8.59 GB)
  Used Dev Size : 8385792 (8.00 GiB 8.59 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Aug  7 17:50:16 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
         Events : 0.4493518

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1


Repairing md3

For this particular server the /dev/md2 and /dev/md3 RAID devices are used to store content that is accessed pretty infrequently, so we can actually take the system out of service while we're recovering the arrays. RAID allows us to rebuild the arrays but still use them in their degraded state with reduced performance, but it is important to understand that rebuilding an array puts a high demand on all RAID members that are used to restore content. In my case I'm able to take the server out of service to allow for faster restoration of the arrays.

(Optional) Stopping services that are configured to use the RAID devices

service stop SERVICE_NAME
umount /dev/md2
umount /dev/md3

Unmounting the file systems should stop services/daemons from hitting the RAID devices we're trying to repair.

mdadm --stop /dev/md2
mdadm: stopped /dev/md2
mdadm --stop /dev/md3
mdadm: stopped /dev/md3

We've now stopped the devices and are ready to begin adding the missing members.

Determining the array members for /dev/md3