Difference between revisions of "GNU Linux/Software RAID"

From WhyAskWhy.org Wiki
Jump to: navigation, search
m (WIP)
m (Fixed unclosed brace)
 
(32 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{WorkInProgress}}
+
[[Category:GNU_Linux]]
 +
[[Category:Tips]]
 +
[[Category:NeedsCleanup]]
  
These are my scratch notes for recovering Software RAID arrays on a GNU/Linux box. The examples here are for a CentOS 5.x box, but presumably any recent GNU/Linux distro could be used that has support for Software RAID via <code>mdadm</code>. In case it's not clear, I'm a newbie when it comes to Software RAID, so some of these steps may be redundant or nonsensical. If so, please feel free to point that out so I can make this easier to read.
+
These are my scratch notes for recovering Software RAID arrays on a GNU/Linux box. The examples here are for a CentOS 5.x box (mdadm version v2.6.9, RAID metadata version 0.90), but presumably any recent GNU/Linux distro could be used that has support for Software RAID via <code>mdadm</code>. In case it's not clear, I'm a newbie when it comes to Software RAID, so some of these steps may be redundant or nonsensical. If so, please feel free to point that out so I can make this easier to read.
  
 +
 +
== References ==
 +
 +
=== Used in this guide ===
 +
 +
* [http://www.linuxfoundation.org/collaborate/workgroups/linux-raid Linux Raid, by the Linux Foundation]
 +
* [https://raid.wiki.kernel.org/index.php/Linux_Raid Linux-raid kernel list community-managed reference for Linux software RAID] (this was my primary source of information)
 +
* [http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html The Software-RAID HOWTO]
 +
* [https://wiki.archlinux.org/index.php/RAID Arch Linux Wiki - RAID]
 +
* [http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch26_:_Linux_Software_RAID LHN - Linux Software RAID]
 +
* [http://en.gentoo-wiki.com/wiki/RAID/Software Gentoo Linux Wiki - RAID] great tips on various RAID metadata levels & their compatibility with grub)
 +
* [http://stromberg.dnsalias.org/~strombrg/Linux-software-RAID.html Dan Stromberg's Software RAID notes]
 +
* [http://www.dedoimedo.com/computers/linux-raid.html dedoimedo.com - How to configure RAID in Linux]
 +
* [http://www.cyberciti.biz/faq/can-i-run-fsck-or-e2fsck-when-linux-file-system-is-mounted/ Can I run fsck or e2fsck when Linux file system is mounted?]
 +
 +
=== Used by me, but not referred to in this particular guide ===
 +
 +
* [http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array#comment-22937 HowToForge.com - Comment by pupu] (recovery a RAID array where the boot disk is/has gone out)
 +
* [http://wiki.centos.org/HowTos/GrubInstallation CentOS.org - GrubInstall HowTo] (reminder of Grub naming conventions for disks)
 +
* [http://www.anchor.com.au/hosting/support/Linux_Software_RAID_Repair Anchor Knowledgebase - Linux Software RAID] (worked for me!)
 +
 +
== Misc Notes ==
 +
 +
=== RAID 5 ===
 +
 +
RAID 5 doesn't support multiple disks going out. After more than one goes out a RAID 5 device will shut down.
 +
 +
=== mdadm.conf file ===
 +
 +
Keep a safe copy of your mdadm.conf file. If your Software RAID arrays are healthy and you don't have a mdadm.conf file, you can create a new one by using the <code>mdadm</code> command to generate a new one.
 +
 +
Depending on your distro, you'll either use this line:
 +
 +
<syntaxhighlight lang="bash">
 +
mdadm --detail --scan > /etc/mdadm.conf
 +
</syntaxhighlight>
 +
 +
or this one:
 +
 +
<syntaxhighlight lang="bash">
 +
mdadm --detail --scan > /etc/mdadm/mdadm.conf
 +
</syntaxhighlight>
  
  
 
== The problem report ==
 
== The problem report ==
  
This started off with me receiving emails from mdadm (that was monitoring three RAID devices on a 1U server with 4 physical disks) that there was a DegradedArray event on md device <code>/dev/md0</code>.
+
This started off with me receiving emails from mdadm (that was monitoring three RAID devices on a 1U server with 4 physical disks) that there was a DegradedArray event on md device <code>/dev/md0</code>. Further directions on this page make heavy use of the <code>mdadm</code> tool for administrative tasks dealing with GNU/Linux Software RAID devices.
  
 
<pre>
 
<pre>
Line 111: Line 155:
  
  
== Determining what disks or partitions the RAID device is composed of ==
+
== RAID device composition ==
  
 
Based on a previous conversation with another tech, I knew that a RAID device could be composed of entire disks or partitions from multiple disks. The advantage of using partitions instead of entire disks is the ease in which you can satisfy the requirement that all RAID members be the same size. In this case, partitions were used to assemble the RAID devices instead of entire disks.
 
Based on a previous conversation with another tech, I knew that a RAID device could be composed of entire disks or partitions from multiple disks. The advantage of using partitions instead of entire disks is the ease in which you can satisfy the requirement that all RAID members be the same size. In this case, partitions were used to assemble the RAID devices instead of entire disks.
  
In our output earlier we saw that <code>/dev/sdb1</code> is part of <code>/dev/md0</code> which is a [[wikipedia:Standard_RAID_levels#RAID_1|RAID 1]] array that is composed of two disks. Since <code>/dev/sdb1</code> is present, we know we're only missing one other disk.
 
  
Since RAID devices require identically sized array members, I realized that to find other array members I could determine the block size of one member and use that to find matching partitions/disk. So if <code>/dev/sdb1</code> as the remaining member of the <code>/dev/md0</code> array has a block size of <code>8385898</code>, the other member would also need to have the same block size. This wouldn't guarantee that all partitions/disks in the list would be a direct match, but the likelihood would be high.
+
== Determining what disks or partitions a RAID device is composed of ==
  
Thankfully using <code>fdisk -l</code> and comparing partition/disk sizes isn't necessary (at least not in our case), and we can use the RAID device member that we're aware of to find the matching members. Because we have all arrays active, we have a listing of the remaining members (<code>/proc/mdstat</code> contents in the problem report we received via email). We can use that information to determine which members have failed.
+
We can determine RAID device members via several different methods, each one with varying degrees of reliability.
  
 +
* Based on block size alone
 +
* <syntaxhighlight lang="bash">mdadm --examine /dev/DEVICE</syntaxhighlight>
 +
* <syntaxhighlight lang="bash">mdadm --misc --detail /dev/DEVICE</syntaxhighlight>
  
== FIX ALL CONTENT BELOW THIS ==
+
We can also combine these as shown below.
  
  
 +
=== Finding array members based on block size ===
  
 +
In our output earlier we saw that <code>/dev/sdb1</code> is part of <code>/dev/md0</code> which is a [[wikipedia:Standard_RAID_levels#RAID_1|RAID 1]] array that is composed of two disks. Since <code>/dev/sdb1</code> is present, we know we're only missing one other disk.
  
 +
Since RAID devices require identically sized array members, I realized that to find other array members I could determine the block size of one member and use that to find matching partitions/disk. So if <code>/dev/sdb1</code> as the remaining member of the <code>/dev/md0</code> array has a block size of <code>8385898</code>, the other member would also need to have the same block size. This wouldn't guarantee that all partitions/disks in the list would be a direct match, but the likelihood would be high.
  
 
+
Because we have all arrays active, we have a listing of the remaining members (<code>/proc/mdstat</code> contents in the problem report we received via email). We can use that information to find candidates for the matching members.
 
 
 
 
=== Determining array members based on block size ===
 
 
 
Since RAID devices require identical sizes on the array members, I realized that to find other array members I could determine the block size of one member and use that to find matching partitions/disk. So if <code>/dev/sdb1</code> as the remaining member of the <code>/dev/md0</code> array and it has a block size of <code>8385898</code>, the other member would also need to have the same block size. This wouldn't guarantee that all partitions/disks in the list would be a direct match, but it would help trim down the partitions/disks we have to check.
 
  
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
Line 144: Line 188:
 
</pre>
 
</pre>
  
In our output earlier we saw that <code>/dev/sdb1</code> was part of <code>/dev/md0</code>
+
It would appear that <code>/dev/sda1</code is the missing member of the <code>/dev/md0</code> RAID device, but let's confirm that.
  
  
=== Determining array members based on mdadm output ===
+
=== Finding array members using mdadm --examine ===
  
<code>mdadm</code> allows us to get the list of array members for a specified array with a short command:
+
First we look back at the contents of <code>mdadm.conf</code> and see that for <code>/dev/md0</code> we have this line:
 +
 
 +
<pre>
 +
ARRAY /dev/md0 level=raid1 num-devices=2 uuid=cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
 +
</pre>
 +
 
 +
That [[wikipedia:Universally_unique_identifier|UUID]] is the identifier for that particular RAID device and is written to a special superblock on all disks participating in the array. This is the default behavior for Software RAID arrays now; it used to be optional. Using that UUID we can verify that one of the identically sized disk/partitions we suspect of being a RAID device member is truly a member.
 +
 
 +
For example, let's start with getting the UUID for <code>/dev/sdb1</code>:
  
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
mdadm --misc --detail /dev/md1
+
mdadm --examine /dev/sdb1
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
 +
Snippet from the output:
  
 
<pre>
 
<pre>
/dev/md1:
+
/dev/sdb1:
         Version : 0.90
+
          Magic : a92b4efc
 +
        Version : 0.90.00
 +
          UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
 +
  Creation Time : Wed Jul 13 23:04:19 2011
 +
    Raid Level : raid1
 +
  Used Dev Size : 8385792 (8.00 GiB 8.59 GB)
 +
    Array Size : 8385792 (8.00 GiB 8.59 GB)
 +
  Raid Devices : 2
 +
  Total Devices : 2
 +
Preferred Minor : 0
 +
 
 +
    Update Time : Wed Aug  8 06:35:41 2012
 +
          State : clean
 +
Active Devices : 2
 +
Working Devices : 2
 +
Failed Devices : 0
 +
  Spare Devices : 0
 +
      Checksum : be673899 - correct
 +
        Events : 4493518
 +
</pre>
 +
 
 +
 
 +
That UUID matches the corresponding line in <code>mdadm.conf</code> as we suspected.
 +
 
 +
<syntaxhighlight lang="bash">
 +
mdadm --examine /dev/sda1
 +
</syntaxhighlight>
 +
 
 +
 
 +
<pre>
 +
/dev/sda1:
 +
          Magic : a92b4efc
 +
         Version : 0.90.00
 +
          UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
 
   Creation Time : Wed Jul 13 23:04:19 2011
 
   Creation Time : Wed Jul 13 23:04:19 2011
 
     Raid Level : raid1
 
     Raid Level : raid1
    Array Size : 8385856 (8.00 GiB 8.59 GB)
+
  Used Dev Size : 8385792 (8.00 GiB 8.59 GB)
  Used Dev Size : 8385856 (8.00 GiB 8.59 GB)
+
    Array Size : 8385792 (8.00 GiB 8.59 GB)
 
   Raid Devices : 2
 
   Raid Devices : 2
 
   Total Devices : 2
 
   Total Devices : 2
Preferred Minor : 1
+
Preferred Minor : 0
    Persistence : Superblock is persistent
 
  
     Update Time : Tue Aug  7 14:12:52 2012
+
     Update Time : Wed Aug  8 06:35:41 2012
 
           State : clean
 
           State : clean
 
  Active Devices : 2
 
  Active Devices : 2
Line 173: Line 259:
 
  Failed Devices : 0
 
  Failed Devices : 0
 
   Spare Devices : 0
 
   Spare Devices : 0
 +
      Checksum : be673887 - correct
 +
        Events : 4493518
 +
</pre>
 +
 +
 +
The UUID for <code>/dev/sda1</code> matches as well, so we've found our missing member for the <code>/dev/md0</code> RAID device.
 +
  
 +
There is also a quicker way to make comparisons, assuming that all of our RAID device members are mapped to <code>/dev/sdX</code> devices:
 +
 +
<syntaxhighlight lang="bash">
 +
for dev  in $(ls /dev/sd*); do mdadm --examine $dev | grep UUID; done
 +
</syntaxhighlight>
 +
 +
<pre>
 +
mdadm: No md superblock detected on /dev/sda.
 +
          UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
 +
          UUID : a45e768c:246aca55:1c012e56:58dd3958
 +
          UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
 +
mdadm: No md superblock detected on /dev/sdb.
 +
          UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
 +
          UUID : a45e768c:246aca55:1c012e56:58dd3958
 +
          UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
 +
mdadm: No md superblock detected on /dev/sdc.
 +
          UUID : a45e768c:246aca55:1c012e56:58dd3958
 +
          UUID : 183e0f5d:2ac92a56:f064a724:9c4cc3a4
 +
          UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
 +
mdadm: No md superblock detected on /dev/sdd.
 +
          UUID : a45e768c:246aca55:1c012e56:58dd3958
 
           UUID : 183e0f5d:2ac92a56:f064a724:9c4cc3a4
 
           UUID : 183e0f5d:2ac92a56:f064a724:9c4cc3a4
        Events : 0.168
+
          UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
 +
</pre>
 +
 
 +
The <code>No md superblock detected on</code> lines help to mark which disk we're looking at, and all following lines show the UUIDs for the partitions. We can visually see in this case that <code>/dev/sda1</code> and <code>/dev/sdb1</code> are members of the same array. By looking at the contents of <code>mdadm.conf</code>, we can tell which one.
 +
 
 +
We now have the information we need to attempt a repair of <code>/dev/md0</code>:
 +
 
 +
{| class="wikitable"
 +
|-
 +
!RAID Device
 +
|<code>/dev/md0</code>
 +
|-
 +
!RAID UUID
 +
|<code>cbae8de5:892d4ac9:c1cb8fb2:5f4ab019</code>
 +
|-
 +
!Members
 +
|<code>/dev/sda1</code>, <code>/dev/sdb1</code>
 +
|}
  
    Number  Major  Minor  RaidDevice State
 
      0      8      34        0      active sync  /dev/sdc2
 
      1      8      50        1      active sync  /dev/sdd2
 
</pre>
 
  
In this case we see that both <code>/dev/sda1</code> and <code>/dev/sdb1</code> make up the <code>/dev/md0</code> RAID device, so we'll need to make sure both are active. In this case we can see that they are because I've already added <code>/dev/sda1</code> back to the array.
+
However, since we've collected the information for <code>/dev/md0</code>, we might as well collect the rest of the information for the other RAID device members.
  
However, you require the RAID device to be active and all members assembled in order to get the complete listing. Otherwise, you have to example each disk/partition for the presence of the UUID that identifies that RAID device. This is where having a current <code>mdadm.conf</code> file really comes in handy.
+
{| class="wikitable"
 +
|+RAID Device members
 +
|-
 +
!RAID Device
 +
|<code>/dev/md0</code>
 +
|<code>/dev/md1</code>
 +
|<code>/dev/md2</code>
 +
|<code>/dev/md3</code>
 +
|-
 +
!RAID UUID
 +
|<code>cbae8de5:892d4ac9:c1cb8fb2:5f4ab019</code>
 +
|<code>183e0f5d:2ac92a56:f064a724:9c4cc3a4</code>
 +
|<code>a45e768c:246aca55:1c012e56:58dd3958</code>
 +
|<code>a5690093:5c58a8d9:ac966bcf:a00660c2</code>
 +
|-
 +
!Members
 +
|<code>/dev/sda1</code> <br/>
 +
<code>/dev/sdb1</code>
 +
|<code>/dev/sdc2</code> <br/>
 +
<code>/dev/sdd2</code>
 +
|<code>/dev/sda2</code> <br/>
 +
<code>/dev/sdb2</code> <br/>
 +
<code>/dev/sdc1</code> <br/>
 +
<code>/dev/sdd1</code>
 +
|<code>/dev/sda3</code> <br/>
 +
<code>/dev/sdb3</code> <br/>
 +
<code>/dev/sdc3</code> <br/>
 +
<code>/dev/sdd3</code>
 +
|}
  
  
== Repairing the root RAID device ==
+
== Repairing /dev/md0 (root RAID device) ==
  
To find out which array member is missing, let's determine which one isn't:
+
We've already determined that <code>/dev/sdb1</code> is present based on the <code>/proc/mdstat</code> output included with the Problem Report. We've also determined that <code>/dev/sda1</code> is the missing device in the RAID 1 pair based on the matching RAID UUIDs from the <code>mdadm --examine</code> output.
  
 +
At any time we can confirm this by looking at the contents of <code>proc/mdstat</code> as if it were a regular text file.
  
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
Line 200: Line 356:
 
</pre>
 
</pre>
  
So, it appears that <code>/dev/sda1</code> needs to be added back.
+
This shows that <code>/dev/md0</code> is still active and missing the other array member.
 +
 
 +
 
 +
Let's add it back:
  
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
Line 206: Line 365:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Snippet from <code>/var/log/messages</code> related to the last command:
+
<pre>
 +
mdadm: re-added /dev/sda1
 +
</pre>
 +
 
 +
 
 +
After waiting a bit (got distracted with something else), let's follow-up and check <code>/var/log/messages</code> and <code>/proc/mdstat</code> to see what the status of the <code>/dev/md0</code> RAID device is.
 +
 
 +
Snippet from <code>/var/log/messages</code>:
  
 
<pre>
 
<pre>
Line 227: Line 393:
  
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
tail /proc/mdstat
+
cat /proc/mdstat
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 
<pre>
 
<pre>
 +
md2 : active raid5 sdd1[3] sdc1[2] sdb2[1]
 +
2917676544 blocks level 5, 256k chunk, algorithm 2 [4/3] [_UUU]
 +
 
md1 : active raid1 sdd2[1] sdc2[0]
 
md1 : active raid1 sdd2[1] sdc2[0]
 
       8385856 blocks [2/2] [UU]
 
       8385856 blocks [2/2] [UU]
Line 243: Line 412:
 
</pre>
 
</pre>
  
Even with the <code>smartd</code> error, it looks like <code>/dev/md0</code> is holding. We'll have to go back at some point and run <code>fsck</code> on it from a rescue disc so the filesystem isn't mounted while we're trying to verify its consistency.
+
 
 +
Even with the <code>smartd</code> error, it looks like <code>/dev/md0</code> is holding. We'll have to go back at some point and run <code>fsck</code> on it from a rescue disc so the filesystem isn't mounted (you can damage a filesystem if you run <code>fsck</code> on it while it is mounted).
  
  
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
mdadm --misc --detail /dev/md1
+
mdadm --misc --detail /dev/md0
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Line 277: Line 447:
 
       0      8        1        0      active sync  /dev/sda1
 
       0      8        1        0      active sync  /dev/sda1
 
       1      8      17        1      active sync  /dev/sdb1
 
       1      8      17        1      active sync  /dev/sdb1
 
 
</pre>
 
</pre>
  
 +
== Repairing /dev/md3 ==
  
== Repairing md3 ==
+
For this particular server the <code>/dev/md2</code> and <code>/dev/md3</code> RAID devices are used to store content that is accessed pretty infrequently, so we can actually take the system out of service while we're recovering the arrays. RAID allows us to rebuild the arrays but still use them in their degraded state with reduced performance, but since this server is not a high priority server I can take it out of service so the demand on the disks is lighter and helps reduce the likelihood that the drives will die during the reconstruction process. This also speeds up the recovery process.
  
For this particular server the /dev/md2 and /dev/md3 RAID devices are used to store content that is accessed pretty infrequently, so we can actually take the system out of service while we're recovering the arrays. RAID allows us to rebuild the arrays but still use them in their degraded state with reduced performance, but it is important to understand that rebuilding an array puts a high demand on all RAID members that are used to restore content. In my case I'm able to take the server out of service to allow for faster restoration of the arrays.
 
  
 
=== (Optional) Stopping services that are configured to use the RAID devices ===
 
=== (Optional) Stopping services that are configured to use the RAID devices ===
Line 311: Line 480:
 
</pre>
 
</pre>
  
We've now stopped the devices and are ready to begin adding the missing members.
 
  
=== Determining the array members for /dev/md3 ===
+
We've now stopped the devices and are ready to begin adding the missing members. We'll have to force the RAID devices to assemble with missing devices so we can add them back manually.
 +
 
 +
<syntaxhighlight lang="bash">
 +
mdadm --assemble /dev/md3 --uuid=a5690093:5c58a8d9:ac966bcf:a00660c2 --force
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
mdadm: /dev/md3 has been started with 3 drives.
 +
</pre>
 +
 
 +
 
 +
Looking at the contents of <code>/proc/mdstat</code> ...
 +
 
 +
<syntaxhighlight lang="bash">
 +
cat /proc/mdstat | grep md3
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
md3 : active raid5 sdd3[3] sdc3[2] sdb3[1]
 +
</pre>
 +
 
 +
... we can see the three active members of the <code>/dev/md3</code> RAID device. After comparing it against our list of RAID members we jotted down earlier, we can see that <code>/dev/sda3</code> needs to be added back.
 +
 
 +
 
 +
<syntaxhighlight lang="bash">
 +
mdadm --add /dev/md3 /dev/sda3
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
mdadm: re-added /dev/sda3
 +
</pre>
 +
 
 +
 
 +
We can monitor the rebuild progress via the contents of <code>/proc/mdstat</code> or we can take a quick glance at the RAID device with another <code>mdadm</code> command.
 +
 
 +
<syntaxhighlight lang="bash">
 +
mdadm --misc --detail /dev/md3
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
/dev/md3:
 +
        Version : 0.90
 +
  Creation Time : Wed Jul 13 23:04:29 2011
 +
    Raid Level : raid5
 +
    Array Size : 2917700352 (2782.54 GiB 2987.73 GB)
 +
  Used Dev Size : 972566784 (927.51 GiB 995.91 GB)
 +
  Raid Devices : 4
 +
  Total Devices : 4
 +
Preferred Minor : 3
 +
    Persistence : Superblock is persistent
 +
 
 +
    Update Time : Tue Aug  7 14:41:32 2012
 +
          State : clean, degraded, recovering
 +
Active Devices : 3
 +
Working Devices : 4
 +
Failed Devices : 0
 +
  Spare Devices : 1
 +
 
 +
        Layout : left-symmetric
 +
    Chunk Size : 256K
 +
 
 +
Rebuild Status : 21% complete
 +
 
 +
          UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
 +
        Events : 0.2140656
 +
 
 +
    Number  Major  Minor  RaidDevice State
 +
      4      8        3        0      spare rebuilding  /dev/sda3
 +
      1      8      19        1      active sync  /dev/sdb3
 +
      2      8      35        2      active sync  /dev/sdc3
 +
      3      8      51        3      active sync  /dev/sdd3
 +
</pre>
 +
 
 +
 
 +
The rebuild took about 10 hours to complete, so you can imagine how long it would take with a heavy load on the system.
 +
 
 +
Snippet from <code>/proc/mdstat</code>:
 +
 
 +
<pre>
 +
md3 : active raid5 sda3[0] sdd3[3] sdc3[2] sdb3[1]
 +
      2917700352 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
 +
</pre>
 +
 
 +
== Repairing /dev/md2 ==
 +
 
 +
As part of our work earlier with <code>/dev/md3</code>, we have already:
 +
 
 +
* Stopped the service(s) that would access <code>/dev/md2</code>
 +
* Unmounted the <code>/dev/md2</code> RAID device
 +
 
 +
 
 +
We'll now proceed to reassembling <code>/dev/md2</code>.
 +
 
 +
<syntaxhighlight lang="bash">
 +
mdadm --assemble /dev/md2 --uuid=a45e768c:246aca55:1c012e56:58dd3958 --force
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
mdadm: /dev/md2 has been started with 3 drives (out of 4).
 +
</pre>
 +
 
 +
 
 +
As with <code>/dev/md3</code>, looking at the contents of <code>/proc/mdstat</code> ...
 +
 
 +
<syntaxhighlight lang="bash">
 +
cat /proc/mdstat | grep md2
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
md2 : active raid5 sdb2[1] sdd1[3] sdc1[2]
 +
</pre>
 +
 
 +
... we can see the three active members of the <code>/dev/md2</code> RAID device. After comparing it against our list of RAID members we jotted down earlier, we can see that <code>/dev/sda2</code> needs to be added back.
 +
 
 +
 
 +
 
 +
<syntaxhighlight lang="bash">
 +
mdadm --add /dev/md2 /dev/sda2
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
mdadm: re-added /dev/sda2
 +
</pre>
 +
 
 +
 
 +
 
 +
We can take a quick glance at the RAID device with the same <code>mdadm</code command we used for <code>/dev/md3</code>:
 +
 
 +
<syntaxhighlight lang="bash">
 +
mdadm --misc --detail /dev/md2
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
/dev/md2:
 +
        Version : 0.90
 +
  Creation Time : Wed Jul 13 23:51:18 2011
 +
    Raid Level : raid5
 +
    Array Size : 2917676544 (2782.51 GiB 2987.70 GB)
 +
  Used Dev Size : 972558848 (927.50 GiB 995.90 GB)
 +
  Raid Devices : 4
 +
  Total Devices : 4
 +
Preferred Minor : 2
 +
    Persistence : Superblock is persistent
 +
 
 +
    Update Time : Wed Aug  8 10:54:10 2012
 +
          State : clean, degraded, recovering
 +
Active Devices : 3
 +
Working Devices : 4
 +
Failed Devices : 0
 +
  Spare Devices : 1
 +
 
 +
        Layout : left-symmetric
 +
    Chunk Size : 256K
 +
 
 +
Rebuild Status : 0% complete
 +
 
 +
          UUID : a45e768c:246aca55:1c012e56:58dd3958
 +
        Events : 0.2085280
 +
 
 +
    Number  Major  Minor  RaidDevice State
 +
      4      8        2        0      spare rebuilding  /dev/sda2
 +
      1      8      18        1      active sync  /dev/sdb2
 +
      2      8      33        2      active sync  /dev/sdc1
 +
      3      8      49        3      active sync  /dev/sdd1
 +
</pre>
 +
 
 +
 
 +
 
 +
While that displays the Rebuild Status percentage, it doesn't give us an estimated time when the rebuild will finish. Looking at <code>/proc/mdstat</code> we're able to see that the rebuild is expected to take about 12 hours.
 +
 
 +
<syntaxhighlight lang="bash">
 +
cat /proc/mdstat
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
Personalities : [raid1] [raid6] [raid5] [raid4]
 +
md2 : active raid5 sda2[4] sdb2[1] sdd1[3] sdc1[2]
 +
      2917676544 blocks level 5, 256k chunk, algorithm 2 [4/3] [_UUU]
 +
      [>....................]  recovery =  0.5% (5167452/972558848) finish=714.0min speed=22578K/sec
 +
 
 +
md1 : active raid1 sdd2[1] sdc2[0]
 +
      8385856 blocks [2/2] [UU]
 +
 
 +
md3 : active raid5 sda3[0] sdd3[3] sdc3[2] sdb3[1]
 +
      2917700352 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
 +
 
 +
md0 : active raid1 sda1[0] sdb1[1]
 +
      8385792 blocks [2/2] [UU]
 +
 
 +
unused devices: <none>
 +
</pre>
 +
 
 +
Once that completes, we'll need to run <code>fsck</code> on the recovered RAID devices to confirm all is well.
 +
 
 +
== Running fsck to verify RAID devices ==
 +
 
 +
From the <code>fsck</code> man page:
 +
 
 +
<pre>
 +
NAME
 +
      fsck - check and repair a Linux file system
 +
 
 +
SYNOPSIS
 +
      fsck [-lsAVRTMNP] [-C [fd]] [-t fstype] [filesys...]  [--] [fs-specific-options]
 +
 
 +
DESCRIPTION
 +
      fsck  is used to check and optionally repair one or more Linux file systems.  filesys
 +
      can be a device name (e.g.  /dev/hdc1, /dev/sdb2), a mount point (e.g.  /, /usr, /home),
 +
      or an ext2 label or UUID specifier (e.g.  UUID=8868abf6-88c5-4a83-98b8-bfc24057f7bd or
 +
      LABEL=root).  Normally, the fsck program will try to handle filesystems on different
 +
      physical disk drives in parallel to reduce the total amount of time needed to check all
 +
      of the filesystems.
 +
 
 +
      If no filesystems are specified on the command line, and the -A option is not specified,
 +
      fsck will default to checking filesystems in /etc/fstab serially.  This is equivalent to
 +
      the -As options.
 +
</pre>
 +
 
 +
Assuming that all RAID devices were successfully reassembled, we're ready to verify the file systems, but first it is always a good idea to make sure that any device to be scanned with <code>fsck</code> is umounted first.
 +
 
 +
<syntaxhighlight lang="bash">
 +
umount /dev/md2
 +
umount /dev/md3
 +
</syntaxhighlight>
 +
 
 +
 
 +
We'll need to verify the root filesystem from a rescue disc or the install disc in recovery mode, but we can verify <code>/dev/md2</code> and <code>/dev/md3</code> without resorting to that. Since that is where the important data resides, we'll focus on that first before testing the root filesystem.
 +
 
 +
Running <code>fsck</code> on a suspect file system has been described by more than one person as a nerve-wracking experience.
 +
 
 +
Because these filesystems could be marked as clean, we want to force <code>fsck</code> to verify them.
 +
 
 +
<syntaxhighlight lang="bash">
 +
fsck -f /dev/md2
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
fsck 1.39 (29-May-2006)
 +
e2fsck 1.39 (29-May-2006)
 +
Pass 1: Checking inodes, blocks, and sizes
 +
Pass 2: Checking directory structure
 +
Pass 3: Checking directory connectivity
 +
Pass 4: Checking reference counts
 +
Pass 5: Checking group summary information
 +
/cache0: 61134317/729448448 files (0.4% non-contiguous), 343419252/729419136 blocks
 +
</pre>
 +
 
 +
 
 +
That's one down, and one to go:
 +
 
 +
<syntaxhighlight lang="bash">
 +
fsck -f /dev/md3
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
fsck 1.39 (29-May-2006)
 +
e2fsck 1.39 (29-May-2006)
 +
Pass 1: Checking inodes, blocks, and sizes
 +
Pass 2: Checking directory structure
 +
Pass 3: Checking directory connectivity
 +
Pass 4: Checking reference counts
 +
Pass 5: Checking group summary information
 +
/cache1: 68682580/729448448 files (0.2% non-contiguous), 386493899/729425088 blocks
 +
</pre>
 +
 
 +
No issues found, so at this point we could mount <code>/dev/md2</code> and <code>/dev/md3</code> and put this server back into production, but instead I'll boot the system from a rescue disc and run <code>fsck</code> on the root RAID device <code>/dev/md0</code> to verify it is also OK.
 +
 
 +
When booting from the CentOS 5.8 installation DVD I chose the rescue mode (<code>linux rescue</code> I believe) and asked it to skip filesystem detection and just give me a shell.
 +
 
 +
I then ran the <code>mdadm --assemble</code> command with the proper arguments to assemble <code>/dev/md0</code> for a filesystem check.
 +
 
 +
<syntaxhighlight lang="bash">
 +
mdadm --assemble /dev/md0 --uuid=cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
 +
</syntaxhighlight>
 +
 
 +
 
 +
Now that the RAID is assembled, we can run <code>fsck</code> on the RAID device.
 +
 
 +
<syntaxhighlight lang="bash">
 +
fsck -f /dev/md0
 +
</syntaxhighlight>
 +
 
 +
After only a few minutes it finished with no errors. Now it is time to reboot and verify the system is operating normally.
 +
 
 +
 
 +
After rebooting the box, we can see that it is back in service and fully operational:
 +
 
 +
<syntaxhighlight lang="bash">
 +
cat /proc/mdstat
 +
</syntaxhighlight>
 +
 
 +
<pre>
 +
Personalities : [raid1] [raid6] [raid5] [raid4]
 +
md2 : active raid5 sdd1[3] sdc1[2] sdb2[1] sda2[0]
 +
      2917676544 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
 +
 
 +
md1 : active raid1 sdd2[1] sdc2[0]
 +
      8385856 blocks [2/2] [UU]
 +
 
 +
md3 : active raid5 sdd3[3] sdc3[2] sdb3[1] sda3[0]
 +
      2917700352 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
 +
 
 +
md0 : active raid1 sdb1[1] sda1[0]
 +
      8385792 blocks [2/2] [UU]
 +
 
 +
unused devices: <none>
 +
</pre>

Latest revision as of 15:21, 6 November 2012


These are my scratch notes for recovering Software RAID arrays on a GNU/Linux box. The examples here are for a CentOS 5.x box (mdadm version v2.6.9, RAID metadata version 0.90), but presumably any recent GNU/Linux distro could be used that has support for Software RAID via mdadm. In case it's not clear, I'm a newbie when it comes to Software RAID, so some of these steps may be redundant or nonsensical. If so, please feel free to point that out so I can make this easier to read.


References

Used in this guide

Used by me, but not referred to in this particular guide

Misc Notes

RAID 5

RAID 5 doesn't support multiple disks going out. After more than one goes out a RAID 5 device will shut down.

mdadm.conf file

Keep a safe copy of your mdadm.conf file. If your Software RAID arrays are healthy and you don't have a mdadm.conf file, you can create a new one by using the mdadm command to generate a new one.

Depending on your distro, you'll either use this line:

mdadm --detail --scan > /etc/mdadm.conf

or this one:

mdadm --detail --scan > /etc/mdadm/mdadm.conf


The problem report

This started off with me receiving emails from mdadm (that was monitoring three RAID devices on a 1U server with 4 physical disks) that there was a DegradedArray event on md device /dev/md0. Further directions on this page make heavy use of the mdadm tool for administrative tasks dealing with GNU/Linux Software RAID devices.

This is an automatically generated mail message from mdadm running on server.example.org

A DegradedArray event had been detected on md device /dev/md0.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid5 sdd1[3] sdc1[2] sdb2[1]
2917676544 blocks level 5, 256k chunk, algorithm 2 [4/3] [_UUU]

md1 : active raid1 sdd2[1] sdc2[0]
8385856 blocks [2/2] [UU]

md3 : active raid5 sdd3[3] sdc3[2] sdb3[1]
2917700352 blocks level 5, 256k chunk, algorithm 2 [4/3] [_UUU]

md0 : active raid1 sdb1[1]
8385792 blocks [2/1] [_U]

unused devices: <none>


In the output above, anywhere you see an underscore it represents a failed member of the RAID array. This means that the following RAID devices need repair:

  • /dev/md0
  • /dev/md2
  • /dev/md3


Gathering Information

We won't need it right away, but it's a good idea to ahead and gather some basic system information use for reference later while restoring the RAID devices.


mdadm.conf contents

cat /etc/mdadm.conf
DEVICE partitions
MAILADDR root
ARRAY /dev/md0 level=raid1 num-devices=2 uuid=cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
ARRAY /dev/md3 level=raid5 num-devices=4 uuid=a5690093:5c58a8d9:ac966bcf:a00660c2
ARRAY /dev/md2 level=raid5 num-devices=4 uuid=a45e768c:246aca55:1c012e56:58dd3958
ARRAY /dev/md1 level=raid1 num-devices=2 uuid=183e0f5d:2ac92a56:f064a724:9c4cc3a4


Partitions list

fdisk -l
Disk /dev/sda: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        1044     8385898+  fd  Linux raid autodetect
/dev/sda2            1045      122122   972559035   fd  Linux raid autodetect
/dev/sda3          122123      243201   972567067+  fd  Linux raid autodetect

Disk /dev/sdb: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1        1044     8385898+  fd  Linux raid autodetect
/dev/sdb2            1045      122122   972559035   fd  Linux raid autodetect
/dev/sdb3          122123      243201   972567067+  fd  Linux raid autodetect

Disk /dev/sdc: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1   *           1      121078   972559003+  fd  Linux raid autodetect
/dev/sdc2          121079      122122     8385930   fd  Linux raid autodetect
/dev/sdc3          122123      243201   972567067+  fd  Linux raid autodetect

Disk /dev/sdd: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *           1      121078   972559003+  fd  Linux raid autodetect
/dev/sdd2          121079      122122     8385930   fd  Linux raid autodetect
/dev/sdd3          122123      243201   972567067+  fd  Linux raid autodetect

Notice a pattern in the partition sizes?


RAID device composition

Based on a previous conversation with another tech, I knew that a RAID device could be composed of entire disks or partitions from multiple disks. The advantage of using partitions instead of entire disks is the ease in which you can satisfy the requirement that all RAID members be the same size. In this case, partitions were used to assemble the RAID devices instead of entire disks.


Determining what disks or partitions a RAID device is composed of

We can determine RAID device members via several different methods, each one with varying degrees of reliability.

  • Based on block size alone
  • mdadm --examine /dev/DEVICE
    
  • mdadm --misc --detail /dev/DEVICE
    

We can also combine these as shown below.


Finding array members based on block size

In our output earlier we saw that /dev/sdb1 is part of /dev/md0 which is a RAID 1 array that is composed of two disks. Since /dev/sdb1 is present, we know we're only missing one other disk.

Since RAID devices require identically sized array members, I realized that to find other array members I could determine the block size of one member and use that to find matching partitions/disk. So if /dev/sdb1 as the remaining member of the /dev/md0 array has a block size of 8385898, the other member would also need to have the same block size. This wouldn't guarantee that all partitions/disks in the list would be a direct match, but the likelihood would be high.

Because we have all arrays active, we have a listing of the remaining members (/proc/mdstat contents in the problem report we received via email). We can use that information to find candidates for the matching members.

fdisk -l | grep 8385898
/dev/sda1   *           1        1044     8385898+  fd  Linux raid autodetect
/dev/sdb1   *           1        1044     8385898+  fd  Linux raid autodetect

It would appear that /dev/sda1</code is the missing member of the /dev/md0 RAID device, but let's confirm that.


Finding array members using mdadm --examine

First we look back at the contents of mdadm.conf and see that for /dev/md0 we have this line:

ARRAY /dev/md0 level=raid1 num-devices=2 uuid=cbae8de5:892d4ac9:c1cb8fb2:5f4ab019

That UUID is the identifier for that particular RAID device and is written to a special superblock on all disks participating in the array. This is the default behavior for Software RAID arrays now; it used to be optional. Using that UUID we can verify that one of the identically sized disk/partitions we suspect of being a RAID device member is truly a member.

For example, let's start with getting the UUID for /dev/sdb1:

mdadm --examine /dev/sdb1

Snippet from the output:

/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
  Creation Time : Wed Jul 13 23:04:19 2011
     Raid Level : raid1
  Used Dev Size : 8385792 (8.00 GiB 8.59 GB)
     Array Size : 8385792 (8.00 GiB 8.59 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Wed Aug  8 06:35:41 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : be673899 - correct
         Events : 4493518


That UUID matches the corresponding line in mdadm.conf as we suspected.

mdadm --examine /dev/sda1


/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
  Creation Time : Wed Jul 13 23:04:19 2011
     Raid Level : raid1
  Used Dev Size : 8385792 (8.00 GiB 8.59 GB)
     Array Size : 8385792 (8.00 GiB 8.59 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Wed Aug  8 06:35:41 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : be673887 - correct
         Events : 4493518


The UUID for /dev/sda1 matches as well, so we've found our missing member for the /dev/md0 RAID device.


There is also a quicker way to make comparisons, assuming that all of our RAID device members are mapped to /dev/sdX devices:

for dev  in $(ls /dev/sd*); do mdadm --examine $dev | grep UUID; done
mdadm: No md superblock detected on /dev/sda.
           UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
           UUID : a45e768c:246aca55:1c012e56:58dd3958
           UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
mdadm: No md superblock detected on /dev/sdb.
           UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
           UUID : a45e768c:246aca55:1c012e56:58dd3958
           UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
mdadm: No md superblock detected on /dev/sdc.
           UUID : a45e768c:246aca55:1c012e56:58dd3958
           UUID : 183e0f5d:2ac92a56:f064a724:9c4cc3a4
           UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
mdadm: No md superblock detected on /dev/sdd.
           UUID : a45e768c:246aca55:1c012e56:58dd3958
           UUID : 183e0f5d:2ac92a56:f064a724:9c4cc3a4
           UUID : a5690093:5c58a8d9:ac966bcf:a00660c2

The No md superblock detected on lines help to mark which disk we're looking at, and all following lines show the UUIDs for the partitions. We can visually see in this case that /dev/sda1 and /dev/sdb1 are members of the same array. By looking at the contents of mdadm.conf, we can tell which one.

We now have the information we need to attempt a repair of /dev/md0:

RAID Device /dev/md0
RAID UUID cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
Members /dev/sda1, /dev/sdb1


However, since we've collected the information for /dev/md0, we might as well collect the rest of the information for the other RAID device members.

RAID Device members
RAID Device /dev/md0 /dev/md1 /dev/md2 /dev/md3
RAID UUID cbae8de5:892d4ac9:c1cb8fb2:5f4ab019 183e0f5d:2ac92a56:f064a724:9c4cc3a4 a45e768c:246aca55:1c012e56:58dd3958 a5690093:5c58a8d9:ac966bcf:a00660c2
Members /dev/sda1

/dev/sdb1

/dev/sdc2

/dev/sdd2

/dev/sda2

/dev/sdb2
/dev/sdc1
/dev/sdd1

/dev/sda3

/dev/sdb3
/dev/sdc3
/dev/sdd3


Repairing /dev/md0 (root RAID device)

We've already determined that /dev/sdb1 is present based on the /proc/mdstat output included with the Problem Report. We've also determined that /dev/sda1 is the missing device in the RAID 1 pair based on the matching RAID UUIDs from the mdadm --examine output.

At any time we can confirm this by looking at the contents of proc/mdstat as if it were a regular text file.

cat /proc/mdstat | grep md0
md0 : active raid1 sdb1[1]

This shows that /dev/md0 is still active and missing the other array member.


Let's add it back:

mdadm --add /dev/md0 /dev/sda1
mdadm: re-added /dev/sda1


After waiting a bit (got distracted with something else), let's follow-up and check /var/log/messages and /proc/mdstat to see what the status of the /dev/md0 RAID device is.

Snippet from /var/log/messages:

Aug  7 14:16:00 lockss1 kernel: md: bind<sda1>
Aug  7 14:16:00 lockss1 kernel: RAID1 conf printout:
Aug  7 14:16:00 lockss1 kernel:  --- wd:1 rd:2
Aug  7 14:16:00 lockss1 kernel:  disk 0, wo:1, o:1, dev:sda1
Aug  7 14:16:00 lockss1 kernel:  disk 1, wo:0, o:1, dev:sdb1
Aug  7 14:16:00 lockss1 kernel: md: syncing RAID array md0
Aug  7 14:16:00 lockss1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Aug  7 14:16:00 lockss1 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Aug  7 14:16:00 lockss1 kernel: md: using 128k window, over a total of 8385792 blocks.
Aug  7 14:22:31 lockss1 kernel: md: md0: sync done.
Aug  7 14:22:31 lockss1 kernel: RAID1 conf printout:
Aug  7 14:22:31 lockss1 kernel:  --- wd:2 rd:2
Aug  7 14:22:31 lockss1 kernel:  disk 0, wo:0, o:1, dev:sda1
Aug  7 14:22:31 lockss1 kernel:  disk 1, wo:0, o:1, dev:sdb1
Aug  7 14:26:10 lockss1 smartd[3387]: Device: /dev/sda, 1 Currently unreadable (pending) sectors
cat /proc/mdstat
md2 : active raid5 sdd1[3] sdc1[2] sdb2[1]
2917676544 blocks level 5, 256k chunk, algorithm 2 [4/3] [_UUU]

md1 : active raid1 sdd2[1] sdc2[0]
      8385856 blocks [2/2] [UU]

md3 : active raid5 sdd3[3] sdc3[2] sdb3[1]
      2917700352 blocks level 5, 256k chunk, algorithm 2 [4/3] [_UUU]

md0 : active raid1 sda1[0] sdb1[1]
      8385792 blocks [2/2] [UU]

unused devices: <none>


Even with the smartd error, it looks like /dev/md0 is holding. We'll have to go back at some point and run fsck on it from a rescue disc so the filesystem isn't mounted (you can damage a filesystem if you run fsck on it while it is mounted).


mdadm --misc --detail /dev/md0

As we can see, /dev/md0 has been restored to service.

/dev/md0:
        Version : 0.90
  Creation Time : Wed Jul 13 23:04:19 2011
     Raid Level : raid1
     Array Size : 8385792 (8.00 GiB 8.59 GB)
  Used Dev Size : 8385792 (8.00 GiB 8.59 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Aug  7 17:50:16 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : cbae8de5:892d4ac9:c1cb8fb2:5f4ab019
         Events : 0.4493518

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1

Repairing /dev/md3

For this particular server the /dev/md2 and /dev/md3 RAID devices are used to store content that is accessed pretty infrequently, so we can actually take the system out of service while we're recovering the arrays. RAID allows us to rebuild the arrays but still use them in their degraded state with reduced performance, but since this server is not a high priority server I can take it out of service so the demand on the disks is lighter and helps reduce the likelihood that the drives will die during the reconstruction process. This also speeds up the recovery process.


(Optional) Stopping services that are configured to use the RAID devices

service stop SERVICE_NAME
umount /dev/md2
umount /dev/md3

Unmounting the file systems should stop services/daemons from hitting the RAID devices we're trying to repair.

mdadm --stop /dev/md2
mdadm: stopped /dev/md2
mdadm --stop /dev/md3
mdadm: stopped /dev/md3


We've now stopped the devices and are ready to begin adding the missing members. We'll have to force the RAID devices to assemble with missing devices so we can add them back manually.

mdadm --assemble /dev/md3 --uuid=a5690093:5c58a8d9:ac966bcf:a00660c2 --force
mdadm: /dev/md3 has been started with 3 drives.


Looking at the contents of /proc/mdstat ...

cat /proc/mdstat | grep md3
md3 : active raid5 sdd3[3] sdc3[2] sdb3[1]

... we can see the three active members of the /dev/md3 RAID device. After comparing it against our list of RAID members we jotted down earlier, we can see that /dev/sda3 needs to be added back.


mdadm --add /dev/md3 /dev/sda3
mdadm: re-added /dev/sda3


We can monitor the rebuild progress via the contents of /proc/mdstat or we can take a quick glance at the RAID device with another mdadm command.

mdadm --misc --detail /dev/md3
/dev/md3:
        Version : 0.90
  Creation Time : Wed Jul 13 23:04:29 2011
     Raid Level : raid5
     Array Size : 2917700352 (2782.54 GiB 2987.73 GB)
  Used Dev Size : 972566784 (927.51 GiB 995.91 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Tue Aug  7 14:41:32 2012
          State : clean, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 256K

 Rebuild Status : 21% complete

           UUID : a5690093:5c58a8d9:ac966bcf:a00660c2
         Events : 0.2140656

    Number   Major   Minor   RaidDevice State
       4       8        3        0      spare rebuilding   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3


The rebuild took about 10 hours to complete, so you can imagine how long it would take with a heavy load on the system.

Snippet from /proc/mdstat:

md3 : active raid5 sda3[0] sdd3[3] sdc3[2] sdb3[1]
      2917700352 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]

Repairing /dev/md2

As part of our work earlier with /dev/md3, we have already:

  • Stopped the service(s) that would access /dev/md2
  • Unmounted the /dev/md2 RAID device


We'll now proceed to reassembling /dev/md2.

mdadm --assemble /dev/md2 --uuid=a45e768c:246aca55:1c012e56:58dd3958 --force
mdadm: /dev/md2 has been started with 3 drives (out of 4).


As with /dev/md3, looking at the contents of /proc/mdstat ...

cat /proc/mdstat | grep md2
md2 : active raid5 sdb2[1] sdd1[3] sdc1[2]

... we can see the three active members of the /dev/md2 RAID device. After comparing it against our list of RAID members we jotted down earlier, we can see that /dev/sda2 needs to be added back.


mdadm --add /dev/md2 /dev/sda2
mdadm: re-added /dev/sda2


We can take a quick glance at the RAID device with the same mdadm</code command we used for /dev/md3:

mdadm --misc --detail /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Wed Jul 13 23:51:18 2011
     Raid Level : raid5
     Array Size : 2917676544 (2782.51 GiB 2987.70 GB)
  Used Dev Size : 972558848 (927.50 GiB 995.90 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Wed Aug  8 10:54:10 2012
          State : clean, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 256K

 Rebuild Status : 0% complete

           UUID : a45e768c:246aca55:1c012e56:58dd3958
         Events : 0.2085280

    Number   Major   Minor   RaidDevice State
       4       8        2        0      spare rebuilding   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1


While that displays the Rebuild Status percentage, it doesn't give us an estimated time when the rebuild will finish. Looking at /proc/mdstat we're able to see that the rebuild is expected to take about 12 hours.

cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid5 sda2[4] sdb2[1] sdd1[3] sdc1[2]
      2917676544 blocks level 5, 256k chunk, algorithm 2 [4/3] [_UUU]
      [>....................]  recovery =  0.5% (5167452/972558848) finish=714.0min speed=22578K/sec

md1 : active raid1 sdd2[1] sdc2[0]
      8385856 blocks [2/2] [UU]

md3 : active raid5 sda3[0] sdd3[3] sdc3[2] sdb3[1]
      2917700352 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]

md0 : active raid1 sda1[0] sdb1[1]
      8385792 blocks [2/2] [UU]

unused devices: <none>

Once that completes, we'll need to run fsck on the recovered RAID devices to confirm all is well.

Running fsck to verify RAID devices

From the fsck man page:

NAME
       fsck - check and repair a Linux file system

SYNOPSIS
       fsck [-lsAVRTMNP] [-C [fd]] [-t fstype] [filesys...]  [--] [fs-specific-options]

DESCRIPTION
       fsck  is used to check and optionally repair one or more Linux file systems.  filesys 
       can be a device name (e.g.  /dev/hdc1, /dev/sdb2), a mount point (e.g.  /, /usr, /home), 
       or an ext2 label or UUID specifier (e.g.  UUID=8868abf6-88c5-4a83-98b8-bfc24057f7bd or 
       LABEL=root).  Normally, the fsck program will try to handle filesystems on different 
       physical disk drives in parallel to reduce the total amount of time needed to check all 
       of the filesystems.

       If no filesystems are specified on the command line, and the -A option is not specified, 
       fsck will default to checking filesystems in /etc/fstab serially.  This is equivalent to
       the -As options.

Assuming that all RAID devices were successfully reassembled, we're ready to verify the file systems, but first it is always a good idea to make sure that any device to be scanned with fsck is umounted first.

umount /dev/md2
umount /dev/md3


We'll need to verify the root filesystem from a rescue disc or the install disc in recovery mode, but we can verify /dev/md2 and /dev/md3 without resorting to that. Since that is where the important data resides, we'll focus on that first before testing the root filesystem.

Running fsck on a suspect file system has been described by more than one person as a nerve-wracking experience.

Because these filesystems could be marked as clean, we want to force fsck to verify them.

fsck -f /dev/md2
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/cache0: 61134317/729448448 files (0.4% non-contiguous), 343419252/729419136 blocks


That's one down, and one to go:

fsck -f /dev/md3
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/cache1: 68682580/729448448 files (0.2% non-contiguous), 386493899/729425088 blocks

No issues found, so at this point we could mount /dev/md2 and /dev/md3 and put this server back into production, but instead I'll boot the system from a rescue disc and run fsck on the root RAID device /dev/md0 to verify it is also OK.

When booting from the CentOS 5.8 installation DVD I chose the rescue mode (linux rescue I believe) and asked it to skip filesystem detection and just give me a shell.

I then ran the mdadm --assemble command with the proper arguments to assemble /dev/md0 for a filesystem check.

mdadm --assemble /dev/md0 --uuid=cbae8de5:892d4ac9:c1cb8fb2:5f4ab019


Now that the RAID is assembled, we can run fsck on the RAID device.

fsck -f /dev/md0

After only a few minutes it finished with no errors. Now it is time to reboot and verify the system is operating normally.


After rebooting the box, we can see that it is back in service and fully operational:

cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid5 sdd1[3] sdc1[2] sdb2[1] sda2[0]
      2917676544 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]

md1 : active raid1 sdd2[1] sdc2[0]
      8385856 blocks [2/2] [UU]

md3 : active raid5 sdd3[3] sdc3[2] sdb3[1] sda3[0]
      2917700352 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]

md0 : active raid1 sdb1[1] sda1[0]
      8385792 blocks [2/2] [UU]

unused devices: <none>