MythTV Community Forum - MythTV talk.com

Go Back   MythTV Community Forum - MythTV talk.com > MythTV > General
Want to give back to the MythTV Community? Help answer threads with 0 replies.


Reply
 
LinkBack Thread Tools Display Modes
  #1  
Old 03-04-2010, 08:35 AM
MythTV Friend in Training
 
Join Date: Mar 2010
Posts: 5
Default xfs raid: trial barrier write failed - is it xfs problem or raid problem?

Hi every one - I've been using Myth for about 7 years now and I just set up a new "serious" backend with 4x1.5TB Samsung hdds to be set up as a raid6 with expansion planned to 6 hdds in my Silverstone LC17 box.

I normally solve all my problems with google so this is my first post here... but long time reader.

My system: running Ubuntu 9.10 (Karmic) on an i5-750 quad core. This is not a fresh install, it is an Ubuntu 9.04 system, upgraded to 9.10 (Karmic), then copied from my old system to the new and running lilo (not grub2) to cope with a raid boot.

Over 4 drives, I wanted a 200MB ext3 raid1 /boot parition (/dev/md10), a 20GB ext3 raid6 partition (/dev/md6) for root, and a 2.9TB xfs raid6 partition (/dev/md8 ) for "storage" eg recordings, videos, music & photos. All partitions were initally installed as raid 1 over 2 drives (as I only had 2 new drives to begin with). I then acquired a further 2 drives and did an mdadm --grow to raid6 over 4 drives, using the latest mdadm 3.1 from Converting RAID5 to RAID6 and other shape changing in md/raid - this version 3.1 of mdadm allows growth of a raid 1 to raid 5/6 and growth of raid 5/6 to more disks.

(In case you're wondering why raid6? Raid5 doesn't offer enough protection in a system I never want to go down - I've lost one Myth backend before due HDD failure and it took many weeks to get it back to how I had it! And raid10 is still not expandable - even under mdadm 3.1. Raid6 over 6 drives (my goal) is 66% space-efficient and can suffer two drive failures. I could even go to more drives in future.)

Anyway, it all seemed to work perfectly - after reshaping (which took a couple of days), I was successfully running raid6 on 4 drives on both root and storage partitions until ... the system failed to shutdown cleanly... had to hard-reset... on reboot the xfs partition (/dev/md8 ) failed to mount and when I looked at the syslog it was reporting (prior to the shutdown event, hundreds of times):

Code:
"Filesystem "md8": xfs_log_force: error 5 returned."
syslog was reporting during the boot phase (dozens of times):

Code:
 Filesystem "md8": Disabling barriers, trial barrier write failed
XFS mounting filesystem md8
Ending clean XFS mount for filesystem: md8
Starting XFS recovery on filesystem: md8 (logdev: internal)
Failed to recover EFIs on filesystem: md8
I tried xfs_repair and it did successfully repair the filesystem but only after running xfs_repair -L (dump logfile - "some filesystem corruption may result"). No filesystem corruption was evident and I hoped this would be the end of my troubles but several hours later the machine froze - had to hard-reset again and md8 again failed to mount with the same messages in the syslog.

This time xfs_repair can't even get through stage 1 - it now reports:

Code:
$ sudo xfs_repair -n  /dev/md8
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval 0

fatal error -- Invalid argument
Code:
$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10] 

md8 : inactive sdc8[4](S) sdd8[0](S) sde8[1](S)
      4356152767 blocks super 1.2
       
md6 : active raid6 sdc6[4] sdd6[0] sdf6[3] sde6[1]
      19534592 blocks super 1.2 level 6, 128k chunk, algorithm 18 [4/4] [UUUU]
      
md10 : active raid1 sdd1[0] sdc1[3] sde1[1] sdf1[2]
      192640 blocks [4/4] [UUUU]
In the above, md8 is the xfs partition, md6 is the ext3 root partition and md10 is a raid1 /boot partition.

I don't know what's going with on the the mdstat for md8 - I thought this was only an xfs issue but it appears that only 3 of the 4 drives have been recognised by mdadm and they are all listed as spares and the array is listed as inactive.

Now, it's fair to report that the computer was rebooted a few times during the raid1-raid6 reshaping and during this time about 900GB was copied from my old backup drive to the new array - maybe I overstressed what is a beta version of mdadm 3.1. I used exactly the same process to create md6 (root) and it hasn't missed a beat - but it also wasn't stressed with reboots and huge file activity during reshaping.

So I don't know if this is an mdadm 3.1 reshaping problem (the array seemed to be clean and working perfectly once reshaping had finished) or an xfs problem. I suspected an xfs problem but the mdstat reporting only 3 of the 4 drives has me suspicious now. I'm tempted to try re-creating the array as raid6 over 4 drives using only ext3 (forget xfs) and hopefully all problems will go away - but would love any clues as to the source of my problems - is it purely an xfs issue or is it an mdadm/raid issue?

Cheers, max
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2  
Old 03-04-2010, 09:59 AM
MythTV Friend in Training
 
Join Date: Mar 2010
Posts: 5
Default

Further attempts to re-create the array:

Code:
$ sudo mdadm --examine /dev/md8
mdadm: No md superblock detected on /dev/md8.

$ sudo mdadm --detail /dev/md8
mdadm: md device /dev/md8 does not appear to be active.

$ sudo mdadm --assemble /dev/md8 /dev/sdd8 /dev/sde8 /dev/sdf8 /dev/sdc8
mdadm: cannot open device /dev/sdd8: Device or resource busy
mdadm: /dev/sdd8 has no superblock - assembly aborted

$ sudo mdadm --create /dev/md8 --level=6 --raid-devices=4 --chunk=8192 /dev/sdd8 /dev/sde8 /dev/sdf8 /dev/sdc8
mdadm: device /dev/sdd8 not suitable for any style of array

$ sudo mdadm --examine --scan /dev/sdd8
ARRAY /dev/md/8 metadata=1.2 UUID=8aab3137:53541c75:f77e666c:9731b8eb name=onigiri:8

$ sudo mdadm --examine --scan /dev/sdc8
ARRAY /dev/md/8 metadata=1.2 UUID=8aab3137:53541c75:f77e666c:9731b8eb name=onigiri:8

$ sudo mdadm --examine --scan /dev/sde8
ARRAY /dev/md/8 metadata=1.2 UUID=8aab3137:53541c75:f77e666c:9731b8eb name=onigiri:8

$ sudo mdadm --examine --scan /dev/sdf8

$ sudo mdadm /dev/md8 -f /dev/sdf8
mdadm: cannot get array info for /dev/md8
So it appears the raid6 array is corrupt and /dev/sdf8 is the culprit (no superblock? it won't return a UUID) - but I can't even fail that drive and have it rebuild!?

I still can't help feeling that xfs is involved here but it also seems to be a raid problem.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3  
Old 03-10-2010, 06:58 AM
MythTV Helper
 
Join Date: Nov 2009
Location: Louisville, KY
Posts: 120
Default

I've always used whole drives for my raid and never split them about like that. I'm sure that had something to do with the error. I'm surprised you can't fail out the /sdf8. Since it's not mounted have you tried doing a fdisk /dev/sdf8, removing the partitition and superblock, readd them and then format it? I would think after a reboot it would allow you to remove then readd to the array.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4  
Old 03-10-2010, 12:44 PM
MythTV Friend in Training
 
Join Date: Mar 2010
Posts: 5
Default

blackoper, thanks very much for the response. I have managed to finally re-create the array after executing a mdamd --stop command (which I hadn't done before the above post) - but I still don't understand why I couldn't fail the sdf8.

I got the array re-created and waited many hours for it to re-build, and was ever hopeful that problems were over. Unfortunately, after waiting for the re-build of the array the xfs filesystem was not recoverable. xfs_repair could do nothing with it (failed at phase 1). So I have re-formatted the array now as ext3 (going to try again without xfs) and restored data from my backup (only a couple of weeks of tv-recordings lost, no big deal).

The new array is running fine as ext3 under raid6. Therefore I have removed xfs from the equation of what is upsetting my current myth backend.

I'm now still having issues with myth backend crashing the entire system after a few hours. This has been happening since the upgrade from Ubuntu 9.04 to 9.10 (with a concurrent upgrade from Myth 0.21 to 0.22) - it's these crashes that have probably been causing my filesystem/raid problems. Anyway, that's the subject of another thread which I will post soon.

mdadm 3.1 has introduced a new funny thing that now my array refuses to name itself as md8 (even though I have done an mdadm --examine --scan /dev/sdc8 >> /etc/mdadm/mdadm.conf which has put the new UUID of the array in mdadm.conf) - it insists on naming itself md127. After hours of trying to fix that I have given up and just changed my fstab to mount md127 instead of md8. Mounting the UUID of the array in fstab doesn't work at all. Lots of stuff going on that I don't understand. The price of trying to use the development version of mdadm just so I have the opportunity to expand the array I guess.

You said you use whole drives for your array - does that mean you use LVM on top of raid to create pseudo-partitions, or you don't create partitions at all? LVM wouldn't help me because I'm using raid6 and I need to boot from the array. You can't boot from raid6 so I needed either a non-raid /boot partition or a raid1 /boot partition. (I chose raid1 - it works with lilo and is supposed to work with grub2 (so I've read) but I couldn't make it work with grub2).

I guess I could install an additional disk and use that for /boot (or a usb stick - but how to boot from usb to a whole-disk raid6 system is beyond my knowledge at this time). If I did install an additional disk for /boot then I could make the remaining drives whole-drive raid. It's important to note that I want my system's boot ability to be resilient to hard-disk (or usb) failure - I'm trying to build this system to be an "appliance" - it will continue to work after hard disk failure. My wife needs to be able to reboot it after a disk failure when I'm on the other side of the world (primary objective!)

I've heard it said that raid is designed for disk failure, not power failure. Because my myth backend is constantly crashing the system since my upgrade, that is possibly akin to power failure. Therefore the raid problems. I have recently learned to reboot the system after total freeze via ALT-PRTSCRN-REISUB sysrq keyboard commands rather than pressing the reset button ( (something new to me - and I'm training my wife to do same!) but that doesn't always work - when myth crashes my system it really crashes it - probably a hardware issue related to the new kernel talking to my old analog LeadTek WinFast 2000XP capture card, but again, that is the subject of another thread - which I will post when I have enough time to provide enough information of what's happening.

Thanks again for your response.

Cheers, max

Last edited by maxw; 03-10-2010 at 12:57 PM.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Similar Threads
Thread Thread Starter Forum Replies Last Post
Raid Card Failure = 14hours+ mayhem... Coume MythTVtalk News and Announcements 1 06-14-2007 05:31 AM
Mounting a RAID volume Retr0 General 3 05-06-2007 04:15 PM
nVidia SATA RAID Controllers fromans4 General 0 03-03-2007 07:16 AM
raid simon Hardware 3 07-27-2004 05:36 PM
Serial ATA and RAID performance comparison digitalboy Hardware 1 06-15-2004 11:55 AM

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT. The time now is 08:22 AM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.2