Jump to content

Upgraded mobo/cpu and half of array is unmountable


gshipley

Recommended Posts

Hello all:

 

I built my system 6 years ago and some of my drives have been in service since day one.  I recently upgraded the mobo etc in order to run docker containers better but when powering on the system half of the array is unmountable.  It sees the drives and they are green.

 

I have two 8 port sata add-on cards so I my guess is one of them got fried somehow.  I went ahead and ordered a new one but thought I would get a sanity check from the log to see if you folks things this is indeed the case.

 

A lot of read errors on the unmountable drives so I assume all 7 drives didn't go bad and it must be the add-on card.

 

syslog attached.

hog-syslog-20150730-2342.zip

Link to comment

Here are some relevant messages from the log:

 

Jul 30 23:37:00 Hog emhttp: shcmd (170): mkdir -p /mnt/disk6

Jul 30 23:37:00 Hog emhttp: shcmd (171): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md6 /mnt/disk6 |& logger

Jul 30 23:37:00 Hog kernel: REISERFS (device md6): found reiserfs format "3.6" with standard journal

Jul 30 23:37:00 Hog kernel: REISERFS (device md6): using ordered data mode

Jul 30 23:37:00 Hog kernel: reiserfs: using flush barriers

Jul 30 23:37:00 Hog kernel: REISERFS warning (device md6): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK?

Jul 30 23:37:00 Hog kernel: REISERFS warning (device md6): sh-2022 reiserfs_fill_super: unable to initialize journal space

Jul 30 23:37:00 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md6,

Jul 30 23:37:00 Hog logger:        missing codepage or helper program, or other error

Jul 30 23:37:00 Hog logger:        In some cases useful info is found in syslog - try

Jul 30 23:37:00 Hog logger:        dmesg | tail  or so

Jul 30 23:37:00 Hog logger:

Jul 30 23:37:00 Hog emhttp: shcmd: shcmd (171): exit status: 32

Jul 30 23:37:00 Hog emhttp: mount error: No file system (32)

Jul 30 23:37:00 Hog emhttp: shcmd (172): rmdir /mnt/disk6

Jul 30 23:37:00 Hog emhttp: shcmd (173): mkdir -p /mnt/disk7

Jul 30 23:37:00 Hog emhttp: shcmd (174): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md7 /mnt/disk7 |& logger

Jul 30 23:37:00 Hog kernel: REISERFS (device md7): found reiserfs format "3.6" with standard journal

Jul 30 23:37:00 Hog kernel: REISERFS (device md7): using ordered data mode

Jul 30 23:37:00 Hog kernel: reiserfs: using flush barriers

Jul 30 23:37:00 Hog kernel: REISERFS warning (device md7): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK?

Jul 30 23:37:00 Hog kernel: REISERFS warning (device md7): sh-2022 reiserfs_fill_super: unable to initialize journal space

Jul 30 23:37:00 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md7,

Jul 30 23:37:00 Hog logger:        missing codepage or helper program, or other error

Jul 30 23:37:00 Hog logger:        In some cases useful info is found in syslog - try

Jul 30 23:37:00 Hog logger:        dmesg | tail  or so

Jul 30 23:37:00 Hog logger:

Jul 30 23:37:00 Hog emhttp: shcmd: shcmd (174): exit status: 32

Jul 30 23:37:00 Hog emhttp: mount error: No file system (32)

Jul 30 23:37:00 Hog emhttp: shcmd (175): rmdir /mnt/disk7

Jul 30 23:37:00 Hog emhttp: shcmd (176): mkdir -p /mnt/disk8

Jul 30 23:37:00 Hog emhttp: shcmd (177): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md8 /mnt/disk8 |& logger

Jul 30 23:37:00 Hog kernel: REISERFS (device md8): found reiserfs format "3.6" with standard journal

Jul 30 23:37:00 Hog kernel: REISERFS (device md8): using ordered data mode

Jul 30 23:37:00 Hog kernel: reiserfs: using flush barriers

Jul 30 23:37:00 Hog kernel: REISERFS warning (device md8): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK?

Jul 30 23:37:00 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md8,

Jul 30 23:37:00 Hog logger:        missing codepage or helper program, or other error

Jul 30 23:37:00 Hog logger:        In some cases useful info is found in syslog - try

Jul 30 23:37:00 Hog logger:        dmesg | tail  or so

Jul 30 23:37:00 Hog logger:

Jul 30 23:37:00 Hog emhttp: shcmd: shcmd (177): exit status: 32

Jul 30 23:37:00 Hog emhttp: mount error: No file system (32)

Jul 30 23:37:00 Hog emhttp: shcmd (178): rmdir /mnt/disk8

Link to comment

I built my system 6 years ago and some of my drives have been in service since day one.  I recently upgraded the mobo etc in order to run docker containers better but when powering on the system half of the array is unmountable.  It sees the drives and they are green.

 

I have two 8 port sata add-on cards so I my guess is one of them got fried somehow.  I went ahead and ordered a new one but thought I would get a sanity check from the log to see if you folks things this is indeed the case.

 

A lot of read errors on the unmountable drives so I assume all 7 drives didn't go bad and it must be the add-on card.

 

Unfortunately, it's not quite that simple, as various drives from BOTH cards are having the same trouble, are unusable at present.  There are a series of roughly 30 second hangs, each followed by trouble reported by numerous drives, all attached to the 2 cards only.  There are also sections of DMAR errors, reminiscent of the Marvell controller issues with virtualization.  Since they are both Marvell chipset based cards, try turning off IOMMU (or possibly all virtualization) in the BIOS settings, and start again.  By the way, under these conditions, I certainly would not assign Disk 13, or do much of anything with the array, until all drives are operational again.

 

I have to say that this is the first time I've seen a problem quite like yours, with the 30 second hangs, that start even before the initialization is complete.  Something's very wrong, but I don't know what.  Drives on both cards were able to identify themselves correctly, were fully set up without issue, then after each hang return bad values and become unusable.  Perhaps disabling the virtualization will help, but if so, check for firmware updates.  You've obviously got a large investment in your system, be a shame to lose virtualization capabilities, if that's the problem.

Link to comment

Thanks for the help.  I did disable IOMMU and the boot was much fast (no 30 second delay) but the drives are still showing as unmountable. 

 

There are some bios updates so I may try that as well.  I have attached a new syslog with IOMMU turned off.

 

Excerpt showing no hang:

----------

Jul 31 18:16:17 Hog emhttp: shcmd (51): mkdir -p /mnt/disk10

Jul 31 18:16:17 Hog kernel: REISERFS warning (device md9): sh-2022 reiserfs_fill_super: unable to initialize journal space

Jul 31 18:16:17 Hog emhttp: shcmd (52): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md10 /mnt/disk10 |& logger

Jul 31 18:16:17 Hog kernel: REISERFS (device md10): found reiserfs format "3.6" with standard journal

Jul 31 18:16:17 Hog kernel: REISERFS (device md10): using ordered data mode

Jul 31 18:16:17 Hog kernel: reiserfs: using flush barriers

Jul 31 18:16:19 Hog kernel: REISERFS warning (device md10): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK?

Jul 31 18:16:19 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md10,

Jul 31 18:16:19 Hog logger:        missing codepage or helper program, or other error

Jul 31 18:16:19 Hog logger:        In some cases useful info is found in syslog - try

Jul 31 18:16:19 Hog logger:        dmesg | tail  or so

Jul 31 18:16:19 Hog logger:

Jul 31 18:16:19 Hog emhttp: shcmd: shcmd (52): exit status: 32

Jul 31 18:16:19 Hog emhttp: mount error: No file system (32)

Jul 31 18:16:19 Hog emhttp: shcmd (53): rmdir /mnt/disk10

Jul 31 18:16:19 Hog kernel: REISERFS warning (device md10): sh-2022 reiserfs_fill_super: unable to initialize journal space

Jul 31 18:16:19 Hog emhttp: shcmd (54): mkdir -p /mnt/disk11

Jul 31 18:16:19 Hog emhttp: shcmd (55): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md11 /mnt/disk11 |& logger

Jul 31 18:16:19 Hog kernel: XFS (md11): Mounting V5 Filesystem

Jul 31 18:16:19 Hog kernel: XFS (md11): Ending clean mount

Jul 31 18:16:19 Hog emhttp: shcmd (56): xfs_growfs /mnt/disk11 |& logger

Jul 31 18:16:19 Hog logger: meta-data=/dev/md11              isize=512    agcount=4, agsize=244188659 blks

Jul 31 18:16:19 Hog logger:          =                      sectsz=512  attr=2, projid32bit=1

Jul 31 18:16:19 Hog logger:          =                      crc=1        finobt=1

Jul 31 18:16:19 Hog logger: data    =                      bsize=4096  blocks=976754633, imaxpct=5

Jul 31 18:16:19 Hog logger:          =                      sunit=0      swidth=0 blks

Jul 31 18:16:19 Hog logger: naming  =version 2              bsize=4096  ascii-ci=0 ftype=1

Jul 31 18:16:19 Hog logger: log      =internal              bsize=4096  blocks=476930, version=2

Jul 31 18:16:19 Hog logger:          =                      sectsz=512  sunit=0 blks, lazy-count=1

Jul 31 18:16:19 Hog logger: realtime =none                  extsz=4096  blocks=0, rtextents=0

Jul 31 18:16:19 Hog emhttp: shcmd (57): mkdir -p /mnt/disk12

Jul 31 18:16:19 Hog emhttp: shcmd (58): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md12 /mnt/disk12 |& logger

Jul 31 18:16:19 Hog kernel: REISERFS (device md12): found reiserfs format "3.6" with standard journal

Jul 31 18:16:19 Hog kernel: REISERFS (device md12): using ordered data mode

Jul 31 18:16:19 Hog kernel: reiserfs: using flush barriers

Jul 31 18:16:19 Hog kernel: REISERFS warning (device md12): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK?

Jul 31 18:16:19 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md12,

Jul 31 18:16:19 Hog logger:        missing codepage or helper program, or other error

Jul 31 18:16:19 Hog logger:        In some cases useful info is found in syslog - try

Jul 31 18:16:19 Hog logger:        dmesg | tail  or so

Jul 31 18:16:19 Hog logger:

Jul 31 18:16:19 Hog emhttp: shcmd: shcmd (58): exit status: 32

Jul 31 18:16:19 Hog emhttp: mount error: No file system (32)

Jul 31 18:16:19 Hog emhttp: shcmd (59): rmdir /mnt/disk12

-----------------------

hog-syslog-20150731-1820.zip

Link to comment

Oh man... I really am in upgrade hell.

 

I had a good quad core AMD board laying around that I decided to try out.  The first couple of boots I was seeing missing disks (2 of them) so I rechecked all the cables, and rebooted.  Boot happened fine, bios on cards saw all drives and unraid started.

 

However, even with a different board/cpu I am now seeing the exact same unmountable drives as when I was using the new Z97 motherboard.

 

I have no clue.

Link to comment

I think I figured it out.....

 

For some reason, the journal parameter on the reiserfs drives became toast on the unmountable drives.  To fix this, I run the following:

 

# reiserfsck --check /dev/sdh1

 

//sdh1 is an example.  Use the right one scxx for your drive that you see in the menu.

 

reiserfs_open_journal: journal parameters from the superblock does not match

to the journal headers ones. It looks like that you created your fs with old

reiserfsprogs. Journal header is fixed.

 

 

 

 

I hope my two days of frustration helps people in the future.... not sure how this happened to so many drive.

 

 

 

 

Link to comment

The proper way to run reiserfsck is against the md devices.  Since you've run it against sdh1, parity is now no longer 100% in sync with the changes to the drive.

 

You'll notice that if you do a non-correcting parity check there will be a number of parity errors.  If you are satisfied that the drive is indeed now mounting and accessing correctly, you should run a correcting parity check to bring everything back in tune.

 

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems

Link to comment

Thanks for the help.  I did disable IOMMU and the boot was much fast (no 30 second delay) but the drives are still showing as unmountable. 

 

There are some bios updates so I may try that as well.  I have attached a new syslog with IOMMU turned off.

 

Excerpt showing no hang:

----------

Jul 31 18:16:17 Hog emhttp: shcmd (51): mkdir -p /mnt/disk10

Jul 31 18:16:17 Hog kernel: REISERFS warning (device md9): sh-2022 reiserfs_fill_super: unable to initialize journal space

Jul 31 18:16:17 Hog emhttp: shcmd (52): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md10 /mnt/disk10 |& logger

Jul 31 18:16:17 Hog kernel: REISERFS (device md10): found reiserfs format "3.6" with standard journal

Jul 31 18:16:17 Hog kernel: REISERFS (device md10): using ordered data mode

Jul 31 18:16:17 Hog kernel: reiserfs: using flush barriers

Jul 31 18:16:19 Hog kernel: REISERFS warning (device md10): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK?

Jul 31 18:16:19 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md10,

 

Just checked the new syslog, compared it with the previous, and wow, the improvement is night and day!  Turning off IOMMU has completely fixed the problem.  All drives are now working fine.

 

The part you have been including above is certainly a problem, but it's rather minor compared to all the exceptions that were occurring with so many drives.  It's just damage from previous crashes, and as you have found is fixable.  I think you are fine now, with IOMMU turned off, once you get each of the drives with damaged file systems repaired.  I'm sorry I couldn't get back to you sooner, and save you some anguish and work.

 

If you could from a command prompt provide the results of the lspci command, I would appreciate it.  I will probably want to check and possibly add the card model numbers to the Marvell chipsets & virtualization 'black list'.  Your situation was similar in some ways, but different in others.  The problem did not occur before the upgrade to the 64 bit kernels with virtualization enabled, which is characteristic of this.  But what was different is that your drives did all initialize without issue, then later most but not all failed.  Plus there's those 30 second hangs.

Link to comment

Rob, thanks for all of the help.

 

So, I figured a few things out that may be helpful to others.

 

I actually turned IMMU and all virt settings back on without issues.  The problem did end up being both sata add-on cards in combination with virt on.  On the new Mobo ( GA-Z97X-UD5H-BK) with the i7-4790k Devils Canyon CPU, my old PCI sata cards with cause these hangs every single time and sometimes system crashes.  I tried both cards independently while only hooking up one drive and would still get the issues.  Since these cards are so old and only support 3.0Gbs it was probably time for an upgrade anyway. 

 

Old sata add-on cards causing the issue: SUPERMICRO AOC-SAT2-MV8 64-bit PCI-X133MHz SATA II (3.0Gb/s) Controller Card

New cards that work great out of the box: SUPERMICRO AOC-SAS2LP-MV8 PCI-Express 2.0 x8 SATA / SAS 8-Port Controller Card

 

I think all my frustration came from the following scenario:

With virt disabled, old cards worked but I still saw those reiserfs mount errors which made me think the cards were still bad.  I should have paid closer attention to the logs and put two and two together.  Old cards do work fine if you disable virt (as per your post in the defect/bug forum).

 

 

 

 

Links to parts in question:

Motherboard: http://www.newegg.com/Product/Product.aspx?Item=N82E16813128722&cm_re=ga-z97x-ud5h-bk-_-13-128-722-_-Product

CPU: http://www.newegg.com/Product/Product.aspx?Item=N82E16819117369&cm_re=BX80646I74790K_i7-4790K-_-19-117-369-_-Product

Old Sata add-on cards not working with virt: http://www.newegg.com/Product/Product.aspx?Item=N82E16815121009

New Sata add-on cards working with virt: http://www.newegg.com/Product/Product.aspx?Item=N82E16816101792

 

Link to comment

I am currently rebuilding parity and only have 1.5 hours left.  In the meantime, I thought I would brag about my new system. :)

 

Before I was running anywhere from 70-100% cpu load depending on if the server was sitting idle.

Memory (2gb) was around 60% when idle and 100% when doing anything of note.

 

Check out this screenshot of the new system. :)

unRaid.png.ac8aea4db8c1b846de7984f169336bff.png

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...