gshipley Posted July 30, 2015 Share Posted July 30, 2015 Hello all: I built my system 6 years ago and some of my drives have been in service since day one. I recently upgraded the mobo etc in order to run docker containers better but when powering on the system half of the array is unmountable. It sees the drives and they are green. I have two 8 port sata add-on cards so I my guess is one of them got fried somehow. I went ahead and ordered a new one but thought I would get a sanity check from the log to see if you folks things this is indeed the case. A lot of read errors on the unmountable drives so I assume all 7 drives didn't go bad and it must be the add-on card. syslog attached. hog-syslog-20150730-2342.zip Link to comment
gshipley Posted July 30, 2015 Author Share Posted July 30, 2015 Here are some relevant messages from the log: Jul 30 23:37:00 Hog emhttp: shcmd (170): mkdir -p /mnt/disk6 Jul 30 23:37:00 Hog emhttp: shcmd (171): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md6 /mnt/disk6 |& logger Jul 30 23:37:00 Hog kernel: REISERFS (device md6): found reiserfs format "3.6" with standard journal Jul 30 23:37:00 Hog kernel: REISERFS (device md6): using ordered data mode Jul 30 23:37:00 Hog kernel: reiserfs: using flush barriers Jul 30 23:37:00 Hog kernel: REISERFS warning (device md6): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK? Jul 30 23:37:00 Hog kernel: REISERFS warning (device md6): sh-2022 reiserfs_fill_super: unable to initialize journal space Jul 30 23:37:00 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md6, Jul 30 23:37:00 Hog logger: missing codepage or helper program, or other error Jul 30 23:37:00 Hog logger: In some cases useful info is found in syslog - try Jul 30 23:37:00 Hog logger: dmesg | tail or so Jul 30 23:37:00 Hog logger: Jul 30 23:37:00 Hog emhttp: shcmd: shcmd (171): exit status: 32 Jul 30 23:37:00 Hog emhttp: mount error: No file system (32) Jul 30 23:37:00 Hog emhttp: shcmd (172): rmdir /mnt/disk6 Jul 30 23:37:00 Hog emhttp: shcmd (173): mkdir -p /mnt/disk7 Jul 30 23:37:00 Hog emhttp: shcmd (174): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md7 /mnt/disk7 |& logger Jul 30 23:37:00 Hog kernel: REISERFS (device md7): found reiserfs format "3.6" with standard journal Jul 30 23:37:00 Hog kernel: REISERFS (device md7): using ordered data mode Jul 30 23:37:00 Hog kernel: reiserfs: using flush barriers Jul 30 23:37:00 Hog kernel: REISERFS warning (device md7): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK? Jul 30 23:37:00 Hog kernel: REISERFS warning (device md7): sh-2022 reiserfs_fill_super: unable to initialize journal space Jul 30 23:37:00 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md7, Jul 30 23:37:00 Hog logger: missing codepage or helper program, or other error Jul 30 23:37:00 Hog logger: In some cases useful info is found in syslog - try Jul 30 23:37:00 Hog logger: dmesg | tail or so Jul 30 23:37:00 Hog logger: Jul 30 23:37:00 Hog emhttp: shcmd: shcmd (174): exit status: 32 Jul 30 23:37:00 Hog emhttp: mount error: No file system (32) Jul 30 23:37:00 Hog emhttp: shcmd (175): rmdir /mnt/disk7 Jul 30 23:37:00 Hog emhttp: shcmd (176): mkdir -p /mnt/disk8 Jul 30 23:37:00 Hog emhttp: shcmd (177): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md8 /mnt/disk8 |& logger Jul 30 23:37:00 Hog kernel: REISERFS (device md8): found reiserfs format "3.6" with standard journal Jul 30 23:37:00 Hog kernel: REISERFS (device md8): using ordered data mode Jul 30 23:37:00 Hog kernel: reiserfs: using flush barriers Jul 30 23:37:00 Hog kernel: REISERFS warning (device md8): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK? Jul 30 23:37:00 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md8, Jul 30 23:37:00 Hog logger: missing codepage or helper program, or other error Jul 30 23:37:00 Hog logger: In some cases useful info is found in syslog - try Jul 30 23:37:00 Hog logger: dmesg | tail or so Jul 30 23:37:00 Hog logger: Jul 30 23:37:00 Hog emhttp: shcmd: shcmd (177): exit status: 32 Jul 30 23:37:00 Hog emhttp: mount error: No file system (32) Jul 30 23:37:00 Hog emhttp: shcmd (178): rmdir /mnt/disk8 Link to comment
RobJ Posted July 31, 2015 Share Posted July 31, 2015 I built my system 6 years ago and some of my drives have been in service since day one. I recently upgraded the mobo etc in order to run docker containers better but when powering on the system half of the array is unmountable. It sees the drives and they are green. I have two 8 port sata add-on cards so I my guess is one of them got fried somehow. I went ahead and ordered a new one but thought I would get a sanity check from the log to see if you folks things this is indeed the case. A lot of read errors on the unmountable drives so I assume all 7 drives didn't go bad and it must be the add-on card. Unfortunately, it's not quite that simple, as various drives from BOTH cards are having the same trouble, are unusable at present. There are a series of roughly 30 second hangs, each followed by trouble reported by numerous drives, all attached to the 2 cards only. There are also sections of DMAR errors, reminiscent of the Marvell controller issues with virtualization. Since they are both Marvell chipset based cards, try turning off IOMMU (or possibly all virtualization) in the BIOS settings, and start again. By the way, under these conditions, I certainly would not assign Disk 13, or do much of anything with the array, until all drives are operational again. I have to say that this is the first time I've seen a problem quite like yours, with the 30 second hangs, that start even before the initialization is complete. Something's very wrong, but I don't know what. Drives on both cards were able to identify themselves correctly, were fully set up without issue, then after each hang return bad values and become unusable. Perhaps disabling the virtualization will help, but if so, check for firmware updates. You've obviously got a large investment in your system, be a shame to lose virtualization capabilities, if that's the problem. Link to comment
gshipley Posted July 31, 2015 Author Share Posted July 31, 2015 Thanks for the help. I did disable IOMMU and the boot was much fast (no 30 second delay) but the drives are still showing as unmountable. There are some bios updates so I may try that as well. I have attached a new syslog with IOMMU turned off. Excerpt showing no hang: ---------- Jul 31 18:16:17 Hog emhttp: shcmd (51): mkdir -p /mnt/disk10 Jul 31 18:16:17 Hog kernel: REISERFS warning (device md9): sh-2022 reiserfs_fill_super: unable to initialize journal space Jul 31 18:16:17 Hog emhttp: shcmd (52): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md10 /mnt/disk10 |& logger Jul 31 18:16:17 Hog kernel: REISERFS (device md10): found reiserfs format "3.6" with standard journal Jul 31 18:16:17 Hog kernel: REISERFS (device md10): using ordered data mode Jul 31 18:16:17 Hog kernel: reiserfs: using flush barriers Jul 31 18:16:19 Hog kernel: REISERFS warning (device md10): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK? Jul 31 18:16:19 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md10, Jul 31 18:16:19 Hog logger: missing codepage or helper program, or other error Jul 31 18:16:19 Hog logger: In some cases useful info is found in syslog - try Jul 31 18:16:19 Hog logger: dmesg | tail or so Jul 31 18:16:19 Hog logger: Jul 31 18:16:19 Hog emhttp: shcmd: shcmd (52): exit status: 32 Jul 31 18:16:19 Hog emhttp: mount error: No file system (32) Jul 31 18:16:19 Hog emhttp: shcmd (53): rmdir /mnt/disk10 Jul 31 18:16:19 Hog kernel: REISERFS warning (device md10): sh-2022 reiserfs_fill_super: unable to initialize journal space Jul 31 18:16:19 Hog emhttp: shcmd (54): mkdir -p /mnt/disk11 Jul 31 18:16:19 Hog emhttp: shcmd (55): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md11 /mnt/disk11 |& logger Jul 31 18:16:19 Hog kernel: XFS (md11): Mounting V5 Filesystem Jul 31 18:16:19 Hog kernel: XFS (md11): Ending clean mount Jul 31 18:16:19 Hog emhttp: shcmd (56): xfs_growfs /mnt/disk11 |& logger Jul 31 18:16:19 Hog logger: meta-data=/dev/md11 isize=512 agcount=4, agsize=244188659 blks Jul 31 18:16:19 Hog logger: = sectsz=512 attr=2, projid32bit=1 Jul 31 18:16:19 Hog logger: = crc=1 finobt=1 Jul 31 18:16:19 Hog logger: data = bsize=4096 blocks=976754633, imaxpct=5 Jul 31 18:16:19 Hog logger: = sunit=0 swidth=0 blks Jul 31 18:16:19 Hog logger: naming =version 2 bsize=4096 ascii-ci=0 ftype=1 Jul 31 18:16:19 Hog logger: log =internal bsize=4096 blocks=476930, version=2 Jul 31 18:16:19 Hog logger: = sectsz=512 sunit=0 blks, lazy-count=1 Jul 31 18:16:19 Hog logger: realtime =none extsz=4096 blocks=0, rtextents=0 Jul 31 18:16:19 Hog emhttp: shcmd (57): mkdir -p /mnt/disk12 Jul 31 18:16:19 Hog emhttp: shcmd (58): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md12 /mnt/disk12 |& logger Jul 31 18:16:19 Hog kernel: REISERFS (device md12): found reiserfs format "3.6" with standard journal Jul 31 18:16:19 Hog kernel: REISERFS (device md12): using ordered data mode Jul 31 18:16:19 Hog kernel: reiserfs: using flush barriers Jul 31 18:16:19 Hog kernel: REISERFS warning (device md12): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK? Jul 31 18:16:19 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md12, Jul 31 18:16:19 Hog logger: missing codepage or helper program, or other error Jul 31 18:16:19 Hog logger: In some cases useful info is found in syslog - try Jul 31 18:16:19 Hog logger: dmesg | tail or so Jul 31 18:16:19 Hog logger: Jul 31 18:16:19 Hog emhttp: shcmd: shcmd (58): exit status: 32 Jul 31 18:16:19 Hog emhttp: mount error: No file system (32) Jul 31 18:16:19 Hog emhttp: shcmd (59): rmdir /mnt/disk12 ----------------------- hog-syslog-20150731-1820.zip Link to comment
gshipley Posted July 31, 2015 Author Share Posted July 31, 2015 And just for completeness sake, the MOBO in question is: GA-Z97X-UD5H-BK running BIOS F6 http://www.gigabyte.com/products/product-page.aspx?pid=4978#ov Link to comment
gshipley Posted July 31, 2015 Author Share Posted July 31, 2015 Upgraded bios to latest version with same results. I am crying today. Link to comment
gshipley Posted July 31, 2015 Author Share Posted July 31, 2015 Also booted into Ubuntu and the same drives can not be mounted which rules out UNRaid itself as having a problem. Link to comment
gshipley Posted July 31, 2015 Author Share Posted July 31, 2015 Installed a new SATA card today with same results. Guys, I am really at a loss here. Could it be the motherboard? -- gs Link to comment
gshipley Posted July 31, 2015 Author Share Posted July 31, 2015 Just upgraded to 6.1-rc2 with same unmountable drives. 6.1-rc2 log attached. hog-syslog-20150731-1607.zip Link to comment
gshipley Posted July 31, 2015 Author Share Posted July 31, 2015 Oh man... I really am in upgrade hell. I had a good quad core AMD board laying around that I decided to try out. The first couple of boots I was seeing missing disks (2 of them) so I rechecked all the cables, and rebooted. Boot happened fine, bios on cards saw all drives and unraid started. However, even with a different board/cpu I am now seeing the exact same unmountable drives as when I was using the new Z97 motherboard. I have no clue. Link to comment
gshipley Posted July 31, 2015 Author Share Posted July 31, 2015 Yeah, so I give up officially. Re-installed original mobo, cpu, ram and same errors. Tried new sata cards, three motherboards, new cables, with same result. Link to comment
Gog Posted August 1, 2015 Share Posted August 1, 2015 Nuke it from orbit, it's the only way to be sure... Could it be the sata interface of one of the drive that plays tricks with the mobo?. Can you reboot with only the working hd connected and add one hd per reboot? Link to comment
gshipley Posted August 1, 2015 Author Share Posted August 1, 2015 I think I figured it out..... For some reason, the journal parameter on the reiserfs drives became toast on the unmountable drives. To fix this, I run the following: # reiserfsck --check /dev/sdh1 //sdh1 is an example. Use the right one scxx for your drive that you see in the menu. reiserfs_open_journal: journal parameters from the superblock does not match to the journal headers ones. It looks like that you created your fs with old reiserfsprogs. Journal header is fixed. I hope my two days of frustration helps people in the future.... not sure how this happened to so many drive. Link to comment
Squid Posted August 1, 2015 Share Posted August 1, 2015 The proper way to run reiserfsck is against the md devices. Since you've run it against sdh1, parity is now no longer 100% in sync with the changes to the drive. You'll notice that if you do a non-correcting parity check there will be a number of parity errors. If you are satisfied that the drive is indeed now mounting and accessing correctly, you should run a correcting parity check to bring everything back in tune. https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems Link to comment
gshipley Posted August 1, 2015 Author Share Posted August 1, 2015 Yeah, the plan is to rebuild parity after I am done with all of this. Link to comment
RobJ Posted August 1, 2015 Share Posted August 1, 2015 Thanks for the help. I did disable IOMMU and the boot was much fast (no 30 second delay) but the drives are still showing as unmountable. There are some bios updates so I may try that as well. I have attached a new syslog with IOMMU turned off. Excerpt showing no hang: ---------- Jul 31 18:16:17 Hog emhttp: shcmd (51): mkdir -p /mnt/disk10 Jul 31 18:16:17 Hog kernel: REISERFS warning (device md9): sh-2022 reiserfs_fill_super: unable to initialize journal space Jul 31 18:16:17 Hog emhttp: shcmd (52): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md10 /mnt/disk10 |& logger Jul 31 18:16:17 Hog kernel: REISERFS (device md10): found reiserfs format "3.6" with standard journal Jul 31 18:16:17 Hog kernel: REISERFS (device md10): using ordered data mode Jul 31 18:16:17 Hog kernel: reiserfs: using flush barriers Jul 31 18:16:19 Hog kernel: REISERFS warning (device md10): sh-462 check_advise_trans_params: bad transaction max size (4294967295). FSCK? Jul 31 18:16:19 Hog logger: mount: wrong fs type, bad option, bad superblock on /dev/md10, Just checked the new syslog, compared it with the previous, and wow, the improvement is night and day! Turning off IOMMU has completely fixed the problem. All drives are now working fine. The part you have been including above is certainly a problem, but it's rather minor compared to all the exceptions that were occurring with so many drives. It's just damage from previous crashes, and as you have found is fixable. I think you are fine now, with IOMMU turned off, once you get each of the drives with damaged file systems repaired. I'm sorry I couldn't get back to you sooner, and save you some anguish and work. If you could from a command prompt provide the results of the lspci command, I would appreciate it. I will probably want to check and possibly add the card model numbers to the Marvell chipsets & virtualization 'black list'. Your situation was similar in some ways, but different in others. The problem did not occur before the upgrade to the 64 bit kernels with virtualization enabled, which is characteristic of this. But what was different is that your drives did all initialize without issue, then later most but not all failed. Plus there's those 30 second hangs. Link to comment
gshipley Posted August 1, 2015 Author Share Posted August 1, 2015 Rob, thanks for all of the help. So, I figured a few things out that may be helpful to others. I actually turned IMMU and all virt settings back on without issues. The problem did end up being both sata add-on cards in combination with virt on. On the new Mobo ( GA-Z97X-UD5H-BK) with the i7-4790k Devils Canyon CPU, my old PCI sata cards with cause these hangs every single time and sometimes system crashes. I tried both cards independently while only hooking up one drive and would still get the issues. Since these cards are so old and only support 3.0Gbs it was probably time for an upgrade anyway. Old sata add-on cards causing the issue: SUPERMICRO AOC-SAT2-MV8 64-bit PCI-X133MHz SATA II (3.0Gb/s) Controller Card New cards that work great out of the box: SUPERMICRO AOC-SAS2LP-MV8 PCI-Express 2.0 x8 SATA / SAS 8-Port Controller Card I think all my frustration came from the following scenario: With virt disabled, old cards worked but I still saw those reiserfs mount errors which made me think the cards were still bad. I should have paid closer attention to the logs and put two and two together. Old cards do work fine if you disable virt (as per your post in the defect/bug forum). Links to parts in question: Motherboard: http://www.newegg.com/Product/Product.aspx?Item=N82E16813128722&cm_re=ga-z97x-ud5h-bk-_-13-128-722-_-Product CPU: http://www.newegg.com/Product/Product.aspx?Item=N82E16819117369&cm_re=BX80646I74790K_i7-4790K-_-19-117-369-_-Product Old Sata add-on cards not working with virt: http://www.newegg.com/Product/Product.aspx?Item=N82E16815121009 New Sata add-on cards working with virt: http://www.newegg.com/Product/Product.aspx?Item=N82E16816101792 Link to comment
gshipley Posted August 1, 2015 Author Share Posted August 1, 2015 I am currently rebuilding parity and only have 1.5 hours left. In the meantime, I thought I would brag about my new system. Before I was running anywhere from 70-100% cpu load depending on if the server was sitting idle. Memory (2gb) was around 60% when idle and 100% when doing anything of note. Check out this screenshot of the new system. Link to comment
RobJ Posted August 1, 2015 Share Posted August 1, 2015 Thanks, I've added the old card's model number (88SX6081) to the Defect report. I'm glad you have found a much better solution. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.