November 15, 201411 yr UPDATED 12/17/2014 - Updated post title since workaround to put all disks on AOC-SAS2LP was successful. Errors only occured in mixed mode, using both motherboard and controller sata ports. ---------------------- Hi - I'm not sure if this should be a defect or support issue, but starting with a defect - please move if appropriate. I'm having some trouble upgrading from unRAID 5.05 to 6.0-beta10a. I'm seeing what appear to be controller/disk errors on the first parity check under 6.0-beta10a, ending up with a red ball data disk that produces a truncated smart report. This does not occur under 5.05. My system is an older Asus P5B running 4x2GB of Crucial RAM and a SuperMicro AOC-SAS2LP. The three 3TB data drives are on the SAS2LP and the 6TB parity and 250GB cache drive are on the motherboard SATA connectors. I've repeated this problem several times - it always ends with a red ball on one of the data drives (though which one varies). This appears to be 100% repeatable for me, and it doesn't take long into the parity check to occur. I've captured the smart reports in the attached logs - note that the smart errors on the red ball drive go away on reboot. There are three attached files with syslogs and smart reports. 5.05-Baseline.zip is a baseline boot with successful parity check under 5.05. 6.0-beta10a-disk-errors.zip is pulled after the upgrade and a failed parity check. sdf has the bad smart report. The -0 files are pulled before the parity check is initiated, and the -1 files are pulled after the red ball. 6.0-beta10a-reboot.zip is pulled after a reboot - you can see the smart report on sdf has cleared. Returning to 5.05 and recalculating parity there has brought me back successfully each time. Note that sdf is unformatted because it is my old parity drive, but I don't think that should matter. Any ideas? Please let me know if I can supply more information or try anything different. Thanks. 5.05-Baseline.zip unRAID-6.0-beta10a-disk-errors.zip unRAID-6.0-beta10a-reboot.zip
November 15, 201411 yr I notice you have some 6TB Red drives. I would get parity errors on my system until I set the 6TB drives to never spin down. Once I had done that the errors stopped occurring. It might be worth seeing if that helps you as well.
November 15, 201411 yr Author Thanks, I'll look at that. Do you think it has to do with the setting, or unRAID spinning down (or trying to spin down) the drives? I ask because I initiated the parity check about 15 minutes after booting so nothing had tried to spin down yet (I have the default 60 minute setting).
November 15, 201411 yr Thanks, I'll look at that. Do you think it has to do with the setting, or unRAID spinning down (or trying to spin down) the drives? I ask because I initiated the parity check about 15 minutes after booting so nothing had tried to spin down yet (I have the default 60 minute setting). I have no idea why it seems to help.
November 16, 201411 yr Author Unfortunately, that did not help in my case. I get the same problem: From Syslog: Nov 16 17:19:50 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 625:command active 415395BC, slot [32]. Nov 16 17:19:50 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 625:command active 415395B4, slot [23]. Nov 16 17:20:20 Tower kernel: sas: Enter sas_scsi_recover_host busy: 32 failed: 32 Nov 16 17:20:20 Tower kernel: sas: trying to find task 0xffff8802143afa40 Nov 16 17:20:20 Tower kernel: sas: sas_scsi_find_task: aborting task 0xffff8802143afa40 From Smart report: === START OF INFORMATION SECTION === Vendor: /0:0:2:0 Product: Physical block size: 0 bytes Lowest aligned LBA: 14896 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 >> Terminate command early due to bad response to IEC mode page A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Any other diagnostics I can try?
November 18, 201411 yr Author For grins and giggles I re-cabled and dropped the cache drive. This allowed me to pull the AOC-SAS2LP and run parity + 3 data drives off my 4 motherboard sata connections. In this configuration I am able to successfully complete a parity check under unRAID 6.0-beta10a, where I could not while running the data drives off the SuperMicro card (parity and cache were on the MB for all tests). Feels to me like there is an issue that involves the Marvel driver for the SuperMicro cards. Any thoughts? Or further diagnostics that I can provide? Thanks.
November 22, 201411 yr Author Well, I tried running under XEN as jphipps suggested. Unfortunately that didn't work, same issue. I'm attaching the XEN syslog to see if that helps. Anything else I can try to provide additional diagnostic information? I'll need to take this machine back to 5.05 soon... XEN-syslog.zip
November 23, 201411 yr It looks like it is having an issue identifying the filesystem on disk3: Nov 22 12:29:37 Tower emhttp: shcmd (31): mkdir -p /mnt/disk3 Nov 22 12:29:37 Tower emhttp: shcmd (32): set -o pipefail ; mount -t auto -o noatime,nodiratime /dev/md3 /mnt/disk3 |& logger Nov 22 12:29:37 Tower logger: mount: block device /dev/md3 is write-protected, mounting read-only Nov 22 12:29:37 Tower kernel: REISERFS warning (device md3): sh-2021 reiserfs_fill_super: can not find reiserfs on md3 Nov 22 12:29:37 Tower kernel: EXT3-fs (md3): error: can't find ext3 filesystem on dev md3. Nov 22 12:29:37 Tower kernel: EXT2-fs (md3): error: can't find an ext2 filesystem on dev md3. Nov 22 12:29:37 Tower kernel: EXT4-fs (md3): VFS: Can't find ext4 filesystem Nov 22 12:29:37 Tower kernel: FAT-fs (md3): bogus number of reserved sectors Nov 22 12:29:37 Tower kernel: FAT-fs (md3): Can't find a valid FAT filesystem Nov 22 12:29:37 Tower kernel: FAT-fs (md3): bogus number of reserved sectors Nov 22 12:29:37 Tower kernel: FAT-fs (md3): Can't find a valid FAT filesystem Nov 22 12:29:37 Tower kernel: REISERFS warning (device md3): sh-2021 reiserfs_fill_super: can not find reiserfs on md3 Nov 22 12:29:37 Tower kernel: EXT3-fs (md3): error: can't find ext3 filesystem on dev md3. Nov 22 12:29:37 Tower kernel: EXT2-fs (md3): error: can't find an ext2 filesystem on dev md3. Nov 22 12:29:37 Tower kernel: EXT4-fs (md3): VFS: Can't find ext4 filesystem Nov 22 12:29:37 Tower kernel: FAT-fs (md3): bogus number of reserved sectors Nov 22 12:29:37 Tower kernel: FAT-fs (md3): Can't find a valid FAT filesystem Nov 22 12:29:37 Tower kernel: FAT-fs (md3): bogus number of reserved sectors Nov 22 12:29:37 Tower kernel: FAT-fs (md3): Can't find a valid FAT filesystem Nov 22 12:29:37 Tower kernel: ISOFS: Unable to identify CD-ROM format. Nov 22 12:29:37 Tower kernel: hfsplus: unable to find HFS+ superblock Nov 22 12:29:37 Tower kernel: UDF-fs: warning (device md3): udf_load_vrs: No VRS found Nov 22 12:29:37 Tower kernel: UDF-fs: warning (device md3): udf_fill_super: No partition found (2) Nov 22 12:29:37 Tower kernel: XFS (md3): Invalid superblock magic number Nov 22 12:29:37 Tower logger: mount: you must specify the filesystem type If that should be a Reiserfs volume, you may want to run a check on it..
November 23, 201411 yr Author It's unformatted - a former parity drive I haven't formatted to ReiserFS because I would like to move to XFS. I've red balled more often on this drive, but I have also red balled on valid ReiserFS drives as well.
November 23, 201411 yr Looksl ike from the log, every disk related error is for disk3. Since it is unformated, have you tried just removing that out and see how it runs under Xen? Have you had other than that disk red ball while booted under Xen?
November 23, 201411 yr Author I've had both disk2 and disk3 red ball under regular unRAID 6.0-beta10a, but so far only disk3 has redballed under XEN. To see if it might help, I formatted disk3 with ReiserFS and ran another Parity Check - failed again, same MO. Syslog attached. Both disk2 and disk3 have run fine while attached to the motherboard. I can try pulling it off as well - but I need the space so that would just be temporary... I'd suspect the AOC-SAS2LP or the cable, but everything ran fine under unRAID 5.05. XEN-syslog2.zip
November 23, 201411 yr That is pretty strange.. There was a few oddities on mine, if I had any disks on the MB I would have disk errors, so I had to move them all to the SAS card. I also had most of my issues while having a mix of filesystems. Since i have converted to all XFS I haven't had an issue, but the mix of filesystems did seem to run under ok under Xen, but not under non-Xen. One other test I had run in the past, was to shutdown NFS and Samba to not allow any client connectivity while I ran the parity check, and it did seem to run all the way through even under non-Xen. Alot of times I would see a client ( SageTV server ) writing heavily to the array during the parity check and about the time of the issue.
November 24, 201411 yr I had some issues with a sas2lp. It needed bios and f/w update to play nice. I never got to that part as I RMA'd before that, but I wasnt careful with packaging. To make a a long story short: I never got my sas2lp to play with Unraid. However, had I updated the card I am sure it would have worked as my research showed that it would.
November 24, 201411 yr I am currently running the 1812 version of the firmware on the card. I have 2 servers both with those cards, and one has no issue, and the other did. I am almost wondering if it is something to do with the combo of MB and the card. The one that works is a SuperMicro motherboard with an I3, and the one that had issues is n EVGA with an AMD.
November 25, 201411 yr Author Interesting, I am currently running Parity on the motherboard so maybe I'll switch everything to the SAS2LP before I give up. I'm all ReiserFS right now, though I'd like to switch over to XFS - but I can only do that if I can get the SAS2LP running under unRAID 6... I'm also running 1812. I haven't reflashed it, though I've seen posts where people re-flashed it to the same version - that was all under unRAID 4 or 5, though. Not sure if that would help under 6.
November 25, 201411 yr I haven't reflashed it, though I've seen posts where people re-flashed it to the same version - that was all under unRAID 4 or 5, though. Not sure if that would help under 6. While reflashing it won't hurt, the posts that you're referring to is probably where for some reason Supermicro changed the ID of the card, and unRaid would no longer recognize the card without reflashing.
December 12, 201411 yr Author Hi, Jon - thanks for checking in. The problem still exists in beta12, same symptoms. Based on suggestions from jphipps I've been playing around with motherboard vs. SAS2LP SATA ports. Using a subset of disks I can successfully run a parity check with 4 disks on my motherboard SATA ports under beta10a - regular, non-Xen. I have also successfully run a parity check with the same 4 disks on the SAS2LP under both beta10a and beta12, so it appears the problem may be related to (or made worse by) having drives on a combination of motherboard and SAS2LP SATA ports. I have a new breakout cable in hand so I'll try this weekend with all disks on the SAS2LP and see what happens under beta12.
December 12, 201411 yr Hi, Jon - thanks for checking in. The problem still exists in beta12, same symptoms. Based on suggestions from jphipps I've been playing around with motherboard vs. SAS2LP SATA ports. Using a subset of disks I can successfully run a parity check with 4 disks on my motherboard SATA ports under beta10a - regular, non-Xen. I have also successfully run a parity check with the same 4 disks on the SAS2LP under both beta10a and beta12, so it appears the problem may be related to (or made worse by) having drives on a combination of motherboard and SAS2LP SATA ports. I have a new breakout cable in hand so I'll try this weekend with all disks on the SAS2LP and see what happens under beta12. Ok, thanks for the feedback. Sorry to hear you're still not out of the woods yet, but as you've gone through testing, it appears we're narrowing down the potential reason for this issue and it seems to be hardware configuration specific (and not a widespread general issue). That's the silver lining to this if you ask me.
December 14, 201411 yr I seem to also be having a similar issue since upgrading to 6.0-beta10a/12. I have a 3 AOC-SASLP-MV8's in my setup. While on 6.0-beta10a, I recently swapped out my 4TB parity, with a 6TB red. Once parity rebuild was complete, I swapped out a 2TB disk with the 6TB and let it rebuild. Parity rebuilt the disk and expanded my array just fine. I then upgraded to 6.0-beta12 and have been fighting ever since. My data 6TB red drive red balled, but all checks on the disk show fine. And now my parity is showing invalid, even though the parity disk is green balled. If my situation is similar, I'd like to assist if I can, if it is not, I can start a new thread.
December 15, 201411 yr I seem to also be having a similar issue since upgrading to 6.0-beta10a/12. I have a 3 AOC-SASLP-MV8's in my setup. While on 6.0-beta10a, I recently swapped out my 4TB parity, with a 6TB red. Once parity rebuild was complete, I swapped out a 2TB disk with the 6TB and let it rebuild. Parity rebuilt the disk and expanded my array just fine. I then upgraded to 6.0-beta12 and have been fighting ever since. My data 6TB red drive red balled, but all checks on the disk show fine. And now my parity is showing invalid, even though the parity disk is green balled. If my situation is similar, I'd like to assist if I can, if it is not, I can start a new thread. MisterLas, can you please verify that you have 3 of the AOC-SASLP-MV8 and not the AOC-SAS2LP-MV8 controller cards? I have 2 AOC-SAS2LP-MV8 controller cards powering 14 drives (mixed 2TB and 3TB assortment) running Beta 12 and haven't had an issues with the parity (i've done parity checks too). What model of motherboard do you have?
December 15, 201411 yr MisterLas, can you please verify that you have 3 of the AOC-SASLP-MV8 and not the AOC-SAS2LP-MV8 controller cards? I have 2 AOC-SAS2LP-MV8 controller cards powering 14 drives (mixed 2TB and 3TB assortment) running Beta 12 and haven't had an issues with the parity (i've done parity checks too). What model of motherboard do you have? Thanks. Yes, I have 3 of the AOC-SASLP-MV8, powering 1 parity and 23 data disks (cache is in a 2.5" PCI hotswap bay that is connected to my motherboard directly). My motherboard is a Supermicro MBD-X9SCM-F-O with an i3-2120 CPU.
December 15, 201411 yr Author I seem to also be having a similar issue since upgrading to 6.0-beta10a/12. I have a 3 AOC-SASLP-MV8's in my setup. While on 6.0-beta10a, I recently swapped out my 4TB parity, with a 6TB red. Once parity rebuild was complete, I swapped out a 2TB disk with the 6TB and let it rebuild. Parity rebuilt the disk and expanded my array just fine. I then upgraded to 6.0-beta12 and have been fighting ever since. My data 6TB red drive red balled, but all checks on the disk show fine. And now my parity is showing invalid, even though the parity disk is green balled. If my situation is similar, I'd like to assist if I can, if it is not, I can start a new thread. Hi - In my first post, if you look in unRAID-6.0-beta10a-disk-errors.zip at smart-0269.sdf-1.txt you'll see a strange looking smart report. It looks like the controller has completely lost the connection to the disk. If you reboot the connection is re-established and the smart report looks normal. Not sure if you are seeing anything similar, but I think that's the root of my issue. In my case I can complete a limited parity check with 4 disks on the motherboard sata controllers. That won't work long term, so I'm trying a workaround right now with all disks on the SAS2LP instead (I have a smaller setup, Plus license). My issues so far have occurred running in mixed mode - parity on the motherboard and data on the SAS2LP. We'll see what happens when I get home tonight.
December 16, 201411 yr All of my drives report a clean SMART report, but for some reason it still shows my disk9 as faulted (even though SMART is clean) and an invalid parity. I am in the process of preclearing another 6TB disk. Not sure how to proceed from here as everything I can see from the logs is clean... I just can't get that 6TB data disk to show clean, assuming that is why my parity is showing invalid as well.
December 18, 201411 yr Author Well, the workaround was successful. I am able to complete a parity check with all drives on the AOC-SAS2LP. Thanks jphipps, jonp, and others for the helpful comments.
Archived
This topic is now archived and is closed to further replies.