March 5, 201511 yr Alright, I need some help/advice/therapy/Xanax or maybe a combination but I got an unRaid problem that's just bugging me. I've had my server for a while -- let me say, unRaid is one of the greatest things EVER for computer freaks. Here's what I had before yesterday: 1 3TB Parity Drive 7 3TB Data Drives 1 2TB Cache Drive I have space for 24 drives, I got an i7 processor, 1100w PS, 3 SAS cards...my wife says I have a problem. I tell her she can have 126 pairs of shoes, I can have a data center. Anyway, I wanted to mention specs because I believe I have everything I need to do what I want to do, because....well it's been working since I've had it. Upgraded unRaid server is: 1 4TB Parity Drive (upgrade) 9 3TB Data Drives (moved old parity to data disk) 1 2TB Cache Drive Did the research everything was going great. Then a red ball. Disk failure. It's a disk that has NEVER failed, it wasn't even touched during this process. It even has the same "Disk" identification. What I don't get is the drive works fine. I'm watching a movie off it via windows explorer. I moved a file to that disk using \\server\disk 2\. I don't get it. I have no idea what to do next. Right now, I'm literally moving all the data from disk 2 to disk 8. I'm preparing to remove the drive, I don't want to. By the way the moving of the files from \\server\disk 2 to \\server\disk 8 works fine. No errors, no problems. I tried parity check three times. Failed each time, less than a minute into the process. Anyway, I'm sorry this is long. I hate when things just stop working and then you can't figure it out. I understand stuff breaks. Maybe it's me...I'm sure I did something wrong, but I have no idea what that was. Does anyone have any ideas? Thanks to anyone that actually read all of this, I'm on like my 27th cup of joe. So yea, I'm amped right now. Just want an answer or at least a plausible explanation.
March 5, 201511 yr First off, the system is doing exactly what its supposed to be doing. unRaid failed at a write to the red-balled drive, and it automatically disabled it. You are able to still read the information off of it because the system has created a "virtual" drive by reading the contents of all of the other drives and parity. Since you were just inside your server, the odds are fairly good that the cause of the red ball is just that you have a slightly loose sata or power cable to the drive. Right now, I would post your syslog so that we can see exactly what led up to the red-ball (if you haven't already reset the computer) If it was me, I would stop the copying process, reboot the server and see if you can't capture the smart information from it. Then we can make a decision as to whether or not the drive is actually any good, or if it was just a loose cable somewhere along the line. (When you have the computer shut off, reseat all of the cables going to the drive (and it wouldn't be a bad idea to check all of the cables going to the other drives at the same time) BTW, what kind of case do you have?
March 5, 201511 yr Author First, thank you for responding. I could buy the lose cable thing if I had opened the case. I didn't. I just got out an unused drive tray installed the new drive and rebooted. It's weird that it's able to use that drive (the red balled one) since parity disk is new and a parity check with the new parity drive hasn't been completed yet. I'm just going to keep moving files to a the different disk after all it's been going for 8 hours now with a grand total of 0 errors. I know that probably seems stupid but it seemed like the thing to do. I understand and appreciate the logic of your response. I've had experience with loose cable/bad cable problems before. But I never even touched that drive, did not remove that tray, it's not even in the same row of drives. Is it possible that it's now loose? Yep. I'm discounting nothing at this point. The case is Norco Technologies Inc. RPC-4224 Thanks for responding.
March 5, 201511 yr You say a parity check hasn't been completed, but has a parity build completed? Or is your parity showing invalid? Maybe a screenshot would help clarify.
March 5, 201511 yr Author Screenshot and syslog attached Syslog show all sorts of write errors around 16:00 that when parity failed. Screenshot shows 384 errors; that number hasn't changed since I stopped the parity check when all these errors started popping up. I did eventuality let the parity thingy run. And well that's about it. So here are the requested files unraid_syslog_030415-2011.zip
March 5, 201511 yr At 4:14 the drive did suffer write errors due to something that is usually cable / power related nel: ata17: failed to read log page 10h (errno=-5) (Minor Issues) Mar 4 16:14:33 Bazinga kernel: ata17.00: exception Emask 0x1 SAct 0x1 SErr 0x0 action 0x6 (Errors) Mar 4 16:14:33 Bazinga kernel: ata17.00: failed command: WRITE FPDMA QUEUED (Minor Issues) Mar 4 16:14:33 Bazinga kernel: ata17.00: cmd 61/00:00:b0:50:1e/04:00:07:00:00/40 tag 0 ncq 524288 out (Drive related) Mar 4 16:14:33 Bazinga kernel: res 01/04:ff:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation) (Errors) Mar 4 16:14:33 Bazinga kernel: ata17.00: status: { ERR } (Drive related) Mar 4 16:14:33 Bazinga kernel: ata17.00: error: { ABRT } (Errors) Mar 4 16:14:33 Bazinga kernel: ata17: hard resetting link (Minor Issues) Maybe the tray wasn't seated 100% properly? You were in the process of rebuilding disk #2 from parity and all of the other drives combined. Those errors are what keeps causing the rebuild process to stop (its a rebuild -> not a check... the screen should say this) After you grab all of the files off the virtual disk 2, you're going to have to solve why those ata errors are appearing. Your supply should be able to handle all the drives, so it really comes down to tray issues / 8087 cable issues / controller.
March 5, 201511 yr Author I'll be trying to get to the bottom of this as soon as these files are moved. Thank you very much for your thoughts and your time.
March 5, 201511 yr You might also provide spec on your Power supply and determine if it has a single 12V rail. Moving up to 11 HD's on your server could cause issues if your power supply is a dual rail one.
March 5, 201511 yr As noted above, you can read/write to the drive because UnRAID is working exactly as it should -- it's using your parity drive and the other drives to emulate the failed drive. You DO have good parity -- when you replaced the parity drive with a 4TB drive, it was clearly updated, so parity is good. If that wasn't the case, the system (a) wouldn't show good parity; and (b) couldn't emulate the failed drive. At this point, I'd do the following: (a) Let your copy finish, so you have a complete backup of the failed drive. (b) Stop the array; un-assign disk #2; Start the array so disk #2 shows as missing; then Shutdown the array. © Check the connection for disk #2 -- if it's in a hot-swap case, unplug it; then replug it. (d) Boot the system; Stop the array if it auto-starts; assign disk #2 to the correct slot; then Start the array -- it should then do a rebuild onto that disk. If this fails, then repeat (b); then shut down and install a NEW drive (3-4TB); then boot again; assign the NEW drive as disk #2; and then Start the array and let it do a rebuild. (e) When the rebuild is completed, do a non-correcting parity check (this will confirm the rebuild was good).
March 5, 201511 yr Author I appreciate all these suggestions and I'll post when I eventually solve this little riddle. So many great folks on this forum. It's appreciated...well at least by me. If anyone else has ideas or suggestions I'm happy to hear them. I also got do some research of power supplies. I'm not ashamed to say I'm 100% ignorant in that area. But that's ok, I can learn.
March 5, 201511 yr Don't skimp on your power supply -- buy a quality unit that has, as a minimum, active PFC and a single 12v rail. For 11 drives any good 550w or above unit should be fine. As Frank noted above, it'd be useful if you posted the make/model/specifications of your current unit ... we can at least offer an opinion as to whether it may be the cause of your headaches.
March 5, 201511 yr Author Apparently, I'm a liar...but not really. The last machine I built was 1000 or 1100, I chose a different one for reasons that I can't remember now. So I did some digging and found the receipt. It is the CORSAIR AX series AX860. In the interest of making sure that I'm still awake and I don't forget the actual equipment I have, this is my system summary: unRAID Version: unRAID Server Pro, Version 5.0.6 Motherboard: ASUSTeK Computer INC. - P6T Processor: Intel® Core™ i7 CPU 950 @ 3.07GHz Cache: L1-Cache = 256 kB (max. 256 kB) L2-Cache = 1024 kB (max. 1024 kB) L3-Cache = 8192 kB (max. 8192 kB) Memory: 65536 MB (max. 4096 MB) 2048 MB (Single-bank Connection) = 24576 MB, 2048 MB (Single-bank Connection) 2048 MB (Single-bank Connection) = 2048 MB (Single-bank Connection), 2048 MB (Single-bank Connection) 2048 MB (Single-bank Connection) = 2048 MB (Single-bank Connection), 2048 MB (Single-bank Connection) 2048 MB (Single-bank Connection) = 2048 MB (Single-bank Connection), 2048 MB (Single-bank Connection) 24 GB = 2048 MB (Single-bank Connection), 2048 MB 1066 MHz = BANK0, 2048 MB 1066 MHz = BANK1, 2048 MB 1066 MHz = BANK2, 2048 MB 1066 MHz = BANK3, 2048 MB 1066 MHz = BANK4, 2048 MB 1066 MHz = BANK5, Network: eth0: 1000Mb/s - Full Duplex overkill.....I know, but again if she can have the shoes....I get my own data center. Thanks everyone!!
March 5, 201511 yr An AX860 is a VERY good power supply ... certainly makes it unlikely that power is part of this issue. I think the most likely problem is you simply have a failed drive (disk #2) -- but it won't hurt to attempt a rebuild onto itself after you reseat it (unplug it; then plug it back into its slot), just to confirm it didn't just get inadvertently bumped and isn't securely seated.
March 5, 201511 yr Author I certainly have more confidence that this problem will be resolved. I'm hoping a simple re-seating of the "failed" drive will do the trick. This is why people should check their hard-headed nature at the door and just post the question in the forum. When one learns, we all learn. Anyway, my thanks again!
March 6, 201511 yr Author As noted above, you can read/write to the drive because UnRAID is working exactly as it should -- it's using your parity drive and the other drives to emulate the failed drive. You DO have good parity -- when you replaced the parity drive with a 4TB drive, it was clearly updated, so parity is good. If that wasn't the case, the system (a) wouldn't show good parity; and (b) couldn't emulate the failed drive. At this point, I'd do the following: (a) Let your copy finish, so you have a complete backup of the failed drive. (b) Stop the array; un-assign disk #2; Start the array so disk #2 shows as missing; then Shutdown the array. © Check the connection for disk #2 -- if it's in a hot-swap case, unplug it; then replug it. (d) Boot the system; Stop the array if it auto-starts; assign disk #2 to the correct slot; then Start the array -- it should then do a rebuild onto that disk. If this fails, then repeat (b); then shut down and install a NEW drive (3-4TB); then boot again; assign the NEW drive as disk #2; and then Start the array and let it do a rebuild. (e) When the rebuild is completed, do a non-correcting parity check (this will confirm the rebuild was good). Wanted to provide an update: I have followed the above recommendation, explicitly. So far, I simply removed the drive tray and re-inserted the drive using steps B and C from the above recommendation. Upon rebooting the system, re-assigning "Drive B", and re-starting the array, the rebuilding disk procedure commenced as expected. Thus far, I am 20.6% (618GB) into this re-build process. I'm cautiously optimistic that this process will complete and all disk will return to a "green ball" state. I say cautiously, because I know that 20.6% isn't that far into the process, but I'm hopeful because all previous attempts at re-building failed after about 1.5%. No S.M.A.R.T errors have been reported thus-far, no disk errors have been recorded to this point. No, I don't believe in jinxing myself either, it will either work or it will not. I'm simply trying to report my experiences so if anyone else EVER has a similar issue(s) they may know where to start. I also took the time to update all my documentation about my unRaid system, including the exact system specs -- even down to the exact drive slot position of each disk, also including a backup of the "flash" drive; I updated all relevant information and created a copy of this data in my Dropbox. In the past, this info was not stored on an external source or a cloud service. So, I feel that maybe perhaps I've improved my knowledge about my system, what it contains, and I'm confident knowing that this system information and configuration is at my disposal from virtually anywhere. I will post another update when this issue is fully resolved. I thank you all of you for the assistance that has been provided.
March 6, 201511 yr Just thought I should add that you need to be careful if you ever decide to restore the flash from a backup. Another user had a backup that was not current and restored that, and unRAID thought one of his data drives was the parity drive and started writing parity to it. That data drive was formerly used as the parity drive and his backup had not been updated since replacing the parity drive.
March 6, 201511 yr Just thought I should add that you need to be careful if you ever decide to restore the flash from a backup. Another user had a backup that was not current and restored that, and unRAID thought one of his data drives was the parity drive and started writing parity to it. That data drive was formerly used as the parity drive and his backup had not been updated since replacing the parity drive. Related to this you should take any backup with the array stopped as otherwise unRAID may think that an unclean shutdown was done.
Archived
This topic is now archived and is closed to further replies.