bsim Posted August 25, 2017 Share Posted August 25, 2017 (edited) Racking my brains on this...I have have an Areca ARC-1280 (latest firmware) (on Unraid 6.3.5) that has been working beautifully...up until now (for 3 or 4 months). In my last parity check I had two drives red x ("device is disabled, contents emulated") on me (I have dual parity setup, so both running as emulated). When looking at the areca controller logs, it only shows that the drives had a bunch of "Reading Error"s and no write errors. I've seen that there are a few peculiar issues with the Areca cards. Is this a controller timeout issue or something? There were 0 sync errors reported throughout the parity check. What should I do? I first shut down the server, verified that the drives were seated correctly, reseated the sata cables. Right now I'm attempting to do a check with no parity corrections to see if anything else comes up funky. I have ran a few parity checks in the past (runs automatically every month) with the same setup, and no problems. If the drives only show a read error on the drives can the drives still be trusted by unraid to work correctly? What happens if a third drive drops with only read errors from the controller? The best solution I've found that seems risky for just having read errors reported Stop the array unassign both disks start the array stop the array assign the disks again start the array and allow the parity rebuild if rebuild successful, reverify by running non-correcting parity check Is this truely a drive issue or is it just a controller nuance/setting? Edited August 25, 2017 by bsim bad title Quote Link to comment
JorgeB Posted August 25, 2017 Share Posted August 25, 2017 (edited) 54 minutes ago, bsim said: I first shut down the server, Did you save the diagnostics before shutting down? If yes post them, I bet that there are write errors after the read errors. Edited August 25, 2017 by johnnie.black Quote Link to comment
bsim Posted August 25, 2017 Author Share Posted August 25, 2017 I realized as I was shutting down to check, that I should have saved the diag information...It doesn't look like anything was dumped to flash which sucks (would be a great unraid feature to dump snapshots on errors by date/time). I do remember though, that before the shutdown, there were no sync errors...With being on a stable setup for a few months, I understood this as the drives just temporarily had an issue or the controller card farted. In the Areca web interface, I did find a " Auto Activate Incomplete Raid " that could have been an issue. From what I have researched, when the read errors occured, the controller may have taken the raid set (jbod=single disk) offline causing unraid to scram. After I brought the system up with the changed areca setting, the two drives don't show as failed in the areca interface any longer. I'm running a non-parity write check on the array to verify no other drives red ball, but my next step I'm thinking I will try an extended smart test on the "bad" drives, and then possibly rebuild the drives to verify that nothing in parity is screwed up. I did order replacement 5TB drives that will take a few days to arrive, so assuming no other drives red ball, I may just slap in the new two drives and go with them...Just hate the time lag when on the edge of data loss...Day and a half for parity check, day and a half (at least) to preclear drives, day and a half rebuild Areca Log...I'm thinking the failed was the controller setting after the multiple read errors. 2017-08-24 16:44:37 IDE Channel 13 Device Failed 2017-08-24 16:44:37 Raid Set # 12 RaidSet Degraded 2017-08-24 16:44:37 ST4000DM000-1F21 Volume Failed 2017-08-24 11:00:43 IDE Channel 13 Reading Error 2017-08-24 11:00:34 IDE Channel 13 Reading Error 2017-08-24 11:00:26 IDE Channel 13 Reading Error 2017-08-24 11:00:17 IDE Channel 13 Reading Error 2017-08-24 11:00:08 IDE Channel 13 Reading Error 2017-08-24 11:00:00 IDE Channel 13 Reading Error Quote Link to comment
JorgeB Posted August 25, 2017 Share Posted August 25, 2017 10 minutes ago, bsim said: or the controller card farted. Since there are 2 disabled disks it's most likely this, and there aren't more because when a controller hiccups unRAID only disables as many disks as there are parity disks, now the controller hiccup may have been caused by a bad disk, check SMART for all disks, if all OK rebuild the disable disks. Quote Link to comment
bsim Posted August 25, 2017 Author Share Posted August 25, 2017 That hiccup is kinda what I was leaning towards, being that there hasn't been any issues with drives in previous parity checks, and then all of a sudden burping on 2 drives with only read errors acknowledged by the controller. Looking at the smart status on the controller itself and the smart status on unraid's dashboard off hand, no issues show glaring (all green). I noticed that offline drives need the array to be stopped to run the smart tests from the dashboard (smart tests greyed out), but do I have to shut down the array to execute on disable drives from the command line? Just to make sure...these the best smart command lines on newer hardware? DRIVE INFORMATION - smartctl -a /dev/sdx DRIVE SHORT TEST - smartctl -t short /dev/sdx DRIVE LONG TEST - smartctl -t long /dev/sdx SHOW TEST RESULTS - smartctl -l selftest /dev/sdx Quote Link to comment
JorgeB Posted August 25, 2017 Share Posted August 25, 2017 8 minutes ago, bsim said: but do I have to shut down the array to execute on disable drives from the command line? No, commands look fine. Quote Link to comment
bsim Posted August 25, 2017 Author Share Posted August 25, 2017 Something that I have also noticed is that 99% of the smart data in the unraid webgui is garbage...I've researched a bit and found that unraid hasn't implemented controller manufacturer specific smart requests. To check from the command line... It requires an initial lsscsi -g|grep "Areca" to get the controller location (/dev/sg25), then smartctl -a -d areca,1 /dev/sg25 to get the information for the "sdb" specific drive (the single digit drive location corresponds to the drive letter for each disk...sdb=1, sdc=2, sdd=3...) I submitted a feature request hoping that something better would be implemented soon, as it's been requested a few times since before 6.2 and doesn't seem like it's made it to the final product. No response yet. So my command lines for the Areca... DRIVE INFORMATION - smartctl -a -d areca,1 /dev/sg25 DRIVE SHORT TEST - smartctl -t short -d areca,1 /dev/sg25 DRIVE LONG TEST - smartctl -t long -d areca,1 /dev/sg25 SHOW TEST RESULTS - smartctl -l selftest -d areca,1 /dev/sg25 Quote Link to comment
bonienl Posted August 26, 2017 Share Posted August 26, 2017 (edited) 12 hours ago, bsim said: I submitted a feature request hoping that something better would be implemented soon, as it's been requested a few times since before 6.2 and doesn't seem like it's made it to the final product. No response yet. There is native support in the GUI since unRAID v6.0 to set controller specific settings for each individual disk. This allows you to obtain the SMART information in the correct way. Edited August 26, 2017 by bonienl Quote Link to comment
bsim Posted August 26, 2017 Author Share Posted August 26, 2017 Tried switching to the Areca setting...no changes under disks...It seems that it is saying drives need to be spun up, in which case, mine never spin down anyway. Pressing spin up just refreshes the page and sticks with the " Unavailable - disk must be spun up " message. Even downloading the report gives me an error message about being spun up. All of my testing buttons are greyed out...do i need to bring down my array or reboot my server to enable the Areca setting for smart? Quote Link to comment
bonienl Posted August 26, 2017 Share Posted August 26, 2017 How did you set the controller settings, can you show a screenshot? Quote Link to comment
bsim Posted August 27, 2017 Author Share Posted August 27, 2017 (edited) I used Disk 5 (Normal Operation) as the example disk...All disks are on the one 24 port areca controller. Edited August 27, 2017 by bsim text example clarification Quote Link to comment
bonienl Posted August 27, 2017 Share Posted August 27, 2017 I don't have an Areca controller myself and can't test, but it requires for each disk attached to the controller to set its specific location, see the 1..4 , 1...128, 1..4 parameters. Perhaps somebody who has a working areca controller setup can assist here. Quote Link to comment
bsim Posted August 27, 2017 Author Share Posted August 27, 2017 I got it! AAAAAAHHHHHH....Now I just have to put correlate 24 drives with what I figured out before sdb=1, sdc=2,sdd=3....and enter for each drive! UG. Would there be a regular expression that would allow me to correlate the drive letters to their controller slots? I'm terrible at regular expressions, but having used them many times, it seems that the possibility would be there for a jedi master of regular expressions to help users. Essentially the default under Settings, disk settings, global smart settings being set to areca are worthless as far as I can tell, as they don't give you a way to allow unraid correlate the numbers/drive letters for you to save some time. Perhaps the default global setting is what the main page temperatures use to pull smart data and why the main temperatures still don't function correctly, but at least now each disk pulls the correct smart data. Why doesn't the main page pull the temperature using each disks smart pulling setting? Is there a fix or a place to play around for a fix? As a side note, not sure if it's because I am currently doing a parity check or not, but "Unavailable - disk must be spun up" seems to still be keeping all of the buttons in the "Self-test" section still grey'ed out...I have several hours until I can check with the array offline or at least not in parity check cycle. I've executed the short smart test from the command line using the areca option and it seems that the test is reported as being "Interrupted (host reset)" after pulling the report a while later...Perhaps because they are mounted and in use? I'll see if I can get more when the parity ends (Just under 8 hours yet). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.