Areca ARC-1280 Read Error Cause Disable Device


bsim

Recommended Posts

Racking my brains on this...I have have an Areca ARC-1280 (latest firmware) (on Unraid 6.3.5) that has been working beautifully...up until now (for 3 or 4 months).

In my last parity check I had two drives red x ("device is disabled, contents emulated") on me (I have dual parity setup, so both running as emulated).

When looking at the areca controller logs, it only shows that the drives had a bunch of "Reading Error"s and no write errors. I've seen that there are a few peculiar issues with the Areca cards. Is this a controller timeout issue or something? There were 0 sync errors reported throughout the parity check. What should I do?

I first shut down the server, verified that the drives were seated correctly, reseated the sata cables. Right now I'm attempting to do a check with no parity corrections to see if anything else comes up funky. I have ran a few parity checks in the past (runs automatically every month) with the same setup, and no problems.

If the drives only show a read error on the drives can the drives still be trusted by unraid to work correctly? What happens if a third drive drops with only read errors from the controller?

 

The best solution I've found that seems risky for just having read errors reported

Stop the array

unassign both disks

start the array

stop the array

assign the disks again

start the array and allow the parity rebuild

if rebuild successful, reverify by running non-correcting parity check

 

Is this truely a drive issue or is it just a controller nuance/setting?

Edited by bsim
bad title
Link to comment

I realized as I was shutting down to check, that I should have saved the diag information...It doesn't look like anything was dumped to flash which sucks (would be a great unraid feature to dump snapshots on errors by date/time). I do remember though, that before the shutdown, there were no sync errors...With being on a stable setup for a few months, I understood this as the drives just temporarily had an issue or the controller card farted.

In the Areca web interface, I did find a " Auto Activate Incomplete Raid " that could have been an issue. From what I have researched, when the read errors occured, the controller may have taken the raid set (jbod=single disk) offline causing unraid to scram. After I brought the system up with the changed areca setting, the two drives don't show as failed in the areca interface any longer.

I'm running a non-parity write check on the array to verify no other drives red ball, but my next step I'm thinking I will try an extended smart test on the "bad" drives, and then possibly rebuild the drives to verify that nothing in parity is screwed up. I did order replacement 5TB drives that will take a few days to arrive, so assuming no other drives red ball, I may just slap in the new two drives and go with them...Just hate the time lag when on the edge of data loss...Day and a half for parity check, day and a half (at least) to preclear drives, day and a half rebuild

 

Areca Log...I'm thinking the failed was the controller setting after the multiple read errors.

 

2017-08-24 16:44:37 IDE Channel 13 Device Failed    
2017-08-24 16:44:37 Raid Set # 12 RaidSet Degraded    
2017-08-24 16:44:37 ST4000DM000-1F21 Volume Failed    
2017-08-24 11:00:43 IDE Channel 13 Reading Error    
2017-08-24 11:00:34 IDE Channel 13 Reading Error    
2017-08-24 11:00:26 IDE Channel 13 Reading Error    
2017-08-24 11:00:17 IDE Channel 13 Reading Error    
2017-08-24 11:00:08 IDE Channel 13 Reading Error    
2017-08-24 11:00:00 IDE Channel 13 Reading Error    
Link to comment
10 minutes ago, bsim said:

or the controller card farted.

 

Since there are 2 disabled disks it's most likely this, and there aren't more because when a controller hiccups unRAID only disables as many disks as there are parity disks, now the controller hiccup may have been caused by a bad disk, check SMART for all disks, if all OK rebuild the disable disks.

Link to comment

That hiccup is kinda what I was leaning towards, being that there hasn't been any issues with drives in previous parity checks, and then all of a sudden burping on 2 drives with only read errors acknowledged by the controller. Looking at the smart status on the controller itself and the smart status on unraid's dashboard off hand, no issues show glaring (all green).

 

I noticed that offline drives need the array to be stopped to run the smart tests from the dashboard (smart tests greyed out), but do I have to shut down the array to execute on disable drives from the command line?

 

Just to make sure...these the best smart command lines on newer hardware?

 

DRIVE INFORMATION - smartctl -a /dev/sdx
DRIVE SHORT TEST - smartctl -t short /dev/sdx
DRIVE LONG TEST - smartctl -t long /dev/sdx

 

SHOW TEST RESULTS - smartctl -l selftest /dev/sdx

 

Link to comment

Something that I have also noticed is that 99% of the smart data in the unraid webgui is garbage...I've researched a bit and found that unraid hasn't implemented controller manufacturer specific smart requests. To check from the command line...

 

It requires an initial

lsscsi -g|grep "Areca"

to get the controller location (/dev/sg25), then

 

smartctl -a -d areca,1 /dev/sg25

to get the information for the "sdb" specific drive

(the single digit drive location corresponds to the drive letter for each disk...sdb=1, sdc=2, sdd=3...)

 

I submitted a feature request hoping that something better would be implemented soon, as it's been requested a few times since before 6.2 and doesn't seem like it's made it to the final product. No response yet.

 

So my command lines for the Areca...

DRIVE INFORMATION - smartctl -a -d areca,1 /dev/sg25

DRIVE SHORT TEST - smartctl -t short -d areca,1 /dev/sg25

DRIVE LONG TEST - smartctl -t long -d areca,1 /dev/sg25

 

SHOW TEST RESULTS - smartctl -l selftest -d areca,1 /dev/sg25

 

 

 

Link to comment
12 hours ago, bsim said:

I submitted a feature request hoping that something better would be implemented soon, as it's been requested a few times since before 6.2 and doesn't seem like it's made it to the final product. No response yet.

 

There is native support in the GUI since unRAID v6.0 to set controller specific settings for each individual disk. This allows you to obtain the SMART information in the correct way.

Edited by bonienl
Link to comment

Tried switching to the Areca setting...no changes under disks...It seems that it is saying drives need to be spun up, in which case, mine never spin down anyway. Pressing spin up just refreshes the page and sticks with the " Unavailable - disk must be spun up " message. Even downloading the report gives me an error message about being spun up. All of my testing buttons are greyed out...do i need to bring down my array or reboot my server to enable the Areca setting for smart?

Link to comment

I got it! AAAAAAHHHHHH....Now I just have to put correlate 24 drives with what I figured out before sdb=1, sdc=2,sdd=3....and enter for each drive! UG.  Would there be a regular expression that would allow me to correlate the drive letters to their controller slots? I'm terrible at regular expressions, but having used them many times, it seems that the possibility would be there for a jedi master of regular expressions to help users.

 

Essentially the default under Settings, disk settings, global smart settings being set to areca are worthless as far as I can tell, as they don't give you a way to allow unraid correlate the numbers/drive letters for you to save some time. Perhaps the default global setting is what the main page temperatures use to pull smart data and why the main temperatures still don't function correctly, but at least now each disk pulls the correct smart data.

 

Why doesn't the main page pull the temperature using each disks smart pulling setting? Is there a fix or a place to play around for a fix?

 

As a side note, not sure if it's because I am currently doing a parity check or not, but "Unavailable - disk must be spun up" seems to still be keeping all of the buttons in the "Self-test" section still grey'ed out...I have several hours until I can check with the array offline or at least not in parity check cycle. I've executed the short smart test from the command line using the areca option and it seems that the test is reported as being "Interrupted (host reset)" after pulling the report a while later...Perhaps because they are mounted and in use? I'll see if I can get more when the parity ends (Just under 8 hours yet).

 

 

Areca Successful.jpg

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.