Jump to content

Unclear disk issue


Recommended Posts

Happy New Year everybody. 

 

I have been having some strange issues with a disk since upgrading to 6.9.0-rc2. According to the UNRAID GUI it had 1024 errors and is now being emulated. This is the second time this has happened in the span of a week. The first time I checked all the connections and had the disk rebuild and it worked well for a couple days before it happened again. In response to the error I ran an extended SMART check and it passed. I also can not see in the SMART report any indication of the disk failing. I have attached the diagnostics. The disk in question is DISK 1.

 

FYI the disk is connected via a Silverstone ECS04 Raid card.

tower-diagnostics-20210103-0736.zip

Link to comment

I went through the rebuild and everything went well. So now I am again at a loss here. The disk does not have any UDMA CRC errors and also no reallocated sectors. This seems odd to me, as reading other posts and from past experience a bad disk has always carried some form of increasing, permanent, errors and a bad connection made the CRC error count go up but then stop increasing once the connection issue was fixed. In this case though I got neither scenario really. JorgeB, do you happen to know what may have caused these strange errors that UNRAID displayed? Do you recommend I send the disk back for a warranty replacement after only 1300h of use?

Link to comment
57 minutes ago, GMAsterAU said:

a bad connection made the CRC error count go up

Not always. In fact, not usually. Bad connection will often result in the disk not even knowing there was a problem since it can't even be accessed. CRC and other SMART attributes are stored within the disk and if the disk didn't know there was a connection problem it can't store anything about that.

 

If you want further advice based on your current situation post new diagnostics.

Link to comment
  • 4 weeks later...

After almost a month without issues, the identical disk issue appeared again. Overnight at 2 am DISK 1 showed read errors again. Once again there are no SMART errors reported as far as I can see. Following our previous discussion, the bottom line then is that the disk is failing in spite of no SMART errors? Is there any way to know what kind of errors these are?

tower-diagnostics-20210131-0809.zip

Link to comment
  • 4 weeks later...

Hi all, so I did a lot of poking and testing and the symptoms keep getting weirder. As a note any and all of these issues have started and persisted with Version: 6.9.0-rc2 

1. currently I have 2 disks sent away for replacement and they are missing from the array as discussed above. 

 

2. I have discovered that two out of 4 disks that are connected to the RAID card (Silverstone ECS04), show errors after about 1 -2 days of Server up-time. However when I restart the server the errors are removed and everything is good again until the cycle restarts. 

2.1 in response to the RAID card, I have increased cooling, however when I checked on its temps it did not exceed the manufacturers recommendations

 

3. as part of the whole 'weird errors are happening' situation, I have also discovered that user shares do not show up in the 'SHARES' menu, and when I connect to the server I can only see select trees and everything else is missing, requiring a restart to fix. 

 

Before restart:

545483562_ScreenShot2021-02-26at6_44_02am.thumb.png.0ada17783e50ec51c919d1cbc9041e12.png

 

982400113_ScreenShot2021-02-26at6_46_02am.thumb.png.bd5f36a333f20f89cb1d7ac3f2cc5125.png

 

After restart:

 

1932526443_ScreenShot2021-02-26at7_04_48am.thumb.png.e06c7e5c853503537c03bf19b4d9277d.png1163946971_ScreenShot2021-02-26at7_04_16am.thumb.png.a19cf3a4e2ad3dfc88f4dad23aeaf0a4.png

 

 

I have a couple key questions to understand what is going on:

 

1. does anyone know what kind of errors Unraid is recording and counting in the Main menu when the disk error rate goes up?

2. why do these errors get reset?

3. what could lead yo me having to restart the server to get it all sorted?

4. what governs the shares information and where is it stored? am I looking at a failed RAM module perhaps?

 

thanks for all your help with this

 

 

 

tower-diagnostics-20210226-0643_before restart.zip tower-diagnostics-20210226-0702_after_restart.zip

Link to comment
  1. Any failed attempt to read or write a disk is counted, you can see these in the syslog in your diagnostics
  2. The error counts in the Errors column on Main always start at zero when the server boots. You can also reset them at Main - Array Operation - Clear Stats
  3. Maybe a controller problem is resetting and looks like disks 5,6 were having problems so that might indicate something they have in common is the culprit
  4. The user shares are simply the aggregate of all top level folders on the pools and array, just another view of the disk files. If you create a user share, Unraid creates a top level folder named for the share on the pools or array as needed according to the settings for the share. Conversely, any top level folder on the pools or array is automatically a user share named for the folder. Problems reading the disks can sometimes interfere with aggregating the folders and so the user shares are "broken"

 

 

 

Link to comment
24 minutes ago, JorgeB said:

I believe there have been other reports of issues with those disks when used on a LSI with v6.9, possibly a driver issue, you could try connecting them to the onboard SATA, of course swap with disks from a different model.

sure! I will try that tomorrow morning and report back

Link to comment
On 2/26/2021 at 9:01 PM, GMAsterAU said:

sure! I will try that tomorrow morning and report back

@JorgeB I am amazed! so far the swap worked fine. No errors reported with the same use, after more than 2 days when it previously used to show errors after 1 day. What I did was change the RAID card being connected from the 8TH Seagate Ironwolf drives to two 3TB WD drives and now there are no issues so far.

 

Is there a way to raise this with UNRAID? I imagine I am not experiencing this as an isolated case

Link to comment
10 hours ago, JorgeB said:

It's not, you can try upgrading to v6.9 final, it might include a newer LSI driver, but LT can't do anything about this, this would be an LSI + those Seagate disks issue.

How unfortunate. Well either way thank you very much for your help with this. It looks like after a lot of trial and error I have reached a stable server configuration again.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...