Jump to content

disk4 has much higher writes than the rest and is disabled now, but SMART reads no errors?


Recommended Posts

woke up this morning to a nasty red common problems error message. 

 

it appears that one of my disks is having problems but i cant figure out exactly what it is. i just switched my sata connections from one of these nvme breakout boards to one of these H200 in IT Mode and everything seemed to be going well yesterday. no issues at all that i could find. 

but today i noticed (im pretty sure this part is not new) that disk4 has way more writes than any non-parity disk in the array and im not sure why? (also attached the view of the array)

 

and then obviously today its been pulled disabled from the array and is not showing any errors that i know how to recognize in the Tools > Diagnostics files (attached). disk4 is the only one not labelled as such, ID ends in R5SK

 

can anyone shed some light on what might have happened? is this a bad cable/connection or does the disk need replacing? im still fairly new to the diagnostics part of server management so any help and education would be greatly appreciated

unraid-array-06132022.png

datass-diagnostics-20220613-0924.zip

Edited by Sk8rSeth
word choice for clarity
Link to comment
  • Sk8rSeth changed the title to disk4 has much higher writes than the rest and is disabled now, but SMART reads no errors?
Just now, trurl said:

Unclear about your choice of words here. By "pulled" do you just mean the disk is disabled or do you mean something else.

ah! sorry, its still physically in the server and such, just no longer being read/write to in the array. the device was _disabled_ is what i should have said

Link to comment
10 minutes ago, Sk8rSeth said:

that disk4 has way more writes than any non-parity disk in the array and im not sure why?

Don't worry about, number of writes is basically meaningless, you can for example run a parity check with identical disks and some end up with many more reads than other, like double or more.

 

Actual disk looks fine, likely a power/connection problem.

Link to comment

 

6 minutes ago, trurl said:

Is that screenshot current?

yes, taken right as i posted this.

 

i will shut down the whole thing and reseat the cables, and see if that fixes the issue. the random high write count made me think maybe it wasnt the 'thing i last changed'. 

 

@JorgeBhow can you tell the actual disk looks fine? is it just the lack of SMART errors or something else that i can start checking in these situations?

Link to comment

awesome thanks!

 

is there any test or procedure i can use to test the disk/array after reseating the cables and restarting to see if that was actually the problem?

will unraid throw the same error immediately and disable the disk like before if the connections werent the issue?

Edited by Sk8rSeth
Link to comment

do i need to remove the 'config' from the Historical Devices section (which i believe is just Unassaigned Devices plugin?) and more importantly, do i need to unassign the disk before starting the array? upon startup of the server again, the disk4 still shows the 'device is disabled' red X before starting the array.

 

im not sure the procedure here, and the last time i messed with things i didnt really know, i lost an entire disk's worth of data. so im trying to be especially cautious here

Link to comment

okay well i think i have a bigger problem than just the connections. after reseating the cables on both ends, and starting the rebuild for disk4, another disk is throwing a bunch of errors. specifically:

Jun 13 12:24:38 DATAss kernel: md: disk1 write error, sector=2021486584

a ton of times. 

 

does this mean maybe a bad cable set? im using these cables, so its possible both of those drives are part of the same cable chain.

 

also how do i deal with this now that im over an hour into the rebuild of disk4? do i let the rebuild continue? pause it? i have no idea what to do

error2.png

Edited by Sk8rSeth
Link to comment

i have custom power cables that i made myself, and have been working flawlessly for over a year now, but i also reseated every connection to all the HDDs earlier when i started this diagnosis, so i suppose i could have not reseated them properly, however unlikely that might be?

i also have all the drives in a fractal design node 304 case, which has two 'banks' of four drive cages. and the right most drive cage sites pretty close above the PSU, allowing for little room for SATA cables to bend around and find theyre way. they didnt seem under any stress to me, but these mini-sas to SATA cables for the h200 are new to me, is it possible theyre just way more fragile than i thought?

 

do i need to wait the 19 more hours for this current rebuild to complete, hoping no other disks fail in that time before i can try to mess with the cables more? or can i pause the rebuild and try to fix the connections, then restart the rebuild?

 

i am deeply nervous about the fact that two drives are in a failed state, which means with two parity my whole array is at the limits of its protection. stressful.

 

attached is new diagnostics i just pulled, but im still not familiar enough to understand all im looking at.

datass-diagnostics-20220613-1353.zip

Edited by Sk8rSeth
Link to comment

Similar to the other

3 hours ago, JorgeB said:

SMART looks 100% healthy and the errors in the syslog when it was disable don't indicate a media error.

 

If you stop the rebuild it will have to start it over from the beginning. On the other hand, if you stop it, you can rebuild both at once.

Link to comment
Just now, trurl said:

Do you have backups of anything important and irreplaceable?

irreplaceable stuff, yes. but not all my important stuff (plus its just simply a LOT of data) so i would really want to take the least risky path to maintain data. its good to know i can stop the rebuild and start it again, but i am still not sure of the problem itself.

 

is there any way to narrow it down to cables or maybe the h200 is bad, or something like that? if im rebuilding two disks at once, that means im out of redundancy and the whole array is at risk right, and since those SATA cables from the h200 are in groups of four, that would mean a bad cable could spell the end of 4 drives at worst case scenario right? is this the kind of thing where if i start a rebuild, and another drive, or two goes down just like this disk1 and disk4 situation, that i can stop the rebuild assume its a bad cable and replace the cable without losing all my data?

Link to comment

not currently, but thats in my plan. if i were to go buy a spare disk or two to trade these out, and would need the data thats on the old disks im replacing, how would i access it? i like the idea of not overwriting it by building on top just in case but i have no idea how i would do anything with those old drives?

Link to comment

really thats awesome! do you have any documentation for how to do that? would i need to add them back to the server internally, or could i drop them into an external usb interface? how do i pull data off them and onto the array? 

 

sorry for all the questions, im learning so much!

Link to comment

According to your diagnostics you already have Unassigned Devices installed. You can go directly to the correct Support Thread for any of your plugins by clicking its Support Thread link on the Plugins page.

 

16 hours ago, Sk8rSeth said:

external usb interface?

You can do it that way though some USB enclosure implementations might not work as well for some situations.

 

16 hours ago, Sk8rSeth said:

how do i pull data off them and onto the array?

Best is Dynamix File Manager plugin, but only available on Unraid 6.10 or later.

 

Lots of other ways including over the network.

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...