Sk8rSeth Posted June 13, 2022 Share Posted June 13, 2022 (edited) woke up this morning to a nasty red common problems error message. it appears that one of my disks is having problems but i cant figure out exactly what it is. i just switched my sata connections from one of these nvme breakout boards to one of these H200 in IT Mode and everything seemed to be going well yesterday. no issues at all that i could find. but today i noticed (im pretty sure this part is not new) that disk4 has way more writes than any non-parity disk in the array and im not sure why? (also attached the view of the array) and then obviously today its been pulled disabled from the array and is not showing any errors that i know how to recognize in the Tools > Diagnostics files (attached). disk4 is the only one not labelled as such, ID ends in R5SK can anyone shed some light on what might have happened? is this a bad cable/connection or does the disk need replacing? im still fairly new to the diagnostics part of server management so any help and education would be greatly appreciated datass-diagnostics-20220613-0924.zip Edited June 13, 2022 by Sk8rSeth word choice for clarity Quote Link to comment
trurl Posted June 13, 2022 Share Posted June 13, 2022 2 minutes ago, Sk8rSeth said: today its been pulled from the array Unclear about your choice of words here. By "pulled" do you just mean the disk is disabled or do you mean something else. Quote Link to comment
Sk8rSeth Posted June 13, 2022 Author Share Posted June 13, 2022 Just now, trurl said: Unclear about your choice of words here. By "pulled" do you just mean the disk is disabled or do you mean something else. ah! sorry, its still physically in the server and such, just no longer being read/write to in the array. the device was _disabled_ is what i should have said Quote Link to comment
trurl Posted June 13, 2022 Share Posted June 13, 2022 Emulated disk4 looks like it is mounted and has data. smart folder in diagnostics seems to indicate the disk in question is unassigned and syslog also has a line about Unassigned Devices looking at that disk. Is that screenshot current? Quote Link to comment
trurl Posted June 13, 2022 Share Posted June 13, 2022 SMART for disk looks OK but need to see if you can fix its connection. Quote Link to comment
JorgeB Posted June 13, 2022 Share Posted June 13, 2022 10 minutes ago, Sk8rSeth said: that disk4 has way more writes than any non-parity disk in the array and im not sure why? Don't worry about, number of writes is basically meaningless, you can for example run a parity check with identical disks and some end up with many more reads than other, like double or more. Actual disk looks fine, likely a power/connection problem. Quote Link to comment
Sk8rSeth Posted June 13, 2022 Author Share Posted June 13, 2022 6 minutes ago, trurl said: Is that screenshot current? yes, taken right as i posted this. i will shut down the whole thing and reseat the cables, and see if that fixes the issue. the random high write count made me think maybe it wasnt the 'thing i last changed'. @JorgeBhow can you tell the actual disk looks fine? is it just the lack of SMART errors or something else that i can start checking in these situations? Quote Link to comment
JorgeB Posted June 13, 2022 Share Posted June 13, 2022 SMART looks 100% healthy and the errors in the syslog when it was disable don't indicate a media error. Quote Link to comment
Sk8rSeth Posted June 13, 2022 Author Share Posted June 13, 2022 (edited) awesome thanks! is there any test or procedure i can use to test the disk/array after reseating the cables and restarting to see if that was actually the problem? will unraid throw the same error immediately and disable the disk like before if the connections werent the issue? Edited June 13, 2022 by Sk8rSeth Quote Link to comment
JorgeB Posted June 13, 2022 Share Posted June 13, 2022 Non correcting parity check is a good test. Quote Link to comment
Sk8rSeth Posted June 13, 2022 Author Share Posted June 13, 2022 do i need to remove the 'config' from the Historical Devices section (which i believe is just Unassaigned Devices plugin?) and more importantly, do i need to unassign the disk before starting the array? upon startup of the server again, the disk4 still shows the 'device is disabled' red X before starting the array. im not sure the procedure here, and the last time i messed with things i didnt really know, i lost an entire disk's worth of data. so im trying to be especially cautious here Quote Link to comment
JorgeB Posted June 13, 2022 Share Posted June 13, 2022 https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself Quote Link to comment
Sk8rSeth Posted June 13, 2022 Author Share Posted June 13, 2022 oh sweet thank you! Quote Link to comment
Sk8rSeth Posted June 13, 2022 Author Share Posted June 13, 2022 (edited) okay well i think i have a bigger problem than just the connections. after reseating the cables on both ends, and starting the rebuild for disk4, another disk is throwing a bunch of errors. specifically: Jun 13 12:24:38 DATAss kernel: md: disk1 write error, sector=2021486584 a ton of times. does this mean maybe a bad cable set? im using these cables, so its possible both of those drives are part of the same cable chain. also how do i deal with this now that im over an hour into the rebuild of disk4? do i let the rebuild continue? pause it? i have no idea what to do Edited June 13, 2022 by Sk8rSeth Quote Link to comment
trurl Posted June 13, 2022 Share Posted June 13, 2022 Possibly you disturbed the connections on that disk when you were trying to fix the other. Make sure there is no tension on the cables that might pull on the connections. Also power connections could be a problem, including any splitters. attach new diagnostics to your NEXT post in this thread Quote Link to comment
Sk8rSeth Posted June 13, 2022 Author Share Posted June 13, 2022 (edited) i have custom power cables that i made myself, and have been working flawlessly for over a year now, but i also reseated every connection to all the HDDs earlier when i started this diagnosis, so i suppose i could have not reseated them properly, however unlikely that might be? i also have all the drives in a fractal design node 304 case, which has two 'banks' of four drive cages. and the right most drive cage sites pretty close above the PSU, allowing for little room for SATA cables to bend around and find theyre way. they didnt seem under any stress to me, but these mini-sas to SATA cables for the h200 are new to me, is it possible theyre just way more fragile than i thought? do i need to wait the 19 more hours for this current rebuild to complete, hoping no other disks fail in that time before i can try to mess with the cables more? or can i pause the rebuild and try to fix the connections, then restart the rebuild? i am deeply nervous about the fact that two drives are in a failed state, which means with two parity my whole array is at the limits of its protection. stressful. attached is new diagnostics i just pulled, but im still not familiar enough to understand all im looking at. datass-diagnostics-20220613-1353.zip Edited June 13, 2022 by Sk8rSeth Quote Link to comment
trurl Posted June 13, 2022 Share Posted June 13, 2022 Similar to the other 3 hours ago, JorgeB said: SMART looks 100% healthy and the errors in the syslog when it was disable don't indicate a media error. If you stop the rebuild it will have to start it over from the beginning. On the other hand, if you stop it, you can rebuild both at once. Quote Link to comment
trurl Posted June 13, 2022 Share Posted June 13, 2022 And the emulated disks are mounted so that looks good. Do you have backups of anything important and irreplaceable? Quote Link to comment
Sk8rSeth Posted June 13, 2022 Author Share Posted June 13, 2022 Just now, trurl said: Do you have backups of anything important and irreplaceable? irreplaceable stuff, yes. but not all my important stuff (plus its just simply a LOT of data) so i would really want to take the least risky path to maintain data. its good to know i can stop the rebuild and start it again, but i am still not sure of the problem itself. is there any way to narrow it down to cables or maybe the h200 is bad, or something like that? if im rebuilding two disks at once, that means im out of redundancy and the whole array is at risk right, and since those SATA cables from the h200 are in groups of four, that would mean a bad cable could spell the end of 4 drives at worst case scenario right? is this the kind of thing where if i start a rebuild, and another drive, or two goes down just like this disk1 and disk4 situation, that i can stop the rebuild assume its a bad cable and replace the cable without losing all my data? Quote Link to comment
trurl Posted June 13, 2022 Share Posted June 13, 2022 Do you have spare disks you can use for rebuilding and so not overwrite the originals by rebuilding on top? Quote Link to comment
Sk8rSeth Posted June 13, 2022 Author Share Posted June 13, 2022 not currently, but thats in my plan. if i were to go buy a spare disk or two to trade these out, and would need the data thats on the old disks im replacing, how would i access it? i like the idea of not overwriting it by building on top just in case but i have no idea how i would do anything with those old drives? Quote Link to comment
trurl Posted June 13, 2022 Share Posted June 13, 2022 They can be accessed with the Unassigned Devices plugin. Quote Link to comment
Sk8rSeth Posted June 13, 2022 Author Share Posted June 13, 2022 really thats awesome! do you have any documentation for how to do that? would i need to add them back to the server internally, or could i drop them into an external usb interface? how do i pull data off them and onto the array? sorry for all the questions, im learning so much! Quote Link to comment
trurl Posted June 14, 2022 Share Posted June 14, 2022 According to your diagnostics you already have Unassigned Devices installed. You can go directly to the correct Support Thread for any of your plugins by clicking its Support Thread link on the Plugins page. 16 hours ago, Sk8rSeth said: external usb interface? You can do it that way though some USB enclosure implementations might not work as well for some situations. 16 hours ago, Sk8rSeth said: how do i pull data off them and onto the array? Best is Dynamix File Manager plugin, but only available on Unraid 6.10 or later. Lots of other ways including over the network. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.