jcarre Posted June 15, 2022 Share Posted June 15, 2022 Hello everyone, A couple days ago I posted in here because I had two disks fail in this topic: Everything seemed to be working fine, but yesterday the disk that had had problems started showing errors again. So I bit the bullet and decided to order a new drive (thas currently underway) and just take the faulty disk out of the array. I have 2 parity drives, so it should be good. Yesterday though, out of nowhere, some of my share became unaccesible, and in the GUI they were marked as not present. I restarted and they came back up, but after a while the same thing happened. Today I did another restart and after a few hours I find the array in the following state: Something is clearly wrong and I don't know what it might be. All my drives except the Seagate drives are connected through the "LSI SAS2116 PCI-Express Fusion-MPT SAS-2" card, could it be that that it's faulty? I'm getting scared 😟. What steps should I take? I attached both the system logs the diagnostics below. Thank you very much for your help! ivpiter-diagnostics-20220615-1316.zip ivpiter-syslog-20220615-1116.zip Quote Link to comment
JorgeB Posted June 15, 2022 Share Posted June 15, 2022 First things to do is to check/replace cables for disk2 and the 2TB pool disk, then post new diags after array start. Quote Link to comment
trurl Posted June 15, 2022 Share Posted June 15, 2022 1 hour ago, jcarre said: both the system logs the diagnostics below system log since last boot is already included in diagnostics. Quote Link to comment
jcarre Posted June 15, 2022 Author Share Posted June 15, 2022 3 hours ago, JorgeB said: First things to do is to check/replace cables for disk2 and the 2TB pool disk, then post new diags after array start. I replaced a whole MiniSAS cable that I had on my other computer and connected the 2TB drive with a sata cable directly to the motherboard. I also took out the LSI card and reseated it just in case (it looked good though). Now I have disk2 disabled and disk8 with "Unmontable". Disk4 was taken out of the array. ivpiter-syslog-20220615-1449.zip Quote Link to comment
trurl Posted June 15, 2022 Share Posted June 15, 2022 2 hours ago, trurl said: system log since last boot is already included in diagnostics. And we much prefer diagnostics instead of just syslog, maybe you misunderstood. We don't need syslog if you post diagnostics. If you post syslog we don't have diagnostics. Attach diagnostics to your NEXT post in this thread. Quote Link to comment
JorgeB Posted June 15, 2022 Share Posted June 15, 2022 Just now, trurl said: Attach diagnostics to your NEXT post in this thread. That and check filesystem on disks 2, 4 and 8. Quote Link to comment
trurl Posted June 15, 2022 Share Posted June 15, 2022 3 hours ago, jcarre said: I'm getting scared Do you have backups of anything important and irreplaceable? Even dual parity is not a substitute for backups. Quote Link to comment
jcarre Posted June 15, 2022 Author Share Posted June 15, 2022 11 minutes ago, JorgeB said: That and check filesystem on disks 2, 4 and 8. I have attached diagnostics below. I'll start checking the filesystems, but disk4 is not currently installed in the array. 3 minutes ago, trurl said: Do you have backups of anything important and irreplaceable? Even dual parity is not a substitute for backups. Yes I do have another sever with the important data, but I also don't wanna lose all "unimportant" media. ivpiter-diagnostics-20220615-1709.zip Quote Link to comment
JorgeB Posted June 15, 2022 Share Posted June 15, 2022 2 minutes ago, jcarre said: but disk4 is not currently installed in the array. It's still being emulated. Quote Link to comment
jcarre Posted June 15, 2022 Author Share Posted June 15, 2022 I repaired the 3 drives, on disk2 and disk4 I had a bunch of errors and files moved to lost+found but the repair finished successfully. Just as I was writing this reply I lost connection to the server, it doesn't respond to a ping command. I'll try to press the power button and wait for it to shutdown. ivpiter-diagnostics-20220615-1936.zip Quote Link to comment
JorgeB Posted June 15, 2022 Share Posted June 15, 2022 Doesn't sound like the server is very stable, trying to repair/rebuild disks with an unstable server can cause more issues, see if you can get new diags after array start. Quote Link to comment
jcarre Posted June 15, 2022 Author Share Posted June 15, 2022 31 minutes ago, JorgeB said: Doesn't sound like the server is very stable, trying to repair/rebuild disks with an unstable server can cause more issues, see if you can get new diags after array start. It has been working rock solid for 1 year +, it started behaving erratically two weeks ago. Nothing in the configuration changed. I tested the ram last time, have no idea how to further test it. Could it by any chance a problem with the usb drive? I remember having problems with them in the past. I brought the server back up and started the array which looked fine for a second but then I got this message: Automatic Unraid no array operation will start. And the array seems like it's not started + it doesn't give me any option, just shutdown or reboot. ivpiter-diagnostics-20220615-2020.zip Quote Link to comment
jcarre Posted June 15, 2022 Author Share Posted June 15, 2022 After yet another reboot, the array came back online but disk 2 is still disabled. ivpiter-diagnostics-20220615-2155.zip Quote Link to comment
JorgeB Posted June 15, 2022 Share Posted June 15, 2022 1 hour ago, jcarre said: Automatic Unraid no array operation will start. That's from the parity check tuning plugin. 6 minutes ago, jcarre said: After yet another reboot, the array came back online but disk 2 is still disabled. That's expected, it will remain disabled until it's rebuilt, same for disk4, you either need to replace the missing disk or remove it from the array. Quote Link to comment
JorgeB Posted June 15, 2022 Share Posted June 15, 2022 Check for a lost+found folder in the emulated disk2 and the other repaired disks, if there's is one check contents, any files there could be lost data. Quote Link to comment
trurl Posted June 15, 2022 Share Posted June 15, 2022 Nothing assigned as disk3 which I assume is correct. Disk2 disabled, disk4 disabled and missing, those emulated disks mounted and all other disks green and mounted. Is that correct? It looks like your lost+found share is on a lot more disks than just these we have been working on. Go to User Shares, click Compute... for lost+found, and post a screenshot. Have you examined lost+found at all? It often is a lot of work to actually recover any files from it, and you may have a lot of data there. Until you get your array stable, you might consider not rebuilding on top of any existing disk and only rebuilding onto spares so the original disks aren't changed and might be used to recover files that repairs and rebuilds don't Quote Link to comment
jcarre Posted June 15, 2022 Author Share Posted June 15, 2022 6 minutes ago, trurl said: Nothing assigned as disk3 which I assume is correct. Disk2 disabled, disk4 disabled and missing, those emulated disks mounted and all other disks green and mounted. Is that correct? Yes, disk3 was taken out of the array a long time ago (to shrink the array). Disk2 is disabled and disk4 is out of the array I still have it inside the case. I am waiting on a replacement for it. All the other drives are green. Here is a screenshot of the lost+found share + ncdu command, it looks like 18+TB of data are on there Now I have a new drive on the way which should arrive on Monday + the one that I took out before (the Seagate). What is the best course of action? Quote Link to comment
trurl Posted June 15, 2022 Share Posted June 15, 2022 The reason lost+found is on all those disks is because you have done filesystem repair on all those disks, some of which are not part of the problems in this thread. 2 hours ago, jcarre said: It has been working rock solid for 1 year +, it started behaving erratically two weeks ago. Were all these repairs done in the last 2 weeks? Quote Link to comment
jcarre Posted June 15, 2022 Author Share Posted June 15, 2022 1 minute ago, trurl said: The reason lost+found is on all those disks is because you have done filesystem repair on all those disks, some of which are not part of the problems in this thread. Were all these repairs done in the last 2 weeks? I don't remember doing any filesystem repair prior to the last thread that I started which was 2 weeks ago. On the previous one I did disk 8 and disk 4. Today I did 2, 4 and 8. So I have no idea why 6,7 and 10 are appearing here. I had one disk fail once, which I replaced and rebuild from parity. And there was also a lot of changes due to replacing drives, but the parity operation always showed 0 errors. I also used to do monthly parity checks (I set them to once every 3 months now), and never had problem. Quote Link to comment
trurl Posted June 15, 2022 Share Posted June 15, 2022 3 minutes ago, jcarre said: I have no idea why 6,7 and 10 are appearing here There are only two possible reasons, you repaired them, or you somehow moved lost+found from other disks to those disks. Unraid will not have moved them there by itself and mover only moves between cache and array never between array disks. Quote Link to comment
trurl Posted June 15, 2022 Share Posted June 15, 2022 Does your controller have adequate cooling? Are you sure you don't have power problems? Any power splitters? Quote Link to comment
jcarre Posted June 15, 2022 Author Share Posted June 15, 2022 3 minutes ago, trurl said: Does your controller have adequate cooling? Are you sure you don't have power problems? Any power splitters? That could very well be a problem, since the card doesn't have a fan and it's been really hot in here lately. I use a Norco Rack Case with three fans at the front, but nothing on the back. Perhaps I could zip tie a one of those 40mm fans in the heat sink of the card. In terms of power I have the server connected directly to a UPS which is connected directly into the wall. I am having a lot of power outages though since de AC sometimes triggers the breakers. 10 minutes ago, trurl said: There are only two possible reasons, you repaired them, or you somehow moved lost+found from other disks to those disks. Unraid will not have moved them there by itself and mover only moves between cache and array never between array disks. It could be possible that I did sometime ago, I can't remember. The sizes on these drives are a lot smaller though. Quote Link to comment
trurl Posted June 15, 2022 Share Posted June 15, 2022 3 minutes ago, jcarre said: It could be possible that I did sometime ago, I can't remember. Maybe disk10 was repaired here? https://forums.unraid.net/topic/104247-xfs-corruption-disabled-disks-check-filesystem/ 4 minutes ago, jcarre said: The sizes on these drives are a lot smaller though. What do you mean by that? Quote Link to comment
trurl Posted June 15, 2022 Share Posted June 15, 2022 50 minutes ago, jcarre said: ncdu command Thanks I didn't know about that one. Quote Link to comment
jcarre Posted June 15, 2022 Author Share Posted June 15, 2022 8 minutes ago, trurl said: Maybe disk10 was repaired here? https://forums.unraid.net/topic/104247-xfs-corruption-disabled-disks-check-filesystem/ Oh yes, this was the very first day that I installed the new motherboard + cpu + ram combo, I think that because I had xmp enabled everything started to get corrupted. I changed that and all was good. It could be possible that I repaired the disks then. 10 minutes ago, trurl said: What do you mean by that? I meant the sizes of the "files" are a lot smaller, a lot less data in there. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.