May 28, 20188 yr 3 drives are showing smart warnings so I've got 3 more 4tb to replace them. All existing drives are 4tb. 1 of the 2 parity drives are failing. 2 of the 5 raided drives are failing. What is the best procedure to replace them all? One at a time? Which one should I start with? Do the parity drive first then replace the other two drives at the same time?
May 29, 20188 yr Author I can do this later but is there a reason for it? Not asking for troubleshooting just advice on what order to replace and rebuild. I'll guess the way to go about it is replace the parity, let that build. Replace 1 or both of the data drives and let that rebuild.
May 29, 20188 yr 36 minutes ago, AJ Ouellet said: Not asking for troubleshooting just advice on what order to replace and rebuild. Without knowing what condition the drives are in, there is no way to know what the safest order of replacement is going to be. You say they are failing, and have smart warnings, but that is a very generic statement. We have no way of knowing what the actual condition is. With 3 suspect drives and 2 parity drives, you will lose data if one of the other drives is unable to be read correctly while reconstructing the one you replace. Proceeding blindly is not a good idea, and you asked for help, which is good. You deny us the ability to analyse the problem, which leaves us just as blind as you as to what to do.
May 29, 20188 yr @AJ Ouellet I would test out the new drives, and then add them as UDs. Then you can copy the data from the drives you want to replace to the new drives in parallel. Optionally you can compare checksums to ensure everything copied correctly. Then you can do a new config, excluding the failing disks and including the new ones. Parity will then build. The three old drives you can hang on to as backups. This is a lot faster and handles failure scenarios better IMO.
May 29, 20188 yr Author 6 hours ago, jonathanm said: Without knowing what condition the drives are in, there is no way to know what the safest order of replacement is going to be. You say they are failing, and have smart warnings, but that is a very generic statement. We have no way of knowing what the actual condition is. With 3 suspect drives and 2 parity drives, you will lose data if one of the other drives is unable to be read correctly while reconstructing the one you replace. Proceeding blindly is not a good idea, and you asked for help, which is good. You deny us the ability to analyse the problem, which leaves us just as blind as you as to what to do. Understand. Parity drive has 600+ crc errors. 1 of the data drives has 300+ and the last one has only 1. I'll still try to post the diagnostics later tonight when I'm home with the server.
May 29, 20188 yr 1 hour ago, AJ Ouellet said: Understand. Parity drive has 600+ crc errors. 1 of the data drives has 300+ and the last one has only 1. I'll still try to post the diagnostics later tonight when I'm home with the server. The CRC errors are normally not disk errors - they are caused by failed transfers between disk and controller and are quite often caused by the cable. It's some of the other numbers that are way more interesting when it comes to disk health.
May 29, 20188 yr Author 21 minutes ago, pwm said: The CRC errors are normally not disk errors - they are caused by failed transfers between disk and controller and are quite often caused by the cable. It's some of the other numbers that are way more interesting when it comes to disk health. Two of the drives should be replaced anyways as they have 4+ years power on times. Will post diag asap.
May 29, 20188 yr 5 minutes ago, AJ Ouellet said: Two of the drives should be replaced anyways as they have 4+ years power on times. Not an indicator of health. I have many drives still in perfect working order with twice that many working hours. When you attach your diagnostics zip file, also list your general configuration (MB, HBA, PSU, etc). Many times symptoms that first show up as drive errors can be attributed to other factors, and changing drives can actually make things worse instead of better. Do you have a complete set of backups for all files you don't want to lose?
May 29, 20188 yr Author 10 minutes ago, jonathanm said: Not an indicator of health. I have many drives still in perfect working order with twice that many working hours. When you attach your diagnostics zip file, also list your general configuration (MB, HBA, PSU, etc). Many times symptoms that first show up as drive errors can be attributed to other factors, and changing drives can actually make things worse instead of better. Do you have a complete set of backups for all files you don't want to lose? No backups but it's just media that can be acquired again should the worst happen. Will post diag tonight after kiddos get to bed.
May 29, 20188 yr Author I'd have to open the box to find out what PSU i'm running but I think it was a thermaltake 800w. Here is the diag and screenshot of system specs from unraid info. mediaserver-smart-20180527-1617.zip
May 30, 20188 yr 53 minutes ago, AJ Ouellet said: I'd have to open the box to find out what PSU i'm running but I think it was a thermaltake 800w. Here is the diag and screenshot of system specs from unraid info. mediaserver-smart-20180527-1617.zip No, that is not the diagnostics. It is just SMART for a single disk. 21 hours ago, trurl said: Tools - Diagnostics, post complete zip
May 30, 20188 yr Author Not sure what I did before, was in a rush. This should have it all. mediaserver-diagnostics-20180529-2118.zip
May 30, 20188 yr 15 hours ago, pwm said: The CRC errors are normally not disk errors - they are caused by failed transfers between disk and controller and are quite often caused by the cable. And if you recently upgraded from some version before 6.4 it may be that you are just now noticing these because the new version is now monitoring them by default. They might be old connection issues that aren't even occurring now. You can acknowledge them and you won't get notified again unless they increase. I don't see any reason to replace any disks. None are disabled and no serious SMART issues. If it ain't broke don't fix it.
May 30, 20188 yr Author 1 hour ago, trurl said: And if you recently upgraded from some version before 6.4 it may be that you are just now noticing these because the new version is now monitoring them by default. They might be old connection issues that aren't even occurring now. You can acknowledge them and you won't get notified again unless they increase. I don't see any reason to replace any disks. None are disabled and no serious SMART issues. If it ain't broke don't fix it. Ok thanks. I'll acknowledge them and see if they come back. I have 1 more port so i'll be able to add another 4tb... Now to find a card to add a few more ports so I can use the rest of the drives.
May 30, 20188 yr 5 hours ago, AJ Ouellet said: Ok thanks. I'll acknowledge them and see if they come back. I have 1 more port so i'll be able to add another 4tb... Now to find a card to add a few more ports so I can use the rest of the drives. I recommend only adding disks as needed to increase capacity. Adding disks just because you have them is only adding more points of failure. Even more so if you have to add a controller just to add the disks. I guess the counter-argument to that is your warranty clock is ticking, but you could just test them really well with preclear or something and then set them aside so they aren't accumulating power-on hours.
Archived
This topic is now archived and is closed to further replies.