August 15, 20232 yr Hello, My server was running a parity check last night and I got a disk error. Disk1 has errored on me before and I've replaced it with the current disk which continued to have intermittent failures (it's been disabled due to errors it seems like once every couple of months and failed about 2 to 3 times now on the replaced disk.). I know I need to update my hardware a bit (CPU/Mobo/Ram) because what I'm running is ancient but I have a couple of questions about my current situation. 1. Is it ok to rebuild onto disk1 based on the attached logs? 2. Since I've replaced the sata connection on disk 1 and replaced the drive itself I wonder if it's not an issue with that port failing on my mobo. If this is the case I realize the long term fix would be to replace cpu/mobo/ram, but in preperation for that I wonder if it's a good idea to consolidate disks to the point of not needing to use the sata port that disk1 is connected to? as you can see I have several 1TB drives I could roll up into 3TB drives and reduce the total number of connected drives so that when I replace the mobo I don't have to carry over as many drives and I can have an extra sata port to switch disks to a newer FS. Something like this: Or I could update the parity to a 4TB disk and then buy another 4TB data disk. Thanks for the help, it's pretty remarkable I've been running this machine for over 13 years now with little issues/maintenance until now. tower-diagnostics-20230815-0836.zip Edited August 15, 20232 yr by cobolstinks
August 15, 20232 yr Community Expert It's showing issues with multiple disks, so possible a problem with the onboard SATA controller, could also be a PSU issue, since disk1 dropped offline power cycle the server and post new diags after array start.
August 15, 20232 yr Author 2 hours ago, JorgeB said: It's showing issues with multiple disks, so possible a problem with the onboard SATA controller, could also be a PSU issue, since disk1 dropped offline power cycle the server and post new diags after array start. ok rebooted and started here is the updated diagnostic report. Where are you seeing errors on the other disks? If I poke around in the syslog all I see are I/O errors on disk1. Things like: Aug 14 15:36:12 Tower kernel: md: disk1 write error, sector=3452280880 and Aug 14 15:12:48 Tower kernel: md: disk1 read error, sector=3452277416 tower-diagnostics-20230815-1004.zip Edited August 15, 20232 yr by cobolstinks
August 15, 20232 yr Community Expert Disk1 didn't come back online, reboot might not do it 5 hours ago, JorgeB said: power cycle the server 4 hours ago, cobolstinks said: Where are you seeing errors on the other disks? I should have been clearer, I saw ATA errors for two disks, but for one of them there were very few, so they didn't actually result in disk errors, but it's still a sign that something is wrong.
August 15, 20232 yr Author 1 hour ago, JorgeB said: Disk1 didn't come back online, reboot might not do it I should have been clearer, I saw ATA errors for two disks, but for one of them there were very few, so they didn't actually result in disk errors, but it's still a sign that something is wrong. I'm sorry to be so dense, are you asking me to add disk 1 back into the array and then start it and post logs?
August 16, 20232 yr Community Expert Power off the server (don't just reboot), wait 10 secs, power back on, post new diags.
August 16, 20232 yr Author 2 hours ago, JorgeB said: Power off the server (don't just reboot), wait 10 secs, power back on, post new diags. I had it powered off overnight. Booted up this AM, I haven't started the array but here are the updated diags. tower-diagnostics-20230816-0748.zip
August 16, 20232 yr Community Expert Was this disk1? Device Model: MD3000GSA6472E 1 Serial Number: P8H1R6XR
August 16, 20232 yr Author 10 minutes ago, JorgeB said: Was this disk1? Device Model: MD3000GSA6472E 1 Serial Number: P8H1R6XR yes
August 16, 20232 yr Community Expert That disk has a failing now SMART attribute, though it's not one of the worst ones I would recommend replacing with a new one, if you don't have a spare or prefer not to replace it now, and assuming the emulated disk1 is still mounting, you can rebuild on top, replace cables before to rule that out.
August 16, 20232 yr Author 15 minutes ago, JorgeB said: That disk has a failing now SMART attribute, though it's not one of the worst ones I would recommend replacing with a new one, if you don't have a spare or prefer not to replace it now, and assuming the emulated disk1 is still mounting, you can rebuild on top, replace cables before to rule that out. Ok thanks for that information. I will look at replacing the disk. One last question. This server is aging and I know I need to do several updates. 1. I need to change the file system from RFS to XFS on all my data disks. 2. I need to update the hardware. I'm noticing slowness when mounting the shares, the dashboard shows my CPU ram maxing out at times, and several of my disks run hot. So I'm looking to update the mobo/cpu/ram in the near future. How would you recommend proceeding with these upgrades in mind? I'm considering buying 3 4TB HDs 1 parity, 2 data disks and consolidate my server from a 6 disk array down to 3. I currently have a 3tb parity disk... Should I try to rebuild ontop the existing disk and if it works update the parity disk to a 4TB and then consolidate on the old mobo/cpu/ram? Or should I just update the mobo/cpu/ram and rebuild on that and then update the disks and then do the FS update? Edited August 16, 20232 yr by cobolstinks
August 16, 20232 yr Author I tried to rebuild on disk1 and it disabled the disk again. It says it's doing a read check, should I stop it and power it off? I ordered new 4tb disks and one 3tb disk. I will rebuild disk1 on the 3tb disk and then shuffle the disks again once I have a 4tb parity disk in place. tower-diagnostics-20230816-1040.zip
August 16, 20232 yr Author 1 hour ago, JorgeB said: Did you swap cables as mentioned? I replaced the cable a month ago when this same issue with disk1 popped up.
August 16, 20232 yr Community Expert In that case it really may be a bad disk, so you need to replace it.
August 16, 20232 yr Author 50 minutes ago, JorgeB said: In that case it really may be a bad disk, so you need to replace it. Thanks that's what I'm hoping. I have 1 - 3TB and 4 - 4TB disks enroute. This is what I plan to do. 1. rebuild disk1 on new 3TB disk. 2. run parity check 3. upgrade parity disk to 4TB 4. replace existing an existing 1TB data disk with a new 4TB disk. 5. copy as much data as possible onto new 4TB disk. 6. replace another 1TB disk with 4TB disk. 7. copy remaining data from other disks onto it. 8 remove old empty data disks. 9 perform xfs upgrade per discussion below. 10 update cpu/mobo/ram if needed. Does this sound about right?
August 20, 20232 yr Author Thanks for all your help! I'm making some progress but have another setback. I was able to replace the 3TB disk1 that was reporting errors with a new 3TB disk. I then upgraded my parity disk to a new 4TB disk. I then started to replace my existing smaller data disks with other 4TB disks but I keep getting errors on disk1 reported while attempting to rebuild disk3 onto a newer larger disk. I'm pretty frustrated with this and am tempted to purchase a new mobo/cpu/ram and just try to do the rebuild on it. Is there something else I should try before just building a new server? Is there anyway to confirm the issue is the onboard sata controller? I've replaced the cable and this is now the 2nd disk that is consistently reporting errors as disk1. I'm thinking of purchasing this as my next server https://newegg.io/00a2a3d Thanks! tower-diagnostics-20230820-1240.zip
August 20, 20232 yr Community Expert Solution If you haven't wet also replace (or swap) the power cable, those Nvidia SATA controllers are pretty old by now, maybe there's a problem with that port, if you can possibly best to just update the hardware first.
August 20, 20232 yr Author 1 hour ago, JorgeB said: If you haven't wet also replace (or swap) the power cable, those Nvidia SATA controllers are pretty old by now, maybe there's a problem with that port, if you can possibly best to just update the hardware first. Sounds good. Any feedback on the wish list? https://newegg.io/00a2a3d I want something that is a little future proof as I don't update hardware frequently ;). I'd like to run a few containers (vpn / torrenting and try streaming 4k with plex). I'd like the potential of running a VM in the future if need be. Anything stick out with that wish list?
August 21, 20232 yr Community Expert I'm more familiar with server hardware, but that looks good to me.
August 21, 20232 yr Author I tried to check the server again this AM and now it's not seeing another 3 drives. I checked in the bios and the bios isn't seeing the missing drives either so I think maybe this is a flaky sata controller. I ordered the parts on my list, can't wait to get my new system built.
September 5, 20232 yr Author Just wanted to close the loop on this. I've been up on new hardware for about a week now. No issues. I was able to swamp out the drives and reformat to xfs without loosing any data. I'm using one of the drives that kept getting errors on the old hardware. I think it must have been my controller or something on the old system but not the disks themselves. The new hardware is running so much quitter than my old system and seems to draw less than 1/2 the power. Pretty happy with unraid on the new hardware. Thanks for the help before.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.