ceyo14 Posted June 1, 2020 Share Posted June 1, 2020 I have a new (to me, good) 8TB drive that I want to use for parity, and the parity drive to use as a data drive... I did have an issue with the parity drive but I believe it was a cable or something as the server was bumped and haven't seen any more issues... well I finally got the drive and started having issues with a 4TB drive (MB4000GCWDC_S1Z1GTFQ) it would show as unmountable: no file system, started playing around with the cables, had a few spares, not too conviced it was the cables or whatever... I did try a disk check on another machine and it worked great... so eventually it did pick up the partition and stayed mounted after running one of the xfs_repair commands... I removed the parity and put the new drive in parity sync started giving me errors..... put back old drive and it was considered new, did the newconfig and checked parity is valid and got it back to how it was, I had a spare 3TB drive and decided to change the 4TB drive and started to free up space on the 4TB to try and replace with the 3TB... but I actually started having issues with the other 4TB drive and the other 8TB Data drive already installed... I have not been able to fix this as of today, I am stuck I don't know what to do.... I have the following in my server... ParityST8000DM004-2CX188_ZCT04EXV - 8 TB (sdg)*0.0 B/s0.0 B/s0 Disk 1ST5000DM000-1FK178_W4J0AEZW - 5 TB (sdb)*0.0 B/s0.0 B/s0xfs5 TB Disk 2MB4000GCWDC_S1Z1GTFQ - 4 TB (sdf)*61.4 KB/s0.0 B/s0xfs4 TB Disk 3MB4000GDUPB_8433K012F - 4 TB (sde)*0.0 B/s0.0 B/s0xfsUnmountable: No file system Disk 4WDC_WD40EFRX-68WT0N0_WD-WCC4E5UZUJK5 - 4 TB (sdc)*0.0 B/s0.0 B/s0xfs4 TB Disk 5HGST_HUH728080ALE600_2EGLHPRX - 8 TB (sdd)*0.0 B/s0.0 B/s25xfs8 TB I have another 8TB HGST drive and a 3TB Toshiba drive (this is the first one to give errors but has checked out to be good with HD Sentinel and test performed). I do have CrashPlan backups, but I believe I would have to wipe everything and restore as I don't have a way to know what is in what disk... the info on the array has not changed much since I've had issues, if anything its not too important. but the fact that I don't know what is in the drive thats unmountable scares me. any way I can force emulated that drive? IDK, thought and suggestion please. everything I have read and tried has not worked, didn't find secondary block on the drive... nas-diagnostics-20200601-1311.zip Quote Link to comment
JorgeB Posted June 1, 2020 Share Posted June 1, 2020 There are read errors on disk5, and that makes it impossible to correctly emulate disk3 (assuming parity is valid, and I'm not sure it is since I didn't follow all you did before), replace cables on disk5 and post new diags. 1 Quote Link to comment
ceyo14 Posted June 2, 2020 Author Share Posted June 2, 2020 (edited) On 6/1/2020 at 2:10 PM, johnnie.black said: There are read errors on disk5, and that makes it impossible to correctly emulate disk3 (assuming parity is valid, and I'm not sure it is since I didn't follow all you did before), replace cables on disk5 and post new diags. I have replaced the cable a few times, last time it even hard locked the server, rebooted, and it came up but still have read errors, tried changing to the other controller, tried like 4 cables in different ports too... what do you recommend? nas-diagnostics-20200602-1849.zip Edited June 2, 2020 by ceyo14 Quote Link to comment
JorgeB Posted June 3, 2020 Share Posted June 3, 2020 8 hours ago, ceyo14 said: I have replaced the cable a few times, It's even worse on the last diags, still read errors on disk5 and it ended up dropping offline, and there also read errors on disk2. You have some hardware issue, it might be cables/connection, controller/board, power supply, etc, you need to start swapping some hardware around to find out. Quote Link to comment
ceyo14 Posted June 11, 2020 Author Share Posted June 11, 2020 (edited) On 6/3/2020 at 2:52 AM, johnnie.black said: It's even worse on the last diags, still read errors on disk5 and it ended up dropping offline, and there also read errors on disk2. You have some hardware issue, it might be cables/connection, controller/board, power supply, etc, you need to start swapping some hardware around to find out. I played around a bit more with the cables ( I am going to order a full set of cables, too many issues and don't want to keep the same ones...) , and the parity did a full read and repair using HD Sentinel (a bunch were like dark green, maybe weak? but at the end it said they were all good), the other 2 I had most issues with completed some test successfully... and was finally able to start the array... I am going to perform a read check now after I reinstall all memory and run a memtest. (had a bootup issue and only booted after I removed all memory and once I installed one in, it booted I left it like that, only 4gb right now) and will update later. Update: memtest already finished, all good there. Will start Read Check now and update later. Diags as of now. nas-diagnostics-20200611-1533.zip Edited June 11, 2020 by ceyo14 Quote Link to comment
ceyo14 Posted June 12, 2020 Author Share Posted June 12, 2020 @johnnie.black I am 60.5% in to the parity check but I am at 668960 parity errors, 1182 read or drive errors in the Parity Drive, and from 8 pending sectors (from 176 before they were "fixed") it is now reporting 104 pending with 2100 reported uncorrect and 24 reallocated sectors... Should I just remove it and start the parity on the other good 8TB drive? all other drives have not given me any more issues... Quote Link to comment
JorgeB Posted June 12, 2020 Share Posted June 12, 2020 6 minutes ago, ceyo14 said: Should I just remove it and start the parity on the other good 8TB drive? If all the other drives are OK sounds like a good idea. 1 Quote Link to comment
ceyo14 Posted June 12, 2020 Author Share Posted June 12, 2020 30 minutes ago, johnnie.black said: If all the other drives are OK sounds like a good idea. ok, it looks like it... I'll try that and update later, Thanks for all your help! Quote Link to comment
ceyo14 Posted June 13, 2020 Author Share Posted June 13, 2020 OK, Parity Rebuild finished successfully, everything is good now, thanks for the help, will be changing some cables and changing a sata power splitter too just in case. afraid to even touch it now.... but I do have some upgrades waiting for it... Quote Link to comment
ceyo14 Posted June 21, 2020 Author Share Posted June 21, 2020 I wanted to follow up and get your opinion... I just finished a Parity Check and It found 8345 errors, should I rerun parity and fix the errors or what should I do? I haven't changed anything physically since the last post about the parity write finishing. nas-diagnostics-20200620-2013.zip Quote Link to comment
ceyo14 Posted June 21, 2020 Author Share Posted June 21, 2020 (edited) server seems a bit weird, I had an error 500 on the web gui and did a reboot command using SSH and it ignored it, had to force reboot... Should I just make another post? nas-diagnostics-20200620-2316.zip Edited June 21, 2020 by ceyo14 Quote Link to comment
JorgeB Posted June 21, 2020 Share Posted June 21, 2020 9 hours ago, ceyo14 said: I just finished a Parity Check and It found 8345 errors A few errors are normal, even expected after this: Jun 19 21:17:28 NAS emhttpd: unclean shutdown detected Run a correcting check. Quote Link to comment
ceyo14 Posted June 28, 2020 Author Share Posted June 28, 2020 I had a VM freeze unraid again and had to force reboot, it did another Parity check and found 7123 errors... I would have expected thing to settle after all this... Drives have not reported any more issues.... nas-diagnostics-20200628-0944.zip Quote Link to comment
JorgeB Posted June 29, 2020 Share Posted June 29, 2020 17 hours ago, ceyo14 said: had to force reboot, it did another Parity check and found 7123 errors... A few sync errors are normal after an unclean shutdown. Quote Link to comment
ceyo14 Posted July 28, 2020 Author Share Posted July 28, 2020 On 6/29/2020 at 3:05 AM, johnnie.black said: A few sync errors are normal after an unclean shutdown. I haven't stopped having issues with the server. I left it off for the past week waiting for an HBA card. I have just installed it, now everything comes up and seems to work normally, except I have to restart array after boot up for it to actually work. Xorg was also the culprit of hanging my reboots. so now everything shows up but upon boot I see all disk and array "started" but nothing boots, unassigned disks still shows all disk as available to be mounted and on the disks area all disks show hundreds of thousands of errors.... again restart array and everything returns to normal... all disk perform as expected. I also checked USB Drive, chkdsk is good, read check all good.... Diagnostic 2242 is latest, was a good reboot (after killing Xorg) and restarted array for it to work. Diagnostic 2150 was before and LOG showed as full... so I pulled it too... nas-diagnostics-20200727-2242.zip nas-diagnostics-20200727-2150.zip Quote Link to comment
ceyo14 Posted July 28, 2020 Author Share Posted July 28, 2020 ok, so I started a Parity check and while it was running tried starting a VM and it started giving errors on all disks... I need to check passthrough devices, make sure a VM is not trying to use this raid card.... Quote Link to comment
ceyo14 Posted July 28, 2020 Author Share Posted July 28, 2020 ok, had a though time to correct the VM as I also had issues with a Memory DIMM, ran Memtest and it froze, checked IPMI DIMM C2 was giving problems, reseated and Memtest ran without issues... was able to actually rename vm image and change config as it was trying to pass through the raid card when started. removed from there... Everything seems normal now. at last... running parity check and ran SMART tests on all disks, after parity check check all disks filesystem and rerun parity check to see if its all good now. 1 Quote Link to comment
ceyo14 Posted July 29, 2020 Author Share Posted July 29, 2020 OMG still having issues, but at least it is no longer the drives or anything with them.... I was kinda hammering the server, was running Parity check, did extended smart test on all drives and was running PhotoPrism which was hammering the CPUs and disks reindexing everything and another memory module gave errors.... Reseated and no more issues... I will just let the parity check run first, then 24 hour memtest if I don't lose power from the storm heading to PR... I think this can be closed... I don't expect more issues with the drives... just keep checking hardware. nas-diagnostics-20200728-2121.zip Quote Link to comment
ceyo14 Posted July 29, 2020 Author Share Posted July 29, 2020 just to clarify, I ran MemTest and even though it didn't have errors there, there was a DIMM that was throughing errors on the event log. so I pulled it out. sad day. it was a 16gb 1866 module.... Quote Link to comment
ceyo14 Posted August 12, 2020 Author Share Posted August 12, 2020 Just to leave this with closure, I am now happily running normally with all my drives and data intact, I have completed clean 0 error parity checks with everything that was changed and am happy with results, I now trust my server again, Thanks a lot @johnnie.black for your input and recommendations. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.