PZ303 Posted January 25, 2023 Share Posted January 25, 2023 Hi everyone, I had a drive fail and i'm trying to replace it. However, data rebuild essentially stops and i've had 4 failed attempts at the rebuild. Each time it stops, always at a different point, it says the rebuild is < 1mbs and the array shows no activity at all on any drive. In this state, the rebuild won't pause or cancel, nor will the system shut down or reboot. Pulling the cable is the only option. In my last attempt, I was booted in safe mode with the array in maintenance mode and still had the same result. I read a suggestion on another thread that 'new config' might fix the problem, but considering the failed drive and it being emulated, i would think that would result in data loss. My diag is attached, and i'm open to any advice. Thanks! diagnostics-20230125-1403.zip Quote Link to comment
trurl Posted January 25, 2023 Share Posted January 25, 2023 9 minutes ago, PZ303 said: I read a suggestion on another thread that 'new config' might fix the problem, but considering the failed drive and it being emulated, i would think that would result in data loss. New Config would make it impossible to rebuild. You misunderstood that other thread, or it doesn't apply to your situation, or someone is giving bad advice. Quote Link to comment
PZ303 Posted January 25, 2023 Author Share Posted January 25, 2023 @trurl that's what i thought. The other thread had some more nuance to it where it wasn't a replacement and rebuild, so that's what i thought. Quote Link to comment
trurl Posted January 25, 2023 Share Posted January 25, 2023 Jan 25 11:55:21 SilentCricket kernel: md: recovery thread: multiple disk errors, sector=5291262368 Disable Autostart in Disk Settings. Shutdown, check all disk connections, SATA and power, both ends, including splitters. Unassign disk1, start the array in normal (not maintenance mode) and post new diagnostics. Quote Link to comment
PZ303 Posted January 25, 2023 Author Share Posted January 25, 2023 Thanks for the quick reply, @trurl. This is in a racked array with a backplane and I did a shuffle of which bay the disks live in. Disk 1 is unassigned now and that was the replacement disk. The previous occupant of disk 1 threw tons of errors. New diag attached. diagnostics-20230125-1643.zip Quote Link to comment
trurl Posted January 25, 2023 Share Posted January 25, 2023 2 minutes ago, PZ303 said: The previous occupant of disk 1 threw tons of errors Do you still have that disk? Possibly nothing wrong with it. Hang on to it in case we need its contents. Quote Link to comment
trurl Posted January 25, 2023 Share Posted January 25, 2023 Do any of your disks have SMART warnings on the Dashboard page? (save me the trouble of examining every SMART report). Quote Link to comment
PZ303 Posted January 25, 2023 Author Share Posted January 25, 2023 I have the original disk. I just mounted it, and it's replacement outside the array to SMART test it and it's erroring pretty badly. It's running an extended SMART now. I ran an extend SMART on all drive yesterday to trouble shoot and no issues. Including the new disk. Quote Link to comment
trurl Posted January 25, 2023 Share Posted January 25, 2023 Emulated disk1 mounted and shows plenty of data. Didn't see much in that previous syslog except that one line. Makes me wonder if the controller is dead or needs reseating. Usually you would see multiple lines showing the multiple disks involved. 06:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02) Subsystem: Super Micro Computer Inc AOC-S3008L-L8e [15d9:0808] Kernel driver in use: mpt3sas Kernel modules: mpt3sas Quote Link to comment
trurl Posted January 25, 2023 Share Posted January 25, 2023 Looks like all of your array is on that same controller. Can you read the other disks OK? Quote Link to comment
trurl Posted January 25, 2023 Share Posted January 25, 2023 Other disks must be readable or disk1 couldn't be emulated. Did you check the connections? Quote Link to comment
trurl Posted January 25, 2023 Share Posted January 25, 2023 NVM, I see what happened. md driver crashed right after that line in syslog. Have you done memtest recently? Quote Link to comment
PZ303 Posted January 25, 2023 Author Share Posted January 25, 2023 I just shut down, checked / reseated the LSI (correct there is only 1) and rebooted. But like you mentioned, all other disks have been fine and Disk 1 has moved to 2 or 3 locations on the backplane to validate that wasn't the issue. I did run a memtest as a part of this a few days ago. I only let it do 1 full pass because there were zero errors. And it wasn't the included memtest, i downloaded memtest onto a new USB and ran it that way. Quote Link to comment
PZ303 Posted January 25, 2023 Author Share Posted January 25, 2023 One more thing worth mentioning is that both of these drives, the failed one and it's replacement, are Renewaed drives from Amazon. https://www.amazon.com/gp/product/B0BLJVKQKJ/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 The first one failed Smart the second I installed it but i foolishly continued with it and within a month, the system got completely unstable with it just being present. It';s the one with the SMART errors i posted. The replacement is the warranty replacement, still renewed, but it's passing SMART. It is throwing some errors though. I'm starting to wonder if I just got double unlucky with renewed drives. Quote Link to comment
trurl Posted January 25, 2023 Share Posted January 25, 2023 1 hour ago, trurl said: Do any of your disks have SMART warnings on the Dashboard page? (save me the trouble of examining every SMART report). Quote Link to comment
PZ303 Posted January 26, 2023 Author Share Posted January 26, 2023 Yeah sorry. I hadn't seen these errors and the dashboard shows it as passed smart. It doesn't show the drive as healthy because it shows it as emulated. Are you also thinking it's also just a bad drive? Quote Link to comment
JorgeB Posted January 26, 2023 Share Posted January 26, 2023 Suggest trying with a different Unraid release, like v6.10.3, current kernel might not like your hardware, if the same happens with v6.10.3 then it would suggest a hardware problem. Quote Link to comment
trurl Posted January 26, 2023 Share Posted January 26, 2023 Are you sure you're talking about the Dashboard page? You might have to spinup drives to see the thumbs-up thumbs down indicator. Quote Link to comment
trurl Posted January 26, 2023 Share Posted January 26, 2023 SMART report for all array disks looks fine. Quote Link to comment
PZ303 Posted January 26, 2023 Author Share Posted January 26, 2023 @trurl - correct. All green checks. I was confusing the Emulated orange dot with the green check, but it doesn't report as failed and when i ran the smart test it was, and is still saying passed. Despite that, the drive is getting a lot of Raw read error rates and seek error rates. The other drives all show zero while the number on this drive is climbing. @JorgeB i was on 6.11.5 and it gives me a downgrade path to 6.11.4 so i'll start there. This is an upgraded build with a new 13700k and ASRock Z690 Steel Legend with latest bios to support the new 13gen chips. 128gb of RAM memtested with zero errors on just one pass. These drives are both Seagate Exos from 2018 so i doubt those are not supported. I've ordered other drives to replace these so in a few days i'll be able to tell if i just had 2 bad renewed drives. I guess one question is could there be data corruption causing this issue on rebuild? Quote Link to comment
trurl Posted January 26, 2023 Share Posted January 26, 2023 Just now, PZ303 said: a downgrade path to 6.11.4 You can download the zip of any current release and just replace all the bz* files on flash to get that version. Of course you should backup flash first. Quote Link to comment
PZ303 Posted January 26, 2023 Author Share Posted January 26, 2023 Thanks. I'll try that next. I had seen that option, but the in-gui downgrade was a bit quicker and i'm going to be too busy with work today to mess with the flash. I'll presume the minor version downgrade won't really change things, but i figured why not while i'm busy today. Quote Link to comment
PZ303 Posted January 26, 2023 Author Share Posted January 26, 2023 So i just downgraded to 6.10.3 and the issue persists. The rebuild got unstable and stopped within 5 minutes. I'll wait for the (now third) replacement drive from a different manufacturer and vendor and see if that solves the problem. 2 bad drives, while wildly bad luck, seems to be the most likely scenario here. Quote Link to comment
JorgeB Posted January 27, 2023 Share Posted January 27, 2023 Post new diags from v6.10.3 to see if the Unraid driver is still crashing. Quote Link to comment
PZ303 Posted January 28, 2023 Author Share Posted January 28, 2023 Here is the diag when I try the rebuild on 6.10.3 silentcricket-diagnostics-20230127-2037.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.