xoC Posted August 15, 2023 Share Posted August 15, 2023 Hello, I have 2 disk which have failed. I'm kind of lost about what to do as all the usual links to the FAQ are broken. I attached my diagnostics. Even when I unselect both disk (empty) and starts the array, it is stuck and also show "mounting" on disk 2. It does the same thing after trying to rebuild the array. Thanks in advance. nastorm-diagnostics-20230815-1658.zip Quote Link to comment
xoC Posted August 15, 2023 Author Share Posted August 15, 2023 To add another picture, here is what happens when both disk are disconnected in hardware, I can't even start the array to access the contents (emulated) as it is stuck on "Mouting disks..." Quote Link to comment
JorgeB Posted August 15, 2023 Share Posted August 15, 2023 Please post the diagnostics, if you cannot get them get at least the syslog: cp /var/log/syslog /boot/syslog.txt 1 Quote Link to comment
xoC Posted August 15, 2023 Author Share Posted August 15, 2023 Thanks for your quick answer. I thought the zip I posted in the first post was the diagnostics ? nastorm-diagnostics-20230815-1658.zip Quote Link to comment
JorgeB Posted August 16, 2023 Share Posted August 16, 2023 12 hours ago, xoC said: I thought the zip I posted in the first post was the diagnostics ? You did, sorry, missed it, check filesystem on disk2 (run it without -n). Quote Link to comment
xoC Posted August 16, 2023 Author Share Posted August 16, 2023 (edited) So here it is for disk1, It had many errors (CRC) and yesterday I did a run with -n and then with no -n. Today it says : Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... sb_fdblocks 121721460, counted 125133027 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 3 - agno = 1 - agno = 2 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. Now for disk2 : did the same yesterday, loooot of errors with -n, tried with no -n and it didn't complete but I don't remember the error. Disk2 today's filesystem check is attached in txt format because way too long. disk 2.txt Edited August 16, 2023 by xoC Quote Link to comment
JorgeB Posted August 16, 2023 Share Posted August 16, 2023 Try again disk2 without -n and post he output. Quote Link to comment
xoC Posted August 16, 2023 Author Share Posted August 16, 2023 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Should I try with -L ? Quote Link to comment
xoC Posted August 16, 2023 Author Share Posted August 16, 2023 (edited) Here it is. disk2 -L.txt Edited August 16, 2023 by xoC Quote Link to comment
Solution JorgeB Posted August 16, 2023 Solution Share Posted August 16, 2023 It should mount now, if it does look for a lost+found folder. Quote Link to comment
xoC Posted August 16, 2023 Author Share Posted August 16, 2023 (edited) Thanks. I started the array It did mount indeed ! It directly started a re-build. Do you think one or both disk are failing and should be replaced ? Since I'm 2 down, the array is currently unprotected, I maybe should not try to rebuild if a disk is in an unproper shape. edit : no lost+found folder on either disk. Edited August 16, 2023 by xoC Quote Link to comment
JorgeB Posted August 16, 2023 Share Posted August 16, 2023 Two disks getting disabled at the same time is usually not a disk problem, and SMART looks good for both, if emulated content for both looks correct you can rebuild on top, possibly good idea to check/replace cables before doing to rule that out, and if it happens again save the diags before rebooting. Quote Link to comment
xoC Posted August 16, 2023 Author Share Posted August 16, 2023 Ok, noted ! Thanks a lot for your quick help 1 Quote Link to comment
xoC Posted September 1, 2023 Author Share Posted September 1, 2023 Hello back, it's been a nightmare since then, everytime the parity sync runs, it finishes correctly, and then, one or two disk get disabled immediately. I shut the server off one week ago because I had no time to investigate. Yesterday, Parity 1 and Disk 1 were disabled. I changed sata cables for parity 1 parity 2 and disk 1 and ran a rebuild. It completed this night and seen the logs, it disabled the disk 1 and disk 2, 20 min later. Is it ok to continue on this topic or should I better open a non resolved topic ? Attached are the diagnostic, server was not powered down since then. nastorm-diagnostics-20230901-0912.zip Quote Link to comment
itimpi Posted September 1, 2023 Share Posted September 1, 2023 I do not see the parity sync completing in the diagnostics. Instead I see you getting continual resets until some disks that look like disk1 and disk2 drop offline which means that there is no SMART information for those drives in the diagnostics that we can check. You mention the SATA cabling - have you also checked the power cabling? Are your sure your PSU can handle the load when all drives are active? Have you tried running an extended SMART test on disk1 and disk2? Quote Link to comment
xoC Posted September 1, 2023 Author Share Posted September 1, 2023 Thanks for your answer. In the notification archives I found that where it said parity finished but with lots of errors. For the power, that's a good question. The Server is running with the same HW (except a few disk swaps since then) for at least two years, but the power supply is not young. IIRC it's a 600W one. My server is connected to a 650W UPS and it doesn't show that much load. Do you think 650W is too low though for 8 disks + 2 SSD, no GPU or pcie extension cards ? I'm remote for the server today, and it said extended smart disks need no spin down delay. Everytime I apply "never" and come back, it's back on 15min so it seems I can't run the extended tests remote. Do I need to shut down and restart the array for that option to change ? And the short tests do absolutely nothing. Quote Link to comment
JorgeB Posted September 1, 2023 Share Posted September 1, 2023 Looks more like a power/connection issue with those two disks, do they share something in common, like a power splitter? Quote Link to comment
xoC Posted September 1, 2023 Author Share Posted September 1, 2023 On the power supply connector, from memory, there are 4 or 6 disks in series. I'm gonna check later when I go back to my server. Maybe I can try to swap in their bays disk 1/2 with 3/4 and then rebuild and check if the reset errors goes to disk 3/4 ? Quote Link to comment
xoC Posted September 1, 2023 Author Share Posted September 1, 2023 (edited) Is it risky to rebuild with read error problems ? Can it write "wrong" data to my disk 1 & 2 by trying to rebuild but having lot of read errors ? Edited September 1, 2023 by xoC Quote Link to comment
JorgeB Posted September 1, 2023 Share Posted September 1, 2023 Any read errors beyond what parity can emulated will make Unraid skip those sectors during the rebuild, so for a disk being rebuilt on top it will keep the old data, which should be correct, for a new spare disk it will leave whatever was there. Quote Link to comment
xoC Posted September 1, 2023 Author Share Posted September 1, 2023 Thanks for the info. Quote Link to comment
xoC Posted September 5, 2023 Author Share Posted September 5, 2023 (edited) So I unplugged my two cache disks and put their power connector on disk1 & disk2. Parity 1, 2 and disk 1, 2 were on the same power line, in series. I tried to mount disk 1 : unmountable, no valid superblock > please use xfs_repair disk 2 : mountable. For some unknown reasons, I ran again xfs_repair from the GUI and both didn't found issues. I ran xfs_repair -n /dev/sde1 and there were error, I tried without -n it said to use -L. I used with -L and now the disk is mountable. It seems the GUI repair didn't fix anything even if it said so, and the command line repair actually worked. I'm right now in maintenance mode and rebuilding the array, I will see what the outcome is in a few hours. Edited September 5, 2023 by xoC Quote Link to comment
JorgeB Posted September 5, 2023 Share Posted September 5, 2023 1 minute ago, xoC said: I ran xfs_repair -n /dev/sde1 This will not update parity, you should run a correcting check. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.