Gico Posted March 8, 2019 Share Posted March 8, 2019 (edited) Hi, I made some mess. I had a red X for disk 2. Ran xfs_repair which seemed OK, but still couldn't mount it. Tried: xfs_admin -U generate /dev/sde1 but got: /dev/sde1 : No such file or directory fatal error -- couldn't initialize XFS library I did new config (probably a mistake), preserved all assignments, started the array WITH "parity is ok", but still the disk is unmountable. However I can mount it through UD and browse it's data through the gui, so probably didn't lose the data. How do I resolve this? Attached diagnostics. juno-diagnostics-20190308-2209.zip Edited March 13, 2019 by Gico Quote Link to comment
JorgeB Posted March 8, 2019 Share Posted March 8, 2019 (edited) You didn't use the correct procedures and could lose data, but after the new config disk2 wasn't mounting because of this: Mar 8 21:47:19 Juno kernel: XFS (md2): Filesystem has duplicate UUID 98e2e722-1e0f-4db0-9449-4a444af07a0e - can't mount This means there a clone of that disk already mounted, or the same disk mounted in UD for example, unmount that one and try again, you'll need to do a new config again, since you unassigned the disk since. P.S: all you cache devices have read/write errors: Mar 8 21:32:17 Juno kernel: BTRFS info (device sdo1): bdev /dev/sdo1 errs: wr 251000, rd 264252, flush 27, corrupt 36174, gen 1 Mar 8 21:32:17 Juno kernel: BTRFS info (device sdo1): bdev /dev/sdm1 errs: wr 29454, rd 31202, flush 2, corrupt 0, gen 0 Mar 8 21:32:17 Juno kernel: BTRFS info (device sdo1): bdev /dev/sdr1 errs: wr 194380, rd 205820, flush 85, corrupt 63, gen 58 Mar 8 21:32:17 Juno kernel: BTRFS info (device sdo1): bdev /dev/sdp1 errs: wr 6, rd 1, flush 0, corrupt 7, gen 0 Mar 8 21:32:17 Juno kernel: BTRFS info (device sdo1): bdev /dev/sdq1 errs: wr 44286, rd 45178, flush 14, corrupt 1, gen 0 Probably a record , see here for more info on how clear and monitor for future errors. Edited March 8, 2019 by johnnie.black 1 Quote Link to comment
Gico Posted March 8, 2019 Author Share Posted March 8, 2019 (edited) Unmounted, assigned, new config, and array is fully online, but I ran about a minute of parity check (without correction), and 47 errors were found. I stopped it for now. Is it probably because of the xfs_repair, and I should let the parity check run with corrections? [Edit: Running parity check with corrections] Why didn't the xfs_admin command I ran failed? As for the cache, I'll run a scrub, but I'm planning to replace all the cache drives soon, although I'm not convinced that they are to be blamed for the errors. Edited March 8, 2019 by Gico Quote Link to comment
John_M Posted March 8, 2019 Share Posted March 8, 2019 45 minutes ago, Gico said: Why didn't the xfs_admin command I ran failed? Because it isn't the correct tool to use to fix the problem you had (the red cross next to disk 2). https://wiki.unraid.net/index.php/Troubleshooting#What_do_I_do_if_I_get_a_red_X_next_to_a_hard_disk.3F Quote Link to comment
JorgeB Posted March 9, 2019 Share Posted March 9, 2019 If the disk is mounting correctly and data appears correct just run a correcting check, a few sync errors are expected after what you did, i.e., mounting a disk outside the array. Quote Link to comment
Gico Posted March 9, 2019 Author Share Posted March 9, 2019 That recurring error in my system has returned, during parity check this time, and 2 disks are disabled. So: maintenance mode, xfs_repair, and if all is fine generate new UUIDs, restart and try start the array? juno-diagnostics-20190309-1138.zip Quote Link to comment
JorgeB Posted March 9, 2019 Share Posted March 9, 2019 There appears to be a HBA problem, could also be power/cable related. 41 minutes ago, Gico said: generate new UUIDs There's no need to generate new UUIDs, either rebuild the disks or if you think they are fine and no new data was written to any emulated disk do a new config and resync parity. Quote Link to comment
Gico Posted March 9, 2019 Author Share Posted March 9, 2019 Both disks were not written to with new data. Disk 3 repaired by xfs_repair and seems ok now. Still disabled. Disk 9 dropped from the array (was in "Not installed" status). After reboot it's present. Assigned it to the array and started maintenance mode. "Content emulated" now. Running xfs_repair and it's not ending even an hour later. The last line of the xfs_repair doesn't change: "resetting inode 7732495730 nlinks from 3 to 2" Should I leave it to end? juno-diagnostics-20190309-1730.zip Quote Link to comment
JorgeB Posted March 9, 2019 Share Posted March 9, 2019 If you're going to do a new config not much point in fixing the filesystem on the emulated disks, though like mentioned new config should only be done if you're pretty sure the disks are OK, safer way in these cases is to rebuild to spare disks keeping the old ones intact, and you still need to fix the underlying problem. Quote Link to comment
Gico Posted March 9, 2019 Author Share Posted March 9, 2019 I trust disk3 content, but want to replace and rebuild disk 9. How do I do that? Quote Link to comment
JorgeB Posted March 9, 2019 Share Posted March 9, 2019 Assuming xfs_repair finished and the emulated disk9 is mounting, easiest way is replace disk9 first, when the rebuild is done do a new config to re-enable the other one, then re-sync parity. Quote Link to comment
Gico Posted March 9, 2019 Author Share Posted March 9, 2019 (edited) Emulated disk9 is unmountable. Any other option? juno-diagnostics-20190309-1934.zip Edit: Reconstruct ended, but not mounting, and the reconstructed disk has the same error for xfs_repair. I consulted with someone who told me that I also have a memoty issue. Part of dmesg command output: [ 4160.946042] XFS (md9): Metadata CRC error detected at xfs_agi_read_verify+0x89/0xd2 [xfs], xfs_agi block 0x1ffffffe2 [ 4160.946667] XFS (md9): Unmount and run xfs_repair [ 4160.947018] XFS (md9): First 128 bytes of corrupted metadata buffer: [ 4160.947373] 00000000d50ca9ae: 58 41 47 49 00 00 00 01 00 00 00 04 0f ff ff ff XAGI............ [ 4160.947990] 000000002fa5e4f9: 00 00 0e 80 00 00 00 03 00 00 00 01 00 00 00 e8 ................ [ 4160.948600] 000000000b6b644d: 09 3e 3e 07 ff ff ff ff ff ff ff ff ff ff ff ff .>>............. [ 4160.949217] 00000000d8d5886f: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 4160.949827] 00000000d1ba6cc7: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 4160.950447] 000000007c95e4e3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 4160.951061] 000000009f7ac1b5: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 4160.951672] 0000000068cf2da9: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 4160.952355] XFS (md9): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x1ffffffe2 len 1 error 74 [ 4160.968428] XFS (md9): Metadata CRC error detected at xfs_agi_read_verify+0x89/0xd2 [xfs], xfs_agi block 0x27fffffda [ 4160.969070] XFS (md9): Unmount and run xfs_repair [ 4160.969411] XFS (md9): First 128 bytes of corrupted metadata buffer: [ 4160.969758] 00000000d50ca9ae: 58 41 47 49 00 00 00 01 00 00 00 05 07 54 1e 8e XAGI.........T.. [ 4160.970384] 000000002fa5e4f9: 00 00 0e 80 00 3e 1b 5b 00 00 00 01 00 00 00 8d .....>.[........ [ 4160.971012] 000000000b6b644d: 17 c8 5e a2 ff ff ff ff ff ff ff ff ff ff ff ff ..^............. [ 4160.971625] 00000000d8d5886f: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 4160.972253] 00000000d1ba6cc7: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 4160.972872] 000000007c95e4e3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 4160.973492] 000000009f7ac1b5: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 4160.974163] 0000000068cf2da9: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 4160.974833] XFS (md9): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x27fffffda len 1 error 74 [ 4160.975474] XFS (md9): Error -117 reserving per-AG metadata reserve pool. [ 4160.975481] XFS (md9): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c. Return address = 00000000445d3cbe [ 4160.975490] XFS (md9): Corruption of in-memory data detected. Shutting down filesystem [ 4160.975836] XFS (md9): Please umount the filesystem and rectify the problem(s) Edited March 10, 2019 by Gico Quote Link to comment
JorgeB Posted March 10, 2019 Share Posted March 10, 2019 15 hours ago, Gico said: Edit: Reconstruct ended, but not mounting, and the reconstructed disk has the same error for xfs_repair That's expected, since it will rebuild what is currently on the emulated disk, hope you did it on a new disk. 15 hours ago, Gico said: Emulated disk9 is unmountable. Any other option? Upgrade to v6.7rc which includes a newer xfsprogs, if that still fails best bet is using the old disk, in case it wasn't rebuilt on top. Quote Link to comment
Gico Posted March 10, 2019 Author Share Posted March 10, 2019 16 hours ago, Gico said: Edit: Reconstruct ended, but not mounting, and the reconstructed disk has the same error for xfs_repair. I consulted with someone who told me that I also have a memoרy issue. Part of dmesg command output: .... [ 4160.975490] XFS (md9): Corruption of in-memory data detected. Shutting down filesystem [ 4160.975836] XFS (md9): Please umount the filesystem and rectify the problem(s) Do you also see a memory issue? Probably better to get a new memory stick first, right? Yes I reconstructed disk9 on a new disk Quote Link to comment
JorgeB Posted March 10, 2019 Share Posted March 10, 2019 1 minute ago, Gico said: Do you also see a memory issue? No, and if you're using ECC RAM like your sig indicates there's won't be an issue, with ECC either there's a correctable error and the server continues to work normally or it halts if there's an uncorrectable one. Quote Link to comment
Gico Posted March 10, 2019 Author Share Posted March 10, 2019 (edited) Done nothing yet, booted up the server and although reconstruct ended, something is not right with the disk assignments. Do I need to address this before v6.7rc? New Config? Edit: Disk3 repaired by xfs_repair, and I trust it's content. Edited March 10, 2019 by Gico Quote Link to comment
JorgeB Posted March 10, 2019 Share Posted March 10, 2019 Problem was since emulated disk9 was unmountable Unraid couldn't correctly resize the filesystem during the previous rebuild, so it's still with the old size, but like mentioned no point in rebuilding an unmountable emulated disk, unassign disk9 and start the array, to again use the emulated disk, then upgrade to v6.7 and run xfs_repair on the emulated disk, if xfs_repair still fails no point in rebuilding, best bet is to use the old disk after a new config. Quote Link to comment
Gico Posted March 10, 2019 Author Share Posted March 10, 2019 Newer xfsprogs didn't help. xfs_repair still run endlessly. Maybe mount the disk as UD and try xfs_repair? Quote Link to comment
JorgeB Posted March 10, 2019 Share Posted March 10, 2019 14 minutes ago, Gico said: Maybe mount the disk as UD and try xfs_repair? The original disk might not even need repairing, but check it out using UD. Quote Link to comment
Gico Posted March 10, 2019 Author Share Posted March 10, 2019 (edited) Original disk doesn't mount too. Running xfs_repair on this drive, and it hangs at this line: juno-diagnostics-20190310-1623.zip Edited March 10, 2019 by Gico Quote Link to comment
JorgeB Posted March 10, 2019 Share Posted March 10, 2019 Try using -P, it helped an user with a stuck xfs repair recently, if it doesn't you'd need to ask on the xfs mailing list for further help. Quote Link to comment
Gico Posted March 11, 2019 Author Share Posted March 11, 2019 (edited) Finally it's fixed! The logical disk9 fs is OK now. Thank you! disk3: Had 5.31 TB in use before this event, now 4.65 TB, out of which 199 GB are in lost+found. Disk is disabled & emulated. disk9: Had 4.70 TB in use before this event, now 4.04 TB in use, out of which 1.64 TB are in lost+found. Disk is disabled, emulated, & not installed. How to proceed? lost+found are good files that only "lost" their folder? After this would be done, I will try to see if the original disk9 has anything else to offer with xfs_repair -P. juno-diagnostics-20190311-0309.zip Edited March 11, 2019 by Gico Quote Link to comment
JorgeB Posted March 11, 2019 Share Posted March 11, 2019 6 hours ago, Gico said: How to proceed? You need to rebuild both disks, it can be done at the same time, if rebuilding disk3 on top of the old disk you might want to check the old disk first, it might be less corrupt, you can moun't it with UD but only with the array is stopped or it won't mount. 6 hours ago, Gico said: lost+found are good files that only "lost" their folder? Usually yes, you'll need to go through them and get them sorted. Quote Link to comment
Gico Posted March 12, 2019 Author Share Posted March 12, 2019 (edited) The old (physical) disk3 fs seems to be in a perfect shape: Mounted successfully through UD, Used/Free capacity is as before this event, no lost+found folder, and I can browse it through Unraid gui. I wrote the earlier in this thread, and the FS had a perfect result in xfs_repair, so I didn't understand why I had problems later. On 3/10/2019 at 12:36 PM, Gico said: Edit: Disk3 repaired by xfs_repair, and I trust it's content. So I will leave disk3 unassigned, assign the replacement disk9, rebuild it from the xfs_repaired logical disk9, then assign original disk3, new config and rebuild parity. Correct? Later I will check if original disk9 is in a better state. Edit: Why can't I mount the original disk9 through UD when the array is stopped? It says: "Mar 12 22:17:17 Juno unassigned.devices: Mount of '/dev/sdu1' failed. Error message: mount: /mnt/disks/WDC_WD6002FRYZ-01WD5B0_K1GGUUVD: /dev/sdu1 already mounted or mount point busy." Edited March 12, 2019 by Gico Quote Link to comment
JorgeB Posted March 13, 2019 Share Posted March 13, 2019 11 hours ago, Gico said: So I will leave disk3 unassigned, assign the replacement disk9, rebuild it from the xfs_repaired logical disk9, then assign original disk3, new config and rebuild parity. Correct? Yes. 11 hours ago, Gico said: Edit: Why can't I mount the original disk9 through UD when the array is stopped? It says: "Mar 12 22:17:17 Juno unassigned.devices: Mount of '/dev/sdu1' failed. Error message: mount: /mnt/disks/WDC_WD6002FRYZ-01WD5B0_K1GGUUVD: /dev/sdu1 already mounted or mount point busy." Try rebooting. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.