thebedivere Posted August 8, 2022 Author Share Posted August 8, 2022 Ok, checked disk 7. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 1 - agno = 3 - agno = 0 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. Ran again without the -n flag and attached the diagnostics. orthanc-diagnostics-20220808-1844.zip Quote Link to comment
thebedivere Posted August 8, 2022 Author Share Posted August 8, 2022 Disk 7 still shows up as unmountable. Quote Link to comment
trurl Posted August 8, 2022 Share Posted August 8, 2022 37 minutes ago, thebedivere said: Ran again without the -n flag Did you capture the output so you could post it? Quote Link to comment
trurl Posted August 8, 2022 Share Posted August 8, 2022 20 hours ago, trurl said: lots of syslog entries about Unassigned Device sdh Still filling syslog, and I almost forgot why since I look at a lot of threads and a lot of diagnostics. Currently using 25% of log space, it will eventually fill your log space unless you reboot before then 10 hours ago, trurl said: You should remove it if for no other reason than it is cluttering your syslog and we will wonder about it next time. Quote Link to comment
trurl Posted August 8, 2022 Share Posted August 8, 2022 Since (emulated) disk6 is mountable now you could rebuild it. Quote Link to comment
trurl Posted August 8, 2022 Share Posted August 8, 2022 Just now, trurl said: Since (emulated) disk6 is mountable now you could rebuild it. Or you could take a look at physical disk6 using Unassigned Devices to see if it has corruption, or if repair would turn out better than the 2.4GB lost+found you will get from the rebuild. Quote Link to comment
thebedivere Posted August 8, 2022 Author Share Posted August 8, 2022 36 minutes ago, trurl said: Did you capture the output so you could post it? Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... Metadata CRC error detected at 0x439496, xfs_agf block 0x1ffffffe1/0x200 agf has bad CRC for ag 4 block (2,184178199-184178280) multiply claimed by bno space tree, state - 1 block (2,226384586-226384833) multiply claimed by cnt space tree, state - 2 block (2,123204880-123205051) multiply claimed by cnt space tree, state - 2 block (2,40225617-40225622) multiply claimed by cnt space tree, state - 2 block (2,40225773-40225859) multiply claimed by cnt space tree, state - 2 block (2,102377537-102377552) multiply claimed by cnt space tree, state - 2 block (2,36413631-36413735) multiply claimed by cnt space tree, state - 2 block (2,61136756-61136865) multiply claimed by cnt space tree, state - 2 block (2,174373505-174373514) multiply claimed by cnt space tree, state - 2 block (2,9748180-9748193) multiply claimed by cnt space tree, state - 2 block (2,9748274-9748287) multiply claimed by cnt space tree, state - 2 block (2,104799337-104799352) multiply claimed by cnt space tree, state - 2 block (2,123180332-123180475) multiply claimed by cnt space tree, state - 2 block (2,162984111-162984288) multiply claimed by cnt space tree, state - 2 block (2,60006119-60006216) multiply claimed by cnt space tree, state - 2 block (2,102524589-102524613) multiply claimed by cnt space tree, state - 2 block (2,40224317-40224326) multiply claimed by cnt space tree, state - 2 block (2,40224419-40224432) multiply claimed by cnt space tree, state - 2 block (2,40224497-40224575) multiply claimed by cnt space tree, state - 2 block (2,107092002-107092006) multiply claimed by cnt space tree, state - 2 block (2,236311683-236311689) multiply claimed by cnt space tree, state - 2 block (2,107087496-107087499) multiply claimed by cnt space tree, state - 2 block (2,48692963-48693032) multiply claimed by cnt space tree, state - 2 block (2,106727218-106727508) multiply claimed by cnt space tree, state - 2 block (2,104837401-104837680) multiply claimed by cnt space tree, state - 2 block (2,9514183-9514243) multiply claimed by cnt space tree, state - 2 block (2,48042435-48042473) multiply claimed by cnt space tree, state - 2 block (2,40238304-40238370) multiply claimed by cnt space tree, state - 2 block (2,162968554-162968627) multiply claimed by cnt space tree, state - 2 block (2,47965621-47965668) multiply claimed by cnt space tree, state - 2 block (2,178636310-178636383) multiply claimed by cnt space tree, state - 2 block (2,102523728-102523738) multiply claimed by cnt space tree, state - 2 block (2,19627766-19627977) multiply claimed by cnt space tree, state - 2 block (2,95752502-95752512) multiply claimed by cnt space tree, state - 2 block (2,127294648-127294666) multiply claimed by cnt space tree, state - 2 agf_freeblks 2038004, counted 2048802 in ag 2 agf_freeblks 2116416, counted 2106811 in ag 4 agi unlinked bucket 24 is 344682520 in ag 1 (inode=2492166168) sb_fdblocks 793886, counted 11056513 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 data fork in ino 1841992 claims free block 722810940 data fork in ino 1841992 claims free block 722818433 data fork in ino 1841992 claims free block 722766205 data fork in ino 1841992 claims free block 722745607 - agno = 1 data fork in ino 2358538759 claims free block 721347041 data fork in ino 2358538759 claims free block 721149027 - agno = 2 data fork in ino 4296520813 claims free block 556498438 data fork in ino 4296520813 claims free block 578099252 data fork in ino 4296520813 claims free block 773182227 data fork in ino 4296520813 claims free block 773182467 data fork in ino 4296520817 claims free block 573284307 data fork in ino 4296520817 claims free block 639394219 data fork in ino 4296520817 claims free block 639395162 data fork in ino 4296733237 claims free block 616345471 data fork in ino 4296733237 claims free block 639394507 data fork in ino 4296733237 claims free block 639395362 data fork in ino 4296739755 claims free block 641669917 data fork in ino 4296739755 claims free block 699854846 data fork in ino 4296739755 claims free block 699839130 data fork in ino 4296739755 claims free block 632622970 data fork in ino 4296739755 claims free block 632630486 data fork in ino 4296739755 claims free block 643598034 data fork in ino 4296739755 claims free block 632630142 data fork in ino 4296739755 claims free block 585572404 data fork in ino 4296739755 claims free block 585563566 data fork in ino 4296739755 claims free block 585563626 data fork in ino 4296744535 claims free block 711244085 data fork in ino 4296744537 claims free block 578099040 data fork in ino 4296744541 claims free block 596876767 data fork in ino 4296744541 claims free block 763252191 data fork in ino 4296744541 claims free block 763255439 data fork in ino 4296744541 claims free block 763255426 data fork in ino 4296744541 claims free block 763255442 data fork in ino 4296744541 claims free block 641708205 data fork in ino 4296744541 claims free block 584836169 data fork in ino 4296744541 claims free block 643965222 data fork in ino 4296744541 claims free block 643962849 data fork in ino 4296744541 claims free block 643962780 data fork in ino 4296744541 claims free block 643958036 data fork in ino 4296744541 claims free block 643962552 data fork in ino 4296744541 claims free block 639248129 data fork in ino 4296744541 claims free block 715506878 data fork in ino 4296744541 claims free block 660075640 data fork in ino 4296744541 claims free block 660051040 data fork in ino 4296744541 claims free block 584913266 data fork in ino 4296744541 claims free block 722709888 data fork in ino 4296744541 claims free block 722710684 data fork in ino 4296744541 claims free block 584912986 data fork in ino 4296744544 claims free block 546384767 data fork in ino 4296744546 claims free block 546619004 data fork in ino 4296744546 claims free block 546619106 data fork in ino 4296744546 claims free block 546619200 data fork in ino 4296744547 claims free block 577095121 data fork in ino 4296744547 claims free block 577095239 data fork in ino 4296744547 claims free block 577095345 data fork in ino 4296744547 claims free block 577096441 data fork in ino 4296744547 claims free block 577096535 data fork in ino 4296744547 claims free block 577108880 data fork in ino 4296744547 claims free block 585468837 data fork in ino 4296744553 claims free block 546619278 data fork in ino 4296744553 claims free block 598007436 - agno = 3 data fork in ino 6446982893 claims free block 721391881 - agno = 4 data fork in ino 8590740692 claims free block 664165116 - agno = 5 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 0 - agno = 3 - agno = 1 - agno = 5 - agno = 4 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 2492166168, moving to lost+found Phase 7 - verify and correct link counts... done Quote Link to comment
thebedivere Posted August 9, 2022 Author Share Posted August 9, 2022 27 minutes ago, trurl said: Since (emulated) disk6 is mountable now you could rebuild it. Is this the process of stopping the array, unassigning the drive, starting the array, stopping the array, and reassigning the drive to the same slot? Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 16 hours ago, thebedivere said: Is this the process of stopping the array, unassigning the drive, starting the array, stopping the array, and reassigning the drive to the same slot? yes Quote Link to comment
thebedivere Posted August 10, 2022 Author Share Posted August 10, 2022 Rebuild is running, but it's going real slow. Docker and VMs are off. orthanc-diagnostics-20220810-1002.zip Quote Link to comment
JorgeB Posted August 10, 2022 Share Posted August 10, 2022 Disk7 appear to be failing: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 24 Helium_Condition_Upper PO---K 075 043 075 NOW 0 Helium leak? Also check/replace cables on parity2 and disk5. Quote Link to comment
trurl Posted August 10, 2022 Share Posted August 10, 2022 Connection problems on parity2. Also, disk7 seems to think it is failing, maybe the reason you can't repair its filesystem? 6 minutes ago, JorgeB said: Helium leak? That attribute isn't usually monitored. Should it be added to custom attributes for some models? Quote Link to comment
JorgeB Posted August 10, 2022 Share Posted August 10, 2022 5 minutes ago, trurl said: That attribute isn't usually monitored. Should it be added to custom attributes for some models? Don't have enough experience with Helium disks for now to see if it's worth monitoring, it won't hurt, just have a couple of WD using Helium and the attribute is different, in this case and since there's a SMART attribute "failing now" user gets a notification anyway, assuming they are enabled. Quote Link to comment
thebedivere Posted August 10, 2022 Author Share Posted August 10, 2022 (edited) Shut it down and checked the cables on the drives, removed the disk that wasn't in use and had errors. Restarted and things are looking better. Should I be worried about it leaking helium? I'll probably start the warranty process on it, since I doubt it can be recovered if that's the problem. orthanc-diagnostics-20220810-1636.zip Edited August 10, 2022 by thebedivere Quote Link to comment
trurl Posted August 10, 2022 Share Posted August 10, 2022 Still having connection problems on parity2. Ideally, when rebuilding a disk, you should see a lot of Writes to the rebuilding disk, a lot of Reads from all other disks, and zeros in the Errors column. Are you seeing any Errors in the Errors column in Array Devices? Quote Link to comment
thebedivere Posted August 10, 2022 Author Share Posted August 10, 2022 No errors showing up: Quote Link to comment
thebedivere Posted August 10, 2022 Author Share Posted August 10, 2022 15 minutes ago, trurl said: Still having connection problems on parity2. I stopped the array and hot swapped the 2 parity drives into a different drive bay, so they should be running from the sata controller on the mobo now instead of the expansion card. if they don't have problems and the other drives now do, I will need to replace the sata card. orthanc-diagnostics-20220810-1708.zip Quote Link to comment
thebedivere Posted August 10, 2022 Author Share Posted August 10, 2022 19 minutes ago, trurl said: Still having connection problems on parity2. For my own understanding, where do you see this in the diagnostics? Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 20 hours ago, thebedivere said: For my own understanding, where do you see this in the diagnostics? In logs/syslog lots of this Aug 10 14:08:37 Orthanc kernel: ata10.00: failed command: READ FPDMA QUEUED Aug 10 14:08:37 Orthanc kernel: ata10.00: cmd 60/40:38:b8:c8:d4/05:00:00:00:00/40 tag 7 ncq dma 688128 in Aug 10 14:08:37 Orthanc kernel: res 40/00:00:f8:ad:d4/00:00:00:00:00/40 Emask 0x10 (ATA bus error) Aug 10 14:08:37 Orthanc kernel: ata10.00: status: { DRDY } Aug 10 14:08:37 Orthanc kernel: ata10.00: failed command: READ FPDMA QUEUED Aug 10 14:08:37 Orthanc kernel: ata10.00: cmd 60/d8:40:f8:cd:d4/04:00:00:00:00/40 tag 8 ncq dma 634880 in Aug 10 14:08:37 Orthanc kernel: res 40/00:00:f8:ad:d4/00:00:00:00:00/40 Emask 0x10 (ATA bus error) Aug 10 14:08:37 Orthanc kernel: ata10.00: status: { DRDY } Aug 10 14:08:37 Orthanc kernel: ata10: hard resetting link In system/lsscsi.txt [10:0:0:0] disk ATA ST16000NM001G-2K SN04 /dev/sdi /dev/sg8 state=running queue_depth=32 scsi_level=6 type=0 device_blocked=0 timeout=30 dir: /sys/bus/scsi/devices/10:0:0:0 [/sys/devices/pci0000:00/0000:00:01.2/0000:02:00.2/0000:03:04.0/0000:06:00.0/ata10/host10/target10:0:0/10:0:0:0] And smart folder shows which disk is sdi, you can also search for ata10 and sdi in syslog and figure it out That entry in lsscsi also shows controller 06:00.0, which can be seen in system/lspci.txt 06:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215] (rev 11) Subsystem: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215] Quote Link to comment
thebedivere Posted August 11, 2022 Author Share Posted August 11, 2022 orthanc-diagnostics-20220811-1246.zip From what I can tell the drives that are now on the sata card are having read issues. I'm shutting it down and replacing the sata card. Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 ata10 and the ata12s on Marvell, ata2 on another, probably motherboard. Replacing Marvell should be progress even if it doesn't fix everything. Can't remember if it was already mentioned, what are you replacing it with? I skimmed the thread and didn't notice the link we usually give for recommended controllers. Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 Another thing I don't think we have mentioned. Does your system have adequate cooling? Controllers have heatsinks for a reason. Quote Link to comment
thebedivere Posted August 11, 2022 Author Share Posted August 11, 2022 20 minutes ago, trurl said: Can't remember if it was already mentioned, what are you replacing it with? I had grabbed another one off amazon, but even though its a different brand and image it's the exact same controller... So right now I don't have anything to replace it with. Any suggestions? Quote Link to comment
thebedivere Posted August 11, 2022 Author Share Posted August 11, 2022 19 minutes ago, trurl said: Another thing I don't think we have mentioned. Does your system have adequate cooling? Controllers have heatsinks for a reason. I have just the case fans that came with it and a big box fan blowing over the whole server rack. There isn't anything pointing directly at the sata controller. I can try and get another fan in the case pointing at it. Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 43 minutes ago, trurl said: the link we usually give for recommended controllers. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.