Jump to content

johnsanc

Members
  • Posts

    309
  • Joined

  • Last visited

Everything posted by johnsanc

  1. Speed stats are dead even to both Parity drives. I'm hoping this check gets things back in order. I'm just racking my brain as to how the parity could be so incorrect considering I haven't written anything to the array aside from a few hashes from File Integrity plugin and an XFS repair. It feels like my parity should have been mostly correct before and now this correcting check is actually breaking the parity. The only other explanations I can think of are: Previous parity syncs were bad - but that doesn't really make sense since my initial build of Parity #1 was as clean as possible and had no writes that I can recall when building Parity #2 (Aside from those outlined in my first post) Software bug somewhere since there is zero hardware related errors in the logs as far as I can tell Its an eerie coincidence that the sync corrections both align to the minute with other events that occurred: Parity #1 sync corrections: At the border of the 2TB drives where there appeared to be extra reads to one drive that shouldn't have been. This was also at 18:00 on the nose, but I don't have anything scheduled to happen at that specific time as far as I know. Parity #2 sync corrections: Exactly at 5:30 AM which coincides when SSD Trim was executed. I suppose I will know much more after the 10 TB mark, and especially after I try a non-correcting parity check after this completes.
  2. Attaching diagnostics in case anyone is curious and wants to check if I am overlooking something. tower-diagnostics-20200601-0945.zip tower-diagnostics-20200529-2035.zip
  3. I'll run another non-correcting parity check tomorrow night if this completes by then. As of about 5:30 AM on the nose I am now getting writes to Parity #2 for what looks like every sector. This is exactly when my SSD Trim schedule is set to run... is there anyway that can be related? Only thing I can think of now of what might have caused this is that I needed to a reboot with no array configuration at all before the XFS repair because for some reason I still couldn't select disk7 or disk15 after doing a New Config. Once everything was booted back up I manually reassigned my disks and confirmed I could still access everything. Either way, I still see zero errors in the syslog unless they are being suppressed due to the P+Q corrections. Have you ever seen hardware errors that resulted in zero errors in the syslog? Given the odd chain of events outlined in my first post, I guess maybe it can be chalked up to that. I'll just let this complete then take it form there, as long as im not seeing errors in the syslog and I can still access my files I guess I should just let this run its course. It will be particularly interesting to see what happens after the 10TB mark since my new parity drives are 12TB but none of my data drives are that large yet.
  4. Quick update - More weirdness occurring. During my parity check Parity #1 has basically non-stop sync errors after the 2TB mark, almost as if the parity alignment was thrown off after my 2 TB disks. I was watching my disk usage as parity check crossed over the 2TB mark and I noticed slow reads on only one of my 2TB drives for a few minutes after the other 2TB drives stopped reading. Even weirder is that Parity #2 has hardly any writes, implying there are far fewer sync issues with Parity #2. There is nothing in the syslog indicating any errors. Any ideas what could cause this? Shouldn't my Parity Sync of Parity #2 also corrected Parity #1? (as outlined here:) I just cant think of a reason why Parity #2 would appear good, but Parity #1 would not. If there were any parity issues I would have expected both parity drives to need updating. Update: 107,625,991 sync errors and counting. I have no idea how Parity #1 could get THAT out of whack. After some quick math, it seems as if about every sector after the 2TB mark is triggering a sync correction. I understand Parity #2 could do that if I rearranged disks, but I am at a loss of understanding what could possibly cause that aside from some sort of a software "hiccup" that throws off the "P" calculation but not the "Q". @johnnie.black - I certainly welcome your opinions on this one. My google-fu may be weak, but I scoured the forums and could not find anything similar to this behavior. For now I guess I'll let this run its course and run another Parity check after this completes. Just for fun, you can see at around 18:00 when the 2TB mark was passed, then minutes after, massive slowdown to about 65mb/s as corrections to Parity #1 are written. It was during that spike that I saw reads to one of my 2TB which I don't think should have been occurring since parity check should have been past that point.
  5. As of this morning I was able to do an XFS repair on the two disks and I can access the files again. I did see that there were writes to parity when I made the XFS repair. I am doing a parity check now just to make sure everything is in good working order before I start doing any new file writes to the array. This was a bit of a scare, but Unraid did the right thing to minimize data loss at every step. Also, there hasn't been a single XFS issue I haven't been able to recover from over the years. Shows how mature that file system is. Thanks all for the help and guidance!
  6. Thanks, thats what I was leaning toward. Is another sync necessary after an XFS repair? Wouldn't a parity check basically do the same thing and write any corrections to both parity drives? I'm a bit hesitant to do a full parity sync on both parity drives at once. That reminds me, I should disable my monthly parity check this time around just in case. You also reminded me of a detail I left out. I had a USB external disk preclearing the entire time. One of the reasons I wanted to swap with the server still on was so I could let the preclear continue. I didn't realize that you could resume a preclear after a reboot, nor did I realize that stopping the array also stops a preclear on an unassigned device (which you can resume).
  7. Yeah most likely - At this point I'm just wondering if I should let the parity sync finish or if I should stop it and do another new config and try an XFS repair. I suppose theres no further damage that can be done by allowing parity sync to continue, but not sure if it's a waste of time. For example if I will need to do a complete new parity sync again anyways. Do you think another parity sync would be required? Or would an XFS repair after parity is done syncing suffice?
  8. Parity #2 was a brand new pre-cleared disk. Disk 7 SAID is had an up-to-date Build in the File Integrity screen, but I'm not sure if that was accurate considering I later tried to build hashes for another disk that said the build was up-to-date that was already done calculating parity and it was definitely writing to the disk... There is nothing specific in the syslog about writes via the File Integrity plugin and it did not generate any error logs. I just assume there was an attempt to write a hash to extended attributes which triggered the XFS errors. Or perhaps even the reads could trigger the XFS errors.
  9. I am in the process of upgrading my parity drives to a larger capacity and I ran into an issue. Here are the series of events: I shutdown the array and turned off the server to replace Parity #1. Started the server and let the parity sync. This was successful. I shut down the array and attempted to swap the old Parity #2 with a new disk while the server was still on. Unraid recognized the new drive and things seemed just fine. When I attempted to start the array for the next parity sync I got read and write errors on disk7 and disk15. I know I was not writing any files during this time, so I'm not sure where the reads/writes were coming from. I tried to stop the array and parity sync as soon as I noticed errors, but the array did not seem to stop, so I chose to shutdown, which worked. When the server came back up the parity sync for Parity #2 started automatically. I stopped the parity sync because disk7 and disk15 were disabled. I tried to start the array in maintenance mode to do a XFS check on the two disks, however it couldn't do anything. I assume because I didn't have valid dual parity at the time since I was in the middle of a parity sync. I checked the SMART reports on both disks and things looked fine. At this point I figured I could either try to put my old parity drive back in and restore both disks, or move forward. Since parity sync was already started I assumed that my Parity #1 may have been damaged anyways and I no longer had my old Parity #1 disk since I pre-cleared it after the successful Parity #1 sync. I chose to do a new config, and trusted parity since I knew the disks weren't dying and I wasn't writing any files to the array at the time. I confirmed I could access the contents on disk7 and disk15 without issue. Note I did not attempt to do an XFS check at this time since I could access the files. In hindsight maybe I should have. (?) I then stopped the array to un-assign Parity #2 (which I know was not good parity since I stopped it earlier). I reassigned Parity #2 to start the sync. Things seemed to go well for several hours until I mistakenly tried to build hashes with the File Integrity Plugin. Shortly thereafter I started to receive XFS errors in the syslog and disk7 and disk15 were inaccessible via midnight commander but still showed green in the Unraid UI and Parity Sync kept going. The syslog is now showing errors about filesystem size every second, I assume because disk7 and disk15 are inaccessible due to FS errors. Currently Parity Sync is still running with about 50% left to go. All disks are still marked green and there is nothing in the Unraid UI that would indicate there are any issues. There are some writes to Parity #1 during the sync (about 14k so far), which I expected given the circumstances. So my question is - where do I go from here? Do I let the parity sync complete then try to do an XFS repair? If the drives are inaccessible due to FS errors, is it normal that they are still green? Just seems odd that everything in UI seems fine, but there are definitely issues with the array in the current state. XFS Errors: May 30 14:19:15 Tower kernel: XFS (md15): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x224b896e0 len 8 error 117 May 30 14:19:15 Tower kernel: XFS (md15): Metadata corruption detected at xfs_da3_node_read_verify+0x106/0x138 [xfs], xfs_da3_node block 0x224b896e0 May 30 14:19:15 Tower kernel: XFS (md15): Unmount and run xfs_repair May 30 14:19:15 Tower kernel: XFS (md15): First 128 bytes of corrupted metadata buffer: May 30 14:19:15 Tower kernel: 000000009be5df51: 8e 32 b3 8e 85 31 07 74 b0 e4 d5 75 30 df b5 66 .2...1.t...u0..f May 30 14:19:15 Tower kernel: 000000001ca01a9b: 10 00 1f 09 0a b0 b0 0c 2d b0 9e 23 c6 27 21 fb ........-..#.'!. May 30 14:19:15 Tower kernel: 00000000c4409e1b: ac 65 36 7c 92 bd df 0b d6 f3 31 66 a3 28 9b 49 .e6|......1f.(.I May 30 14:19:15 Tower kernel: 000000006ce0d9d1: db c3 35 b2 99 2c bc 00 d0 c4 87 c2 4f 13 29 1c ..5..,......O.). May 30 14:19:15 Tower kernel: 000000004b95f201: 10 f6 fe 58 69 df bf f5 0a 0c e8 86 36 0d 84 34 ...Xi.......6..4 May 30 14:19:15 Tower kernel: 0000000093039ca2: dc 14 0a f4 53 d3 06 a2 1c 48 40 8b 02 ec 13 20 ....S....H@.... May 30 14:19:15 Tower kernel: 00000000e1ecbb2f: 08 50 9f d6 f0 b8 66 a2 18 00 6f e3 f0 80 e1 e8 .P....f...o..... May 30 14:19:15 Tower kernel: 0000000097002a7b: 60 3d 53 f8 d7 18 88 02 0a 10 d7 e1 b4 02 f3 15 `=S............. FS Size Errors: May 30 14:37:26 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/appdata May 30 14:37:26 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/domains May 30 14:37:26 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/iTunes May 30 14:37:26 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/system May 30 14:37:27 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/Applications May 30 14:37:27 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/Backup May 30 14:37:27 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/Concert Videos May 30 14:37:27 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/Downloads May 30 14:37:27 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/LaunchBox May 30 14:37:27 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/Movies May 30 14:37:27 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/Music May 30 14:37:27 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/Standup Comedy May 30 14:37:27 Tower emhttpd: error: get_fs_sizes, 6412: Input/output error (5): statfs: /mnt/user/TV Shows In the future I certainly won't try swapping disks again while the array is off. Powering off is apparently much safer. Also, the issue above felt like I was between a rock and hard place, and I'm sure I could have taken better steps - but its in the past and im just trying to save as much data as I can and eventually get things back into proper working order. I can provide diagnostics if needed, but figured I would lay out the chain of events first since you don't run into this every day.
  10. Cool thanks for confirming. I was planning on doing a correcting parity check just to make sure everything was synced up properly across both parity disks, but if its doing it automatically during the sync I'll just skip that. This is kind of an edge case, but it would be nice if the Unraid UI were a bit more clear about what is occurring. Generally "syncs" and "checks" are viewed as completely separate processes.
  11. I feel like this is newbie question for someone who has used Unraid for so long... but here it goes: In a dual parity setup, I noticed that when you are syncing parity for Parity #2, there are also reads for Parity #1. What is happening? Is there actually a parity check occurring for Parity #1 while Parity #2 is syncing? The Unraid UI doesn't indicate that anything is happening to Parity #1. It just says a Parity Sync is occurring in general.
  12. I have the X570 Creator. Here are the IOMMU groups after the BIOS tweaks. Note that USB passthrough with these X570 board is not great. One of the controllers does not support Reset and the other requires a bit of config editing to passthrough properly. Basically everything in this thread applies to this board as well: IOMMU group 0: [1022:1482] 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 1: [1022:1483] 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge IOMMU group 2: [1022:1483] 00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge IOMMU group 3: [1022:1482] 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 4: [1022:1482] 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 5: [1022:1483] 00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge IOMMU group 6: [1022:1483] 00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge IOMMU group 7: [1022:1482] 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 8: [1022:1482] 00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 9: [1022:1482] 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 10: [1022:1484] 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 11: [1022:1482] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 12: [1022:1484] 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 13: [1022:1484] 00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 14: [1022:1484] 00:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 15: [1022:790b] 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61) [1022:790e] 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) IOMMU group 16: [1022:1440] 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0 [1022:1441] 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1 [1022:1442] 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2 [1022:1443] 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3 [1022:1444] 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4 [1022:1445] 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5 [1022:1446] 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6 [1022:1447] 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7 IOMMU group 17: [1022:57ad] 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream IOMMU group 18: [1022:57a3] 02:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge IOMMU group 19: [1022:57a3] 02:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge IOMMU group 20: [1022:57a3] 02:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge IOMMU group 21: [1022:57a4] 02:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:1485] 2e:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:149c] 2e:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c] 2e:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller IOMMU group 22: [1022:57a4] 02:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:7901] 2f:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) IOMMU group 23: [1022:57a4] 02:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:7901] 30:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) IOMMU group 24: [8086:15ea] 03:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) IOMMU group 25: [8086:15ea] 04:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) IOMMU group 26: [8086:15ea] 04:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) IOMMU group 27: [8086:15ea] 04:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) IOMMU group 28: [8086:15ea] 04:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) IOMMU group 29: [8086:15eb] 05:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06) IOMMU group 30: [8086:15ec] 07:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06) IOMMU group 31: [1b21:1187] 24:00.0 PCI bridge: ASMedia Technology Inc. Device 1187 IOMMU group 32: [1b21:1187] 25:01.0 PCI bridge: ASMedia Technology Inc. Device 1187 [1b21:0612] 26:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02) IOMMU group 33: [1b21:1187] 25:02.0 PCI bridge: ASMedia Technology Inc. Device 1187 [8086:1539] 27:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) IOMMU group 34: [1b21:1187] 25:03.0 PCI bridge: ASMedia Technology Inc. Device 1187 [8086:2723] 28:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a) IOMMU group 35: [1b21:1187] 25:04.0 PCI bridge: ASMedia Technology Inc. Device 1187 [1b21:0612] 29:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02) IOMMU group 36: [1b21:1187] 25:05.0 PCI bridge: ASMedia Technology Inc. Device 1187 IOMMU group 37: [1b21:1187] 25:06.0 PCI bridge: ASMedia Technology Inc. Device 1187 [10de:128b] 2b:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1) [10de:0e0f] 2b:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1) IOMMU group 38: [1b21:1187] 25:07.0 PCI bridge: ASMedia Technology Inc. Device 1187 IOMMU group 39: [1d6a:07b1] 2d:00.0 Ethernet controller: Aquantia Corp. AQC107 NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 02) IOMMU group 40: [1987:5016] 31:00.0 Non-Volatile memory controller: Phison Electronics Corporation E16 PCIe4 NVMe Controller (rev 01) IOMMU group 41: [10de:1e84] 32:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] (rev a1) [10de:10f8] 32:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1) [10de:1ad8] 32:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1) [10de:1ad9] 32:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1) IOMMU group 42: [1000:0087] 33:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05) IOMMU group 43: [1022:148a] 34:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function IOMMU group 44: [1022:1485] 35:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP IOMMU group 45: [1022:1486] 35:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP IOMMU group 46: [1022:149c] 35:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller IOMMU group 47: [1022:1487] 35:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller IOMMU group 48: [1022:7901] 36:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) IOMMU group 49: [1022:7901] 37:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
  13. I've moved both SSDs to the AMD controller. Fingers crossed that resolves the issues.
  14. A few hours ago it looks like some issues arose starting from an IO_PAGE_FAULT event in the logs. From there I started getting a ton of errors on my cache pool and eventually disk9 was kicked from the array even though nothing should have been written to it since the only activity at that time was going to cache drives. Likely related to this topic... only now I have the full diagnostics leading up to the errors. Should I move both SSDs to the same controller and off of the ASMedia controller? For example moving to 30:00.0. Any help or suggestions are appreciated - this seems to keep happening every day or two ever since upgrading to 6.8.2 tower-diagnostics-20200202-1849.zip
  15. I was downloading something in my Windows 10 VM then the connection dropped out. A reboot did not fix the issue and not sure how to get diagnostics. Attached are photos of the syslog after the dropout and after reboot from the local GUI. Any idea what happened and how to resolve? EDIT: Disregard, I'm an idiot - I was on work VPN from my laptop I was trying to access from. Doh!
  16. I recently had some issues with with my cache pool. My SSDs were both beyond their LBAs written warranty - however I wasn't seeing anything in the SMART reports that indicate a bad disk aside from a handful of CRC errors, which I thought are usually attributed to bad cables. Sometimes one of the disks would just drop out with an unmountable filesystem. And sometimes it seems to have even impacted other disks connected to the motherborad (not sure if related). I ended up replacing these drives and now all of my issues seem to be resolved... I am closely monitoring my logs, so hopefully this was actually the root cause of my issues. If not, when then at least I ruled out one possibility. So I guess my question is, what are things to lookout for to know when an SSD is dying? I know they don't really die the same way as mechanical drives. Also, can a dying SSD impact other disks, particularly if mover is running?
  17. I get this error on every startup with my new X570 board, is it anything to be concerned about? Jan 31 19:45:37 Tower ntpd[2286]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized I couldn't find much info on this error, but it sounds like ntpd is starting too early. I also didn't know if its related to this at all:
  18. Thanks again @johnnie.black - you always seem to come to rescue for things like this. It's a testament to the community and one reason why I stick with unRAID even when weird issues come up every now and then. Thanks to your helpful FAQ I was able to manually mount of the drives, copy data to the array, format the cache pool, and move data back. ... Now if only I could figure out why it happened in the first place. If something like this happens again I'll be sure to provide complete diagnostics before rebooting.
  19. Thanks I'll give that a try. I am really struggling with seeing the value of btrfs mirrored cache pool. Seems like every issue that occurs results in a slim chance of recovering data from either drive. There really should be an easy way to at least convert a disk from the pool into a single cache drive. The pool really does give a false sense of security.
  20. Unfortunately I don't but here is the old thread from the last time I had a cache pool issue, which was not long ago. (completely different hardware though) In the future I will download diagnostics before any reboots. I had syslog saving to the cache pool... but obviously that wont help in this case.
  21. I am having a bunch of weird issues with 6.8.2 that I have never seen before. I am not sure if its just a coincidence and I have cables / drives failing or if there is an issue with my hardware and 6.8.2. Basically I woke up this morning with a disk in my cache pool marked as unmountable with no file system. I rebooted and the issue persisted. I noticed that in the syslog it says a UUID is missing, when I run blkid I can see that the disk now has a different UUID than it did previously. Any idea why this happened? What are the steps for me to restore my cache pool or at least save that data that is on the other disk? tower-diagnostics-20200131-1241.zip
  22. @unbalanced - How did you figure out what was causing the issue? I do not have any additional network cards, but I am still plagued by this issue, which I only recall happening with 6.8.2... Maybe i'll try to downgrade and see if rebooting works normally.
  23. Thanks, I am already using the latest BIOS. I suppose theres still a few kinks to work out with X570. In the meantime I tried this: https://forum.level1techs.com/t/devops-workstation-fixing-nvme-trim-on-linux/148354 I did not have a TRIM job going at the time, but I figured its worth a shot to try it anyways, before I replace cables. Also, I believe all of the errors around 5:30 AM were related to TRIM of my cache pool. It seems as if the writes are blocked while TRIM is running. I turned off Docker and re-ran TRIM and no errors. Weird boot loop also seems to be resolved as long as I use legacy boot instead of UEFI
  24. I also ran into this but mine cut off on the "tsc-early" line - I had to do a hard power down and power back on to get past the boot loop. Im beginning to think theres something about these x570 boards that doesn't reboot like others ive used. It seems like I usually run into problems if I just let it reboot, but things are much more stable with a power down / power on
  25. Interesting - I wonder why virtualization would cause an issue with that. I obviously am not passing that through to my VM. Are there any BIOS settings or anything I should look into?
×
×
  • Create New...