Exilepc Posted July 28 Share Posted July 28 I have been using Unraid for years, and I have upgraded through the years, but this last jump seems to be making me question not only my tech skill and manhood but my sanity as well. System Specs: Quote PCPartPicker Part List: https://pcpartpicker.com/list/jr9ct7 CPU: AMD Ryzen 9 5950X 3.4 GHz 16-Core Processor ($358.00 @ Newegg) Motherboard: Asus PRIME X570-PRO ATX AM4 Motherboard ($377.36 @ Amazon) Memory: G.Skill Ripjaws V 128 GB (4 x 32 GB) DDR4-3200 CL16 Memory ($259.99 @ Amazon) Storage: Samsung 870 Evo 2 TB 2.5" Solid State Drive ($179.99 @ Amazon) Storage: Samsung 870 Evo 2 TB 2.5" Solid State Drive ($179.99 @ Amazon) Storage: Samsung 870 Evo 2 TB 2.5" Solid State Drive ($179.99 @ Amazon) Storage: Samsung 870 Evo 2 TB 2.5" Solid State Drive ($179.99 @ Amazon) Storage: Sabrent Rocket 4 Plus 8 TB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive ($1165.00 @ Amazon) Storage: Sabrent Rocket 4 Plus 8 TB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive ($1165.00 @ Amazon) Storage: HP 7 TB 3.5" 7200 RPM Internal Hard Drive Storage: HP 7 TB 3.5" 7200 RPM Internal Hard Drive Storage: HP 7 TB 3.5" 7200 RPM Internal Hard Drive Storage: HP 7 TB 3.5" 7200 RPM Internal Hard Drive Storage: HP 7 TB 3.5" 7200 RPM Internal Hard Drive Storage: HP 7 TB 3.5" 7200 RPM Internal Hard Drive Storage: HP 7 TB 3.5" 7200 RPM Internal Hard Drive Storage: Seagate IronWolf NAS 8 TB 3.5" 7200 RPM Internal Hard Drive ($149.88 @ Amazon) Storage: Seagate IronWolf NAS 8 TB 3.5" 7200 RPM Internal Hard Drive ($149.88 @ Amazon) Storage: Seagate IronWolf NAS 8 TB 3.5" 7200 RPM Internal Hard Drive ($149.88 @ Amazon) Storage: Seagate IronWolf NAS 8 TB 3.5" 7200 RPM Internal Hard Drive ($149.88 @ Amazon) Storage: Seagate IronWolf NAS 8 TB 3.5" 7200 RPM Internal Hard Drive ($149.88 @ Amazon) Storage: Seagate IronWolf NAS 8 TB 3.5" 7200 RPM Internal Hard Drive ($149.88 @ Amazon) Storage: Seagate IronWolf NAS 8 TB 3.5" 7200 RPM Internal Hard Drive ($149.88 @ Amazon) Storage: Seagate IronWolf NAS 8 TB 3.5" 7200 RPM Internal Hard Drive ($149.88 @ Amazon) Storage: Seagate IronWolf Pro NAS 12 TB 3.5" 7200 RPM Internal Hard Drive ($229.99 @ Adorama) Storage: Seagate IronWolf Pro NAS 12 TB 3.5" 7200 RPM Internal Hard Drive ($229.99 @ Adorama) Storage: Seagate IronWolf Pro NAS 16 TB 3.5" 7200 RPM Internal Hard Drive ($299.95 @ Amazon) Storage: Seagate IronWolf Pro NAS 16 TB 3.5" 7200 RPM Internal Hard Drive ($299.95 @ Amazon) Storage: Seagate IronWolf Pro NAS 16 TB 3.5" 7200 RPM Internal Hard Drive ($299.95 @ Amazon) Storage: Seagate IronWolf Pro NAS 16 TB 3.5" 7200 RPM Internal Hard Drive ($299.95 @ Amazon) Storage: Seagate IronWolf Pro NAS 16 TB 3.5" 7200 RPM Internal Hard Drive ($299.95 @ Amazon) Storage: Seagate IronWolf Pro NAS 16 TB 3.5" 7200 RPM Internal Hard Drive ($299.95 @ Amazon) Storage: Seagate IronWolf Pro NAS 16 TB 3.5" 7200 RPM Internal Hard Drive ($299.95 @ Amazon) Video Card: PNY VCQP4000-PB Quadro P4000 8 GB Video Card ($425.00 @ Amazon) Power Supply: SilverStone Technology ST1300-TI, 80 Plus Titanium 1300W Fully Modular ATX/PS2 Power Supply, SST-ST1300-TI-X ($340.99 @ Newegg) Case Fan: Noctua NF-A12x25 PWM chromax.black.swap 60.09 CFM 120 mm Fan ($34.95 @ Amazon) Case Fan: Noctua NF-A12x25 PWM chromax.black.swap 60.09 CFM 120 mm Fan ($34.95 @ Amazon) Case Fan: Noctua NF-A12x25 PWM chromax.black.swap 60.09 CFM 120 mm Fan ($34.95 @ Amazon) Case Fan: Noctua NF-A12x25 PWM chromax.black.swap 60.09 CFM 120 mm Fan ($34.95 @ Amazon) Fan Controller: Razer RZ34-02140700-R3M1 Fan Controller ($48.96 @ Amazon) Parts notes: I have had the motherboard since 2021, the RAM is about a year old, and the CPU is about 6 months old. The 16 Tb drives, Sas Controller, Power supply, Case, SSDs, and NVMe are new. The issues started when I installed the RAM. After about a week, the system would lock up randomly and complain about BTRFS corruption. I decided to move the data to my Synology and start over with a better CPU and a new case. Issues: Rebuild is very slow Disks drop off and return with a new device name/address (Viz SDB becomes SDAC), Errors pop up on the disks, and Smart becomes unresponsive for that disk. (Note it does not seem to be the same drive (always), and it has included drives of other sizes). The issue only seems to happen on parity check/rebuild. Troubleshooting I have completed so far: Initial Hardware Tests PC Doctor Tests: You ran initial hardware diagnostics using PC Doctor to check for any immediate issues. Burn-In Tests: Conducted a burn-in test at a PC shop, which included SMART tests on the drives. All tests passed without errors. Power Supply Upgrade Power Supply Upgrade: Upgraded the power supply to a 1300W unit because the existing one might have been reaching its maximum output, potentially causing instability. CPU and Drive Upgrades CPU Upgrade: Replaced the CPU with a 16-core, 32-thread processor to enhance performance and address potential CPU-related issues. Drive Upgrades: Installed new 16TB drives during a case upgrade. Also moved existing 12TB and 8TB drives from an old Synology system to the new setup. SSD Installation: Added 4 SSDs on a PCI card connected via SATA to the server to expand storage capabilities. Cache NVMe Upgrade: Upgraded the NVMe cache to a 4TB model to increase cache size and performance. Memory Upgrade and Issues RAM Upgrade: Increased RAM from 32GB to 128GB. This upgrade caused issues such as cache corruption and the system becoming unresponsive after 96-130 hours on old system. Further Hardware Troubleshooting PC Shop Inspection: Took the server to a PC shop for a thorough check to ensure no hardware issues were overlooked. SAS Cable Replacement: Swapped out and replaced SAS cables to new backplanes to ensure proper connectivity and eliminate cable faults. Firmware and BIOS Updates: Updated the firmware on the 9305-24i controller and ensured the BIOS was configured to IT mode for optimal performance. Drive Relocation: Moved drives to new bays to see if physical placement was causing issues. External Drive Tests: Tested the drives outside of the server to verify their functionality. CPU and Memory Tests: Tests were conducted on the CPU and memory to rule out any potential faults for over a week. Observations and Investigations Drive Unmounting and Remounting: I noticed that initially, only two drives would unmount and remount, but eventually, all drives on one backplane were affected. Then other backplanes. Random Drive Placement: After moving the server to the PC shop, drives were randomly placed back into the array by size, which could have caused issues. Suspected Cable Issues: It was considered that a single miniSAS-to-SATA cable might be causing problems, especially if the problematic drives shared it. Replaced SAS Cable 1 Controller Considerations: Ordered another 9305-24i controller as some SATA drives directly connected to the motherboard were missing. Randomly placed drives into random bays. SAS backplates separated to different sata rails Replaced USB Stick Fresh load of unraid Happens in safemode and normal mode Changed the Seagate Settings with this guide: I have attached diagnostics from the and screenshots and photos of the system. I am out of ideas... and need advice tower-diagnostics-20240417-1722(1).zip tower-diagnostics-20240420-1617.zip tower-diagnostics-20240728-0930.zip Quote Link to comment
JorgeB Posted July 29 Share Posted July 29 Disks are dropping offline, looks like a power/connection problem. Quote Link to comment
Exilepc Posted August 9 Author Share Posted August 9 (edited) It only seems to be the 16 TB and every once in a while, a 12 TB one. 1300w should be more than enough; what is the best way to check on this to rule it in or out? One of the strange things is it seems to happen to the same drives, normally one that is in the protection or first 5. I shuffled the drives to eliminate the backplanes and maybe overload one of them. Edited August 9 by Exilepc Quote Link to comment
JorgeB Posted August 9 Share Posted August 9 31 minutes ago, Exilepc said: I shuffled the drives to eliminate the backplanes and maybe overload one of them. And the same ones keep dropping? Quote Link to comment
Exilepc Posted August 9 Author Share Posted August 9 It normally drops out one drive at a time (not always the same drive), but it has happened to up to three drives at the same time. Viz Boot 1, start check: Drive 3 drops out and starts to have errors and stop showing temp and show up in unmapped drives Boot 2, start check: Drive 5 and 1 drops and starts to have errors and stop showing temp and show up in unmapped drives Boot 3: Drive 2 is marked as failed Boot 4, start check: No issues Boot 5, start check: Drives 2,3,5 start to have errors and stop showing temp and show up in unmapped drives Quote Link to comment
JorgeB Posted August 10 Share Posted August 10 14 hours ago, Exilepc said: not always the same drive If it's not always the same drive it still suggests like mentioned a power/connection issue, are you using any power splitters? You can also try a different PSU if available, it won't be lack of power but it may not be working correctly. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.