Jump to content

Exilepc

Members
  • Posts

    45
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Exilepc's Achievements

Rookie

Rookie (2/14)

0

Reputation

  1. It normally drops out one drive at a time (not always the same drive), but it has happened to up to three drives at the same time. Viz Boot 1, start check: Drive 3 drops out and starts to have errors and stop showing temp and show up in unmapped drives Boot 2, start check: Drive 5 and 1 drops and starts to have errors and stop showing temp and show up in unmapped drives Boot 3: Drive 2 is marked as failed Boot 4, start check: No issues Boot 5, start check: Drives 2,3,5 start to have errors and stop showing temp and show up in unmapped drives
  2. It only seems to be the 16 TB and every once in a while, a 12 TB one. 1300w should be more than enough; what is the best way to check on this to rule it in or out? One of the strange things is it seems to happen to the same drives, normally one that is in the protection or first 5. I shuffled the drives to eliminate the backplanes and maybe overload one of them.
  3. I have been using Unraid for years, and I have upgraded through the years, but this last jump seems to be making me question not only my tech skill and manhood but my sanity as well. System Specs: Parts notes: I have had the motherboard since 2021, the RAM is about a year old, and the CPU is about 6 months old. The 16 Tb drives, Sas Controller, Power supply, Case, SSDs, and NVMe are new. The issues started when I installed the RAM. After about a week, the system would lock up randomly and complain about BTRFS corruption. I decided to move the data to my Synology and start over with a better CPU and a new case. Issues: Rebuild is very slow Disks drop off and return with a new device name/address (Viz SDB becomes SDAC), Errors pop up on the disks, and Smart becomes unresponsive for that disk. (Note it does not seem to be the same drive (always), and it has included drives of other sizes). The issue only seems to happen on parity check/rebuild. Troubleshooting I have completed so far: Initial Hardware Tests PC Doctor Tests: You ran initial hardware diagnostics using PC Doctor to check for any immediate issues. Burn-In Tests: Conducted a burn-in test at a PC shop, which included SMART tests on the drives. All tests passed without errors. Power Supply Upgrade Power Supply Upgrade: Upgraded the power supply to a 1300W unit because the existing one might have been reaching its maximum output, potentially causing instability. CPU and Drive Upgrades CPU Upgrade: Replaced the CPU with a 16-core, 32-thread processor to enhance performance and address potential CPU-related issues. Drive Upgrades: Installed new 16TB drives during a case upgrade. Also moved existing 12TB and 8TB drives from an old Synology system to the new setup. SSD Installation: Added 4 SSDs on a PCI card connected via SATA to the server to expand storage capabilities. Cache NVMe Upgrade: Upgraded the NVMe cache to a 4TB model to increase cache size and performance. Memory Upgrade and Issues RAM Upgrade: Increased RAM from 32GB to 128GB. This upgrade caused issues such as cache corruption and the system becoming unresponsive after 96-130 hours on old system. Further Hardware Troubleshooting PC Shop Inspection: Took the server to a PC shop for a thorough check to ensure no hardware issues were overlooked. SAS Cable Replacement: Swapped out and replaced SAS cables to new backplanes to ensure proper connectivity and eliminate cable faults. Firmware and BIOS Updates: Updated the firmware on the 9305-24i controller and ensured the BIOS was configured to IT mode for optimal performance. Drive Relocation: Moved drives to new bays to see if physical placement was causing issues. External Drive Tests: Tested the drives outside of the server to verify their functionality. CPU and Memory Tests: Tests were conducted on the CPU and memory to rule out any potential faults for over a week. Observations and Investigations Drive Unmounting and Remounting: I noticed that initially, only two drives would unmount and remount, but eventually, all drives on one backplane were affected. Then other backplanes. Random Drive Placement: After moving the server to the PC shop, drives were randomly placed back into the array by size, which could have caused issues. Suspected Cable Issues: It was considered that a single miniSAS-to-SATA cable might be causing problems, especially if the problematic drives shared it. Replaced SAS Cable 1 Controller Considerations: Ordered another 9305-24i controller as some SATA drives directly connected to the motherboard were missing. Randomly placed drives into random bays. SAS backplates separated to different sata rails Replaced USB Stick Fresh load of unraid Happens in safemode and normal mode Changed the Seagate Settings with this guide: I have attached diagnostics from the and screenshots and photos of the system. I am out of ideas... and need advice tower-diagnostics-20240417-1722(1).zip tower-diagnostics-20240420-1617.zip tower-diagnostics-20240728-0930.zip
  4. I cleared the config and re setup the array, for some reason the same behavior of a lot of errors even when there is no matching reads or rights
  5. On a new build. I have a strange issue, 2 of my 16tb seagate drives disk 3 and 4 The disks are sdg and sdf then are able to be in the array, but after starting the array they will show up in unassigned devices as sdy and sdz. I am not sure why... any input would be great I attached the screenshot
  6. Sadly I have not been able to track down the issue… I wish I had an answer
  7. @JorgeB @trurl Strange update - I was having trouble stopping the parity check to check the IP, so I booted into safemode gui and it has been up for over 2 days...
  8. I am working on setting up a syslog server via graylog on another unraid box
  9. @trurl Please see the link above... That is the syslog file
  10. @trurl https://www.dropbox.com/s/2fxqm3nj6o3g5bo/syslog?dl=0 For some reason it is 4+ gb... not sure why
×
×
  • Create New...