September 29, 20241 yr Hey! I am writing this today after trying to solve the problem myself for the past year to avoid bothering you, but I have a severe lack of skills soooo... Here is the thing: I had a problem with a storage disk, that had its data corrupted, and eventually comes to a "read only" error. When it was in "read only" state, any try to write something led to an error 5. It happened while doing hard read/write work (like downloading "50 ISOs of linux" simultaneously, or copying 3TB of data from a disk to another). So I tried to remove the disk, but then the problem moved to another disk. I bought like 4 others disks in total during this year, and precleared all of them to be sure: preclear read, zeroing and post-clear read all finished without any error (50h per disk). It seems that the I/O problem "moves" from a disk to another, depending of the disks that are in the array (if I remove the 2 12TB it mooves to the SSD cache, if I add some it might moove to the NVMe, or to the 12TB, I can't find any pattern there). I tought the SATA of the motherboard were too saturated, so I bought an LSI 9300-16i card, and even designed a 3D cooler adapter to keep it cool, but the problem persists. In the meantime, Fix Common Problems told me today that an "invalid folder" with the name of an old share is still within /mnt. Some "flash device corrupted also" while starting the array, but disappearing eventually. I am a bit confused now about what to test then... Maybe the PSU since it is a G650M, known to be a disaster? I bought another PSU to troubleshoot even this, but I assume that the problem is more software than hardware now. If someone sees something that I missed, it could help me a lot! Thanks ❤️ mc5-diagnostics-20240929-1442.zip
September 29, 20241 yr Community Expert Looks like both pool devices dropped offline in the past, run a correcting scrub on the pool and post the results.
September 29, 20241 yr Author Seems to have worked! All this time and money just for that, you saved me! A bit of panic while doing the scrub, since I only saw "137 errors fixed" then the "main" tab alternating between empty array and Error 500, but after a forced reboot it seems that I don't have any red line for the moment, thanks Jorge ❤️ I also added a weekly scrub to cache and pools now, to avoid it happenning again. Will try to copy 3TB and stress the disks again, but it seems that it was that simple...
September 29, 20241 yr Author Oopsi, talked too soon... mc5-diagnostics-20240929-1734.zip Edited September 29, 20241 yr by resolute-clearance8449 Added the diag
September 30, 20241 yr Author Hi, here are the results: Cache: UUID: dc947b32-2638-4059-927b-d0a51c5d878a Scrub started: Mon Sep 30 09:33:50 2024 Status: finished Duration: 0:02:54 Total to scrub: 178.41GiB Rate: 1.02GiB/s Error summary: no errors found Secondary pool: I'll come back to you when ended! But what is odd is that errors keep going with use, since it had no errors before the copy started.
October 2, 20241 yr Author Scrub ended, here are the results: UUID: 250fadc9-bf35-4060-aa6d-c030b89bca9a Scrub started: Wed Oct 2 01:10:44 2024 Status: finished Duration: 8:18:13 Total to scrub: 5.00TiB Rate: 175.37MiB/s Error summary: read=294772105 csum=256 Corrected: 272 Uncorrectable: 294772089 Unverified: 0 In the meantime, if it can help: Since the duration went to 3 to 4 days, I stopped the scrub, and erased the disks (I can afford to loose all the data, it is saved on another disk). I then relaunched a scrub, that found no errors. But I had a lot of lines "kernel: sd 7:0:6:0: Power-on or device reset occurred", so I went to Tools / System Devices to find out that 7:0:6:0 were attributed to one of the disks of the pool. So I changed the disk for another that I just bought: no more "Power-on or device reset occurred", except when really powering the disks on I assume. I tried to copy one season at first, then scrub the pool with the new disk: no error. Tried to copy an entire show then scrub: no error. Tried to copy all of the "show" folder then scrub: 24 uncorrectable errors, but 0 corrected and 0 unverified. Tried to copy some others folders then scrub: 294 772 089 uncorrectable errors, and 272 corrected (the result that it above), and logs that look like a christmas tree! I am starting to think that the disk that keeped disconnecting basically corrupted the data and that I "only" have to retrieve them to remove all the errors Diag attached as usual, if needed Thanks for your help! mc5-diagnostics-20241002-0938.zip
October 2, 20241 yr Community Expert There are already a lot of device errors, and the syslog already rotated, so cannot see the start of the problem, and if they are new or old, but looks like a device dropped offline. If the data can be deleted, delete all the existing data, reset the pool stats, start copying again and post new diags after new errors.
October 2, 20241 yr Author Well, it was faster than I thought 😅 Copy crashed, pool went into read-only, and I had no access to its settings. After a reboot, read-only was gone and I had access to the scrub, so here is the result: UUID: 250fadc9-bf35-4060-aa6d-c030b89bca9a Scrub started: Wed Oct 2 16:47:04 2024 Status: finished Duration: 0:23:22 Total to scrub: 192.16GiB Rate: 140.35MiB/s Error summary: verify=13732 csum=247432 Corrected: 261164 Uncorrectable: 0 Unverified: 0 mc5-diagnostics-20241002-1754.zip Edited October 2, 20241 yr by resolute-clearance8449 Added the diag (again)
October 2, 20241 yr Author Already tried, and swapping PSU cables too, but nothing worked. Would it be possible that, since they all come from the same batch, they might be all faulty? If you think that it might be a possibility, I will buy some Western Digital Gold for example, and try with it! Attached the sound of one of them: Seagate Unraid.mp4
October 2, 20241 yr Community Expert Solution That's not sounding good, if the cables were replaced it could be a bad disk.
October 2, 20241 yr Community Expert It might be worth checking that there is not a power related issue.
October 2, 20241 yr Author Sure, especially with a G650M, even if the sound of the disks is a bit scary! I will try to go back to my faithful WD and keep you updated, thanks a lot for your time!!
October 7, 20241 yr Author Hey! Quick update since I recieved the new WD Red Plus saturday: I copied 4TB of data on it and not a single issue. From what I see now, I am stunned that all this mess might only be caused by all the Seagate drives being faulty (4 in total). Next steps to be sure : - relaunch all the apps that were using it, and stress test for a week or so - if still no errors, add the "backup disk" to create a Raid1 and confirm (or not) if the issue came from the disks, or the raid architecture itself! Keeping you updated obviously (and maybe 1 or 2 people to whom the problem could happen in the future, hello to you) mc5-diagnostics-20241007-1331.zip Edited October 7, 20241 yr by resolute-clearance8449
October 7, 20241 yr Community Expert 6 minutes ago, resolute-clearance8449 said: From what I see now, I am stunned that all this mess might only be caused by all the Seagate drives being faulty (4 in total). It could also be due to something that happened while they were in transit as one assumed they all travelled together.
October 7, 20241 yr Author Yes 100% possible too. Will tell in about a week now, but all seems to be resolved thanks to your help Fingers crossed!
October 24, 20241 yr Author Hi! Last update (I hope), after 2 weeks of testing: all works perfectly - in "btrfs single" configuration, not a single error no matter the disk (new WD Red Plus 10TB or "old" WD Black 8TB, each on their own pool) even while writing / reading intensively for a week - so I copied the ~6TB of data from the 8TB to the 10TB - I then deleted all the content of the 8TB and merged the 2 disks in one pool - Unraid switched from "btrfs single" to "raid 1", and all went smoothly Everything works like a charm since, thanks a lot for your help (and long live Western Digital!) ❤️ mc5-diagnostics-20241024-1041.zip
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.