July 16, 201213 yr Hi, I'm seriously stuck with not being able to rebuild a replacement disk. The array is functioning as I can still access all the files that were on the original, possibly now failed, disk but the array is unprotected. When I start a rebuild it instantly causes unRAID to crash. The problems started when I attempted to increase the md_sync window size to improve performance. I set this to 876. It had been 512. Md_num_stripes was 3840 and md_write_limit was 2304. These two values have not been changed and based on Tom’s discussion elsewhere on the forum, these values should work fine with 2 GB of system memory and 13 disks. I'm running unRAID Pro 4.7 on an Asus P5Q Premium with an Intel E8400 3.0 Ghz CoreDuo CPU, 2 GB RAM, 2 x AOC SASLP MV8 controllers, 12 data drives, 1 parity drive, and 1 cache drive. Up to this point this system has been in daily use and rock solid for almost 18 months. After saving the change to md_sync size, the instant I restarted the array, a red ball showed up on Disk 7 (Seagate ST31500341AS, 1.5 T). I tried moving the disk to different slots and controllers to eliminate potential points of failure, all with the same results. I then checked the disk using SeaTools (Windows) which failed the short test. I ran SeaTools in repair mode, which completed successfully. I then ran a long SMART test, which also passed. I reinstalled the disk and started the server which came up with Disk 7 missing. I stopped the array and assigned the disk back to Disk 7. When the array was started (to rebuild), unRAID instantly collapsed. The internal web server quit shortly afterwards but I was still able to telnet in and capture the logs. A truncated syslog is attached. I've also included dmesg with an error logged between lines 879 and 913. I forced a power down, then removed the disk from the server and restarted it. It booted ‘normally’ and the array came up, again missing Disk 7. Fortunately all data on all drives, including the missing Disk 7, is readable. A SMART report for all disks, including the suspect disk, reported no errors on any kind. Results attached. I then decided to replace the original disk. I externally pre-cleared a new Seagate 2T drive (ST2000DM001-9YN) using Joe L’s preclear_disk.sh. This disk was installed and the server started. Again, the server came up OK, but missing Disk 7. I stopped the array and assigned the new disk to Disk 7. I started a rebuild (no other option was offered), and again had exactly the same problem as before. I had to force the shutdown. The server boots and the array starts unprotected when no disk is assigned to Disk 7. Any attempt to assign either disk to Disk 7 results in the instant failure and generation of many, many MB of errors in the log when the array is restarted. This behaviour occurs irrespective of physical location or controller. I'm guessing I need to avoid, if at all possible, forcing a parity rebuild as that would result in the loss of the remaining 'protected' Disk 7 data. Finally, I ran reiserfsck against both suspect disks. The original disk shows no errors and apparently lots of data. However the replacement disk comes back with all sorts of errors with the sub-tree and bad nodes. Results attached. Thanks in advance for any assistance! SMART_results.zip dmesg_after_failed_restart_160712.zip syslog_after_changing_stripe_size_down_shortened_version.zip reiserfsck_results.txt
July 16, 201213 yr Check my post: http://lime-technology.com/forum/index.php?topic=21377.msg190277#msg190277 I recently had a similar experience when adjusting those values, though mine are much less than yours and I have 8GB of RAM. I had to reset default values and rebuild the drive, and I don't plan on editing them again (other than md_sync to 512). I feel both your drives are fine. Try setting it to this: Tunable (md_num_stripes): 1280 Tunable (md_write_limit): 768 Tunable (md_sync_window): 512 Your values seem VERY high, I use these settings and I have 22 disks. Though the latest RC has less than subpar parity syncs (~40MB/s versus 100MB/s on B14), I don't feel it's related to these settings.. many people have reported the same.
July 18, 201213 yr Author Many thanks for the feedback. As suggested, I reset the values back to default which seems to have corrected the problem. The array is rebuilding now. Fingers crossed that it completes successfully in 522 minutes!
Archived
This topic is now archived and is closed to further replies.