Kir Posted July 22, 2015 Share Posted July 22, 2015 Hi guys, I've got a very peculiar situation. I will describe it step by step: 1. I built a new unRAID server v6.0.1. Precleared all disks - OK. 2. Transferred the data from the old one to this one. All OK. 3. Parity check - lots of errors, disk1 redballs. 4. As I'm sure in this disk, do new config, assign the disks to the array. It starts parity sync that finishes OK. 5. I do rsync dry run to check the integrity of the files on disk1 - all OK, no errors! 6. Parity check - again, lots of errors, disk1 redballs. Notice that errors *only* happen during parity check. Parity sync, data transfer, rsync all finish OK. Relevant portion of the syslog attached. syslog_partial.zip Quote Link to comment
dgaschk Posted July 22, 2015 Share Posted July 22, 2015 Tools->Diagnostics. Attach file. Quote Link to comment
Kir Posted July 22, 2015 Author Share Posted July 22, 2015 Here it is. Quickly glancing over it, looks like parity check causes the whole controller to lock up? (attached diags removed for privacy) Quote Link to comment
dgaschk Posted July 22, 2015 Share Posted July 22, 2015 What model PSU? What are the system specs? What MB, SATA cards, etc. Quote Link to comment
Kir Posted July 22, 2015 Author Share Posted July 22, 2015 PSU: 2 x Supermicro 1200W MB: Dell 0YXT71 SATA: Supermicro AOC-SAS2LP-MV8 with the latest FW Backplane: Supermicro SAS2-846EL1 RAM: 8Gb (tested OK) PS: I read about SAS2LP lockup bug, but that was resolved in Kernel 3.6.2-1.fc16 (?) Quote Link to comment
dgaschk Posted July 23, 2015 Share Posted July 23, 2015 None of the disks are responding. There is a hardware fault in the SATA system. Check all cables and connections. Quote Link to comment
Kir Posted July 23, 2015 Author Share Posted July 23, 2015 All works perfectly until I start the parity check. Even parity sync, heavy data transfer, etc. - all OK. Parity check kills it. Quote Link to comment
Kir Posted July 24, 2015 Author Share Posted July 24, 2015 OK. Checked all cables - OK Re-added the "failed" disk - OK Array initiated recovery, finished OK Did rsync --dry-run to check the files with the originals - OK Parity check - same, errors galore ending in red-balling. What gives? Quote Link to comment
sureguy Posted July 24, 2015 Share Posted July 24, 2015 Have you run a memtest? Quote Link to comment
Kir Posted July 25, 2015 Author Share Posted July 25, 2015 Have you run a memtest? Yep. Reply #4 above - "RAM: 8Gb (tested OK)" Quote Link to comment
sureguy Posted July 25, 2015 Share Posted July 25, 2015 Have you run a memtest? Yep. Reply #4 above - "RAM: 8Gb (tested OK)" Sorry, I missed that. How long did you let memtest run? I would recommend overnight at a minimum, multiple passes are suggested. Quote Link to comment
Kir Posted July 26, 2015 Author Share Posted July 26, 2015 Just ran the memtest again for 15 hrs - no errors. Quote Link to comment
Kir Posted August 6, 2015 Author Share Posted August 6, 2015 OK guys, what is so special about parity check? What does it do that no other unRaid operations never do? Just to explain my point what works and what doesn't: 1. rsync to transfer data - OK 2. array recovery - OK 3. parity check - lots of errors, disk redballs, controller locks up sometimes. I even thought the old bug mentioned here raised its ugly head again ("everything works well, untill i do parity check. then i get a lot of errors." - sounds familiar?) But I reflashed the firmware like other guys did, and it didn't fix it. TL;DR: I got tired waiting for a solution, so just bough M1015, reflashed to IT mode, and all is peachy. My SAS2LP is collecting the dust now... Quote Link to comment
trurl Posted August 21, 2015 Share Posted August 21, 2015 Anyone? Your previous post said ...just bough M1015, reflashed to IT mode, and all is peachy... so what do you want from anyone? Quote Link to comment
Kir Posted August 21, 2015 Author Share Posted August 21, 2015 "what is so special about parity check? What does it do that no other unRaid operations never do?" So pretty much, why SAS2LP works for every operation except parity check? Quote Link to comment
itimpi Posted August 21, 2015 Share Posted August 21, 2015 "what is so special about parity check? What does it do that no other unRaid operations never do?" Parity check is the only time that you are simultaneously reading from all drives at the maximum obtainable speed. Quote Link to comment
trurl Posted August 21, 2015 Share Posted August 21, 2015 "what is so special about parity check? What does it do that no other unRaid operations never do?" So pretty much, why SAS2LP works for every operation except parity check? Reading a file only involves the drive being read. Writing a file to cache only involves the cache drive. Writing a file on the parity-protected array only involves the drive being written and the parity drive. Checking parity, or rebuilding parity, or rebuilding a data disk, all require all drives to be accessed simultaneously. Quote Link to comment
Kir Posted August 21, 2015 Author Share Posted August 21, 2015 Reading a file only involves the drive being read. Writing a file to cache only involves the cache drive. Writing a file on the parity-protected array only involves the drive being written and the parity drive. Checking parity, or rebuilding parity, or rebuilding a data disk, all require all drives to be accessed simultaneously. Did you read my post? Rebuilding a data disk worked OK several times. Checking parity always killed it. Quote Link to comment
trurl Posted August 21, 2015 Share Posted August 21, 2015 Reading a file only involves the drive being read. Writing a file to cache only involves the cache drive. Writing a file on the parity-protected array only involves the drive being written and the parity drive. Checking parity, or rebuilding parity, or rebuilding a data disk, all require all drives to be accessed simultaneously. Did you read my post? Rebuilding a data disk worked OK several times. Checking parity always killed it. I didn't re-read the entire (old) thread if that's what you mean. Seemed like you said your problems were solved. Quote Link to comment
RobJ Posted August 21, 2015 Share Posted August 21, 2015 Check out this thread. I don't know if his fix will apply to others, but it seems worth trying, if you're in a position to purchase a new set of memory. I confess I also didn't read enough to know if his situation is the same as yours. Quote Link to comment
Kir Posted August 21, 2015 Author Share Posted August 21, 2015 Check out this thread. I don't know if his fix will apply to others, but it seems worth trying, if you're in a position to purchase a new set of memory. I confess I also didn't read enough to know if his situation is the same as yours. All OK after I changed the controller. So there is something with SAS2LP and parity check only. That's what's really strange... Quote Link to comment
HellDiverUK Posted August 21, 2015 Share Posted August 21, 2015 Considering how many folks use that controller with zero issues, it's pretty safe to assume either your controller is faulty, overheating, or something else is broken. You cannot making the sweeping statement that there's an issue with all SAS2LP. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.