October 9, 201213 yr Guys I could really use some help. I have been using an unRAID server for a few years and fortunately have never had an issue. Worked flawlessly. Recently, one of my drives ran out of space and I was going to replace it with a bigger drive when I noticed parity had run (i have it set to run once a month) and there were 57 sync errors. I ran a check again and selected to fix errors. When I did this, I thought all was good after it finished. But then my unRAID server froze the next day and I had to unclean power it off. Now when I reran parity, i got thousands of errors. Even after I choose to fix them, they still recur if I run again. I am not quite sure how to diagnose from here. I am not sure how to figure out which drive or what system problem is causing my issues. I am using Unmenu and ran a smart history. Besides a few drives that are running past their power-on threshold hours, the only other error that was highlighted was this: Offline_Uncorrectable it is now 1 (warning threshold is 1) on my disk 4. I am not sure what that is. I am attaching my syslog. Have lots of sensitive data on here and am nervous about doing anything. Really could use some guidance. Much thank in advance. Running 5.0 beta6a Also have one Supermicro AOC-SASLP-MV8 syslog-2012-10-08.txt.zip
October 10, 201213 yr Author If anyone has a chance and could look at this to let me know if there is anything obvious I would appreciate it.
October 10, 201213 yr Have lots of sensitive data on here and am nervous about doing anything. Really could use some guidance. Step 1. Make a backup of your sensitive data elsewhere. unRAID is not a backup. It is a way to protect yourself from a single disk failure. Then, perform a memory test. (preferably overnight) Make sure it is not adding to your issues. Memory can go bad, or possibly the voltage, clock speed and timing was not set properly by your BIOS. Lastly, run tests on specific disks. Repeated reads from specific disks should result in consistent md5 checksums. It might just be a single disk if only one is not consistent in its output. There are "dd"scripts in other threads that feed the output to md5sum. Run them. Joe L.
October 11, 201213 yr Since you've got about 10 drives perhaps you are running into some power supply issues? Stephen
October 11, 201213 yr Author Here are my smart reports. But I will do as Joe says and run a memtest than dd scripts for each disk. Sh.zip
October 11, 201213 yr Those are not SMART reports and are incomplete. See here: http://lime-technology.com/wiki/index.php/Troubleshooting#Obtaining_a_SMART_report
October 12, 201213 yr Author Oops. Sorry you are right. I will repost. Ran the dd and md5sum tests and all came back clean, including parity drive. Memtest has been running for 5 hours and clean so far. Will let it run over night and then see.
October 12, 201213 yr Author Well. My SMART reports do not show any errors and Memtest reported nothing. I am running file system checks now. But assuming those are not the issues, what are the next next steps? Again, i ran dd and all disks checked out fine. I would assume bad cable would have shown up during the Dd/md5sum checks? Not sure what else it could be...has been running without any parity errors for 2 years. Any other thoughts?
October 14, 201213 yr Author Well the problem is fixed. Not sure how exactly. Net is I ran three parity correcting syncs while also dusting out and reconnecting all cables. Finally it reported 0 errors. All memtests and disk checks passed. Anyway, thanks for the help.
Archived
This topic is now archived and is closed to further replies.