June 8, 201610 yr I have 2 parity errors on my last monthly check. I redid the parity check and still have 2 errors (i did have it set to write corrections). I assume that means my parity errors are on the parity drive? Should i worry about this, or just monitor?
June 8, 201610 yr Community Expert Attach Diagnostics file ('Tools' >> 'Diagnostics') to your next post. More information will be needed to access your issue.
June 12, 201610 yr Author i am having a problem as the diagnostics zip is too large and forums wont take it. will try again tomorrow when i get home. for now here is everything but logs tower-diagnostics-20160611-1255.zip
June 12, 201610 yr Author Not sure the best way to post my log with file size, its far over the limit. Here are lines from my log that mention the word parity. I have started a 3rd parity check with write corrections enabled - will be done in 6 hours. Jun 3 12:19:08 Tower kernel: mdcmd (230): check CORRECT Jun 3 12:19:08 Tower kernel: md: recovery thread woken up ... Jun 1 08:32:12 Tower kernel: md: correcting parity, sector=4157508704 Jun 1 10:08:58 Tower kernel: md: correcting parity, sector=5366050248 Jun 1 15:15:24 Tower kernel: mdcmd (3411): spindown 2 Jun 1 15:49:02 Tower kernel: mdcmd (3412): spindown 1 Jun 1 15:49:02 Tower kernel: mdcmd (3413): spindown 3 Jun 1 15:49:03 Tower kernel: mdcmd (3414): spindown 4 Jun 1 15:49:03 Tower kernel: mdcmd (3415): spindown 5 Jun 1 17:51:37 Tower kernel: mdcmd (3416): spindown 4 Jun 1 17:51:46 Tower kernel: mdcmd (3417): spindown 1 Jun 1 17:51:48 Tower kernel: mdcmd (3418): spindown 2 Jun 1 17:51:54 Tower kernel: mdcmd (3419): spindown 3 Jun 1 17:51:54 Tower kernel: mdcmd (3420): spindown 5 Jun 1 17:55:33 Tower kernel: mdcmd (3421): spindown 11 Jun 1 17:55:33 Tower kernel: mdcmd (3422): spindown 12 Jun 1 17:55:33 Tower kernel: mdcmd (3423): spindown 13 Jun 1 17:55:33 Tower kernel: mdcmd (3424): spindown 14 Jun 1 17:55:34 Tower kernel: mdcmd (3425): spindown 15 Jun 1 17:55:34 Tower kernel: mdcmd (3426): spindown 16 Jun 1 17:55:34 Tower kernel: mdcmd (3427): spindown 17 Jun 1 17:59:02 Tower kernel: mdcmd (3428): spindown 18 Jun 1 19:08:38 Tower kernel: mdcmd (3429): spindown 12 Jun 1 19:09:49 Tower kernel: mdcmd (3430): spindown 15 Jun 1 19:15:04 Tower kernel: mdcmd (3431): spindown 17 Jun 1 19:15:36 Tower kernel: mdcmd (3432): spindown 5 Jun 1 19:25:58 Tower kernel: md: sync done. time=69957sec Jun 1 19:25:58 Tower kernel: md: recovery thread sync completion status: 0 Jun 1 20:08:12 Tower kernel: mdcmd (3433): spindown 11 Jun 1 20:08:17 Tower kernel: mdcmd (3434): spindown 14 Jun 3 12:19:08 Tower kernel: md: recovery thread checking parity... Jun 3 12:19:08 Tower kernel: md: using 1536k window, over a total of 5860522532 blocks. Jun 3 17:49:31 Tower kernel: md: correcting parity, sector=4157508704 Jun 3 19:26:34 Tower kernel: md: correcting parity, sector=5366050248 Jun 4 00:32:44 Tower kernel: mdcmd (231): spindown 1 Jun 12 03:40:22 Tower logger: skipping "appdata" Jun 12 03:40:22 Tower logger: mover finished Jun 12 07:15:57 Tower kernel: md: correcting parity, sector=5579984760 Jun 12 09:15:31 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog
June 13, 201610 yr Author Things have took a turn for the worse, i am attaching the diagnostics, but only the logs from june 12 when i did the 3rd parity check. Now Disk 11 says it is disabled and contents emulated, but the logs indicate problems that don't seem to be related just to disk 11, errors all over the place. I am only in town for a day or 2 to try and fix this, so any help is appreciated. Note i zipped the files to under 192 (a 2 part zip file), was unable to do much remotely on the last posts. I did a full download of diagnostics and am hoping a power down and unplug will solve the problem. Edit, i will need to repost, and just cut some more out of the log file, it didn't work in the multiple zip. This is just the last few days of log, the rest minus the logs will follow. logs.zip
June 13, 201610 yr Author Here is everything but the logs as of today. tower-diagnostics-20160612-2221.zip
June 13, 201610 yr Author Have to sleep, been up forever, status is disk 11 disabled contents emulated, marked faulty on dashboard, but has a thumbs up. parity status = data is invalid. I am leaving the array down overnight and have scheduled mover for July 1st. I have a hot spare 6tb, but not sure i should do anything at all until i get some help. Any help is appreciated, hoping that i'm not out of luck. Will check first thing in the AM for any ideas/help. Goodnight and thanks in advance.
June 13, 201610 yr Community Expert I'd start by checking what these disks have in common, expander? Jun 12 10:58:07 Tower kernel: md: disk9 read error, sector=8088338896 Jun 12 10:58:07 Tower kernel: md: disk10 read error, sector=8088338896 Jun 12 10:58:07 Tower kernel: md: disk11 read error, sector=8088338896 Jun 12 10:58:07 Tower kernel: md: disk12 read error, sector=8088338896 Jun 12 10:58:07 Tower kernel: md: disk13 read error, sector=8088338896 Jun 12 10:58:07 Tower kernel: md: disk14 read error, sector=8088338896 Jun 12 10:58:07 Tower kernel: md: disk15 read error, sector=8088338896 Jun 12 10:58:07 Tower kernel: md: disk16 read error, sector=8088338896 Jun 12 10:58:07 Tower kernel: md: disk17 read error, sector=8088338896 Jun 12 10:58:07 Tower kernel: md: disk18 read error, sector=8088338896
June 13, 201610 yr Author I have 2 of these daisy chained. My wife was doing some cleaning in that room yesterday as well. I didn't see any loose cables, but powered everything down and reseated all the cables from cpu box to the 2 SGI racks. http://www.ebay.com/itm/3U-SGI-Rackable-SE3016-16-Caddies-3-5-SAS-SATA-Storage-Expander-JBOD-NAS-DAS-/142017101497?hash=item2110e0feb9:g:zGIAAOSwyQtViJ~I
June 13, 201610 yr Author https://www.dropbox.com/s/13qntnub0cubvcg/tower-diagnostics-20160612-2221.zip?dl=0 Here are the complete logs, thanks Squid for the dropbox suggestion!
June 14, 201610 yr Author I didn't get too any replies and will be away from home again for a while, so i took a chance and am using my hot spare to rebuild. So far no problems, if it works i will do a few preclears on the 5tb that failed and then either buy a new drive or copy the contents over to the 5tb and put the 6 back as hot spare. My assumption is that there was some type of glitch with the 16 bay expanders or cabling and that there is/was nothing actually wrong with the drive.. Crossing fingers..
June 14, 201610 yr Community Expert The failed disk looks fine, failure was almost certainly expander related, maybe a loose/faulty cable.
June 14, 201610 yr Author Thanks Johnnie.B - I woke up this morning and checked the machine - It is 86% done with data rebuild. It should have been done, but was showing 2 days plus and a speed of 2.3MB-4.5MB, luckily i can't see anything one the log that looks weird. A few minutes later, 34MB,24MB and now 120MB. Perhaps i never watched a rebuild before and this is normal, but i doubt it. Next step i will look through the smart logs. New Diagnostics attached, just in case anyone wants to look. Complicating factor is that i have to leave town today for work, so limited in what i can do physically, with some off and on remote admin options. tower-diagnostics-20160614-0658.zip
June 14, 201610 yr Community Expert Like you noted log is clean, it can be a disk with slow sectors, these are difficult to detect using SMART, until slow sectors turn to bad sectors and the disk fails.
June 14, 201610 yr Author Thanks Johnnie, Scary stuff, In your opinion is the new beta reliable enough to trust yet? I would love dual parity.
June 14, 201610 yr Community Expert Thanks Johnnie.B - I woke up this morning and checked the machine - It is 86% done with data rebuild. It should have been done, but was showing 2 days plus and a speed of 2.3MB-4.5MB, luckily i can't see anything one the log that looks weird. A few minutes later, 34MB,24MB and now 120MB. Perhaps i never watched a rebuild before and this is normal, but i doubt it. Next step i will look through the smart logs. New Diagnostics attached, just in case anyone wants to look. Depending some what on your hardware, using the GUI will impact your parity check/rebuild speeds. (A watched pot never boils...) This was a major topic of discussion some months back and options were added into the GUI to improve this situation. 'Settings' >> 'DisplaySettings' >> 'Page update frequency:'
June 14, 201610 yr Community Expert Thanks Johnnie, Scary stuff, In your opinion is the new beta reliable enough to trust yet? I would love dual parity. For me yes, all my servers are on 6.2beta, I accept there are some risks but IMO less risk than running a array with single parity, especially a big array.
June 14, 201610 yr Author Thanks Johnnie, Scary stuff, In your opinion is the new beta reliable enough to trust yet? I would love dual parity. For me yes, all my servers are on 6.2beta, I accept there are some risks but IMO less risk than running a array with single parity, especially a big array. Thanks, i tried it a while back (6.2 beta 21) just as a test server and there was some weirdness with my network refreshing on samba shares. I think i had 4 8tb drives. Went back to the 6.1.9 and all the problems went away. Just curious, how many servers are you running? I am running 2 of them, this and the one with 4 usb enclosure harvested 8tb drives. Waiting for black friday to get some more 8tb's.
June 14, 201610 yr Community Expert Thanks Frank1940! it was set at realtime. Hope it's that but doubt it, with page update set to realtime parity check stats are updated once a minute, not enough to slow down so much, maybe if you have a slow CPU and keep pressing F5 to refresh the page main every second. Just curious, how many servers are you running? 6 at the moment not including my test server, all with dual parity except one that is a backup server, all with v6.2-beta.
June 14, 201610 yr Author Got this notification, I can't find any errors in the log? Is that normal to not have any info on the errors or am i looking in the wrong place? attached latest diagnostics. unRAID Data rebuild:: 14-06-2016 11:15 Notice [TOWER] - Data rebuild: finished (311 errors) Duration: unavailable (no parity-check entries logged) tower-diagnostics-20160614-1122.zip
June 15, 201610 yr Community Expert The rebuild completed without any errors, the notification errors are related to the last incomplete parity check. A new parity check should return 0 errors.
Archived
This topic is now archived and is closed to further replies.