September 2, 201114 yr I have had UnRaid 4.7 running for a few days. I added the parity disk a couple of days ago and today (after moving one folder from one disk to another )it showed as a new disk and is rebuilding parity again after a reboot of the server. Any clues as to why?
September 2, 201114 yr Hello, More info is needed before this board can assist. Post syslog, and hardware breakdown. See stickies on how to capture syslog. Take care, Jim S.
September 2, 201114 yr Author I'm running identical hardware to the 20 drive Tower from Raj. I haven't added the controller cards yet so I have all 6 drives connected to the motherboard. The drives are Seagate Barracuda Greens 2TB, all were successfully precleared before being added to the array. I rebooted the server after I got the error and the syslog reset and only has entries since after the reboot so I don't think that will be much help. It is rebuilding the parity on the same original parity disk right now. I just wonder if moving and deleting folders from one share to another as well as reassigning the disks to the shares might have screwed things up.
September 3, 201114 yr Author I let the parity re-synch(completed with no errors) shut the server down for the night and when I started this morning all was well.
September 3, 201114 yr Author Actually it looks like I may have a problem with the parity disk. Sep 3 09:07:53 Tower kernel: handle_stripe read error: 1346904/0, count: 1 (Errors) Sep 3 09:07:53 Tower kernel: md: disk0 read error (Errors) Sep 3 09:07:53 Tower kernel: handle_stripe read error: 1346912/0, count: 1 (Errors) Sep 3 09:07:53 Tower kernel: md: disk0 read error (Errors) Sep 3 09:07:53 Tower kernel: handle_stripe read error: 1346920/0, count: 1 (Errors) Sep 3 09:07:53 Tower kernel: md: disk0 read error (Errors) Sep 3 09:07:53 Tower kernel: handle_stripe read error: 1346928/0, count: 1 (Errors) Sep 3 09:07:53 Tower kernel: md: disk0 read error (Errors) Sep 3 09:07:53 Tower kernel: handle_stripe read error: 1346936/0, count: 1 (Errors) Sep 3 09:07:53 Tower kernel: md: disk0 read error (Errors) Sep 3 09:07:53 Tower kernel: handle_stripe read error: 1346944/0, count: 1 (Errors) Sep 3 09:07:54 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Sep 3 09:07:54 Tower last message repeated 3 times Sep 3 09:07:54 Tower kernel: NTFS driver 2.1.29 [Flags: R/O MODULE]. (System) Sep 3 09:08:34 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
September 3, 201114 yr Author Now I get this error when trying to run the short smart test from unmenu smartctl -t short -d ata /dev/sda 2>&1 smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. HELP!
September 3, 201114 yr Author I stopped the array, shutdown the serger and re-seated the drive. It restarted without any errors and I was able to do the short smart test, it showed no errors. Hopefully that was the problem.
September 3, 201114 yr Author I am back where I started, Parity disk showing as new and rebuilding again. Appears to be having issues with disk 4 as well. Sep 3 09:49:35 Tower kernel: handle_stripe write error: 2192388096/4, count: 1 (Errors) Sep 3 09:49:35 Tower kernel: md: disk4 write error (Errors) Sep 3 09:49:35 Tower kernel: handle_stripe write error: 2192388104/4, count: 1 (Errors) Sep 3 09:49:35 Tower kernel: md: disk4 write error (Errors) Sep 3 09:49:35 Tower kernel: handle_stripe write error: 2192388112/4, count: 1 (Errors) Sep 3 09:49:35 Tower kernel: md: disk4 write error (Errors) Sep 3 09:49:35 Tower kernel: handle_stripe write error: 2192388120/4, count: 1 (Errors) Sep 3 09:49:35 Tower kernel: md: disk4 write error (Errors) Sep 3 09:49:35 Tower kernel: handle_stripe write error: 2192388128/4, count: 1 (Errors) Sep 3 09:49:35 Tower kernel: md: disk4 write error (Errors) Sep 3 09:49:35 Tower kernel: handle_stripe write error: 2192388136/4, count: 1 (Errors) Sep 3 09:49:35 Tower kernel: md: disk4 write error (Errors) Sep 3 09:49:35 Tower kernel: handle_stripe write error: 2192388144/4, count: 1 (Errors) Sep 3 09:49:35 Tower kernel: md: disk4 write error (Errors) Sep 3 09:49:35 Tower kernel: handle_stripe write error: 2192388152/4, count: 1 (Errors)
September 3, 201114 yr Author Could this be part of the problem? parity device: pci-0000:00:1f.2-scsi-0:0:0:0 host1 (sda) ST2000DL003-9VT166_5YD2Z590 disk1 device: pci-0000:00:1f.5-scsi-1:0:0:0 host4 (sdf) ST2000DL003-9VT166_5YD2TKNM disk2 device: pci-0000:00:1f.2-scsi-1:0:1:0 host2 (sdd) ST2000DL003-9VT166_5YD2Y1K5 disk3 device: pci-0000:00:1f.5-scsi-0:0:0:0 host3 (sde) ST2000DL003-9VT166_5YD2VWT7 disk4 device: pci-0000:00:1f.2-scsi-0:0:1:0 host1 (sdb) ST2000DL003-9VT166_5YD28YGC disk5 device: pci-0000:00:1f.2-scsi-1:0:0:0 host2 (sdc) ST2000DL003-9VT166_5YD31EBL It's showing 2 disks on Host 1 and 2 disks on Host 2
September 3, 201114 yr Could this be part of the problem? parity device: pci-0000:00:1f.2-scsi-0:0:0:0 host1 (sda) ST2000DL003-9VT166_5YD2Z590 disk1 device: pci-0000:00:1f.5-scsi-1:0:0:0 host4 (sdf) ST2000DL003-9VT166_5YD2TKNM disk2 device: pci-0000:00:1f.2-scsi-1:0:1:0 host2 (sdd) ST2000DL003-9VT166_5YD2Y1K5 disk3 device: pci-0000:00:1f.5-scsi-0:0:0:0 host3 (sde) ST2000DL003-9VT166_5YD2VWT7 disk4 device: pci-0000:00:1f.2-scsi-0:0:1:0 host1 (sdb) ST2000DL003-9VT166_5YD28YGC disk5 device: pci-0000:00:1f.2-scsi-1:0:0:0 host2 (sdc) ST2000DL003-9VT166_5YD31EBL It's showing 2 disks on Host 1 and 2 disks on Host 2 Nope that is normal. Check all the power and data cable connections. Your drives are either dying or you have a power connection problem.
September 4, 201114 yr Author I opened the case and reseated all the cables. It rebuilt the parity and looks good so far. I'm thinking if I'm gonna have issues they will arise when I write to the array not when I am just reading from it. I'll write something to it later today.
Archived
This topic is now archived and is closed to further replies.