January 16, 201214 yr Greetings unRAID community, I have recently had an issue with one of my servers that has me stumped. I generated the array and opted to wait on building a parity disk as I had a significant amount of data to write to the array first. After writing the data my last order of HDDs arrived and I added two more empty drives to expand the array further and another as my parity drive. After powering up the server again I started the initial parity calculation which appeared to go fine, I left the house as it took some time and when I returned there had been an error of some kind. One of the original drives (luckily it had no data on it) was red balled and the parity drive was orange balled and had (obviously) stopped calculating parity. I attempted a short S.M.A.R.T. test on the red ball drive which would not work (I was asked to add -T permissive options) so I figured that the drive had had a power or data cable come loose. I powered down the server, rechecked all the cabling and powered up again. The server started fine with the parity drive and Disk 2 orange but Disk 2 was also showing as Unformatted and I could only proceed with parity calculation if I first formatted the disk. As there had been no data stored on the disk I wend ahead with the format but the server did nothing and returned to its previous state. Confused I fired up my console (I wish unRAID shipped with SSH by default, installing unMenu just for SSH seems a little much) and checked to see if syslog could shed light. I saw a mention of what I thought could be the culprit: Jan 16 15:51:13 passionfruit emhttp: shcmd (337): set -o pipefail ; mkreiserfs -q /dev/md2 |& logger Jan 16 15:51:13 passionfruit logger: mkreiserfs 3.6.21 (2009 www.namesys.com) Jan 16 15:51:13 passionfruit logger: Jan 16 15:51:55 passionfruit emhttp: shcmd (338): mkdir /mnt/disk2 Jan 16 15:51:55 passionfruit emhttp: shcmd (339): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md2 /mnt/disk2 |& logger Jan 16 15:51:55 passionfruit kernel: REISERFS warning (device md2): sh-2021 reiserfs_fill_super: can not find reiserfs on md2 Jan 16 15:51:55 passionfruit logger: mount: wrong fs type, bad option, bad superblock on /dev/md2, Jan 16 15:51:55 passionfruit logger: missing codepage or helper program, or other error Jan 16 15:51:55 passionfruit logger: In some cases useful info is found in syslog - try Jan 16 15:51:55 passionfruit logger: dmesg | tail or so Jan 16 15:51:55 passionfruit logger: Jan 16 15:51:55 passionfruit emhttp: _shcmd: shcmd (339): exit status: 32 Jan 16 15:51:55 passionfruit emhttp: disk2 mount error: 32 Jan 16 15:51:55 passionfruit emhttp: shcmd (340): rmdir /mnt/disk2 Jan 16 15:51:56 passionfruit emhttp: shcmd (341): :>/etc/samba/smb-shares.conf Jan 16 15:51:56 passionfruit emhttp: shcmd (342): cp /etc/netatalk/AppleVolumes.default- /etc/netatalk/AppleVolumes.default Jan 16 15:51:56 passionfruit emhttp: Restart SMB... Jan 16 15:51:56 passionfruit emhttp: shcmd (343): killall -HUP smbd Jan 16 15:51:56 passionfruit emhttp: shcmd (344): ps axc | grep -q rpc.mountd Jan 16 15:51:56 passionfruit emhttp: _shcmd: shcmd (344): exit status: 1 Jan 16 15:51:56 passionfruit emhttp: Restart AFP... Jan 16 15:51:56 passionfruit emhttp: shcmd (345): killall -HUP afpd Jan 16 15:51:56 passionfruit emhttp: shcmd (346): cp /etc/avahi/services/afp.service- /etc/avahi/services/afp.service Jan 16 15:51:56 passionfruit avahi-daemon[7050]: Files changed, reloading. Jan 16 15:51:56 passionfruit avahi-daemon[7050]: Service group file /services/afp.service changed, reloading. Jan 16 15:51:56 passionfruit emhttp: shcmd (347): cp /etc/avahi/services/smb.service- /etc/avahi/services/smb.service Jan 16 15:51:56 passionfruit avahi-daemon[7050]: Files changed, reloading. Jan 16 15:51:56 passionfruit avahi-daemon[7050]: Service group file /services/smb.service changed, reloading. Jan 16 15:51:56 passionfruit emhttp: shcmd (348): /usr/local/sbin/emhttp_event svcs_restarted Jan 16 15:51:56 passionfruit emhttp_event: svcs_restarted This occurs after I click 'Format' from the management interface. What am I looking at here and what should I next try to troubleshoot? Another question, possibly related. I am a little nervous that I currently lack parity protection (data is still backed up elsewhere, just paranoid) so I tried to remove the drive (following the Wiki instructions) until I could learn more but was unsuccessful. My steps were stopping the array, removing the Disk 2 assignment and running initconfig from the console before starting the array again. This simply left the array with a red ball where Disk 2 was and would still not let me calculate parity. Have I done something wrong in trying to remove the disk?
January 16, 201214 yr Author Any ideas on this? At the very least I'd love to be able to simply remove the drive and ensure I have a solid parity before doing anything else. I should also mention I am using one of the latter betas; 5.0b13 to be precise. I can attach the first 10,000 lines of syslog from before I started the initial parity check, I only captured that much as the log was enormous and at least three quarters of the file is the same error repeated.
January 18, 201214 yr Author OK, I have attached the initial syslog. I have two unRAID addons (Simple Features & Plex MediaServer), should I disable them and attempt removing Disk 2 again? Forum does not allow .7z? Weird. syslog.zip
January 18, 201214 yr Disk 6 is giving read errors. Since disk 2 is red balled you have a serious problem. Is there enough free space to copy everything from disk 6? This is why waiting to install parity is a bad idea.
January 23, 201214 yr Author Disk 6 is giving read errors. Since disk 2 is red balled you have a serious problem. Is there enough free space to copy everything from disk 6? This is why waiting to install parity is a bad idea. Dix six is fine (as mentioned in the first post "One of the original drives (luckily it had no data on it) was red balled... the drive had had a power or data cable come loose.") it has since had a S.M.A.R.T. test come back fine. I have successfully removed the orange balled Disk 2 by stopping the array, unassigning Disk 2, powering down the array, removing Disk 2, powering on and typing initconfig. It worked this time. By the way, the Wiki does not mention this but when you refresh the management portal after typing initconfig all of the disks disappear from their assignments. I assume this is the desired behaviour but I would imagine you would want to warn users that they will need to take note of their old disk assignments. I think things could get hairy if you accidentally assign a data drive to the parity slot. I have begun another parity sync and it has, again, failed with the following error: Jan 24 10:22:37 passionfruit emhttp: mdcmd: write: Input/output error Jan 24 10:22:37 passionfruit kernel: mdcmd (299): spindown 0 Jan 24 10:22:37 passionfruit kernel: md: disk0: ATA_OP e0 ioctl error: -5 I have searched the forum here and got nothing close to an error like this. What is going on?
January 23, 201214 yr Author Attached is a copy & paste from the log of my management interface after starting the array & parity sync. I am running a short and long S.M.A.R.T. test as well. log.txt
January 24, 201214 yr Author I am getting the feeling that one of my drive bays might be a little sketchy, I have removed the drive that I had in there and have begun a new attempt to gain parity. All was running well (it got to ~14%) and then I got this error: Jan 24 14:54:38 passionfruit kernel: sas: command 0xf0507900, task 0xf74663c0, timed out: BLK_EH_NOT_HANDLED I can see this is also asked in this thread but I would like to know more about this error and what I can do to work around it as I really want to make sure this server is protected as soon as possible.
January 24, 201214 yr That appears it may be an issue with your SATA card in the beta. You could try another version and see if it helps. Peter
January 24, 201214 yr I would also check the SATA cables to make sure that they aren't what causes this problem...if you have some spares that you know are good, swap those in and see if the errors persist.
January 24, 201214 yr Author That appears it may be an issue with your SATA card in the beta. You could try another version and see if it helps. Peter Which version of 5.0b8+ should I try? I am using 3TB drives so I have to use b8+ right?
January 25, 201214 yr By the way, the Wiki does not mention this but when you refresh the management portal after typing initconfig all of the disks disappear from their assignments. I assume this is the desired behaviour but I would imagine you would want to warn users that they will need to take note of their old disk assignments. I think things could get hairy if you accidentally assign a data drive to the parity slot. That behavior may be unique to the beta you are using. That is not standard behavior in version 4.7.
January 26, 201214 yr Author Just to confirm, I downgraded unRAID to 5.0b8 and was able to avoid the SATA card issues. I have two AOC-SASLP-MV8 cards installed which I assume was the cause of the bad behaviour. I was able to finish a parity sync and check so I am all protected. I upgraded my beta again to 5.0b10 (so Plex Media Server will work) and am re-running a parity sync and will run a check again to double check this beta is fine as well. Are the issues with the SATA cards documented anywhere else?
Archived
This topic is now archived and is closed to further replies.