May 22, 200917 yr I recently replaced a 300GB drive with a 750GB drive (drive 9). The rebuild went without error and parity check came back clean. The next day, I attempted to create a folder on drive 6 (this drive has been in service for several weeks). This resulted in what I can best describe as a 'hang' of the Unraid server. The entire array was not responsive for ~3-5 minutes - could not access any of the other 10 drives in the array. Eventually the connection timed out and the array was responsive again. During this time I was able to telnet into the array. I do not have trouble accessing the other drives in the array. I have attached the syslog - I accessed the drive 6 and after the hang/timeout, I think accessed drive 8 in case it results in a helpful log entry. Not sure how to proceed. I see 2 options: 1 - Run diagnostic on drive 6 - maybe it will indicate/resolve an issue 2 - rebuild drive 6? This forum has been extremely helpful in the past. Thanks, Kevin
May 22, 200917 yr I recently replaced a 300GB drive with a 750GB drive (drive 9). The rebuild went without error and parity check came back clean. The next day, I attempted to create a folder on drive 6 (this drive has been in service for several weeks). This resulted in what I can best describe as a 'hang' of the Unraid server. The entire array was not responsive for ~3-5 minutes - could not access any of the other 10 drives in the array. Eventually the connection timed out and the array was responsive again. During this time I was able to telnet into the array. I do not have trouble accessing the other drives in the array. I have attached the syslog - I accessed the drive 6 and after the hang/timeout, I think accessed drive 8 in case it results in a helpful log entry. Not sure how to proceed. I see 2 options: 1 - Run diagnostic on drive 6 - maybe it will indicate/resolve an issue 2 - rebuild drive 6? This forum has been extremely helpful in the past. Thanks, Kevin Can you look at the Web screen and tell me if any of the drives have red or orange (as opposed to green) icons? I don't see any signs of trouble in the syslog. I do notice some replayed transactions indicative of a hard shutdown prior to this boot. Is this the syslog covering the unresponsive time. Or did you hard reboot and then capture this log? Hangs of the type you describe are most commonly caused (in my experience) by disturbing a cable and causing it to no longer have a secure connection on one side or the other - while adding / removing a drive. You might just try shutting down and making sure the power connections from PSU to disk6 are secure, and that the signal cable is similarly secure.
May 22, 200917 yr Author Thanks for the help. Web screen shows that all lights are green. I will try you suggestion regarding the cables and post the results. Kevin
May 22, 200917 yr might it have been something with your browser? If telnet worked, and the syslog shows nothing... it leaves your windows explorer as a possibility... Joe L.
May 22, 200917 yr Author A few more pieces of information: 1) I have been able to duplicate the symptoms from several different windows PC's (windows 7 and XP). These PC's have worked as expected until yesterday. In addition, I have been able to connect to other drives in the array without issue. The issue seems to be specific to disk6. Happy to pursue windows explorer as a cause if there are any other tests to perform. 2) When I attempted to stop the drives the web interface never came back... I then attempted a shutdown via a telnet session. This did not result in the server shutting down. Ultimately I had to power off the server. Upon reboot, Unraid has initiated a parity check which is performing normally (typically takes about 10 hours to complete). Should I attempt to do anything while parity is being checked? Typically I would run this in the evening and there wouldn't be any activity against the Unraid server. Thanks, Kevin
May 22, 200917 yr Once your parity check is completed, I'd first get a SMART report on disk6, and then I'd perform a check of the file-system on disk6 See: http://lime-technology.com/wiki/index.php/Troubleshooting#Obtaining_a_SMART_report and http://lime-technology.com/wiki/index.php/Check_Disk_Filesystems Obviously, use the device for your disk6 instead of the ones in the wiki examples. Joe L.
May 22, 200917 yr Author Thanks again for the help. I will post the results once the parity check is complete. Kevin
May 22, 200917 yr Thanks again for the help. I will post the results once the parity check is complete. Kevin You can do the SMART report at any time, no need to wait until the parity check is completed. You should wait to perform the file-system-check until after the parity check is completed... just in case it locks up once more. Joe L.
May 22, 200917 yr If the parity check is performing normally, at its usual speed, then Disk 6 must be performing normally, at its usual speed. Hopefully the test results will show more. Since you had just run a parity check, there's probably no reason to finish this one. You can always run another when the problem seems resolved. If the 'hang' happens again, try accessing Disk 6, with your typical file operations, from the physical console.
May 22, 200917 yr If the parity check is performing normally, at its usual speed, then Disk 6 must be performing normally, at its usual speed. True... but the parity check is only reading blocks of data, not trying to interpret them as a file-system. The file-system could be corrupt. Question? You don't by chance have hundreds of files in the root of disk6, do you? You should only have a few folders, one for each share you wish. The files can then go in the folders as desired. This is more of a limit on the fuser file-system than anything else. On early version of unRAID, it would crash trying to allocate enough memory for the user-file-system for all the files in the root of the disk. You can move those file more easily by turning off user-shares temporarily, if that is the case. Joe L.
May 22, 200917 yr Author Just initiated smartctl from the command line. Should have the results later this evening or in the morning. As the post indicated, I needed to install the appropriate library (thanks for the links!!) The drive only has two folders at the top level. One of the folders is 'Movies' which has about 40 subfolders.... Pretty consistent with other drives. Thanks, Kd
May 23, 200917 yr Author Smart Report for disk 6 is attached. Didn't see anything unusual (at least according to the post). I was able list the contents of the drive by navigating to /mnt/disk6 in a telnet session. The parity check is at 75% so it should be complete in the morning. I will then navigate to the drive via windows and will also run the file system check and post the results. Again - thanks for the quick response and useful instructions. Kevin
May 23, 200917 yr Author Parity check has completed - no errors (see attached web page). I ran the file system check as described in the post. It immediately returned an error message (see attached) and indicated that I should run reiserfsck with the rebuild-sb option. Interestingly, once I ran the reiserfsck, I could not list the contents of the disk6 from telnet session. It just shows . and .. entries (figured this was a result of the unmount....). Syslog attached As the post contains warnings about running reiserfsck with the rebuild-sb option, I am awaiting input from expert users. Thanks, Kd
May 23, 200917 yr Next step is to do is the reiserfsck it described. assuming you did not re-mount the disk, you just need to type reiserfsck --rebuild-sb /dev/md6 It will again ask for confirmation... Answer "Yes" Also, the reason you could not see the disk contents any longer was because you had un-mounted the disk from the mount point. After it completes, assuming it does not ask you to run a different option in reiserfsck, you can then re-mount the disk and re-start samba as described in the wiki Joe L.
May 23, 200917 yr It looks like you have probably found the problem that was hanging the system. It is somewhat surprising that the regular mount of Disk 6 did not detect it. Once this completes, I would run the regular reiserfsck again, should take a lot longer. A couple of minor points, performance related, can be done whenever you choose, *if* you choose: * Disk 10 (sdb - ST3500630AS 9QG0G4GF) appears to still have its SATA150 jumper installed. Since it is attached to a Promise card (SiI3114), you won't likely detect any performance improvement by removing that jumper, but there is no point in keeping it installed. It is designed for backward compatibility reasons to *limit* performance to SATA150, and the drive and card support SATA300 (although they won't be able to fully utilize it). See the Improving unRAID Performance, Remove SATA150 Jumper section. * Swap the cable connections of your Parity drive (hdh - ST3750640A 5QD50RBE) with Disk 7 (hdf - Maxtor 6Y250P0 Y63EDVZE). hdf has an IDE channel to itself, while hdh has to share its channel with hdg. Since both hdh and hdf are in the secondary positions, you should not have to change the jumpers. This should provide a small boost in some situations, particularly when doing simultaneous operations with Disk 8 (hdg), such as parity checks and any writes to Disk 8.
May 23, 200917 yr Author Thanks again for all the help and the additional tuning suggestions. Turns out that I needed to run reiserfsck a few times with different settings before it finally came back clean (see attachment). Mounted disk6 and restarted services. All looks good at this point. I will continue to monitor over the weekend. Should I run a parity check given the various repair activities on drive6? Thanks, Kevin
May 23, 200917 yr Thanks again for all the help and the additional tuning suggestions. Turns out that I needed to run reiserfsck a few times with different settings before it finally came back clean (see attachment). Mounted disk6 and restarted services. All looks good at this point. I will continue to monitor over the weekend. That is good news Should I run a parity check given the various repair activities on drive6? No, you do not need to... since you did it on the "md" device, any fixes made by reiserfsck also fixed the parity data as well. but... You can do one if you like... it won't hurt... (You should do one on a monthly basis, just to identify any bad blocks on the data disks as a read-failure would result in unRAID re-writing the block with data calc'ed from parity and the other disks, and that would result in a the block being re-located if it was flagged bad by the physical disk.) Thanks, Kevin You are welcome. Glad you are back up and running. Joe L.
May 23, 200917 yr One more thing. Just for information. I noticed your prompt showed your current directory was /mnt/disk5 You MUST "cd" off of the disk before you attempt to stop the array and reboot. Otherwise, disk5 will be unable to be un-mounted (since it is "busy") and the array will not stop. It will un-mount all the other drives, and those will then (incorrectly) show as un-formatted. Just log off, and press Stop once more if this occurs. Joe L.
Archived
This topic is now archived and is closed to further replies.