January 26, 201214 yr As of this today, I am suddenly having problems accessing my unraid array. I am running 4.7. A syslog is attached below along with an image of my main page that I fortunately grabbed while the system was still responsive intermittently. The syslog shows a flurry of errors as near as I can tell. It also appears to my untrained eye to be something to do with disk7 (which is also sdi if that is relevant). I first noticed the problem when running YAMJ to update it to include some new media files I had just added to the server. I run YAMJ on my PC, having it set to put the YAMJ output on the unRAID server. This setup works so I’m sure that’s not the problem directly. The last step of running YAMJ is that it copies the changed file to the server and it was at that point that it hung. It had to copy about 2000 small files (a completely routine operation) but it would only copy a few and then hang. I tried spinning up all drives on the server and that allowed it to apparently copy a few more files but it stalled again. I then noticed that the even after spinning up the drives, the cache drive was still showing as flashing. Eventually I quit YAMJ since it refused to complete. I next rebooted both the unRAID server (which appeared to come up fine) and the PC. It was at this point that the PC started connecting to the server erratically, so I dumped the syslog. The web interface was still responding at this point, but was doing so erratically. Sometimes it couldn’t find the web interface and sometimes it could after longer than normal delays. I then attempted to stop the array but it wouldn’t enter a stopped state, at which point I took the attached snapshot which shows errors on disk7. Then the web interface wouldn’t connect at all. I tried to connect via putty but it couldn’t find the server either. The current state of the server is unknown to me. The console attached to the server is still running as of right now but I have no idea how to proceed. Can someone help interpret the log and/or give me some direction on where to go next. Thank you. PS: I can still ping the server successfully, and putty can connect using the ip address as opposed to the server name "tower". PPS: I did a smart report from the console. It is attached. It shows 10 sectors reallocated. I'm almost certain that this number was 0 about a week ago when I last checked. syslog-2012-01-26.zip smarttest.txt
January 27, 201214 yr Author Ok, more info. I had status emails set up and I received an email saying that the array had stopped. I'm attaching this email. I started poking around and found that the mnt directory had nothing in it, which I took to mean that either the array had stopped or crashed. So I went ahead and executed "powerdown -r" The system rebooted without apparent problem. The web interface is showing all disks mounted and green. From mymain I did a smart report on disk 7 and it gave me additional info showing several errors of the form: Error 6 occurred at disk power-on lifetime: 4084 hours (170 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 20 4a 83 ee Error: UNC 8 sectors at LBA = 0x0e834a20 = 243485216 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 20 4a 83 ee 00 7d+03:20:19.615 READ DMA ec 00 00 00 00 00 a0 00 7d+03:20:19.611 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 7d+03:20:19.611 SET FEATURES [set transfer mode]. There was six of these reported errors, all the same. I'm attaching this second smart report as well. From this I would assume that the safest thing to do is to remove the disk from the array and reconstruct it from a hot spare which I fortunately have. The disk that is showing errors is still under warranty and since it has already shown problems I'm not sure what the value would be of running a parity test first, but I'm willing to be persuaded if others more knowledgeable than me think it will add anything. Or, if I'm misreading something, and a different course of action is called for I'm ready to be instructed. Thanks for any help. status_email.txt smart_test_after_reboot.txt
Archived
This topic is now archived and is closed to further replies.