jj_uk Posted July 24, 2018 Posted July 24, 2018 Guys- I've just opened up my server to install a new disk, and I've noticed that one of my disks is vibrating really quite badly and is making a noise like what you'd hear if you gently place a bit of paper on a spinning fan blade, like a really fast clicking/grinding noise. So that drive is definitely failing. I've opened up the web GUI for unRAID, and its (apparently) doing a parity check. The check has been running for 18 hrs and is due to finish in 406 DAYS, yes, over a year. This normally takes ~19 hrs to complete. So something is wrong here. How do I find out which drive is failing in the GUI? Why has unRAID not sent me an email telling me a drive is / has failed? Coming from FreeNAS, I always received an email when a drive was failing. I don't know what to do apart from unplug the drive and see which one disappears, but that's not really the correct method to diagnose this, is it? I have 5 drives. 3x 8TB WD RED (3 months old) 2x 1TB 2.5" laptop drives (over 3 years old, i believe). It is one of the 8TB WD RED drives that has failed - again, 5th drive in a year, 3 were DOA, one failed during pre-clear x 3-cycles.
JonathanM Posted July 24, 2018 Posted July 24, 2018 Attach the diagnostics zip file to your next post.
jj_uk Posted July 24, 2018 Author Posted July 24, 2018 OK, attached. I checked "Anonymize diagnostics", so hopefully, its done its job and not posted any personal/private info that shouldn't be on the internet! tower1-diagnostics-20180724-2002.zip
trurl Posted July 24, 2018 Posted July 24, 2018 I didn't notice anything in the diagnostics. Are you sure the noise doesn't have some other source? Fans are a common culprit. Can you actually see the disk in the case but you don't know how to identify it? Some brands have part of the serial number on a label on the end, and the serial number of all disks is displayed in the webUI. 1 hour ago, jj_uk said: How do I find out which drive is failing in the GUI? Why has unRAID not sent me an email telling me a drive is / has failed? Coming from FreeNAS, I always received an email when a drive was failing. If any of the monitored SMART attributes change unRAID will give a warning indicator on the Dashboard for the disk. If you have configured Notifications then it will also send you a notification by email or whatever notification agent you have setup.
jj_uk Posted July 24, 2018 Author Posted July 24, 2018 OK, i've had a look at the disks and the last part of the S/N is indeed written on the end of the drive. Its Disk 3: R6GXS1XY The parity check is terribly slow. unRAID hasn't told me that there is an issue or anything. Activity started on Tue 24 Jul 2018 12:30:01 AM BST (today), finding 0 errors.Last result: 18 hours, 52 minutes. Average speed: 117.8 MB/s Parity-Check in progress... Completed: 4 %. Elapsed time: 20 hours, 42 minutes. Estimated finish: 1756 days, 19 hours, 37 minutes Something is very wrong!
jj_uk Posted July 24, 2018 Author Posted July 24, 2018 Parity WDC_WD80EFZX-68UW8N0_R6GZWG9Y - 8 TB (sde) 36 C 146.8 KB/s 264.2 KB/s 0 Disk 1 ST91000640NS_9XG0VNLC - 1 TB (sdf) 31 C 146.8 KB/s 0.0 B/s 0 xfs 1 TB 30.4 GB 969 GB Disk 2 ST91000640NS_9XG0YDQT - 1 TB (sdg) 32 C 146.8 KB/s 0.0 B/s 0 xfs 1 TB 1.03 GB 1 TB Disk 3 WDC_WD80EFZX-68UW8N0_R6GXS1XY - 8 TB (sdc) 37 C 11.6 MB/s 0.0 B/s 0 xfs 8 TB 7 TB 1 TB Disk 4 WDC_WD80EFZX-68UW8N0_R6GXW7VY - 8 TB (sdd) 37 C 146.8 KB/s 264.2 KB/s 0 xfs 8 TB 6.31 TB 1.69 TB Total Array of five devices 34.6 C 12.2 MB/s 528.4 KB/s 0 18 TB 13.3 TB 4.65 TB The faulty drive (disk 3) appears to be running at 100x faster than the good drives ?!
jj_uk Posted July 24, 2018 Author Posted July 24, 2018 I stopped the parity check. Then I clicked the button to stop the array. The status bar shows "Array Stopping•Stopping services..." It's been 20 mins since i clicked the 'stop array' button, it's still "stopping", apperently.. I'm really concerned now... What should I do next ? I need to replace this drive! EDIT: I opened a terminal and ran the 'top' command: Tasks: 661 total, 7 running, 365 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.4 us, 25.9 sy, 0.0 ni, 57.1 id, 15.4 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 32909168 total, 323232 free, 1289688 used, 31296248 buff/cache KiB Swap: 0 total, 0 free, 0 used. 29544388 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19744 root 20 0 0 0 0 R 38.9 0.0 2584:17 unraidd 6070 root 20 0 4524 1608 1496 D 17.5 0.0 47:18.42 sha256sum 12856 root 20 0 4524 1528 1412 D 17.5 0.0 52:03.08 sha256sum 13159 root 20 0 4524 1608 1496 R 17.5 0.0 35:22.47 sha256sum 16764 root 20 0 4524 1456 1340 D 17.5 0.0 35:19.99 sha256sum 663 root 20 0 4524 1444 1332 D 17.2 0.0 83:12.57 sha256sum 5293 root 20 0 4524 1372 1256 D 17.2 0.0 58:48.84 sha256sum 6763 root 20 0 4524 1456 1340 D 17.2 0.0 98:28.97 sha256sum 10629 root 20 0 4524 1528 1412 D 17.2 0.0 37:19.99 sha256sum 22814 root 20 0 4524 1452 1340 D 17.2 0.0 71:11.58 sha256sum 3949 root 20 0 4524 1552 1436 D 16.8 0.0 47:52.04 sha256sum 5042 root 20 0 4524 1512 1400 D 16.8 0.0 39:27.35 sha256sum 8726 root 20 0 4524 1456 1344 D 16.8 0.0 67:38.03 sha256sum 10584 root 20 0 4524 1516 1400 D 16.8 0.0 85:40.64 sha256sum 11694 root 20 0 4524 1600 1484 D 16.8 0.0 47:37.08 sha256sum 14513 root 20 0 4524 1460 1344 R 16.8 0.0 54:57.44 sha256sum 20008 root 20 0 0 0 0 R 16.8 0.0 66:17.91 dmcrypt_write 25067 root 20 0 4524 1440 1328 D 16.8 0.0 179:42.85 sha256sum 26249 root 20 0 4524 1372 1260 D 16.8 0.0 54:36.01 sha256sum 4714 root 20 0 4524 1608 1496 D 16.5 0.0 150:27.01 sha256sum 4824 root 20 0 4524 1516 1400 D 16.5 0.0 76:38.02 sha256sum 10556 root 20 0 4524 1464 1348 D 16.5 0.0 185:20.26 sha256sum 29552 root 20 0 4524 1488 1376 D 16.5 0.0 66:27.76 sha256sum 2111 root 20 0 4524 1488 1376 D 16.2 0.0 33:36.21 sha256sum 5775 root 20 0 4524 1600 1484 D 16.2 0.0 67:34.82 sha256sum 17769 root 20 0 4524 1528 1412 D 16.2 0.0 85:17.59 sha256sum 3721 root 20 0 4524 1460 1344 D 15.8 0.0 185:42.93 sha256sum 7394 root 20 0 4524 1528 1412 D 15.8 0.0 183:14.16 sha256sum 8037 root 20 0 4524 1440 1328 D 15.8 0.0 184:03.32 sha256sum 24607 root 20 0 4524 1488 1376 D 15.8 0.0 92:39.21 sha256sum 24939 root 20 0 4524 1516 1404 D
pwm Posted July 24, 2018 Posted July 24, 2018 Lots of sha256sum programs stuck busy looping wildly fighting for access within the kernel. You could check in /proc/<pid>/fd/ for names of open files for the different sha256sum programs.
jj_uk Posted July 24, 2018 Author Posted July 24, 2018 It appears to be the "Dynamix File Integrity" plugin reading all the movies on the server. At this point, I think I have no choice but to pull the mains power plug out; I can't stop the array; I can't request a reboot. I've lost control of the server. Not really an ideal situation to be in!
pwm Posted July 24, 2018 Posted July 24, 2018 Just now, jj_uk said: It appears to be the "Dynamix File Integrity" plugin reading all the movies on the server. At this point, I think I have no choice but to pull the mains power plug out; I can't stop the array; I can't request a reboot. I've lost control of the server. Not really an ideal situation to be in! You can't stop the array as long as you see any "D" in the state column. A process in the "D" state is 100% ignoring any attempt to kill it. But it's probably large files if it's a media server and it can take a lot of time to finish computing the checksums - what happens if you kill the processes with "R" status? Will any of the other sha256sum processes then switch from "D" to "R"? In that case you can continue to kill.
jj_uk Posted July 24, 2018 Author Posted July 24, 2018 The server just shut down. I clicked the "power down" button in the web GUI about 10 mins ago and went over to pull the plug out and it shut down just as i was walking over to it. I've pulled out the broken drive and inserted a new (different brand, same size) drive. It's now booting..........
Recommended Posts
Archived
This topic is now archived and is closed to further replies.