Jump to content

WD Red 8TB Hard drive is failing, but which one?


jj_uk

Recommended Posts

Posted

Guys- I've just opened up my server to install a new disk, and I've noticed that one of my disks is vibrating really quite badly and is making a noise like what you'd hear if you gently place a bit of paper on a spinning fan blade, like a really fast clicking/grinding noise.

 

So that drive is definitely failing. I've opened up the web GUI for unRAID, and its (apparently) doing a parity check. The check has been running for 18 hrs and is due to finish in 406 DAYS, yes, over a year. This normally takes ~19 hrs to complete.

 

So something is wrong here. 

 

How do I find out which drive is failing in the GUI?

Why has unRAID not sent me an email telling me a drive is / has failed?

 

Coming from FreeNAS, I always received an email when a drive was failing.

 

I don't know what to do apart from unplug the drive and see which one disappears, but that's not really the correct method to diagnose this, is it?

 

I have 5 drives.

3x 8TB WD RED (3 months old)

2x 1TB 2.5" laptop drives (over 3 years old, i believe).

 

It is one of the 8TB WD RED drives that has failed - again, 5th drive in a year, 3 were DOA, one failed during pre-clear x 3-cycles.

 

 

Posted

I didn't notice anything in the diagnostics. Are you sure the noise doesn't have some other source? Fans are a common culprit. Can you actually see the disk in the case but you don't know how to identify it? Some brands have part of the serial number on a label on the end, and the serial number of all disks is displayed in the webUI.

 

1 hour ago, jj_uk said:

How do I find out which drive is failing in the GUI?

Why has unRAID not sent me an email telling me a drive is / has failed?

 

Coming from FreeNAS, I always received an email when a drive was failing.

 

If any of the monitored SMART attributes change unRAID will give a warning indicator on the Dashboard for the disk. If you have configured Notifications then it will also send you a notification by email or whatever notification agent you have setup.

Posted

OK, i've had a look at the disks and the last part of the S/N is indeed written on the end of the drive. Its Disk 3: R6GXS1XY

 

The parity check is terribly slow. unRAID hasn't told me that there is an issue or anything.

 

Activity started on Tue 24 Jul 2018 12:30:01 AM BST (today), finding 0 errors.
Last result: 18 hours, 52 minutes. Average speed: 117.8 MB/s
Parity-Check in progress... Completed: 4 %.
 Elapsed time: 20 hours, 42 minutes. Estimated finish: 1756 days, 19 hours, 37 minutes

 

Something is very wrong!

Posted
green-on.pngParity WDC_WD80EFZX-68UW8N0_R6GZWG9Y - 8 TB (sde) 36 C 146.8 KB/s 264.2 KB/s 0  
green-on.pngDisk 1 ST91000640NS_9XG0VNLC - 1 TB (sdf) 31 C 146.8 KB/s 0.0 B/s 0 xfs 1 TB
30.4 GB
969 GB explore.png
green-on.pngDisk 2 ST91000640NS_9XG0YDQT - 1 TB (sdg) 32 C 146.8 KB/s 0.0 B/s 0 xfs 1 TB
1.03 GB
1 TB explore.png
green-on.pngDisk 3 WDC_WD80EFZX-68UW8N0_R6GXS1XY - 8 TB (sdc) 37 C 11.6 MB/s 0.0 B/s 0 xfs 8 TB
7 TB
1 TB explore.png
green-on.pngDisk 4 WDC_WD80EFZX-68UW8N0_R6GXW7VY - 8 TB (sdd) 37 C 146.8 KB/s 264.2 KB/s 0 xfs 8 TB
6.31 TB
1.69 TB explore.png
sum.pngTotal Array of five devices 34.6 C 12.2 MB/s 528.4 KB/s 0   18 TB
13.3 TB
4.65 TB

 

 

The faulty drive (disk 3) appears to be running at 100x faster than the good drives ?!

Posted

I stopped the parity check.

 

Then I clicked the button to stop the array. The status bar shows "Array Stopping•Stopping services..."

 

It's been 20 mins since i clicked the 'stop array' button, it's still "stopping", apperently..

 

I'm really concerned now... What should I do next ? I need to replace this drive!

 

EDIT: I opened a terminal and ran the 'top' command:

Tasks: 661 total,   7 running, 365 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.4 us, 25.9 sy,  0.0 ni, 57.1 id, 15.4 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 32909168 total,   323232 free,  1289688 used, 31296248 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 29544388 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
19744 root      20   0       0      0      0 R  38.9  0.0   2584:17 unraidd
 6070 root      20   0    4524   1608   1496 D  17.5  0.0  47:18.42 sha256sum
12856 root      20   0    4524   1528   1412 D  17.5  0.0  52:03.08 sha256sum
13159 root      20   0    4524   1608   1496 R  17.5  0.0  35:22.47 sha256sum
16764 root      20   0    4524   1456   1340 D  17.5  0.0  35:19.99 sha256sum
  663 root      20   0    4524   1444   1332 D  17.2  0.0  83:12.57 sha256sum
 5293 root      20   0    4524   1372   1256 D  17.2  0.0  58:48.84 sha256sum
 6763 root      20   0    4524   1456   1340 D  17.2  0.0  98:28.97 sha256sum
10629 root      20   0    4524   1528   1412 D  17.2  0.0  37:19.99 sha256sum
22814 root      20   0    4524   1452   1340 D  17.2  0.0  71:11.58 sha256sum
 3949 root      20   0    4524   1552   1436 D  16.8  0.0  47:52.04 sha256sum
 5042 root      20   0    4524   1512   1400 D  16.8  0.0  39:27.35 sha256sum
 8726 root      20   0    4524   1456   1344 D  16.8  0.0  67:38.03 sha256sum
10584 root      20   0    4524   1516   1400 D  16.8  0.0  85:40.64 sha256sum
11694 root      20   0    4524   1600   1484 D  16.8  0.0  47:37.08 sha256sum
14513 root      20   0    4524   1460   1344 R  16.8  0.0  54:57.44 sha256sum
20008 root      20   0       0      0      0 R  16.8  0.0  66:17.91 dmcrypt_write
25067 root      20   0    4524   1440   1328 D  16.8  0.0 179:42.85 sha256sum
26249 root      20   0    4524   1372   1260 D  16.8  0.0  54:36.01 sha256sum
 4714 root      20   0    4524   1608   1496 D  16.5  0.0 150:27.01 sha256sum
 4824 root      20   0    4524   1516   1400 D  16.5  0.0  76:38.02 sha256sum
10556 root      20   0    4524   1464   1348 D  16.5  0.0 185:20.26 sha256sum
29552 root      20   0    4524   1488   1376 D  16.5  0.0  66:27.76 sha256sum
 2111 root      20   0    4524   1488   1376 D  16.2  0.0  33:36.21 sha256sum
 5775 root      20   0    4524   1600   1484 D  16.2  0.0  67:34.82 sha256sum
17769 root      20   0    4524   1528   1412 D  16.2  0.0  85:17.59 sha256sum
 3721 root      20   0    4524   1460   1344 D  15.8  0.0 185:42.93 sha256sum
 7394 root      20   0    4524   1528   1412 D  15.8  0.0 183:14.16 sha256sum
 8037 root      20   0    4524   1440   1328 D  15.8  0.0 184:03.32 sha256sum
24607 root      20   0    4524   1488   1376 D  15.8  0.0  92:39.21 sha256sum
24939 root      20   0    4524   1516   1404 D  

 

Posted

Lots of sha256sum programs stuck busy looping wildly fighting for access within the kernel.

 

You could check in /proc/<pid>/fd/ for names of open files for the different sha256sum programs.

Posted

It appears to be the "Dynamix File Integrity" plugin reading all the movies on the server.

 

At this point, I think I have no choice but to pull the mains power plug out; I can't stop the array; I can't request a reboot. I've lost control of the server. Not really an ideal situation to be in!

 

 

 

 

Posted
Just now, jj_uk said:

It appears to be the "Dynamix File Integrity" plugin reading all the movies on the server.

 

At this point, I think I have no choice but to pull the mains power plug out; I can't stop the array; I can't request a reboot. I've lost control of the server. Not really an ideal situation to be in!

 

 

 

 

 

You can't stop the array as long as you see any "D" in the state column. A process in the "D" state is 100% ignoring any attempt to kill it.

 

But it's probably large files if it's a media server and it can take a lot of time to finish computing the checksums - what happens if you kill the processes with "R" status? Will any of the other sha256sum processes then switch from "D" to "R"? In that case you can continue to kill.

Posted

The server just shut down.

I clicked the "power down" button in the web GUI about 10 mins ago and went over to pull the plug out and it shut down just as i was walking over to it.

 

I've pulled out the broken drive and inserted a new (different brand, same size) drive.

 

It's now booting..........

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...