Disks dying...normal or am I doing something wrong?

June 26, 201610 yr

Recently upgraded my 20 disk 52TB array to v6.1.9 and decided (for whatever reason) to change all of my drives to xfs file system.

I had enough free space (one completely empty 6TB drive) so I used variations the following command at the terminal to copy to new disks:

cp -r -v -p /mnt/disk4/* /mnt/disk8 | todos > /boot/disk4copy.txt

Once the copy had completed, I'd make sure that everything had copied, then I would reformat the drive that I had copied from.

I did this one at a time for a few of the larger drives, but, eventually had enough free space on a few drives to do three of the smaller drives at a time.

Everything seemed to be working perfectly. I completed copying the last three drives, made sure that the data had copied, and then reformatted the last three drives.

Once formatting was complete, one of my drives that I had just finished copying data to popped up as red balled. OK, well it's kind of old. I'll just replace it.

It's an old 2TB WD Green drive, so I replace it with a shiny new WD Red 6TB drive. Proceed with data rebuild. Data rebuild completes annnnd now there's another red balled drive.

The server is about 6 years old. I don't know how old each of the failing drives are but they were disk 5 & 8, so fairly old. WD Green drives.

I have a new drive on order to replace the most recently failed one.

I suspect the all of the copying, formatting, and data rebuild has just been enough to stress a few of my older drives and convinced them to give up the ghost.

However, I'm writing to inquire if there is something obviously wrong with what I've done above that is going to lead to a new disk "automatically" falling out when I replace the newest failed drive, or if you guys just think that it's some old drives dying. The server has seen A LOT more activity over the past 1.5 weeks than it usually does.

Many Thanks

John

Quote

June 26, 201610 yr

Recently upgraded my 20 disk 52TB array to v6.1.9 and decided (for whatever reason) to change all of my drives to xfs file system.

I had enough free space (one completely empty 6TB drive) so I used variations the following command at the terminal to copy to new disks:

cp -r -v -p /mnt/disk4/* /mnt/disk8 | todos > /boot/disk4copy.txt

Once the copy had completed, I'd make sure that everything had copied, then I would reformat the drive that I had copied from.

I did this one at a time for a few of the larger drives, but, eventually had enough free space on a few drives to do three of the smaller drives at a time.

Everything seemed to be working perfectly. I completed copying the last three drives, made sure that the data had copied, and then reformatted the last three drives.

Once formatting was complete, one of my drives that I had just finished copying data to popped up as red balled. OK, well it's kind of old. I'll just replace it.

It's an old 2TB WD Green drive, so I replace it with a shiny new WD Red 6TB drive. Proceed with data rebuild. Data rebuild completes annnnd now there's another red balled drive.

The server is about 6 years old. I don't know how old each of the failing drives are but they were disk 5 & 8, so fairly old. WD Green drives.

I have a new drive on order to replace the most recently failed one.

I suspect the all of the copying, formatting, and data rebuild has just been enough to stress a few of my older drives and convinced them to give up the ghost.

However, I'm writing to inquire if there is something obviously wrong with what I've done above that is going to lead to a new disk "automatically" falling out when I replace the newest failed drive, or if you guys just think that it's some old drives dying. The server has seen A LOT more activity over the past 1.5 weeks than it usually does.

Many Thanks

John

Tools - Diagnostics before you reboot

Quote

June 26, 201610 yr

Author

Thanks,

Which of the files would you be interested in seeing, or should I just attach the whole zip file?

Quote

June 26, 201610 yr

Thanks,

Which of the files would you be interested in seeing, or should I just attach the whole zip file?

The whole zip

Quote

June 26, 201610 yr

Author

It is linked here in my Dropbox: https://dl.dropboxusercontent.com/u/5155002/tower-diagnostics-20160626-1702.zip

Forum says that it too large to attach.

I'm sure that I have rebooted since the first failed drive, I don't think that I have done so since the second one.

Much appreciated.

Quote

June 29, 201610 yr

The system ran fine for the first 4 days, until you decided to replace Disk 8. You replaced sdl (WD20EARS 2TB, 416) with sdv (WD Red 6TB, VZA), then started the array and it began the rebuild, but only 23 minutes into it 2 drives became unresponsive, Disk 5 (sdo, WD20EARS, 867) and the just replaced drive (the old Disk 8, sdl, WD20EARS, 416). They were quickly dropped by the kernel, and you can completely ignore the next 128MB of garbage, all drive errors but on drives that had been disabled. There is no clue evident as to why they stopped responding.

One guess (and that's all it is), because it happened so close to the start of the rebuild, perhaps the last drive added tipped the scale on power draw. You might see if it's possible you may now be too close to the maximum number of drives your power supply can support.

Other comments -

* Make sure that sdv (the new Disk 8, 'VZA') is tightly connected, as it appeared to come loose for a moment.

* There were numerous messages of "shfs/user: cache disk full". The Mover is running each morning, but there's no evidence it's moving anything.

* You have IDE emulation turned on for your SATA drives (you are using ata_piix). When you next boot, go into the BIOS settings and look for the SATA mode, and change it to a native SATA mode, preferably AHCI if available, anything but IDE emulation mode. It should be slightly faster, and a little safer.

* Your network settings need a little attention. You had it set originally to use the 192.168.0.* subnet, and your DNS settings are set to use 192.168.0.1. But I'm guessing you have a new router, that is using the 192.168.1.* subnet, and you have changed to use DHCP. So your DNS server is still on 192.168.0.1, but you now have your IP as 192.168.1.154 and gateway of 192.168.1.1. I suspect you could have DNS issues, although you do have the Google DNS server as a backup.

* Personally, I think you are better off using a static IP, but make sure you use the correct .0 or .1 subnet. There are a lot of DHCP renewals in the syslog, and some small issues that go away with a static IP.

* Just a suggestion, you may want to try a networking tip or 2 from the Tips and Tweaks wiki page (use the Tips and Tweaks plugin), to see if it decreases the number of packet drops you are getting. They are harmless, but can have a small effect on network speed.

Quote

June 30, 201610 yr

Wow, RobJ that is quite a consult. You know some people get paid for this type of advice

Quote

June 30, 201610 yr

Author

Wow, Rob, many thanks for the assessment.

Yes, I became aware of the cache disk issue during the reformatting process. I was planning on addressing that once I was done reformatting, but then these drive issues popped up.

I thought that I had the server on a static ip, but I replaced my router around the time that I upgraded to v6. That must have gotten lost during one of these changes.

I have a 750W PSU, so I don't think that I'm having power issues, but I haven't done the calculation recently. Thanks for reminding me to do so.

During the first data rebuild, I tried to run preclear on the first failed drive (the old Disk 8, sdl, WD20EARS, 416) and it failed, quickly. On my end the data rebuild on disk 8 took many hours and appears to have completed successfully (all the data is there).

I had already started the data rebuild on the second failed disk before your response. It has completed and all the data appears to be intact on both the new disk 5 and new disk 8. Despite my bumbling, it looks like all of my data is still intact somehow.

However, I do have a new problem. I've lost my user shares. I can see each of their .cfg files on the flash drive, but I cannot access the shares. The web GUI says that there are no User shares. When I attempt to add one back, I get a message that Share "x" was been deleted. I get the same message if I try to add it a second time.

So, besides the mover & ip issues mentioned above, all seems to be well except my ability to use/access/recreate my user shares.

Any more generous tips?

I greatly appreciate your response

John

Quote

June 30, 201610 yr

... However, I do have a new problem. I've lost my user shares. I can see each of their .cfg files on the flash drive, but I cannot access the shares. The web GUI says that there are no User shares. When I attempt to add one back, I get a message that Share "x" was been deleted. I get the same message if I try to add it a second time.

This happens when an ad-blocker is active. You need to either white-list the GUI or disable the ad-blocker.

Quote

June 30, 201610 yr

Wow, RobJ that is quite a consult. You know some people get paid for this type of advice

There have been a number of times, especially with my current financial state (health issues have kept me from working much this year), that I have wished for some way to make money doing this, but this is a community, a nice one, and doing it for money seems like the wrong 'paradigm'(?). As a lonely guy with Asperger traits, your comments, and that of others, have meant much more than you could know! And it's always seemed very wrong to even think of money, when I've received some appreciation. It's comments like these that keep me going.

Quote

June 30, 201610 yr

Community Expert

Wow, RobJ that is quite a consult. You know some people get paid for this type of advice

There have been a number of times, especially with my current financial state (health issues have kept me from working much this year), that I have wished for some way to make money doing this, but this is a community, a nice one, and doing it for money seems like the wrong 'paradigm'(?). As a lonely guy with Asperger traits, your comments, and that of others, have meant much more than you could know! And it's always seemed very wrong to even think of money, when I've received some appreciation. It's comments like these that keep me going.

I always enjoy reading your posts, so well though out and with so much information, I learn a lot from them, look forward to continue reading them, paid or unpaid

Quote

June 30, 201610 yr

I always enjoy reading your posts, so well though out and with so much information, I learn a lot from them, look forward to continue reading them, paid or unpaid

Totally agree, and don't forget all the wikis and guides...

Quote

Disks dying...normal or am I doing something wrong?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)