Repeatedly getting red ball drives

May 12, 201214 yr

I'm running 5.14 beta, been up and running for about 3 months now. The problem I'm having is that I keep getting red ball disks but when I check smart reports they look fine. This has happened 4 times now and each time I rebuild the disks from parity and everything works fine again. The red ball has happened on 3 different disks, every time has been while the array was running idle, and I have change all cables going to the drives in question. I don't have any linux or command line experience at all and I had a tough time getting this server set up, configured, and all the plugins I wanted installed. I'm wondering if maybe I've done so much incorrectly setting up my server that I've somehow caused these problems. I was thinking I would just start over with unRAID and do a fresh install, especially since it looks like 5.0 may be out any day now. To do this and maintain my user shares I would just need to save my config folder, correct?

Attached are the smart reports of the disks as well as my syslog from this most recent occurrence. Any help would be greatly appreciated. I'm really afraid that one of these times I'm going to end up losing some data.

Quote

May 12, 201214 yr

What MB?

Quote

May 12, 201214 yr

Also, what HBA/Raid card?

Quote

May 12, 201214 yr

Author

Mobo is a Biostar A880G+, the HBA/raid card is a Supermicro-AOC-SASLP-MV8-8-Port-SAS-SATA-Add-on-Card. I built my server by following Raj's 12 drive budget build. All of the drives that have had problems have been connected to the mobo sata ports. Come to think of it, over the past two months that I've been having this problem I've switched drives and cables around trying to resolve the problem. Looking back, I believe that I now have had this problem with all 4 of the sata ports on the motherboard.

Quote

May 12, 201214 yr

Mobo is a Biostar A880G+, the HBA/raid card is a Supermicro-AOC-SASLP-MV8-8-Port-SAS-SATA-Add-on-Card. I built my server by following Raj's 12 drive budget build. All of the drives that have had problems have been connected to the mobo sata ports. Come to think of it, over the past two months that I've been having this problem I've switched drives and cables around trying to resolve the problem. Looking back, I believe that I now have had this problem with all 4 of the sata ports on the motherboard.

If what you are saying is correct, then maybe the problem is with the MB sata ports.

Might want to keep track to see if it occurs on the same sata port connection in the future.

Or if you have 8 or less drives, put them all on the MV8 and see if your problem goes away.

Or get a 2nd MV8 to get the drives off the MB ports.

Just some thoughts.

Quote

May 12, 201214 yr

Need ro see more of the syslog. Are there any additional files in /var/log?

Quote

May 12, 201214 yr

Author

Putting all the drives on the MV8 is an idea I haven't thought of. That may be worth a try.

Need ro see more of the syslog. Are there any additional files in /var/log?

There are other files in there, but nothing in the form of other log files I don't think. Remember, I really don't know much about this stuff so I may miss something obvious without knowing it. All I see are some directories and 4 files: cron, dmesg, monthly_parity_check, and syslog. The syslog I uploaded I got from the System Log icon under the Utils tab of the unRAID web browser. Last night I did a data-rebuild on the red balled disk, so everything is back to normal today. I can get you another syslog a different way if it would help, but since I restarted the server and did a data-rebuild I don't know if that will help. Let me know what I can do.

Thank you for the help so far everyone. I really do appreciate it.

Quote

May 12, 201214 yr

Do you see "syslog.1"?

Quote

May 12, 201214 yr

I'm running 5.14 beta, been up and running for about 3 months now. The problem I'm having is that I keep getting red ball disks but when I check smart reports they look fine. This has happened 4 times now and each time I rebuild the disks from parity and everything works fine again. The red ball has happened on 3 different disks, every time has been while the array was running idle, and I have change all cables

The basic issue is that most of the 5.0beta versions did not leave enough time for the disks to spin up before marking a disk as un-responsive and marking it as disabled.

I think that this has been addressed in the latest 5.0rc3 release.

Joe L.

Quote

May 12, 201214 yr

Author

No, there is no "syslog.1". I'm using midnight commander on the unraid console to navigate to /var/log. This is ok right?

I'm running 5.14 beta, been up and running for about 3 months now. The problem I'm having is that I keep getting red ball disks but when I check smart reports they look fine. This has happened 4 times now and each time I rebuild the disks from parity and everything works fine again. The red ball has happened on 3 different disks, every time has been while the array was running idle, and I have change all cables

The basic issue is that most of the 5.0beta versions did not leave enough time for the disks to spin up before marking a disk as un-responsive and marking it as disabled.

I think that this has been addressed in the latest 5.0rc3 release.

Joe L.

I definitely will be upgrading to 5.0, and really think I want to start fresh and do a more precise job of installing plugins and configuring things this time. I don't think simply updating will accomplish this but I may be wrong. I'm open to whatever suggestions you guys think is best for my situation.

Quote

May 12, 201214 yr

Use the stock distribution go file and no add-ons will load. Then you can reenable desired add-ons once its working

Quote

May 21, 201214 yr

Author

I'm running 5.14 beta, been up and running for about 3 months now. The problem I'm having is that I keep getting red ball disks but when I check smart reports they look fine. This has happened 4 times now and each time I rebuild the disks from parity and everything works fine again. The red ball has happened on 3 different disks, every time has been while the array was running idle, and I have change all cables
The basic issue is that most of the 5.0beta versions did not leave enough time for the disks to spin up before marking a disk as un-responsive and marking it as disabled.

I think that this has been addressed in the latest 5.0rc3 release.

Joe L.

I upgraded to 5.0rc3 by reformatting my flash drive and doing a clean install. I removed all plugins and add-ons. The server ran for 4 days before it gave me another red disk, so either it hasn't been fixed or something else is going on. Any ideas?

I think I'm going to switch everything over to the SATA controller and see how that works for me. I don't think I've had a disk connected to the controller fail on me yet.

Quote

May 25, 201214 yr

Author

I changed all my drives over to the Supermicro MV8 add-on card and I am still getting disk errors. I'm now getting these errors much more frequently, every three to four days, and I'm now getting multiple disk errors. Attached is a screen shot of what I had today with all drives on the MV8. Initially, with the array running, disk 1 was red balled and all other disks were green, all disks were spun down. I stopped the array and the attached screen shot shows what I saw.

I don't know much about computers and was only able to build my server with the help of users on this forum. I really need help troublshooting this problem. I'm guessing this is probably a hardware problem? Is it time to start replacing parts? Anything else I can to do further get to the bottom of this?

I'll have the log file up when I get home from work in the morning.

Quote

May 25, 201214 yr

I changed all my drives over to the Supermicro MV8 add-on card and I am still getting disk errors. I'm now getting these errors much more frequently, every three to four days, and I'm now getting multiple disk errors. Attached is a screen shot of what I had today with all drives on the MV8. Initially, with the array running, disk 1 was red balled and all other disks were green, all disks were spun down. I stopped the array and the attached screen shot shows what I saw.

I don't know much about computers and was only able to build my server with the help of users on this forum. I really need help troublshooting this problem. I'm guessing this is probably a hardware problem? Is it time to start replacing parts? Anything else I can to do further get to the bottom of this?

I'll have the log file up when I get home from work in the morning.

Something else is going on (I think) since the two drives are marked as "wrong' (as if they changed their serial number or their size changed)

If you can attach a syslog to your next post.

Looking at your screen shot... the disks did change their size... dramatically.

They went from 3TB to over 461TB each. Did you change the server time? (to a date about 20 years from now? )

(I'm going to see if I can move this thread to the new rc3 support forum so it gets noticed by Tom @ lime-tech.)

Joe L.

Quote

May 25, 201214 yr

Author

Here are the logs from this time around. I had 3 syslog files this time, syslog, syslog.1 and syslog.2, but they all look similar to me. Also, regarding the size of the drives, I didn't even notice that they were at 461TB each. Anytime I navigate away from and then back to the devices page the size of the drives increases by several hundred GBs. It also happens if I simply click the "Main" tab over and over. Each time I click it the size increases. In addition, the number to the right of "sdx" for those drives also increase with each click. I took another screenshot. I'm starting to get pretty worried here.

File name: syslog.txt.zip File size: 2.2 MB

File name: syslog_1.txt.zip File size: 3.26 MB

File name: syslog_2.txt.zip File size: 641.31 KB

Quote

May 25, 201214 yr

The disks are probably just fine. It is unRAID which is calculating the sizes incorrectly.

(unless you really do have a few 1.13PB drives. 8) 8) 8))

Joe L.

Quote

May 25, 201214 yr

The site you used to upload your files says only the owner can download them.

"FileServe can only be used to download and retrieve files that you have uploaded personally. "

They are not usable there.

Quote

May 25, 201214 yr

Author

Ok, I tried a new site. Hopefully the files are downloadable now.

Quote

May 29, 201214 yr

Author

I've been without access to any data on my array for about a week now hoping that I'd be able to get this thing figured out. Since I don't know enough about this stuff to try and figure it out on my own, instead of troubleshooting further I think what I may try is completely starting over. Starting over meaning buying a new flash drive and a couple new hard drives and throwing them in the box and setting up a new 3 drive server with nothing on it. Then add a few hundred gigs of data from one of my current disks and wait and see how it works for a few weeks. If it looks good then I'll start copying my drives containing data to the new disks and after each drive is copied I'll preclear it and add it to the array. If I still have problems then I'll assume it's hardware related and start replacing parts. Anyone care to weigh in with thoughts on this? I'm still open to other options, just don't want to wait around forever doing nothing if no one has any ideas on what's causing my problem.

Quote

May 29, 201214 yr

Finally got to look at the files you uploaded.

Disk1 has some file-system corruption you need to fix using reiserfsck. See the wiki for details on how.

http://lime-technology.com/wiki/index.php/Check_Disk_Filesystems

May 11 05:34:29 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 54527 does not match to the expected one 2 May 11 05:34:29 Tower kernel: REISERFS error (device md1): vs-5150 search_by_key: invalid format found in block 123154489. Fsck? May 11 05:34:29 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 54527 does not match to the expected one 2 May 11 05:34:29 Tower kernel: REISERFS error (device md1): vs-5150 search_by_key: invalid format found in block 123154489. Fsck?

Quote

May 29, 201214 yr

Author

Ok, I'll do that as soon as I get a chance to be home. That is assuming I turn my server back on and I am able to even start the array. If when I turn it on the parity drive, disk1 and disk2 are all still bad then I'll have another obstacle to overcome before I can try and fix the filesystem.

Quote

May 31, 201214 yr

Author

I turned my server back on this morning and it started up with no problems, parity and disk2 were read as normal (thank goodness) and disk1 was red balled. I started the array and ran reiserfsck and the output was:

No curruptions found
There are on the filesystem:
              Leaves 716879
              Internal nodes 4288
              Directories 139
              Other files 2318
              Data block pointers 725213808 (0 of them are zero)
              Safe links 0

I was not given the option of fixing anything, I suppose because no corruptions were found. Should I do a data rebuild on disk1 and then run reiserfsck again? I'm confused why it didn't find any problems if the log file indicated there was one.

Quote

May 31, 201214 yr

If you ran reiserfsck on the array with the disk disabled, you ran it on the SIMULATED drive as re-constructed through parity.

It did not use the physical disk, since it was disabled.

The disk was disabled because a write to it failed. If you post a syslog for analysis, it might help.

After capturing the current syslog, hopefully giving us clues to the failure, You should stop the array, power down, re-seat the cables to that failed disk, and then see if it responds to a smart report request.

(I am assuming you are using the rc3 release... if not, DO NOT DO ANYTHING until you get specific instructions and attach a syslog to your next post regardless.)

Quote

May 31, 201214 yr

Author

First, yes, I am running rc3.

Yes, I did run it with the disk still red balled which I guess is disabled. Your explanation of the check being performed on the simulated drive makes sense. Attached is the syslog.

As for smart reports, I ran the command...

 smartctl  -a  -d  ata  /dev/sda

and the output is also attached.

Thank you for the help so far.

syslog.zip

smart.txt

Quote

June 19, 201214 yr

Author

I bought a new hard drive and replaced disk1 in my array in hopes that would fix the file-system problems and get my server running good again. It ran for 4 days and then I had another red balled drive. This time it's disk2. Attached is the syslog. I looked through it but don't know what I'm looking at or how to interpret it. Like before, I'd really appreciate some help.

syslog-2012-06-19.txt.zip

Quote

Repeatedly getting red ball drives

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)