Recent troubles shutting down/rebooting.


Recommended Posts

1 hour ago, hansolo77 said:

 

I don't think the controller has a fan on it, but I do have a fan blowing air across the entirety of my expansion cards.  I also have all my fans set to be on full power rather than ramp up based on temps.

I haven't install fan on all hba or 10gNIC, but you need confirm enough alrflow or working temperature.

 

I have one LSI 2308 dead due to hightemp or not enough airflow. It just dead suddenly during operation and not cause intermittent problem, it is my fault on some cooling issue.

Edited by Vr2Io
Link to comment

Wow, it took about 20 minutes to get Memtest v9 to be fully up and ready to test.  About 3 minutes of black screen, then like 2-3 minutes each line of it's pre-testing detection stuff.  But I've got it up and running now.  I left everything at default, which is 4 passes with the CPU using all cores in parallel.  I'm going to go out to lunch.  When I get back, if it's already completed I'll go back into the settings and set the pass to 99 or something so it will continue to run overnight.  In a matter of like 2 minutes, it was already reporting like 48% of the first pass was completed lol.  So far no errors.

Link to comment

Good

 

34 minutes ago, hansolo77 said:

set the pass to 99

Each pass should 25min+ if I haven't remember wrong (16GB RAM, but you have 64GB). Two 4 pass I think enough, I like speedup.

 

I usually troubleshooting in that way, I like found a way which could as quick as possible to reproduce the problem first, then last after problem solve will do some long test to ensure stability.

Edited by Vr2Io
Link to comment

Good again, next, you need decide rerun parity correction check ( stop and restart until error free ) with or without rule out some hardware first.

 

Or at least swap the PCIe slot with another add on card, this no cost. If still randomly error occur then suggest change the HBA.

Edited by Vr2Io
Link to comment
7 minutes ago, hansolo77 said:

Ahhh... so a non-safe shutdown should be ON, but just scheduled checks should be OFF to prevent corruption.  Got it!  Thanks!  :)

Technically the safest action is NOT to write to any disks until you are sure what you are writing is correct, but in the case of a reboot without stopping the array first, parity has a very good chance of not staying in sync if there were writes occurring when the rug got yanked, because Unraid always tends to the filesystem writes first and then sends the parity writes.

 

Parity must be correct to successfully emulate failed drives, so statistically it's better to correct after an unclean shutdown, but in the absence of a known event that would knock parity out of sync, it's better to try to find the reason before blindly writing data.

 

It's a matter of which makes the most sense in most cases, there can be edge cases as always.

Link to comment

I installed an application called "CA Auto Turbo Write Mode".  Is this something I should uninstall?  It was a recommended addon from SpaceInvaderOne.  Not knowing much about how the file transfer stuff works I just blindly installed it.  Could this have created parity corruption as well?  I'm grasping at straws here trying to figure out what happened, and have literally disabled EVERYTHING at this point.

Edited by hansolo77
Link to comment

It's important to understand what the options are and why.

 

Turbo Write updates parity by reading the block from every disk in your array, doing math, then writing to parity. This requires all of your disks to be spun up. Some people run spun 24/7, some drive models will refuse to spin down, etc etc -- Turbo Write is faster but only if it doesn't have to wait for a disk to spin up.

 

The non-Turbo write updates parity by reading the block from parity, doing math to change it to what it should be, and writing it back. This works even if most of your drives are spun down, but can slow things down due to the time it takes to update parity.

 

 

The Auto Turbo Write plugin switches back and forth depending on wether or not your disks are all spinning. It technically should not cause any negative effects, and technically should use the best case at all times, but nothing is perfect. It's unlikely you'll need/want to remove it specifically unless you know better.

 

 

Parity corruption only happens when a disk is written to but parity is not updated, or when a data path is corrupted in transit (bad RAM/etc) or when a disk is failing somewhere. Hard shutdown, direct disk access outside the array, improper procedure repairing filesystem damage, etc etc..

Link to comment
  • 1 year later...

Just reporting back in.  The problems I had were all solved.  I have been sitting happy with no errors for months.  That is, until the start of this month.  Started getting errors again.  Rather than updating this thread, which is no longer related to shut down issues, I've created a new thread for my new troubles.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.