Stale Config on Power Up, Fine after rooboot - CMOS battery?

Aaron Oz · December 16, 2018

Recently, upon power up (from a complete power down), UNRAID starts with the arrayoffline. It says "stale configuration". But all looks correct.

If I just reboot the server from there, upon reboot, the array is started automatically.

Looking at the log file (attached), I see an error that says;

kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

Would it make sense that if a CMOS battery is dead, and the motherboard can't keep time when the power is off, UNRAID would see the configuration as stale due to a time mismatch?

That theory makes sense to me because UnRaid fixes the clock and with a reboot, the motherboard keeps the updated time. So that's why it starts the array.

But I thought I'd check with those more knowledgeable than myself to make sure this isn't a sign of a potentially bigger problem.

tower-syslog-20181216-1236.zip

Frank1940 · December 16, 2018

The dead battery would certainly be the first thing I would look at. They are cheap and available about anywhere where batteries are sold. You should be able to find the Battery Number in the MB manual. They are also quite easy to replace after you open the case. You should also upload the Diagnostics file as it provides much more information than the syslog. Tools >>> Diagnostics (IF you had done that I could have hazarded a guess about the age of the battery by knowing when the MB was introduced.)

IF it is a dead battey and you made any BIOS changes, recheck and see that things are set correctly!

Edited December 16, 2018 by Frank1940

Aaron Oz · December 19, 2018

Whelp... doesn't seem to have been the battery. I just happened to put an entirely new MB into my server. All works great! First boot up, no problem.

I shut the server down last night and just restarted it this morning, and I had "State Configuration" and the array was offline. Everything was right, so I started it, no problems. I've attached the full diagnostics zip file. If anything can see anything, that would be appreciated!

tower-diagnostics-20181219-0804.zip

JorgeB · December 19, 2018

I believe the stale config message is because there were two missing disks at first:

...
Dec 19 08:03:33 Tower kernel: md: import disk3: (sdj) Hitachi_HUA722020ALA331_YBJXDE6F size: 1953514552
Dec 19 08:03:33 Tower kernel: mdcmd (5): import 4
Dec 19 08:03:33 Tower kernel: md: import_slot: 4 missing
Dec 19 08:03:33 Tower kernel: mdcmd (6): import 5 sdi 64 1953514552 0 Hitachi_HUA722020ALA331_B9G4M4NF
...
Dec 19 08:03:33 Tower kernel: md: import disk9: (sdb) WDC_WD5000KS-00MNB0_WD-WCANU2446927 size: 488386552
Dec 19 08:03:33 Tower kernel: mdcmd (11): import 10
Dec 19 08:03:33 Tower kernel: md: import_slot: 10 missing
Dec 19 08:03:33 Tower kernel: mdcmd (12): import 11 sdc 64 976762552 0 ST31000528AS_6VP35NMG
Dec 19 08:03:33 Tower kernel: md: import disk11: (sdc) ST31000528AS_6VP35NMG size: 976762552

They came online right after that:

Dec 19 08:03:38 Tower kernel: sd 13:0:3:0: [sdm] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Dec 19 08:03:39 Tower kernel: sd 13:0:3:0: [sdm] Write Protect is off
Dec 19 08:03:39 Tower kernel: sd 13:0:3:0: [sdm] Mode Sense: f7 00 10 08
Dec 19 08:03:39 Tower kernel: sd 13:0:3:0: [sdm] Write cache: disabled, read cache: enabled, supports DPO and FUA
Dec 19 08:03:39 Tower kernel: .ready
Dec 19 08:03:39 Tower kernel: sd 13:0:2:0: [sdl] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Dec 19 08:03:39 Tower kernel: sd 13:0:2:0: [sdl] Write Protect is off
Dec 19 08:03:39 Tower kernel: sd 13:0:2:0: [sdl] Mode Sense: cf 00 10 08
Dec 19 08:03:39 Tower kernel: sd 13:0:2:0: [sdl] Write cache: disabled, read cache: enabled, supports DPO and FUA
Dec 19 08:03:39 Tower kernel: sdm: sdm1
Dec 19 08:03:39 Tower kernel: sdl: sdl1
Dec 19 08:03:39 Tower kernel: sd 13:0:2:0: [sdl] Attached SCSI disk
Dec 19 08:03:39 Tower kernel: sd 13:0:3:0: [sdm] Attached SCSI disk

But it's not normal, maybe check connections on those disks.

P.S. unrelated but the onboard SATA controller is set to IDE, change to AHCI.

Frank1940 · December 19, 2018

IS this server located in a cold environment? One thing that can cause hard drives to come on line 'late' is slow spin-up of the drive motor. (Twenty-some years ago, I had a hard drive with slow spin-up on startup and I had to set the BIOS to do a long memory test rather than a quick one.)

Aaron Oz · December 19, 2018

Interesting... great observations. And Frank, your thinking is correct, except the slow spin-up drives isn't due to a cold environment. Those two disks are SAS. They are connected to a Dell PERC H310. Watching the boot sequence, I noticed that those drives get spun up slower (........). So it seems the array tries to start before the drives are fully spun up. By the time I go in there, the drives are up and connected and the array is complete.

That leads me to think a couple things;

SAS SCSI drives aren't used in UnRaid that often and they all spin up more slowly (I don't think that's true)
For some reason my H310 / SAS drives spin up more slowly than anyone else's (can't explain that, but it seems likely)
I've missed an obvious setting, just like I missed having the onboard controller in IDE VS AHCI.

FYI, that H310 is controlling two other SATA drives, too. So I don't think it's the card causing the problem.

JorgeB · December 19, 2018

It's likely related to those being SAS drives, they could be on a spin up delay or something, but if they always spin up nothing to worry about.

Aaron Oz · December 20, 2018

15 hours ago, johnnie.black said:

It's likely related to those being SAS drives, they could be on a spin up delay or something, but if they always spin up nothing to worry about.

Cool. Thanks for the input.
Yeah, they've never not spun up.
Is there a variable to tell UnRaid to delay before starting the array? If my kids or wife decide to turn on the server while I'm not around, they won't know to log in and start the array manually. It would be nice if starting the array could be delayed a few more seconds.

JorgeB · December 20, 2018

14 minutes ago, Aaron Oz said:

Is there a variable to tell UnRaid to delay before starting the array?

Not that I know of, you can try disabling spin up delay on the LSI bios, or deleting the bios completely since it's not needed, and that should also get rid of it.

JorgeB · December 20, 2018

Another option that might work is adding a delay to the go file before emhttp starts, e.g.:

#!/bin/bash
sleep 30
# Start the Management Utility
/usr/local/sbin/emhttp &

Stale Config on Power Up, Fine after rooboot - CMOS battery?

Recommended Posts

Aaron Oz

Link to comment

Frank1940

Link to comment

Aaron Oz

Link to comment

JorgeB

Link to comment

Frank1940

Link to comment

Aaron Oz

Link to comment

JorgeB

Link to comment

Aaron Oz

Link to comment

JorgeB

Link to comment

JorgeB

Link to comment

Join the conversation