Jump to content

Cannot start array following Cache Balance Failure


Husker_N7242C

Recommended Posts

Hi everyone,

 

I've had a power-failure whilst performing a balance on the cache. This has resulted in UNRAID being essentially un-bootable (syslog is attached).

I can boot to the GUI however the webGUI and array does not start.

If I boot in safe mode I can access the webGUI however if I attempt to start the array the webGUI stops and the array doesn't start.

I cannot powerdown from terminal after attempting to start the array and am forced to shutdown ungracefully.

 

I originally had a single Samsung 850 EVO 250G cache drive. I added a second (identical) drive and started the balance to RAID0 (metadata to RAID1). At around 7 of 300+ chunks I had a power-failure and the UPS' USB cable was unplugged!

 

The full syslog is attached however my novice eyes spot the following:

Feb 11 18:01:07 NAS emhttp: shcmd (88): mkdir -p /mnt/cache
Feb 11 18:01:07 NAS emhttp: shcmd (89): set -o pipefail ; mount -t btrfs -o noatime,nodiratime -U f8a25ee9-2d47-4954-8a71-58ac2c86f71f /mnt/cache |& logger
Feb 11 18:01:07 NAS kernel: BTRFS info (device sde1): disk space caching is enabled
Feb 11 18:01:07 NAS kernel: BTRFS info (device sde1): has skinny extents
Feb 11 18:01:07 NAS kernel: BTRFS info (device sde1): detected SSD devices, enabling SSD mode
Feb 11 18:01:07 NAS emhttp: err: shcmd: shcmd (89): exit status: -119
Feb 11 18:01:07 NAS emhttp: mount error: No file system (-119)
Feb 11 18:01:07 NAS emhttp: shcmd (90): umount /mnt/cache |& logger
Feb 11 18:01:07 NAS kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000035c
Feb 11 18:01:07 NAS kernel: IP: [<ffffffff812de812>] flush_space+0x44/0x472
Feb 11 18:01:07 NAS kernel: PGD 5ff87e067 
Feb 11 18:01:07 NAS kernel: PUD 5ff87f067 
Feb 11 18:01:07 NAS kernel: PMD 0 
Feb 11 18:01:07 NAS kernel: 
Feb 11 18:01:07 NAS kernel: Oops: 0000 [#1] PREEMPT SMP

I'd really appreciate a hand. Thanks very much in advance.

 

HARDWARE:

Asrock x79 Extreme 11

Intel E2670 (OC to 3.4GHz)

24GB DDR3 Non-ECC in 8 DIMMS (4x4GB + 4x2GB)

GTX Titan (passthrough to VM)

SoundBaster PCIe Card (passthrough to VM)

13x SATA HDD (including single parity)

2x SATA SSD

 

syslog.txt

Link to comment
2 hours ago, Husker_N7242C said:

Intel E2670 (OC to 3.4GHz)

 

I'm not saying this was responsible, but it is never a good idea to overclock a device such as a server where reliability and data integrity are two of of the primary goals.  You just never can be sure when the worst case tolerances in the various parts will align to cause a problem.

Link to comment
3 hours ago, johnnie.black said:

If you have important data on the cache pool try to recover using the steps below, after it's done you need to reformat your pool:

 

https://lime-technology.com/forums/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490

 

P.S. consider getting an UPS.

Thanks Johnnie. I'll give this a go tomorrow evening and let you k ow how it goes. Thanks for the quick response! The UNRAID community is fantastic.

Link to comment
2 hours ago, S80_UK said:

 

I'm not saying this was responsible, but it is never a good idea to overclock a device such as a server where reliability and data integrity are two of of the primary goals.  You just never can be sure when the worst case tolerances in the various parts will align to cause a problem.

Thanks S80. Your comments are certainly generally true but I dont think are valid here. The 2670 has a locked clock so OC is pretty limited and would be capable of alot more if not for artificial limits. The Mobo I chose has power delivery better than everything that I could find (not to mention 14x SATA and 7xPCIe etc and the power supply has a 72amp 12v rail. The CPU and bridge stay under 60 degrees air cooled under full load, the whole system is about 300watts and I actually have custom blocks for the motherboard to fully water cool it but it didnt end-up being necessary because I chose 8c16t over higher clock speeds. I ran the setup for weeks as a standalone PC before getting around to migrating my UNRAID server. 

 

I haven't had an issues with crashes or data corruption. 

 

I'd say OC a server is actually not a bad thing. Every CPU will have slightly different capabilities and stock clocks are set for temps using a crappy stock cooler. With newer setups you can often OC only certain cores which is perfect for UNRAID... you can pin and OC your PLEX and VM cores only.

Link to comment
12 hours ago, Husker_N7242C said:

The 2670 has a locked clock so OC is pretty limited

 

Since the clock multiplier is locked what are you changing to achieve your modest overclock? If you've increased the basic 100 MHz clock I would say that's a bad thing to do (compared with changing the multiplier) because then you're not just overclocking the CPU but pushing everything else out of spec, too.

Link to comment
On 12/02/2018 at 1:14 PM, John_M said:

 

Since the clock multiplier is locked what are you changing to achieve your modest overclock? If you've increased the basic 100 MHz clock I would say that's a bad thing to do (compared with changing the multiplier) because then you're not just overclocking the CPU but pushing everything else out of spec, too.

This is getting quite off topic. I do understand this and as I've said, it's stable, runs cool and has excessively adequate PSU and motherboard and was extensively tested, benchmarked and used for gaming before becoming an UNRAID machine. If I ever have trouble with stability I will start a thread on a gaming/overclocking forum.

Link to comment
On 2/11/2018 at 8:57 PM, johnnie.black said:

If you have important data on the cache pool try to recover using the steps below, after it's done you need to reformat your pool:

 

https://lime-technology.com/forums/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490

 

P.S. consider getting an UPS.

@johnnie.black I had a go at mounting the file system read only (non destructive) as suggested. I don't have the Linux skills to know if it mounted successfully or to move files using the terminal. I didn't get an error/message at all. The article suggests that if it mounts I should move the files using Midnight Commander however I can only get as far as a WEBGUI if I boot in safe mode and do not start the array/docker. Are you able to provide any further advice? Thanks again!

Link to comment
45 minutes ago, johnnie.black said:

Midnight Commander runs without the array started, if you followed the FAQ example just browse to /x and see if the cache contents are there, also you should be able to start the array if you unassign both cache devices.

@johnnie.black Mate, solved, thanks again. The mount point was present and data seems intact.

For any other novices like me, just type "MC" at the command line and you get a DOS-looking file manager. This seems to be assumed knowledge in the article that Johnnie linked. I've never discovered it before and find Dolphin the most dependable in Docker (I prefer Krusader but it just seems to fudge itself and need reinstalling quite often).

Link to comment
2 minutes ago, Husker_N7242C said:

@johnnie.black Mate, solved, thanks again. The mount point was present and data seems intact.

For any other novices like me, just type "MC" at the command line and you get a DOS-looking file manager. This seems to be assumed knowledge in the article that Johnnie linked. I've never discovered it before and find Dolphin the most dependable in Docker (I prefer Krusader but it just seems to fudge itself and need reinstalling quite often).

Yep, just added that to the FAQ to make easier for the future.

 

P.S. @bonienl Here's an example of why I asked to disable the stop array button when doing a btrfs balance/replace, this one was the result of a power failure but a reset press could have the same result, so thanks for the extra work, needed and very much appreciated ?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...