Jump to content

My simple file server seems to go down regularly after swapping drives


chrisb42

Recommended Posts

Hi all,

 

My server regularly becomes totally unresponsive, cannot even ssh into it, and I'm wondering whether it has to do with the new drives I have added. Before, my largest drive was 4TB, now I have replaced the parity drive and a data drive with 8TB drives. But the process of adding these drives (https://eshop.macsales.com/item/HGST/0S04012/) was problematic, despite running a pre-clear on them. Problematic as in the server just being unresponsive and having to hard reboot, even though it wasn't doing anything else.

 

A little bit of background, the hardware has been in use since the 4.x days, and running 5.x for the longest time. Earlier this year I did the upgrade to 6.x, and it went without a problem (no docker images, very few plugins). Following the upgrade, I had also started to convert some drives to XFS by following the instructions found at https://wiki.unraid.net/File_System_Conversion. I have seen a snag there as well, with the destination drive taking up 3gigs more than the original drive, 157GB instead of 154GB).

 

Here my hardware:

Model: N/A

M/B: Supermicro - X7SPA-HF

CPU: Intel® Atom™ CPU D510 @ 1.66GHz

HVM: Not Available

IOMMU: Not Available

Cache: 48 kB, 1024 kB

Memory: 4 GB (max. installable capacity 4 GB)

Network: eth0: 1000 Mb/s, full duplex, mtu 1500 

 eth1: not connected

Kernel: Linux 4.14.49-unRAID x86_64

OpenSSL: 1.0.2o

 

And here today's info from the Log button on the main screen after having to hard reboot again, but it doesn't list much in terms of errors:

Aug 31 13:22:02 Zaphod kernel: BTRFS: device fsid 4268c7ad-100b-41ae-a956-7501b9d53230 devid 1 transid 61 /dev/loop3
Aug 31 13:22:02 Zaphod kernel: BTRFS info (device loop3): disk space caching is enabled
Aug 31 13:22:02 Zaphod kernel: BTRFS info (device loop3): has skinny extents
Aug 31 13:22:02 Zaphod root: Resize '/etc/libvirt' of 'max'
Aug 31 13:22:02 Zaphod kernel: BTRFS info (device loop3): new size for /dev/loop3 is 1073741824
Aug 31 13:22:02 Zaphod emhttpd: shcmd (127): /etc/rc.d/rc.libvirt start
Aug 31 13:22:02 Zaphod root: Starting virtlockd...
Aug 31 13:22:02 Zaphod root: Starting virtlogd...
Aug 31 13:22:02 Zaphod root: Starting libvirtd...
Aug 31 13:22:02 Zaphod kernel: tun: Universal TUN/TAP device driver, 1.6
Aug 31 13:22:02 Zaphod kernel: mdcmd (46): check nocorrect
Aug 31 13:22:02 Zaphod kernel: md: recovery thread: check P ...
Aug 31 13:22:02 Zaphod kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
Aug 31 13:22:02 Zaphod kernel: Ebtables v2.0 registered
Aug 31 13:22:02 Zaphod kernel: md: using 1536k window, over a total of 7814026532 blocks.
Aug 31 13:22:04 Zaphod kernel: virbr0: port 1(virbr0-nic) entered blocking state
Aug 31 13:22:04 Zaphod kernel: virbr0: port 1(virbr0-nic) entered disabled state
Aug 31 13:22:04 Zaphod kernel: device virbr0-nic entered promiscuous mode
Aug 31 13:22:04 Zaphod dhcpcd[1464]: virbr0: new hardware address: 52:54:00:10:2f:6b
Aug 31 13:22:04 Zaphod avahi-daemon[6806]: Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1.
Aug 31 13:22:04 Zaphod avahi-daemon[6806]: New relevant interface virbr0.IPv4 for mDNS.
Aug 31 13:22:04 Zaphod avahi-daemon[6806]: Registering new address record for 192.168.122.1 on virbr0.IPv4.
Aug 31 13:22:04 Zaphod kernel: virbr0: port 1(virbr0-nic) entered blocking state
Aug 31 13:22:04 Zaphod kernel: virbr0: port 1(virbr0-nic) entered listening state
Aug 31 13:22:05 Zaphod dnsmasq[8132]: started, version 2.79 cachesize 150
Aug 31 13:22:05 Zaphod dnsmasq[8132]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
Aug 31 13:22:05 Zaphod dnsmasq-dhcp[8132]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
Aug 31 13:22:05 Zaphod dnsmasq-dhcp[8132]: DHCP, sockets bound exclusively to interface virbr0
Aug 31 13:22:05 Zaphod dnsmasq[8132]: reading /etc/resolv.conf
Aug 31 13:22:05 Zaphod dnsmasq[8132]: using nameserver 192.168.1.1#53
Aug 31 13:22:05 Zaphod dnsmasq[8132]: read /etc/hosts - 2 addresses
Aug 31 13:22:05 Zaphod dnsmasq[8132]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Aug 31 13:22:05 Zaphod dnsmasq-dhcp[8132]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Aug 31 13:22:05 Zaphod kernel: virbr0: port 1(virbr0-nic) entered disabled state
Aug 31 13:26:43 Zaphod ntpd[1527]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Aug 31 13:39:54 Zaphod kernel: md: recovery thread: P incorrect, sector=132077792

 

Any ideas what the problem could be? Looking at the logs directory of the flash drive, I have no file there since March of this year.

Thanks for any pointers,

 

-Christian

Link to comment

Upload the Diagnostics file in a new post.     Tools    >>>  Diagnostics    There may be some clues in those files.

 

Install the Fix Common Problems plugin.  Turn on the troubleshooting mode.  That will write period updates of the syslog to the logs folder/directory on your flash drive.  The next time you experience the problem, upload the latest three or four files in the logs folder in another NEW post.  

Link to comment

Well, I installed Community Applications, then Fix Common Problems, but it stays on "Scanning" even after several minutes… doesn't sound like that's normal? But going back to the previous page and back in allowed me now to enable the Troubleshooting mode. Now it's wait and see!

 

-Christian

Link to comment

Do you have a disk 9 on your system? 

 

Are any of the disk above 90% full?  (Reiserfs has a history of becoming unresponsive as it fills up.  Plus, It has not been updated since about 2010 as its chief developer is serving a prison term for killing his wife.  Thus, it has not been optimized for the latest large capacity disks.)  

Link to comment

Yes, disk #9 is the one I added to convert reiserfs to xfs. Is there an issue with having a 9th data disk? I actually added it as disk #10, so I know that's the one I added for conversion of the other drives. And yes, some drives are rather full, which is why I wanted to upgrade some of the smaller drives with bigger ones.

Link to comment

There is not a problem not having a disk 9 as far as I know.  I was more worried that there was a missing smart report which usually means that a disk is off-line.  

 

As I recall, there have been slow/hang issues with Reiserfs formatted disks when they get too full.  I don't recall ever seeing an explanation of why is is happening but some have said that is the result of some 'housecleaning' occurring that seems to take forever when the disk is nearly full.  Plus, I believe that the processor that you have is no powerhouse and that would aggravate the problem even more.  

Link to comment

Oh it's definitely not a speed demon 🙂. I chose the board for it's efficiency, being on 24/7 and all and really only being a file server.

Fix Common Problems also pointed out that there may be some issues with my Marvell-based Supermicro AOC-SASLP-MV8 card. I checked, and there is newer firmware for both my board and the controller card, which could also contribute to the issues I have seen since replacing drives. Guess I'll have to pull the server and hook up a monitor and keyboard to it…

Thanks for all the pointers!

-Christian

Link to comment
3 hours ago, chrisb42 said:

Fix Common Problems also pointed out that there may be some issues with my Marvell-based Supermicro AOC-SASLP-MV8 card.

I would suggest that you google this card (and use the term 'unraid' as search parameter).  I seem to recall that this card has fall out-of-favor because it seems to have random issues with recent unRAID releases.  I can't remember many details beyond that at this point.  But here is one thread that goes into some detail:

 

        https://forums.unraid.net/topic/39003-marvell-disk-controller-chipsets-and-virtualization/

 

 

Today, most folks are using the LSI based cards.  They can be found used at very modest prices on E-Bay.  You do need to get one of the ones that are natively in the 'IT mode' or (if they have RAID firmware) they have to 'modded' to have the 'IT mode' firmware installed.  If necessary, you can do this install yourself or look for a vendor who supplies cards with it already done. 

Link to comment

Yes, thanks, I had seen that thread. I have now updated the board's firmware, as well as the controller card firmware (and while I was at it, the IPMI firmware of the mainboard). I will now see whether it's more stable for me or not. I will resume the XFS conversion I started and see what happens. Thanks again for all the help! 👍

 

-Christian

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...