huladaddy Posted February 10, 2016 Share Posted February 10, 2016 I had unRAID completely lockup while I was transferring a ton of data from an offline source to one of my array disks. I could not telnet in, or access the GUI, so I had to do a hard reset. Once unRAID came back up, and I started the array, it began to rebuild the parity -- made sense to me. But some things seem to be broken. I have two docker containers that don't seem to work anymore (delugevpn and sabnzbdvpn, both by binhex). All of my docker containers seem to run, but for those two, I cannot access the webui interface. The page just sits there, waiting to load... I have methodically tried multiple things to get those two containers working, and in the end, even creating a new docker.img, deleting the docker containers and images, and starting with new, clean config/appdata directories, I can't get those two containers working anymore. Unfortunately, that's not my only problem. I just now discovered that screen no longer runs. I installed it using the Nerd Pack, and the Nerd Pack reports that it is still installed, but from the terminal, I just get a command not found. Help! Is my system broken? What else might not be working correctly? What should I do? Link to comment
trurl Posted February 10, 2016 Share Posted February 10, 2016 See link to v6 help sticky in my sig. Link to comment
huladaddy Posted February 10, 2016 Author Share Posted February 10, 2016 System Overview unRAID system: unRAID server Plus, version 6.1.6 Model: Custom Motherboard: ASUSTeK Computer INC. - M3A78-EM Processor: AMD Athlon(tm) 7750 Dual-Core @ 2.7 GHz HVM: Enabled IOMMU: Disabled Cache: L1-Cache = 256 kB (max. capacity 256 kB) L2-Cache = 1024 kB (max. capacity 1024 kB) L3-Cache = 2048 kB (max. capacity 2048 kB) Memory: 4096 MB (max. installable capacity 8 GB) BANK0 = 2048 MB, 667 MHz BANK1 = 2048 MB, 667 MHz Network: eth0: 1000Mb/s - Full Duplex Plugins: unbalance, unassigned devices, preclear disks, open files, dynamix (system statistics, system information, local master, cache directories, active streams), community applications, nerd tools (screen -- see OP). The errors being reported seem to be related to my cache drives: Feb 8 17:24:23 undrobo kernel: BTRFS: lost page write due to I/O error on /dev/sdm1 Feb 8 17:24:23 undrobo kernel: BTRFS: lost page write due to I/O error on /dev/sdm1 Feb 8 17:24:25 undrobo kernel: btrfs_dev_stat_print_on_error: 5386 callbacks suppressed Feb 8 17:24:25 undrobo kernel: BTRFS: bdev /dev/sdm1 errs: wr 4832, rd 572, flush 1, corrupt 0, gen 0 Feb 8 17:24:25 undrobo kernel: BTRFS: bdev /dev/sdm1 errs: wr 4833, rd 572, flush 1, corrupt 0, gen 0 Feb 8 17:24:25 undrobo kernel: BTRFS: bdev /dev/sdm1 errs: wr 4833, rd 572, flush 2, corrupt 0, gen 0 Feb 8 17:24:25 undrobo kernel: BTRFS: bdev /dev/sdm1 errs: wr 4834, rd 572, flush 2, corrupt 0, gen 0 Feb 8 17:24:25 undrobo kernel: BTRFS: bdev /dev/sdm1 errs: wr 4835, rd 572, flush 2, corrupt 0, gen 0 Feb 8 17:24:25 undrobo kernel: BTRFS: lost page write due to I/O error on /dev/sdm1 I have attached only part of the syslog.txt. It goes on for another 26 MB repeating the same errors. Let me know if you need to see anything else. syslog-part.txt.zip Link to comment
trurl Posted February 10, 2016 Share Posted February 10, 2016 No personal experience with this: Check Disk Filesystems Link to comment
huladaddy Posted February 10, 2016 Author Share Posted February 10, 2016 Thanks Trurl. I had another lockup, so had to reset once again. After that, one of my cache drives didn't even get recognized by unRAID (/dev/sdm). I tried rebooting one more time, and that time /dev/sdm was recognized... Began to suspect hardware. So I copied everything off the cache to one of my array disks. I changed the data cables on both cache drives. Ran scrub on them, but I'm not really too familiar with btrfs. It reported 10s of 1000s of errors in one section of the report, but in another it seemed to report no errors... Since I had copied all the data off the drives, I decided to preclear them before adding them back to cache. Do you think the data I copied off the cache will be OK, since the filesystem didn't report any errors? Should I rely on that data, or toss it? The only data I really want to keep is /mnt/cache/appdata/. If the preclear (w/ pre-reads and post-reads) reports no issues, should I trust these drives/my hardware? Link to comment
huladaddy Posted February 10, 2016 Author Share Posted February 10, 2016 Similar problems with fresh, pre-cleared and reformatted cache drives. I suspect that maybe there are problems with one or both cache drives that aren't showing up during preclear? Decided to pull the cache drives and just install docker.img and appdata onto a disk in my data array. STILL having problems. Installed a brand new docker.img, re-downloaded delugevpn from binhex and started with a new, empty appdata config directory. Webui for delugevpn still not loading. Some of my other docker containers do seem to be working, but I am using delugevpn as my litmus test. I am concerned that because it is not working, I may have other, as yet undetected problems. Link to comment
ashman70 Posted February 10, 2016 Share Posted February 10, 2016 What is the hardware in your system, have you run UnRaid before on this hardware or is this the first time? Link to comment
huladaddy Posted February 10, 2016 Author Share Posted February 10, 2016 First time, new install of unRAID on this hardware. Link to comment
ashman70 Posted February 10, 2016 Share Posted February 10, 2016 What are the specs of your hardware? Link to comment
huladaddy Posted February 10, 2016 Author Share Posted February 10, 2016 3rd post above. Link to comment
ashman70 Posted February 10, 2016 Share Posted February 10, 2016 Sorry I missed that. What BIOS are you running. Do you have PATA or IDE disabled in your bios. How many drives do you have connected to your motherboard? Is it possible to switch the port your cache drive is connected to. Have you considered redoing your UnRaid install and running it bare, meaning no plugins or dockers for a period of time to see if its stable? Have you run memtest? Link to comment
huladaddy Posted February 10, 2016 Author Share Posted February 10, 2016 BIOS Information Vendor: American Megatrends Inc. Version: 2003 Release Date: 10/12/2009 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 1024 kB Characteristics: ISA is supported PCI is supported PNP is supported APM is supported BIOS is upgradeable BIOS shadowing is allowed ESCD support is available Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported 5.25"/1.2 MB floppy services are supported (int 13h) 3.5"/720 kB floppy services are supported (int 13h) 3.5"/2.88 MB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) ACPI is supported USB legacy is supported LS-120 boot is supported ATAPI Zip drive boot is supported BIOS boot specification is supported Targeted content distribution is supported BIOS Revision: 8.14 I have 5 SATA ports on my motherboard and am using 4 of them. I also have an 8 port PCIe SATA card, of which I am using all 8 ports, and where my two cache drives were attached. Sure, I can move drives to any ports. I can also reinstall unRAID (yuck), could run it without plugins and dockers, and could try memtest. What do you suggest I do, and in what order? Link to comment
ashman70 Posted February 10, 2016 Share Posted February 10, 2016 Lets get some more details first. Is there a video card in your system if so what model? What is the model of the 8 port PCIe SATA card you are using and what slot is it in. The latest BIOS I can see appears to be 2701 from 2010, might not be a bad idea to upgrade to that first, see if any of the stability issues go away. Also maybe try starting out with one cache drive first, see how that goes. Link to comment
huladaddy Posted February 10, 2016 Author Share Posted February 10, 2016 Onboard video. IO Crest SATA III 8 Port Controller Card: Marvell 88SE9705 chipset. In the PCIe slot. I just rebooted with one cache drive disconnected, and the other one isn't even detected! BIOS upgrade first, I guess. Link to comment
ashman70 Posted February 10, 2016 Share Posted February 10, 2016 Do you have an array comprising of all your drives right now? Make sure the Iocrest card is seated properly in its slot. I assume from what you've said all the ports on the Iocrest card are populated. Make sure all the SATA cables are firmly seated at both ends. Link to comment
huladaddy Posted February 11, 2016 Author Share Posted February 11, 2016 Updated the BIOS. And yes, cable connections are one of those things that I automatically check. My array consists of 10 disks including parity. All the ports on the iocrest are populated. Just booted with my cache disks attached. Cache disk 1 is showing 34,177,702,359,547,252 writes and 0 errors... Link to comment
huladaddy Posted February 11, 2016 Author Share Posted February 11, 2016 OK. So I ran a short mem test: two full passes. No errors. Removed every plugin except preclear. Changed the cache drives from the PCIe SATA card to the connectors on the motherboard. After rebooting, still having weird errors relted to the chache disks BTRFS: error (device sdc1) in write_all_supers: 3498: errno=-5 IO failure (errors while submitting device barriers.) And then the computer hangs again... So, is my disk bad? Could a bad disk really bring the machine to a screaming halt? I've disconnected the cache disks for now, and all seems good. BUT, no matter what I do, I cannot get binhex's dockers to work again. I've done everything I can think of, I just can't get the webui to load... Any thoughts? Link to comment
ashman70 Posted February 11, 2016 Share Posted February 11, 2016 Try removing this disk from your system altogether and see how it goes. AM Link to comment
huladaddy Posted February 11, 2016 Author Share Posted February 11, 2016 Oh, that's what I meant. Data cables disconnected. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.