June 26, 20179 yr Hey all, having a problem with my Unraid system: my shares keep vanishing. I am running 6.3.5 on a system with 2X Xeon CPUs and 32GB of RAM. Recently, I have noticed a problem where my file shares vanish. They just no longer show up on the network, and are no longer listed on the shares page. If I go into the command line and navigate to /mnt/user, I get the following error: /bin/ls: reading directory '.': Input/output error A reboot usually seems to deal with this problem, but it is happening more frequently. It has happened twice today, and is proving to be a significant problem for this system. I see this in the system log: Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 131768352 bytes) in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(383) : eval()'d code on line 73 Any ideas?
June 26, 20179 yr Author Okay, figured this out. One of my drives had a loose power cable, which was causing intermittent write failures. This then filled the log file with warnings, and the log partition was full. Rather disappointing that something as simple as that could bring the whole system down, though: that seems like something that should be caught and dealt with without this kind of consequences.
June 26, 20179 yr 1 minute ago, richardb said: Okay, figured this out. One of my drives had a loose power cable, which was causing intermittent write failures. This then filled the log file with warnings, and the log partition was full. Rather disappointing that something as simple as that could bring the whole system down, though: that seems like something that should be caught and dealt with without this kind of consequences. The drive in question was probably the cache drive dropping offline. It is disappointing that that drive if it drops does take the system down, but its because its outside of the array, but still part of the user share system.
June 30, 20179 yr Author Hmm, so this just happened again. This time I wasn't doing anythng: the system was just running normally, then the shares vanished. The system kept running, but there were no shares available or listed on the shares interface page. The last error in the syslog was Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 131874848 bytes) in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(383) : eval()'d code on line 73 Any ideas?
June 30, 20179 yr When this happens, from the command prompt, (local kb/monitor or telnet/ssh), diagnostics and upload the resulting file If that hangs, then do this instead cp /var/log/syslog.txt /boot/syslog.txt and upload the file
July 5, 20178 yr Author Okay, so here is the diagnostic log. This time the system stayed up, but the log file is showing 100%, and I suspect it will dump the shares shortly. Last mesg in the syslog was the same: "Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 131883040 bytes) in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(383) : eval()'d code on line 73" I think it is something to do with a parity check. The system was running one because it completely rebooted overnight for some reason. bernard-diagnostics-20170705-1406.zip
July 8, 20178 yr Author No response on this? Pity, as I have just realized that this problem has caused corruption and data loss. I guess I'll be dumping UnRaid and moving to something that doesn't have strange bugs that randomly dump a chunk of my data.
July 8, 20178 yr Not sure if you've looked in your logs or at the dashboard, but your logs are full of read errors on disk3: disk3: (sdc) WDC_WD40EZRX, ending in DT90 Whether unRAID could or should be doing a better job not withstanding, your hardware isn't stable. Disk3 isn't giving a smart report, check cables. Edited July 8, 20178 yr by tdallen
July 8, 20178 yr Community Expert Besides the already mentioned disk3 problems disk12 has filesystem corruption, you need to run xfs_repair on it. Strike that, disk12 is disable, you didn't mentioned that, so since disk3 dropped offline unRAID can't emulate it anymore, hence the filesystem corruption and your missing data, you need to check if disk3 is still readable to try and rebuild disk12. Jul 5 11:44:21 Bernard kernel: XFS (md12): metadata I/O error: block 0xf9338bd0 ("xfs_trans_read_buf_map") error 5 numblks 32 Jul 5 11:44:21 Bernard kernel: XFS (md12): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. Edited July 8, 20178 yr by johnnie.black
July 8, 20178 yr Author Okay, this is the thing that is frustrating me. I just rebooted the system, and disk 3 is fine. No errors, reading okay. I ran a SMART short self-test, and it reports no errors: "SMART overall-health self-assessment test result: PASSED". It does report that the drive is pre-fail on read error rates and is "old age" on others, but there is certainly no indication that it is failing completely as the log files suggest. Is it about ready to be swapped out? Perhaps, but it ain't dead yet. This is after a reboot and no changes in cable, etc. On disk 12: I have no idea on that. There are no other disks on the system except for a 1TB drive that is passed directly to a media center VM to write captured media files to. I haven't taken any disks out recently. It's a phantom. I guess I can try putting in another drive and rebuilding with that, but this is exactly the kind of stuff that is making me very nervous about this. I mean, disks failing I can understand. Disks failing, then seeming to be fine after a reboot, but another disk appearing? That scares me. Fortunately, everything is backed up in the cloud, and I am in the middle of replacing this system with a simpler one with fewer (bigger) drives running something other than UnRaid.
July 8, 20178 yr Community Expert You should run an extended SMART test on Disk3, but it's on a SASLP, these are know to drop disks without reason for some users, Marvell based controllers are not recommended. I don't understand what you mean by disk12 being a phantom disk, it was a 4TB disk that is now being emulated and has about 2TB of data.
July 9, 20178 yr Author Okay, did more digging. There is (or was ) a disk 12 that had failed completely: wasn't even powering up. I have replaced it, but when I now try and rebuild, it gets about 10% of the way through, then disk 3 starts giving errors all of the time, and it doesn't seem to respond. So I can't rebuild the array as is, and I seem to have one disk that has an intermittent failure when I try. Is there any hope here, or is it time to nuke the whole thing and start again?
July 9, 20178 yr Can you move Disk3 off the SASLP? It's possible (though unlikely) that you could get better results on another SATA port.
July 9, 20178 yr Community Expert Onboard ports are all free, connect disk3 to one of then (the Intel ports, don't use the white Marvell ports), replace disk3 cables (or swap backplane) just to rule it out and try again to rebuild disk12, if disk3 fails again you have 2 problem disks and are beyond unRAID's redundancy with single parity, do a new config, you can keep the data on the good disks.
Archived
This topic is now archived and is closed to further replies.