August 29, 201411 yr Hi, Since Beta 7 release, I updated my Flash drive and change my default boot to the default (was previously booting to Xen Kernel) then boot to Beta7. I did start migrating my array to XFS filesystem. My issue is that since 3-4 days, my server crash randomly at least twice a day. I don't know if it's because I'm copy massive data at the same time. I alway Rsync in 3 different Screen at the same time and I move 3 x 1.5TB to new drives freshly format to XFS. Attach is a screenshot from my IPMI console redirection to see the error. It's always the same crash... only happening since Beta7. I NEVER had a server crash before since I built it in april. I always ran on Beta 6, never on a 5.x Unraid. Any help, ideas etc. ??
August 30, 201411 yr Hi, Since Beta 7 release, I updated my Flash drive and change my default boot to the default (was previously booting to Xen Kernel) then boot to Beta7. I did start migrating my array to XFS filesystem. My issue is that since 3-4 days, my server crash randomly at least twice a day. I don't know if it's because I'm copy massive data at the same time. I alway Rsync in 3 different Screen at the same time and I move 3 x 1.5TB to new drives freshly format to XFS. Attach is a screenshot from my IPMI console redirection to see the error. It's always the same crash... only happening since Beta7. I NEVER had a server crash before since I built it in april. I always ran on Beta 6, never on a 5.x Unraid. Any help, ideas etc. ?? I've had two hard crashes while on 6b7 while rsyncing 1.2 tb of data from a reiserfs array volume to a xfs array volume. Both happened after about 120 gb of data transfer; I have 4 GB of RAM. I got a screen shot of the first time, but not of the second. (see attached). Funny thing is that both times a power button reboot failed to mount drives. I had to login and use powerdown -r to restart apparently normally. I scrolled through my logs but don't see a copy of the log from either time in /boot/logs I've restarted the rsync and will see if it happens again. Dennis
August 30, 201411 yr Author I have 32GB of ECC RAM. I suspect that there is something wrong if we both had the same thing while doing the same operation! I'm currently copying again. I have about 750GB x 3 done, remains about 800GB x 3... I'll see tomorrow morning the status. I forgot to mention that before the last crash, the WebUI reported that 1 of the disk had 13 errors, but it wasn't on the drive that was currently rsynching stuff... After the reboot, the errors wasn't there. When i'll have complete the transfer, i'll start a full parity Check to see.
August 30, 201411 yr I forgot to mention that before the last crash, the WebUI reported that 1 of the disk had 13 errors, but it wasn't on the drive that was currently rsynching stuff... After the reboot, the errors wasn't there. When i'll have complete the transfer, i'll start a full parity Check to see. The error column will reset on reboot. It's a tally of how many times the drive failed a read operation, but a subsequent write of the requested data from the parity calculated value back to the same drive succeeded. I'm not entirely sure if the data is read again after the write operation, but probably not, since it would be in the drive's cache anyway. A full non-correcting check would be a wise thing to do. Also get smart reports on your drives and look them over for pending sectors, and reallocated sectors.
August 30, 201411 yr I suspect that there is something wrong if we both had the same thing while doing the same operation! My first impression was that the two faults are different - the backtrace is significantly different, but I guess that one possible reason for NMI code to run for more than three minutes is that it gets into a recursive loop. Having said that, it would be wothwhile for both of you to perform an extended memtest.
August 30, 201411 yr This is the screenshot of my third crash, again after about 120 gb of data transfer on 6b7 from a reiserfs to xfs volume (/mnt/disk# > /mnt/disk#) Problem: Crash on rsync copy between share drives. Expected: Copy 1.2 TB successfully Result: abrupt system shutdown after about 120 gb (about 3-4 hours); console freeze and no array or ssh. System: ASRock MB running 8 drives on 6b7; plain unRAID + docker (media apps); APCUPSD and cache_dirs running (both latest updates) + screens for the console. Parity check not running. Array mounted with minimal use. Edit: I did a memtest and there were no errors detected in two cycles. I have 4 GB./Edit Note: I previously said that a powerbutton reboot wouldn't mount arrays; it seems that it will but it is slower than usual (maybe fsck?). I'm a couple of hours from the next crash and will post that screen too. There seem to be similarities in my two crashes but differences from Pducharme's. Dennis
August 31, 201411 yr Author your last crash is similar to mine. I don't think it's the RAM, when I started that new Unraid project in april, I copied 10TB to the new server without any crash (from a QNAP NAS).
September 2, 201411 yr Over the weekend I had three more hard crashes while copying from a reiserf formatted array drive to a xfs formatted new drive via an rsync -avh /mnt/disk1/* /mnt/disk2/ command. The screen shots of the console are below; there are a lot of similarities. Things that don't work: I tried booting unRAID into safe mode for the last two, but it still crashed. Turning off (or on) Parity check simultaneously. Things that might work: escaping our of rsync and then restarting every hour or so. Things that make it work later or sooner: Copying mainly 1GB media files: Takes 2-3 hours at about 120GB Copying tiny Plex metadata files: Took 20 minutes. I did check my memory and 2x shows no errors. Twice after a crash the array failed to mount and showed 'drive not found' (the reiser source volume)but it reappeared after a graceful reboot. I'm going to try rsync between two reiser drives next. (my migration to the new xfs drive is finally near done). Edit: Actually I'm going to try the same rsync copy on 6b8 /Edit Thanks Dennis
September 2, 201411 yr fourth crash (actually, there have been about 8 in transferring 1.2TB but these are the ones I have console pics of. Done for now. This is the one that crashed rapidly while copying myriad Plex thumbs. D
September 8, 201411 yr I've had two hard crashes while on 6b7 while rsyncing 1.2 tb of data from a reiserfs array volume to a xfs array volume. Both happened after about 120 gb of data transfer; I have 4 GB of RAM. I got a screen shot of the first time, but not of the second. (see attached). Funny thing is that both times a power button reboot failed to mount drives. I had to login and use powerdown -r to restart apparently normally. I scrolled through my logs but don't see a copy of the log from either time in /boot/logs I've restarted the rsync and will see if it happens again. Dennis I had a crash today on version 6b7, with a similar screen as Dennis. I did not get a a screen capture, sorry. I was doing a rsync from a reiserfs drive (disk1) to an empty xfs drive (disk9) - as part of my conversion from reiserfs to xfs. Also at the same time I was running the mover to move files from the cache to a different xfs drive (disk6) when it crashed. Did a quick google search and got this link in 5.x forums - http://lime-technology.com/forum/index.php?topic=32563.0, which talks about this error version 6b5. Maybe a moderator can move the thread into this forum. So came into this forum and found this thread that talks a little further about the error. Right now I am running a Memtest (SMP) to see if I have any RAM issues. I will update this post when it is complete. Memory test complete no errors. Restarting unRAID.
September 9, 201411 yr [ I had a crash today on version 6b7, with a similar screen as Dennis. I did not get a a screen capture, sorry. I was doing a rsync from a reiserfs drive (disk1) to an empty xfs drive (disk9) - as part of my conversion from reiserfs to xfs. Also at the same time I was running the mover to move files from the cache to a different xfs drive (disk6) when it crashed. I chased down my issue and related the story in another thread: http://lime-technology.com/forum/index.php?topic=35043.0 in essence there is a bug in 6b7 and 6b8 that affects high disk use copies (maybe only to XFS volumes) in the unraid.c driver. Sep 5 00:57:04 Tower kernel: kernel BUG at drivers/md/unraid.c:461! Sep 5 00:57:04 Tower kernel: invalid opcode: 0000 [#1] SMP Mine still reproducibly crashes when I rsync almost anything over 10-20 GB. Tom says it's on his radar. I only recently learned to ssh in from another computer and leave a tail -f /var/log/syslog process running to capture terminal events (ha ha I didn't mean that as a double entendre). In about half the crashes I found the syslog capture highlighted the issue with the word "BUG" Dennis
October 2, 201411 yr Going to mark this solved now since it was part of the other issue that was also solved. Thanks!
Archived
This topic is now archived and is closed to further replies.