dmacias Posted June 17, 2016 Share Posted June 17, 2016 * Several MCE's (Machine check events) were noted, no apparent cause..... The fact there were MCE's makes it hard to say whether the issues were local hardware related, or a normal support issue, or actually due to a defect in the beta. Since I know myself and others have had MCE issues in the past (with memtest usually not finding an issue), I was curious if LT might consider adding mcelog from http://mcelog.org/index.html to the unRAID betas? I may be mistaken, but from what I've read it seems to be the only way to acertain what exactly an MCE log event was actually caused by (even if ultimately benign). That's a great idea, and I agree. If it's not too large, I hope LimeTech will consider adding mcelog, and run it in the recommended daemon mode. I'm not sure it's the best way, but you might also use the --logfile option for persistence, force the logging to /boot (don't know how chatty this is though). Without this, we really don't have any tools for solving users' MCE issues. Plus, this actually can in some cases sideline faulty memory and processes, and possibly other live fixes, allowing continued operation and better troubleshooting. I could add mcelog to the NerdPack 6.2 repo. Link to comment
Squid Posted June 17, 2016 Share Posted June 17, 2016 * Several MCE's (Machine check events) were noted, no apparent cause..... The fact there were MCE's makes it hard to say whether the issues were local hardware related, or a normal support issue, or actually due to a defect in the beta. Since I know myself and others have had MCE issues in the past (with memtest usually not finding an issue), I was curious if LT might consider adding mcelog from http://mcelog.org/index.html to the unRAID betas? I may be mistaken, but from what I've read it seems to be the only way to acertain what exactly an MCE log event was actually caused by (even if ultimately benign). That's a great idea, and I agree. If it's not too large, I hope LimeTech will consider adding mcelog, and run it in the recommended daemon mode. I'm not sure it's the best way, but you might also use the --logfile option for persistence, force the logging to /boot (don't know how chatty this is though). Without this, we really don't have any tools for solving users' MCE issues. Plus, this actually can in some cases sideline faulty memory and processes, and possibly other live fixes, allowing continued operation and better troubleshooting. I could add mcelog to the NerdPack 6.2 repo. +1 Sent from my LG-D852 using Tapatalk Link to comment
fcaico Posted June 17, 2016 Share Posted June 17, 2016 * Several MCE's (Machine check events) were noted, no apparent cause..... The fact there were MCE's makes it hard to say whether the issues were local hardware related, or a normal support issue, or actually due to a defect in the beta. Since I know myself and others have had MCE issues in the past (with memtest usually not finding an issue), I was curious if LT might consider adding mcelog from http://mcelog.org/index.html to the unRAID betas? I may be mistaken, but from what I've read it seems to be the only way to acertain what exactly an MCE log event was actually caused by (even if ultimately benign). That's a great idea, and I agree. If it's not too large, I hope LimeTech will consider adding mcelog, and run it in the recommended daemon mode. I'm not sure it's the best way, but you might also use the --logfile option for persistence, force the logging to /boot (don't know how chatty this is though). Without this, we really don't have any tools for solving users' MCE issues. Plus, this actually can in some cases sideline faulty memory and processes, and possibly other live fixes, allowing continued operation and better troubleshooting. I could add mcelog to the NerdPack 6.2 repo. Not familiar with NerdPack... Is that a plugin? Link to comment
RobJ Posted June 17, 2016 Share Posted June 17, 2016 * Several MCE's (Machine check events) were noted, no apparent cause..... The fact there were MCE's makes it hard to say whether the issues were local hardware related, or a normal support issue, or actually due to a defect in the beta. Since I know myself and others have had MCE issues in the past (with memtest usually not finding an issue), I was curious if LT might consider adding mcelog from http://mcelog.org/index.html to the unRAID betas? I may be mistaken, but from what I've read it seems to be the only way to acertain what exactly an MCE log event was actually caused by (even if ultimately benign). That's a great idea, and I agree. If it's not too large, I hope LimeTech will consider adding mcelog, and run it in the recommended daemon mode. I'm not sure it's the best way, but you might also use the --logfile option for persistence, force the logging to /boot (don't know how chatty this is though). Without this, we really don't have any tools for solving users' MCE issues. Plus, this actually can in some cases sideline faulty memory and processes, and possibly other live fixes, allowing continued operation and better troubleshooting. I could add mcelog to the NerdPack 6.2 repo. That's a good half-step! But I don't know if you've examined it, as it appears to need some special configuration and installation. Plus, run this way, it won't be as effective (as installing it as a daemon). At least that's my understanding from my reading, but I've never used it! Link to comment
dmacias Posted June 17, 2016 Share Posted June 17, 2016 Yes. It a collection of extra slackware packages like python and perl. It's stickied in the plugins section. Link to comment
dmacias Posted June 17, 2016 Share Posted June 17, 2016 * Several MCE's (Machine check events) were noted, no apparent cause..... The fact there were MCE's makes it hard to say whether the issues were local hardware related, or a normal support issue, or actually due to a defect in the beta. Since I know myself and others have had MCE issues in the past (with memtest usually not finding an issue), I was curious if LT might consider adding mcelog from http://mcelog.org/index.html to the unRAID betas? I may be mistaken, but from what I've read it seems to be the only way to acertain what exactly an MCE log event was actually caused by (even if ultimately benign). That's a great idea, and I agree. If it's not too large, I hope LimeTech will consider adding mcelog, and run it in the recommended daemon mode. I'm not sure it's the best way, but you might also use the --logfile option for persistence, force the logging to /boot (don't know how chatty this is though). Without this, we really don't have any tools for solving users' MCE issues. Plus, this actually can in some cases sideline faulty memory and processes, and possibly other live fixes, allowing continued operation and better troubleshooting. I could add mcelog to the NerdPack 6.2 repo. That's a good half-step! But I don't know if you've examined it, as it appears to need some special configuration and installation. Plus, run this way, it won't be as effective (as installing it as a daemon). At least that's my understanding from my reading, but I've never used it! I installed the package and ran 'mcelog --daemon' but that's about it. I don't believe my CPU is supported but it ran without error. Edit: Actually this is what the system log said Jun 17 09:17:47 server mcelog: failed to prefill DIMM database from DMI data Jun 17 09:17:47 server mcelog: Kernel does not support page offline interface Link to comment
RobJ Posted June 17, 2016 Share Posted June 17, 2016 * Several MCE's (Machine check events) were noted, no apparent cause..... The fact there were MCE's makes it hard to say whether the issues were local hardware related, or a normal support issue, or actually due to a defect in the beta. Since I know myself and others have had MCE issues in the past (with memtest usually not finding an issue), I was curious if LT might consider adding mcelog from http://mcelog.org/index.html to the unRAID betas? I may be mistaken, but from what I've read it seems to be the only way to acertain what exactly an MCE log event was actually caused by (even if ultimately benign). That's a great idea, and I agree. If it's not too large, I hope LimeTech will consider adding mcelog, and run it in the recommended daemon mode. I'm not sure it's the best way, but you might also use the --logfile option for persistence, force the logging to /boot (don't know how chatty this is though). Without this, we really don't have any tools for solving users' MCE issues. Plus, this actually can in some cases sideline faulty memory and processes, and possibly other live fixes, allowing continued operation and better troubleshooting. I could add mcelog to the NerdPack 6.2 repo. That's a good half-step! But I don't know if you've examined it, as it appears to need some special configuration and installation. Plus, run this way, it won't be as effective (as installing it as a daemon). At least that's my understanding from my reading, but I've never used it! I installed the package and ran 'mcelog --daemon' but that's about it. I don't believe my CPU is supported but it ran without error. Edit: Actually this is what the system log said Jun 17 09:17:47 server mcelog: failed to prefill DIMM database from DMI data Jun 17 09:17:47 server mcelog: Kernel does not support page offline interface I'm hoping the 'page offline interface' support is just a Config switch that Tom can enable. But even without it, it can help explain current MCE's. And we just happen to have a handy tester here, fcaico, once it's NerdTool'd! Link to comment
dmacias Posted June 17, 2016 Share Posted June 17, 2016 I added mcelog to the 6.2 repo. Link to comment
ljm42 Posted June 17, 2016 Share Posted June 17, 2016 Edit: Actually this is what the system log said Jun 17 09:17:47 server mcelog: failed to prefill DIMM database from DMI data Jun 17 09:17:47 server mcelog: Kernel does not support page offline interface Looks like this is a harmless error message: http://mcelog.org/faq.html#11 I added mcelog to the 6.2 repo. Any reason to not add it to the 6.1 repo? Link to comment
dmacias Posted June 17, 2016 Share Posted June 17, 2016 Edit: Actually this is what the system log said Jun 17 09:17:47 server mcelog: failed to prefill DIMM database from DMI data Jun 17 09:17:47 server mcelog: Kernel does not support page offline interface Looks like this is a harmless error message: http://mcelog.org/faq.html#11 I added mcelog to the 6.2 repo. Any reason to not add it to the 6.1 repo? Let's move to NerdPack thread for further discussion. Link to comment
bonienl Posted June 18, 2016 Share Posted June 18, 2016 Small issue, when bonding NICs the bond IP should be the one reported on the main page. Can you post the content of /boot/config/network.cfg /var/local/emhttp/network.ini It looks like initially the system didn't get an IP address assigned, hence the 169.254.x.x address, but received a valid IP address at a later stage. Link to comment
JorgeB Posted June 18, 2016 Share Posted June 18, 2016 Can you post the content of /boot/config/network.cfg /var/local/emhttp/network.ini It looks like initially the system didn't get an IP address assigned, hence the 169.254.x.x address, but received a valid IP address at a later stage. /boot/config/network.cfg # Generated settings: IFNAME[0]="bond0" BONDNAME[0]="bond0" BONDING_MIIMON[0]="100" BONDING_MODE[0]="4" BONDNICS[0]="eth0 eth1" DESCRIPTION[0]="" USE_DHCP[0]="yes" DHCP_KEEPRESOLV="no" MTU[0]="" SYSNICS="1" /var/local/emhttp/network.ini [eth0] BONDING="yes" BONDNAME="bond0" BONDNICS="eth0,eth1" BONDING_MODE="4" BONDING_MIIMON="100" DHCP_KEEPRESOLV="no" BRIDGING="no" BRNAME="br0" BRNICS="" BRSTP="no" BRFD="0" DESCRIPTION:0="" USE_DHCP:0="yes" IPADDR:0="169.254.126.42" NETMASK:0="255.255.0.0" GATEWAY="" MTU="" TYPE="access" Link to comment
bonienl Posted June 18, 2016 Share Posted June 18, 2016 @johnnie.black - thx Can you also show the output of ifconfig I try to find the mismatch Link to comment
JorgeB Posted June 18, 2016 Share Posted June 18, 2016 ifconfig root@Test:~# ifconfig bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500 inet 192.168.1.99 netmask 255.255.255.0 broadcast 192.168.1.255 ether 00:25:90:7b:ee:a3 txqueuelen 1000 (Ethernet) RX packets 1010 bytes 1340553 (1.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 696 bytes 55179 (53.8 KiB) TX errors 0 dropped 1 overruns 0 carrier 0 collisions 0 eth0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500 ether 00:25:90:7b:ee:a3 txqueuelen 1000 (Ethernet) RX packets 953 bytes 1333406 (1.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 57 bytes 10626 (10.3 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 20 memory 0xdfc00000-dfc20000 eth1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500 ether 00:25:90:7b:ee:a3 txqueuelen 1000 (Ethernet) RX packets 57 bytes 7147 (6.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 639 bytes 44553 (43.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 16 memory 0xdfb00000-dfb20000 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.255.255.255 loop txqueuelen 1 (Local Loopback) RX packets 98 bytes 6924 (6.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 98 bytes 6924 (6.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 Link to comment
bonienl Posted June 18, 2016 Share Posted June 18, 2016 ifconfig Thx, I could replicate the "issue" and have made a correction. Link to comment
jbartlett Posted June 19, 2016 Share Posted June 19, 2016 Found my b23 backup server unresponsive. Could not telnet in (connection refused), screen dark/no signal when I connected up a monitor (might not have been initialized by the motherboard). No way to get a diagnostic. Only thing it was doing is sitting idle and two pre-clears on two 6TB drives. One pre-clear finished (1 cycle), the other was in progress (3 cycle). I know, frustrating kind of report but I wanted to mention it for posterity's sake. Booted with monitor attached, logged in and started a tail on the syslog via console. Link to comment
jbartlett Posted June 20, 2016 Share Posted June 20, 2016 Found my b23 backup server unresponsive. Could not telnet in (connection refused), screen dark/no signal when I connected up a monitor (might not have been initialized by the motherboard). No way to get a diagnostic. Only thing it was doing is sitting idle and two pre-clears on two 6TB drives. One pre-clear finished (1 cycle), the other was in progress (3 cycle). I know, frustrating kind of report but I wanted to mention it for posterity's sake. Booted with monitor attached, logged in and started a tail on the syslog via console. Happened again towards the end of a preclear via the preclear plugin. Couldn't telnet in & website dead. I had the tail log via web admin running and captured this. 3rd line is suspicious. I was able to maintain a login via console but I was unable to attain a diagnostic file via the command line as it never seemed to finish and looked like it started a tail of the syslog to the console. Jun 19 19:10:13 NASBackup kernel: general protection fault: 0000 [#1] PREEMPT SMP Jun 19 19:10:13 NASBackup kernel: Modules linked in: md_mod hid_logitech_hidpp hid_logitech_dj kvm_amd kvm k10temp r8169 mii i2c_piix4 ahci libahci sata_mv wmi acpi_cpufreq [last unloaded: md_mod] Jun 19 19:10:13 NASBackup kernel: CPU: 3 PID: 27246 Comm: preclear_disk.s Not tainted 4.4.13-unRAID #1 Jun 19 19:10:13 NASBackup kernel: Hardware name: System manufacturer System Product Name/F2A85-V PRO, BIOS 6104 05/08/2013 Jun 19 19:10:13 NASBackup kernel: task: ffff88039ccdbc00 ti: ffff8803ad79c000 task.ti: ffff8803ad79c000 Jun 19 19:10:13 NASBackup kernel: RIP: 0010:[<ffffffff810d55d1>] [<ffffffff810d55d1>] anon_vma_interval_tree_remove+0x139/0x200 Jun 19 19:10:13 NASBackup kernel: RSP: 0018:ffff8803ad79fc20 EFLAGS: 00010246 Jun 19 19:10:13 NASBackup kernel: RAX: ffff8804294a5720 RBX: ffff8802568fabe8 RCX: ffff8804294a5720 Jun 19 19:10:13 NASBackup kernel: RDX: ffff8804294a5720 RSI: ffff8803f21538b0 RDI: ffff88020bc9d340 Jun 19 19:10:13 NASBackup kernel: RBP: ffff8803ad79fc60 R08: 0020000000000000 R09: 0000000000000000 Jun 19 19:10:13 NASBackup kernel: R10: ffff88020bc9d360 R11: 0000000000018c50 R12: ffff8802568fab80 Jun 19 19:10:13 NASBackup kernel: R13: ffff8803f2153870 R14: ffff8803f2153870 R15: ffff88020bc9d340 Jun 19 19:10:13 NASBackup kernel: FS: 00002b12edf2c600(0000) GS:ffff88043ed80000(0000) knlGS:0000000000000000 Jun 19 19:10:13 NASBackup kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 19 19:10:13 NASBackup kernel: CR2: 00000000006f3f58 CR3: 00000002e00e7000 CR4: 00000000000406e0 Jun 19 19:10:13 NASBackup kernel: Stack: Jun 19 19:10:13 NASBackup kernel: ffffffff810e42e0 0000000001490000 ffff8802568fabf8 ffff8802568fa2e0 Jun 19 19:10:13 NASBackup kernel: ffff8802568fab80 0000000000000000 00002b12edf05000 ffff8802568fab80 Jun 19 19:10:13 NASBackup kernel: ffff8803ad79fca8 ffffffff810d8c03 0000000000000000 ffff8803ad79fcb8 Jun 19 19:10:13 NASBackup kernel: Call Trace: Jun 19 19:10:13 NASBackup kernel: [<ffffffff810e42e0>] ? unlink_anon_vmas+0x94/0x189 Jun 19 19:10:13 NASBackup kernel: [<ffffffff810d8c03>] free_pgtables+0x35/0x97 Jun 19 19:10:13 NASBackup kernel: [<ffffffff810e0294>] exit_mmap+0x81/0x106 Jun 19 19:10:13 NASBackup kernel: [<ffffffff810480f4>] mmput+0x48/0xd9 Jun 19 19:10:13 NASBackup kernel: [<ffffffff8110ebd4>] flush_old_exec+0x70f/0x77f Jun 19 19:10:13 NASBackup kernel: [<ffffffff811442f5>] load_elf_binary+0x25c/0x13a1 Jun 19 19:10:13 NASBackup kernel: [<ffffffff810d7665>] ? get_user_pages+0x3d/0x3f Jun 19 19:10:13 NASBackup kernel: [<ffffffff8110dfa2>] search_binary_handler+0x74/0x17d Jun 19 19:10:13 NASBackup kernel: [<ffffffff8110f727>] do_execveat_common+0x441/0x5a8 Jun 19 19:10:13 NASBackup kernel: [<ffffffff8110f8a8>] do_execve+0x1a/0x1c Jun 19 19:10:13 NASBackup kernel: [<ffffffff8110faac>] SyS_execve+0x25/0x29 Jun 19 19:10:13 NASBackup kernel: [<ffffffff81622f05>] stub_execve+0x5/0x5 Jun 19 19:10:13 NASBackup kernel: [<ffffffff81622c6e>] ? entry_SYSCALL_64_fastpath+0x12/0x6d Jun 19 19:10:13 NASBackup kernel: Code: 4d 85 db 74 0b 4d 8b 5b 18 4d 39 d8 4d 0f 42 c3 4c 39 41 18 74 0d 4c 89 41 18 48 8b 09 48 83 e1 fc eb a5 4c 8b 47 30 4c 89 40 10 <49> 8b 08 83 e1 01 48 09 c1 49 89 08 48 8b 4f 20 48 89 cf 48 83 Jun 19 19:10:13 NASBackup kernel: RIP [<ffffffff810d55d1>] anon_vma_interval_tree_remove+0x139/0x200 Jun 19 19:10:13 NASBackup kernel: RSP <ffff8803ad79fc20> Jun 19 19:10:13 NASBackup kernel: ---[ end trace 27214fb15b5499ba ]--- I am unable to duplicate with plugins disabled since the preclear plugin isn't loaded in that state. Being my backup server, I do not have a Parity or Cache drives configured currently, which may have something to do with the following line being sent to the syslog with every page load: Jun 19 20:26:50 NASBackup shfs/user: shfs_mkdir: assign_disk: system (123) No medium found ETA: Diagnostic file from a normal boot without safe mode nasbackup-diagnostics-20160619-2027.zip Link to comment
bonzi Posted June 20, 2016 Share Posted June 20, 2016 Somehow this update managed to wipe out my disks.cfg file...pretty uncool. Luckily I had a screenshot but... Link to comment
trurl Posted June 20, 2016 Share Posted June 20, 2016 Somehow this update managed to wipe out my disks.cfg file...pretty uncool. Luckily I had a screenshot but... Your drive assignments are actually stored in super.dat and disks.cfg just has some additional GUI settings for each disk. Most likely a problem with your flash drive instead of a problem with the update but since you haven't posted diagnostics as requested in the OP I guess we'll never know. Link to comment
bonzi Posted June 20, 2016 Share Posted June 20, 2016 Somehow this update managed to wipe out my disks.cfg file...pretty uncool. Luckily I had a screenshot but... Your drive assignments are actually stored in super.dat and disks.cfg just has some additional GUI settings for each disk. Most likely a problem with your flash drive instead of a problem with the update but since you haven't posted diagnostics as requested in the OP I guess we'll never know. Right that's what I assumed also initially but I checked the flash drive and it is fine.. Link to comment
RobJ Posted June 20, 2016 Share Posted June 20, 2016 Your drive assignments are actually stored in super.dat ... A comment for Tom or Eric - there's been a change in how super.dat is modified, sometime recently so probably a part of 6.2 development. My guess is that pre-6.2, super.dat was modified by seeking then reading or writing in place, whereas now it is modified by clearing the file then writing out the whole file. The problem is that there is now a window of opportunity that wasn't there before, as we have seen 3 to 5 cases already where the super.dat file exists but is zero bytes (a loss of all array assignments). I don't remember that ever happening before this. It's my impression that each of the cases are different, which implies it's the general super.dat update routine that has changed, and is clearing the file to zero bytes well before it is rewritten, a window of time that's vulnerable to power outage or system crash. Hopefully that can be improved, either never clearing it or clear before immediate write. Link to comment
dmacias Posted June 20, 2016 Share Posted June 20, 2016 I don't think I made it but I have a super.old in /boot/config Link to comment
RobJ Posted June 20, 2016 Share Posted June 20, 2016 Found my b23 backup server unresponsive. Could not telnet in (connection refused), screen dark/no signal when I connected up a monitor (might not have been initialized by the motherboard). No way to get a diagnostic. Only thing it was doing is sitting idle and two pre-clears on two 6TB drives. One pre-clear finished (1 cycle), the other was in progress (3 cycle). I know, frustrating kind of report but I wanted to mention it for posterity's sake. Booted with monitor attached, logged in and started a tail on the syslog via console. Happened again towards the end of a preclear via the preclear plugin. Couldn't telnet in & website dead. I had the tail log via web admin running and captured this. 3rd line is suspicious. I was able to maintain a login via console but I was unable to attain a diagnostic file via the command line as it never seemed to finish and looked like it started a tail of the syslog to the console. Jun 19 19:10:13 NASBackup kernel: general protection fault: 0000 [#1] PREEMPT SMP Jun 19 19:10:13 NASBackup kernel: Modules linked in: md_mod hid_logitech_hidpp hid_logitech_dj kvm_amd kvm k10temp r8169 mii i2c_piix4 ahci libahci sata_mv wmi acpi_cpufreq [last unloaded: md_mod] Jun 19 19:10:13 NASBackup kernel: CPU: 3 PID: 27246 Comm: preclear_disk.s Not tainted 4.4.13-unRAID #1 Jun 19 19:10:13 NASBackup kernel: Hardware name: System manufacturer System Product Name/F2A85-V PRO, BIOS 6104 05/08/2013 Jun 19 19:10:13 NASBackup kernel: task: ffff88039ccdbc00 ti: ffff8803ad79c000 task.ti: ffff8803ad79c000 Jun 19 19:10:13 NASBackup kernel: RIP: 0010:[<ffffffff810d55d1>] [<ffffffff810d55d1>] anon_vma_interval_tree_remove+0x139/0x200 Jun 19 19:10:13 NASBackup kernel: RSP: 0018:ffff8803ad79fc20 EFLAGS: 00010246 ... Jun 19 19:10:13 NASBackup kernel: Call Trace: Jun 19 19:10:13 NASBackup kernel: [<ffffffff810e42e0>] ? unlink_anon_vmas+0x94/0x189 It's not clear to me whether the GPF occurred during Preclear execution or during VM memory management (notice the vma functions mentioned). No other ideas. You might check for a newer BIOS, yours is mid 2013, probably not the solution though. Being my backup server, I do not have a Parity or Cache drives configured currently, which may have something to do with the following line being sent to the syslog with every page load: Jun 19 20:26:50 NASBackup shfs/user: shfs_mkdir: assign_disk: system (123) No medium found I noticed that 4 of your share config files indicate "Cache: only", could be lack of Cache drive. If it was lack of Parity drive, I think I would have seen that message before. Either way, probably a harmless message. Link to comment
Squid Posted June 20, 2016 Share Posted June 20, 2016 Found my b23 backup server unresponsive. Could not telnet in (connection refused), screen dark/no signal when I connected up a monitor (might not have been initialized by the motherboard). No way to get a diagnostic. Only thing it was doing is sitting idle and two pre-clears on two 6TB drives. One pre-clear finished (1 cycle), the other was in progress (3 cycle). I know, frustrating kind of report but I wanted to mention it for posterity's sake. Booted with monitor attached, logged in and started a tail on the syslog via console. Happened again towards the end of a preclear via the preclear plugin. Couldn't telnet in & website dead. I had the tail log via web admin running and captured this. 3rd line is suspicious. I was able to maintain a login via console but I was unable to attain a diagnostic file via the command line as it never seemed to finish and looked like it started a tail of the syslog to the console. Jun 19 19:10:13 NASBackup kernel: general protection fault: 0000 [#1] PREEMPT SMP Jun 19 19:10:13 NASBackup kernel: Modules linked in: md_mod hid_logitech_hidpp hid_logitech_dj kvm_amd kvm k10temp r8169 mii i2c_piix4 ahci libahci sata_mv wmi acpi_cpufreq [last unloaded: md_mod] Jun 19 19:10:13 NASBackup kernel: CPU: 3 PID: 27246 Comm: preclear_disk.s Not tainted 4.4.13-unRAID #1 Jun 19 19:10:13 NASBackup kernel: Hardware name: System manufacturer System Product Name/F2A85-V PRO, BIOS 6104 05/08/2013 Jun 19 19:10:13 NASBackup kernel: task: ffff88039ccdbc00 ti: ffff8803ad79c000 task.ti: ffff8803ad79c000 Jun 19 19:10:13 NASBackup kernel: RIP: 0010:[<ffffffff810d55d1>] [<ffffffff810d55d1>] anon_vma_interval_tree_remove+0x139/0x200 Jun 19 19:10:13 NASBackup kernel: RSP: 0018:ffff8803ad79fc20 EFLAGS: 00010246 ... Jun 19 19:10:13 NASBackup kernel: Call Trace: Jun 19 19:10:13 NASBackup kernel: [<ffffffff810e42e0>] ? unlink_anon_vmas+0x94/0x189 It's not clear to me whether the GPF occurred during Preclear execution or during VM memory management (notice the vma functions mentioned). No other ideas. You might check for a newer BIOS, yours is mid 2013, probably not the solution though. Being my backup server, I do not have a Parity or Cache drives configured currently, which may have something to do with the following line being sent to the syslog with every page load: Jun 19 20:26:50 NASBackup shfs/user: shfs_mkdir: assign_disk: system (123) No medium found I noticed that 4 of your share config files indicate "Cache: only", could be lack of Cache drive. If it was lack of Parity drive, I think I would have seen that message before. Either way, probably a harmless message. Release notes for beta 18? state that if you don't have a cache drive you have to manually make the system share. Shouldn't be a problem If not running vms Sent from my LG-D852 using Tapatalk Link to comment
John_M Posted June 20, 2016 Share Posted June 20, 2016 A pretty trivial observation regarding the Dashboard page of the webGUI, but there doesn't seem to be much else to complain about in this release The two rightmost columns in the Load Statistics pane are unbalanced in width, giving odd-numbered CPUs a much longer Per CPU Load scale than even-numbered ones. Also affected is the flash : log : docker percentage used display. Link to comment
Recommended Posts