trurl Posted January 21, 2022 Share Posted January 21, 2022 Sometimes flash drive can disconnect or become readonly due to corruption, but often there will be other symptoms of these problems. Booting from USB2 port can be more reliable. There is a timeout that will go ahead and shut down or reboot even if the array doesn't stop. This timeout can be adjusted in Disk Settings. Instead of shutting down or rebooting, stop the array, see how long that takes, and adjust timeout accordingly. All of this has already been discussed in this thread. 1 Quote Link to comment
KRiSX Posted February 9, 2022 Share Posted February 9, 2022 Hey all, I had an unclean shutdown yesterday due to a power cut, UPS options either didn't work or weren't configured correctly - I'm not sure yet and will be testing and fixing it soon, but upon powering the server back on I of course had an unclean shutdown and a parity check which resulted in 116 errors, I just want to check if I need to do anything else in this situation? Looking at the syslog I see "Parity Check Tuning: automatic Non-Correcting Parity Check finished (116 errors)" Should I run parity check again with "Write corrections to parity" enabled or will this be fine? I didn't select or do anything to influence the parity check and everything seems to be working. I'm just in the middle of my transfer from DrivePool so want to be sure I'm good to keep going with something like this occuring. Quote Link to comment
itimpi Posted February 9, 2022 Share Posted February 9, 2022 it is normal to have a small number of errors after an unclean shutdown. They nearly always occur very near the beginning of the check. You need to run a correcting check to get those errors fixed or you risk data corruption if a disk fails and needs recovering. The correcting check should report the same number of errors, but this time it will be fixing them. Subsequent checks should then report 0 errors (assuming no further unclean shutdowns). Quote Link to comment
KRiSX Posted February 9, 2022 Share Posted February 9, 2022 (edited) 11 minutes ago, itimpi said: You need to run a correcting check to get those errors fixed or you risk data corruption if a disk fails and needs recovering. The correcting check should report the same number of errors, but this time it will be fixing them. Subsequent checks should then report 0 errors (assuming no further unclean shutdowns). ok thanks, i'll trigger that off now before i move more data over then, am I correct in saying all I need to do is hit the Check button on Main with the write corrections box ticked or is there more to it? is there a way to make it do this by default in the event this happens again? so i don't have to tie up my disks for 16+ hours twice Edited February 9, 2022 by KRiSX 1 Quote Link to comment
itimpi Posted February 9, 2022 Share Posted February 9, 2022 3 minutes ago, KRiSX said: am I correct in saying all I need to do is hit the Check button on Main with the write corrections box ticked This is all you need to do. 1 Quote Link to comment
JonathanM Posted February 10, 2022 Share Posted February 10, 2022 1 hour ago, KRiSX said: is there a way to make it do this by default in the event this happens again? so i don't have to tie up my disks for 16+ hours twice There are good reasons to not automatically change data without knowing why. If you have a data drive acting up, the last thing you want to do is blindly write to the parity drive based on what is possibly bad data. Also, bad RAM can cause random parity check errors, so if you run 2 non-correcting checks in a row and get different results, you REALLY need to get to the bottom of it before writing ANY data to the server. If a parity check finds errors, you need to fix what caused the errors before committing the changes. In your case, having an unclean shutdown is a good reason for a small number of parity errors, so you should be safe to just correct them, instead of doing a second non-correcting check to verify the results aren't changing. 1 Quote Link to comment
KRiSX Posted February 10, 2022 Share Posted February 10, 2022 6 hours ago, JonathanM said: There are good reasons to not automatically change data without knowing why. If you have a data drive acting up, the last thing you want to do is blindly write to the parity drive based on what is possibly bad data. Also, bad RAM can cause random parity check errors, so if you run 2 non-correcting checks in a row and get different results, you REALLY need to get to the bottom of it before writing ANY data to the server. If a parity check finds errors, you need to fix what caused the errors before committing the changes. In your case, having an unclean shutdown is a good reason for a small number of parity errors, so you should be safe to just correct them, instead of doing a second non-correcting check to verify the results aren't changing. fair points, too many variables to assume its just safe to go ahead... oh well, i'm 8 hours in with another 8 hours to go on the correcting check - will logs show what/if any files are affected or is it a case of if the files are there then you're all good? i'm assuming the latter and right now anything on here could be re-obtained without any hassle, i just want my system healthy overall Quote Link to comment
trurl Posted February 10, 2022 Share Posted February 10, 2022 6 hours ago, KRiSX said: will logs show what/if any files are affected Correcting parity check won't affect any files. Only the parity disk is written, and parity contains none of your data. The reason you don't want to corrupt parity is so you can rebuild a data disk accurately. If it corrects the small number of sync errors you mentioned earlier then that is the expected result. Then a non-correcting parity check should return zero, the only acceptable result. Quote Link to comment
Bob@unraid Posted February 11, 2022 Share Posted February 11, 2022 (edited) Hello everyone, a few days ago I upgraded the cpu and mainboard but tonight the system shutdown itself unclean. I really have no clue the caused the problem, normally the system should go to sleep with the Dynamix S3 Sleep plugin. I am running the latest Version: 6.10.0-rc2 I would be great if someone could check the attached syslog file. Also added the diagnostics Thanks syslog (2) unraid-diagnostics-20220212-0019.zip Edited February 11, 2022 by Bob@unraid Quote Link to comment
trurl Posted February 11, 2022 Share Posted February 11, 2022 5 hours ago, Bob@unraid said: attached syslog file Also attach current diagnostics 1 Quote Link to comment
kuhnamatata Posted February 22, 2022 Share Posted February 22, 2022 I have been trying to figure out what is causing my unclean shutdowns on this server for a couple of years and finally decided to ask for help. It typically goes off line after a few days of uptime, I have attached failure log file along with an after startup diagnostics zip and would really appreciate it if anyone with more expierence could take a look at them. Latest time it went offline (not able to access through the webpage or ssh) the shares were still accessible proto-diagnostics-20220222-1732.zip proto syslog.log Quote Link to comment
dlandon Posted February 23, 2022 Author Share Posted February 23, 2022 You have a lot of call traces. This goes on and on I also see a lot of the myservers messages like at the top of this snipet. Someone that knows more about how to interpret this would be better to help you. Feb 22 01:41:23 Proto flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update Feb 22 01:42:23 Proto flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update Feb 22 01:43:23 Proto flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update Feb 22 01:44:10 Proto kernel: rcu: INFO: rcu_sched self-detected stall on CPU Feb 22 01:44:10 Proto kernel: rcu: #01118-....: (16260291 ticks this GP) idle=e52/1/0x4000000000000000 softirq=26144979/26144982 fqs=4053314 Feb 22 01:44:10 Proto kernel: #011(t=16260292 jiffies g=129526613 q=14094513) Feb 22 01:44:10 Proto kernel: NMI backtrace for cpu 18 Feb 22 01:44:10 Proto kernel: CPU: 18 PID: 1458 Comm: sensors Tainted: P O 5.14.15-Unraid #1 Feb 22 01:44:10 Proto kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018 Feb 22 01:44:10 Proto kernel: Call Trace: Feb 22 01:44:10 Proto kernel: <IRQ> Feb 22 01:44:10 Proto kernel: dump_stack_lvl+0x46/0x5a Feb 22 01:44:10 Proto kernel: ? lapic_can_unplug_cpu+0x93/0x93 Feb 22 01:44:10 Proto kernel: nmi_cpu_backtrace+0x7d/0x8f Feb 22 01:44:10 Proto kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3 Feb 22 01:44:10 Proto kernel: rcu_dump_cpu_stacks+0xc3/0xea Feb 22 01:44:10 Proto kernel: rcu_sched_clock_irq+0x22e/0x608 Feb 22 01:44:10 Proto kernel: ? trigger_load_balance+0x204/0x28a Feb 22 01:44:10 Proto kernel: ? tick_sched_do_timer+0x3e/0x3e Feb 22 01:44:10 Proto kernel: update_process_times+0x8c/0xab Feb 22 01:44:10 Proto kernel: tick_sched_timer+0x38/0x65 Feb 22 01:44:10 Proto kernel: __hrtimer_run_queues+0xfa/0x18a Feb 22 01:44:10 Proto kernel: hrtimer_interrupt+0x92/0x160 Feb 22 01:44:10 Proto kernel: __sysvec_apic_timer_interrupt+0x99/0xdb Feb 22 01:44:10 Proto kernel: sysvec_apic_timer_interrupt+0x61/0x7d Feb 22 01:44:10 Proto kernel: </IRQ> Feb 22 01:44:10 Proto kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Feb 22 01:44:10 Proto kernel: RIP: 0010:smp_call_function_single+0xca/0xf7 Feb 22 01:44:10 Proto kernel: Code: 50 08 80 e2 01 74 04 f3 90 eb f4 83 48 08 01 4d 89 77 10 4c 89 fe 44 89 e7 4d 89 6f 18 e8 a2 fe ff ff 85 db 74 0d 41 8b 57 08 <80> e2 01 74 04 f3 90 eb f3 48 8b 54 24 38 65 48 2b 14 25 28 00 00 Feb 22 01:44:10 Proto kernel: RSP: 0018:ffffc90020fa7cc0 EFLAGS: 00000202 Feb 22 01:44:10 Proto kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 Feb 22 01:44:10 Proto kernel: RDX: 0000000000000011 RSI: ffffc90020fa7cc0 RDI: 0000000000000009 Feb 22 01:44:10 Proto kernel: RBP: ffffc90020fa7d28 R08: 0000000000000009 R09: ffff88810bd56180 Feb 22 01:44:10 Proto kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000009 Feb 22 01:44:10 Proto kernel: R13: ffffc90020fa7d38 R14: ffffffff813e5fa1 R15: ffffc90020fa7cc0 Feb 22 01:44:10 Proto kernel: ? pldmfw_flash_image+0x7fe/0x7fe Feb 22 01:44:10 Proto kernel: ? pldmfw_flash_image+0x7fe/0x7fe Feb 22 01:44:10 Proto kernel: rdmsr_on_cpu+0x48/0x71 Feb 22 01:44:10 Proto kernel: show_temp+0x68/0xc3 [coretemp] Feb 22 01:44:10 Proto kernel: dev_attr_show+0x20/0x42 Feb 22 01:44:10 Proto kernel: sysfs_kf_seq_show+0x75/0xc0 Feb 22 01:44:10 Proto kernel: seq_read_iter+0x156/0x347 Feb 22 01:44:10 Proto kernel: new_sync_read+0x7c/0xaf Feb 22 01:44:10 Proto kernel: vfs_read+0xc6/0x108 Feb 22 01:44:10 Proto kernel: ksys_read+0x76/0xbe Feb 22 01:44:10 Proto kernel: do_syscall_64+0x83/0xa5 Feb 22 01:44:10 Proto kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Feb 22 01:44:10 Proto kernel: RIP: 0033:0x146f4ac243ce 1 Quote Link to comment
kuhnamatata Posted February 23, 2022 Share Posted February 23, 2022 2 hours ago, dlandon said: You have a lot of call traces. This goes on and on I also see a lot of the myservers messages like at the top of this snipet. Someone that knows more about how to interpret this would be better to help you. Thanks for taking the time to look at the files Quote Link to comment
kuhnamatata Posted February 23, 2022 Share Posted February 23, 2022 Found some info on the flash backup adding task, doubt that's causing my server going off-line but i'll try it Quote Link to comment
ljm42 Posted February 23, 2022 Share Posted February 23, 2022 Yeah let's clean up the flash backup messages. Run the commands a bit further up in the conversation, starting here: https://forums.unraid.net/topic/112745-stop-useless-backups/?tab=comments#comment-1026714 If the `git show` command throws an error message then we'll go a different direction from that thread. But I agree, these are a symptom, not the cause. Quote Link to comment
kuhnamatata Posted February 24, 2022 Share Posted February 24, 2022 Ran "git show" and and it displayed config changes from this mornings nextcloud and mariaDB update Quote Link to comment
ljm42 Posted February 24, 2022 Share Posted February 24, 2022 2 hours ago, kuhnamatata said: Ran "git show" and and it displayed config changes from this mornings nextcloud and mariaDB update OK so no repeated backups of the same file that would be cause for concern? In that case it sounds like files are legitimately being changed and backed up, so no more to worry about related to flash backup. Quote Link to comment
kuhnamatata Posted February 24, 2022 Share Posted February 24, 2022 This is what I got when I ran with git show command commit f6bb1dd183b5c22ddf2f930e44ea0576c348eba5 (HEAD -> master, origin/master) commit f6bb1dd183b5c22ddf2f930e44ea0576c348eba5 (HEAD -> master, origin/master) commit f6bb1dd183b5c22ddf2f930e44ea0576c348eba5 (HEAD -> master, origin/master) commit f6bb1dd183b5c22ddf2f930e44ea0576c348eba5 (HEAD -> master, origin/master) Author: gitbot <[email protected]> Date: Mon Dec 27 14:08:03 2021 -0500 Config change diff --git a/config/plugins/fix.common.problems.plg b/config/plugins/fix.common.problems.plg index 10d8c63..0536655 100644 --- a/config/plugins/fix.common.problems.plg +++ b/config/plugins/fix.common.problems.plg @@ -2,8 +2,8 @@ <!DOCTYPE PLUGIN [ <!ENTITY name "fix.common.problems"> <!ENTITY author "Andrew Zawadzki"> -<!ENTITY version "2021.08.05"> -<!ENTITY md5 "28271a759e6b795e4595ed77d22a18eb"> +<!ENTITY version "2021.12.26"> +<!ENTITY md5 "d4f455460e7d5cf64dfe6cf2598c07ce"> <!ENTITY launch "Settings/FixProblems"> <!ENTITY plugdir "/usr/local/emhttp/plugins/&name;"> <!ENTITY github "Squidly271/fix.common.problems"> @@ -11,6 +11,9 @@ ]> <PLUGIN name="&name;" author="&author;" version="&version;" launch="&launch;" pluginURL="&pluginURL;" icon="warning" min="6.7.0" support="http://lime-technology.com/forum/index.php?topic=48972.0"> <CHANGES> +###2021.12.26 +- Check for blank or invalid characters within TLD + ###2021.08.05 - Remove Scaling governor tests - not relevant anymore Quote Link to comment
LTech Posted March 9, 2022 Share Posted March 9, 2022 Hi, today after starting my Unraid Server I got an Unclean Shutdown Error. But I don't know why. Before that, I used the normal shutdown button on the WebGui and everything else should have been shutdown (like VMs, Docker, etc.). I even have a UPS to prevent that. I hope you could help me identify the Problem. unraid-vault-diagnostics-20220308-1610.zip Quote Link to comment
Squid Posted March 9, 2022 Share Posted March 9, 2022 Hi there. What is this pool named cache_zfs_test Is it actually a ZFS pool? If so, it should not be mounted within /mnt Use /mnt/disks instead It appears at first glance that the ZFS plugin wasn't properly unmounting it, possibly because everything within /mnt should be reserved for OS managed devices only (hence use the /mnt/disks/... instead) Quote Link to comment
LTech Posted March 9, 2022 Share Posted March 9, 2022 Hi, no this pool is not a ZFS Pool, it is just the remaining of some testing I did with ZFS. I have the ZFS Plugin still installed, but it is just a normal Cache Pool with 4 drives in Raid 10. This should not pose a problem as far as I know. At least it didn't so far. Quote Link to comment
MattB425 Posted October 8, 2022 Share Posted October 8, 2022 I've been having on and off problems with unclean shutdowns and I have no idea why. Last one caused over 1,000 parity errors so I'm assuming a disk is going bad. How do I check this? Any help would be appreciated. mainframe-diagnostics-20221008-0751.zip syslog Quote Link to comment
dlandon Posted October 8, 2022 Author Share Posted October 8, 2022 7 minutes ago, MattB425 said: I've been having on and off problems with unclean shutdowns and I have no idea why. Last one caused over 1,000 parity errors so I'm assuming a disk is going bad. How do I check this? Any help would be appreciated. mainframe-diagnostics-20221008-0751.zip 127.13 kB · 0 downloads syslog 614.39 kB · 0 downloads Remove the following lines from your go file. Unraid is now handling this for you. modprobe i915 chmod -R 777 /dev/dri Reboot. Quote Link to comment
MattB425 Posted October 8, 2022 Share Posted October 8, 2022 (edited) 7 hours ago, dlandon said: Remove the following lines from your go file. Unraid is now handling this for you. modprobe i915 chmod -R 777 /dev/dri Reboot. Done. Hoping that helps. Thank you very much for your help. edit: Unfortunately still getting crashes/unclean shut downs and don't know why. Edited October 8, 2022 by MattB425 Quote Link to comment
harley dmello Posted October 18, 2022 Share Posted October 18, 2022 The unclean shutdown is required when you want to delete the history of the forum? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.