Nyghthawk Posted January 18, 2018 Share Posted January 18, 2018 Hopefully that topic covers the jist of everything. About 3-5 days ago, maybe less, these issues started. No hardware change. No power failures, my system had been up for 40+ days at that point. Prior to this, there was ONE folder that would sometimes take 10-30 seconds to load, and that was because it has 30-40k FILES/FOLDERS in its main directory. Other than this, never really had trouble unless I was trying to do like 30-40 things to the server at one point. Issue now: Lately when I try and open ANY folder on the server, it takes 40-50 seconds + (up to 2-5 mins) to just open a simple folder that would take half a second prior (like normal). So i tried just a simple restart. Same thing. So then I said, let me update to 6.4. Restarted. Same Thing. So i have tried "Fix Common Problems" Scan. This normally would take 1-2 mins to run. I am at 10 mins and it is still saying "Scanning" I have tried to look at the log, and it just tries to load the log over and over and over. I have a VM on the server, (ubuntu) I log into that, and use the command line to access these same directories, and it lists everything for me instantly. I am trying to figure out if its a particular drive, file system, maybe a windows thing? I cannot troubleshoot this. I can access the terminal, so please help me with some command lines to get this diagnosed. Anyone? Quote Link to comment
JorgeB Posted January 18, 2018 Share Posted January 18, 2018 Type diagnostics on the terminal and upload the resulting zip. Quote Link to comment
Nyghthawk Posted January 18, 2018 Author Share Posted January 18, 2018 (edited) 7 hours ago, johnnie.black said: Type diagnostics on the terminal and upload the resulting zip. EDIT: added tower-diagnostics-20180118-1124.zip Trying now Aprox. How long should I wait for this to collect before I tag it into the "unreasonably" slow category? ~8 mins still here Also tried to shut down array, download diagnostic from webgui, and shutdown the system, all would not run. Next option is hard power at the actual hardware. These are at the end of the "log" file in Tools--->SysLog Jan 18 08:45:10 Tower login[100967]: ROOT LOGIN on '/dev/pts/1' Jan 18 08:52:20 Tower root: Fix Common Problems Version 2018.01.18 Jan 18 08:52:23 Tower root: Fix Common Problems Version 2018.01.18 Jan 18 08:54:20 Tower nginx: 2018/01/18 08:54:20 [error] 9775#9775: *124825 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.102, server: , request: "POST /plugins/fix.common.problems/include/fixExec.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower.local", referrer: "http://tower.local/Settings/FixProblems" Jan 18 08:54:23 Tower nginx: 2018/01/18 08:54:23 [error] 9775#9775: *124659 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.102, server: , request: "POST /plugins/fix.common.problems/include/fixExec.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower.local", referrer: "http://tower.local/Settings/FixProblems" Jan 18 09:18:20 Tower root: Fix Common Problems Version 2018.01.18 Jan 18 09:19:49 Tower root: Fix Common Problems Version 2018.01.18 Jan 18 09:20:01 Tower root: Fix Common Problems Version 2018.01.18 Jan 18 09:20:06 Tower root: Fix Common Problems Version 2018.01.18 Jan 18 09:22:01 Tower nginx: 2018/01/18 09:22:01 [error] 9775#9775: *126675 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.102, server: , request: "POST /plugins/fix.common.problems/include/fixExec.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower.local", referrer: "http://tower.local/Settings/FixProblems" Jan 18 09:22:06 Tower nginx: 2018/01/18 09:22:06 [error] 9775#9775: *126995 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.102, server: , request: "POST /plugins/fix.common.problems/include/fixExec.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower.local", referrer: "http://tower.local/Settings/FixProblems" Jan 18 09:29:04 Tower login[87910]: ROOT LOGIN on '/dev/pts/1' Jan 18 09:35:06 Tower nginx: 2018/01/18 09:35:06 [error] 9775#9775: *129157 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.102, server: , request: "POST /update.htm HTTP/1.1", upstream: "http://unix:/var/run/emhttpd.socket/update.htm", host: "tower.local", referrer: "http://tower.local/Main" Jan 18 09:41:38 Tower shutdown[124016]: shutting down for system halt Jan 18 09:42:09 Tower shutdown[125495]: shutting down for system halt So I did the unthinkable and just hard shut it off at the system. It has rebooted and started a parity check. I will see how this turns out. Thought id just try and open a folder while going through this check to see if its better. Nope, still unreasonably slow just trying to open a single folder. My normal parity check speed is about 80-100+mb It has been going 2.2mb/sec since starting otal size: 8 TB Elapsed time: 18 minutes Current position: 17.4 GB (0.2 %) Estimated speed: 1.7 MB/sec Estimated finish: 55 days, 2 hours, 43 minutes Updated Speed!!! Total size: 8 TB Elapsed time: 30 minutes Current position: 17.4 GB (0.2 %) Estimated speed: 881.3 KB/sec Estimated finish: More issues: I can restart it. it will parity check fast, then all of a sudden slow back down at RANDOM times. All Dockers/VMS disabled I am able to access data for the time the parity check is running fast then it hits a snag and this slowdown happens (can be 5 seconds 5 mins or 50 mins) I reseated all my drives, will try reconnecting every SATA port next Here it as after a reboot, will not try anything else to it at all right now Total size: 8 TB Elapsed time: 3 minutes Current position: 25.0 GB (0.3 %) Estimated speed: 110.3 MB/sec Estimated finish: 20 hours, 6 minutes Sync errors detected: 0 Waiting to see how long this lasts ONe thing to note, when the parity check is working the drives show activity But when the parity slows down (back to the original unable to access anything) the drives show 0 read/0 write And its done Itll just keep going down and down in speed and not do anything more. watching the log as this happened, this is what i gathered was new Jan 18 12:34:02 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at (null)Jan 18 12:34:02 Tower kernel: IP: isci_task_abort_task+0x18/0x334 [isci]Jan 18 12:34:02 Tower kernel: PGD 0 P4D 0 Jan 18 12:34:02 Tower kernel: Oops: 0000 [#1] PREEMPT SMP PTIJan 18 12:34:02 Tower kernel: Modules linked in: xfs nfsd lockd grace sunrpc md_mod bonding igb ptp pps_core i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ipmi_ssif ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd isci libsas intel_cstate intel_uncore intel_rapl_perf ahci libahci i2c_i801 i2c_core scsi_transport_sas wmi ipmi_si button [last unloaded: pps_core]Jan 18 12:34:02 Tower kernel: CPU: 0 PID: 29155 Comm: kworker/u264:3 Not tainted 4.14.13-unRAID #1Jan 18 12:34:02 Tower kernel: Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014Jan 18 12:34:02 Tower kernel: Workqueue: scsi_tmf_8 scmd_eh_abort_handlerJan 18 12:34:02 Tower kernel: task: ffff880c05881c00 task.stack: ffffc90023394000Jan 18 12:34:02 Tower kernel: RIP: 0010:isci_task_abort_task+0x18/0x334 [isci]Jan 18 12:34:02 Tower kernel: RSP: 0018:ffffc90023397ce8 EFLAGS: 00010296Jan 18 12:34:02 Tower kernel: RAX: ffffffffa00b4cbb RBX: ffff880bd77bf9a8 RCX: 0000000000000000Jan 18 12:34:02 Tower kernel: RDX: ffff880c0d020420 RSI: 0000000000002100 RDI: 0000000000000000Jan 18 12:34:02 Tower kernel: RBP: 0000000000000000 R08: 000000942cd7ac00 R09: 0000000000000000Jan 18 12:34:02 Tower kernel: R10: ffffffff81c03ea8 R11: ffff880c05881cc0 R12: ffff880c030f1800Jan 18 12:34:02 Tower kernel: R13: 0000000000000000 R14: 0000000000000008 R15: 0000000000000000Jan 18 12:34:02 Tower kernel: FS: 0000000000000000(0000) GS:ffff880c0d400000(0000) knlGS:0000000000000000Jan 18 12:34:02 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033Jan 18 12:34:02 Tower kernel: CR2: 0000000000000000 CR3: 0000000001c0a001 CR4: 00000000001606f0Jan 18 12:34:02 Tower kernel: Call Trace:Jan 18 12:34:02 Tower kernel: ? __freed_request+0x34/0x86Jan 18 12:34:02 Tower kernel: ? freed_request+0x31/0x4bJan 18 12:34:02 Tower kernel: ? __blk_put_request+0xed/0x131Jan 18 12:34:02 Tower kernel: ? blk_put_request+0x38/0x4bJan 18 12:34:02 Tower kernel: ? scsi_execute+0x168/0x177Jan 18 12:34:02 Tower kernel: ? scsi_test_unit_ready+0x45/0x99Jan 18 12:34:02 Tower kernel: ? cpuacct_charge+0x2c/0x6bJan 18 12:34:02 Tower kernel: ? __accumulate_pelt_segments+0x1d/0x2aJan 18 12:34:02 Tower kernel: ? dequeue_entity+0x49a/0x4bfJan 18 12:34:02 Tower kernel: ? pick_next_task_fair+0x227/0x3d9Jan 18 12:34:02 Tower kernel: ? put_prev_entity+0x21/0x2d9Jan 18 12:34:02 Tower kernel: sas_eh_abort_handler+0x2a/0x3c [libsas]Jan 18 12:34:02 Tower kernel: scmd_eh_abort_handler+0x35/0x8fJan 18 12:34:02 Tower kernel: process_one_work+0x146/0x239Jan 18 12:34:02 Tower kernel: ? rescuer_thread+0x258/0x258Jan 18 12:34:02 Tower kernel: worker_thread+0x1c3/0x292Jan 18 12:34:02 Tower kernel: kthread+0x10f/0x117Jan 18 12:34:02 Tower kernel: ? kthread_create_on_node+0x3a/0x3aJan 18 12:34:02 Tower kernel: ? do_group_exit+0x95/0x95Jan 18 12:34:02 Tower kernel: ret_from_fork+0x1f/0x30Jan 18 12:34:02 Tower kernel: Code: e8 5d 41 5c 41 5d c3 b8 05 00 00 00 c3 b8 05 00 00 00 c3 41 57 41 56 41 55 41 54 55 48 89 fd 53 4c 8d 75 08 48 81 ec 68 01 00 00 <48> 8b 07 c7 44 24 10 00 00 00 00 c7 44 24 18 00 00 00 00 48 8b Jan 18 12:34:02 Tower kernel: RIP: isci_task_abort_task+0x18/0x334 [isci] RSP: ffffc90023397ce8Jan 18 12:34:02 Tower kernel: CR2: 0000000000000000Jan 18 12:34:02 Tower kernel: ---[ end trace 8b38e50124b0ffca ]--- Trying two things.... Seemed FixCommon Problems would stall everything (it randomly started prior to the previous error) Removed that Also started in maintenance mode to do a full parity check. Crossing Fingers, I have currently passed my longest "check" so far. Total size: 8 TB Elapsed time: 14 minutes Current position: 91.3 GB (1.1 %) Estimated speed: 112.1 MB/sec Estimated finish: 19 hours, 36 minutes Update: Got 4 hours + in parity check and was still averaging 80-100mb/sec which is NORMAL. Stopped parity check. Stopped maintenance mode. Started Array, Started parity check again. Seemed to be running fine. Turned VM back on. Turned ONE docker (the last docker img I installed). Plex. So far Plex Running. VM Running. And Parity check going strong. Will enable docker(s) one by one and let run to see if any of the dockers are causing this problem, else seems like its the Fix Common Problem thing. Edited January 19, 2018 by Nyghthawk Quote Link to comment
compund_soil Posted July 1, 2022 Share Posted July 1, 2022 This is EXACTLY what I'm experiencing right now. I will try your mitigation steps when I get home. Thank you for thorough write-up! Did the parity check in 'maintenance mode' clear up everything? I assumed I had some permission issues, but I'll try your approach first. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.