Unreasonably Slow. Unable to Access Folders. Stops Reading Files, etc

Nyghthawk · January 18, 2018

Hopefully that topic covers the jist of everything.

About 3-5 days ago, maybe less, these issues started. No hardware change. No power failures, my system had been up for 40+ days at that point.

Prior to this, there was ONE folder that would sometimes take 10-30 seconds to load, and that was because it has 30-40k FILES/FOLDERS in its main directory.

Other than this, never really had trouble unless I was trying to do like 30-40 things to the server at one point.

Issue now:

Lately when I try and open ANY folder on the server, it takes 40-50 seconds + (up to 2-5 mins) to just open a simple folder that would take half a second prior (like normal).

So i tried just a simple restart.

Same thing.

So then I said, let me update to 6.4. Restarted.

Same Thing.

So i have tried "Fix Common Problems" Scan. This normally would take 1-2 mins to run. I am at 10 mins and it is still saying "Scanning"

I have tried to look at the log, and it just tries to load the log over and over and over.

I have a VM on the server, (ubuntu) I log into that, and use the command line to access these same directories, and it lists everything for me instantly.

I am trying to figure out if its a particular drive, file system, maybe a windows thing?

I cannot troubleshoot this.

I can access the terminal, so please help me with some command lines to get this diagnosed.

Anyone?

JorgeB · January 18, 2018

Type diagnostics on the terminal and upload the resulting zip.

Nyghthawk · January 18, 2018

7 hours ago, johnnie.black said:

Type diagnostics on the terminal and upload the resulting zip.

EDIT: added tower-diagnostics-20180118-1124.zip

Trying now

Aprox. How long should I wait for this to collect before I tag it into the "unreasonably" slow category?

~8 mins still here

Also tried to shut down array, download diagnostic from webgui, and shutdown the system, all would not run. Next option is hard power at the actual hardware.

These are at the end of the "log" file in Tools--->SysLog

Jan 18 08:45:10 Tower login[100967]: ROOT LOGIN  on '/dev/pts/1'
Jan 18 08:52:20 Tower root: Fix Common Problems Version 2018.01.18
Jan 18 08:52:23 Tower root: Fix Common Problems Version 2018.01.18
Jan 18 08:54:20 Tower nginx: 2018/01/18 08:54:20 [error] 9775#9775: *124825 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.102, server: , request: "POST /plugins/fix.common.problems/include/fixExec.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower.local", referrer: "http://tower.local/Settings/FixProblems"
Jan 18 08:54:23 Tower nginx: 2018/01/18 08:54:23 [error] 9775#9775: *124659 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.102, server: , request: "POST /plugins/fix.common.problems/include/fixExec.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower.local", referrer: "http://tower.local/Settings/FixProblems"
Jan 18 09:18:20 Tower root: Fix Common Problems Version 2018.01.18
Jan 18 09:19:49 Tower root: Fix Common Problems Version 2018.01.18
Jan 18 09:20:01 Tower root: Fix Common Problems Version 2018.01.18
Jan 18 09:20:06 Tower root: Fix Common Problems Version 2018.01.18
Jan 18 09:22:01 Tower nginx: 2018/01/18 09:22:01 [error] 9775#9775: *126675 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.102, server: , request: "POST /plugins/fix.common.problems/include/fixExec.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower.local", referrer: "http://tower.local/Settings/FixProblems"
Jan 18 09:22:06 Tower nginx: 2018/01/18 09:22:06 [error] 9775#9775: *126995 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.102, server: , request: "POST /plugins/fix.common.problems/include/fixExec.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower.local", referrer: "http://tower.local/Settings/FixProblems"
Jan 18 09:29:04 Tower login[87910]: ROOT LOGIN  on '/dev/pts/1'
Jan 18 09:35:06 Tower nginx: 2018/01/18 09:35:06 [error] 9775#9775: *129157 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.102, server: , request: "POST /update.htm HTTP/1.1", upstream: "http://unix:/var/run/emhttpd.socket/update.htm", host: "tower.local", referrer: "http://tower.local/Main"
Jan 18 09:41:38 Tower shutdown[124016]: shutting down for system halt
Jan 18 09:42:09 Tower shutdown[125495]: shutting down for system halt

So I did the unthinkable and just hard shut it off at the system.

It has rebooted and started a parity check. I will see how this turns out.

Thought id just try and open a folder while going through this check to see if its better. Nope, still unreasonably slow just trying to open a single folder.

My normal parity check speed is about 80-100+mb

It has been going 2.2mb/sec since starting

otal size:	8 TB
Elapsed time:	18 minutes
Current position:	17.4 GB (0.2 %)
Estimated speed:	1.7 MB/sec
Estimated finish:	55 days, 2 hours, 43 minutes

Updated Speed!!!

Total size:	8 TB
Elapsed time:	30 minutes
Current position:	17.4 GB (0.2 %)
Estimated speed:	881.3 KB/sec
Estimated finish:

More issues: I can restart it. it will parity check fast, then all of a sudden slow back down at RANDOM times.

All Dockers/VMS disabled

I am able to access data for the time the parity check is running fast then it hits a snag and this slowdown happens (can be 5 seconds 5 mins or 50 mins)

I reseated all my drives, will try reconnecting every SATA port next

Here it as after a reboot, will not try anything else to it at all right now

Total size:	8 TB
Elapsed time:	3 minutes
Current position:	25.0 GB (0.3 %)
Estimated speed:	110.3 MB/sec
Estimated finish:	20 hours, 6 minutes
Sync errors detected:	0

Waiting to see how long this lasts

ONe thing to note, when the parity check is working the drives show activity

But when the parity slows down (back to the original unable to access anything) the drives show 0 read/0 write

And its done

Itll just keep going down and down in speed and not do anything more.

watching the log as this happened, this is what i gathered was new

Jan 18 12:34:02 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
Jan 18 12:34:02 Tower kernel: IP: isci_task_abort_task+0x18/0x334 [isci]
Jan 18 12:34:02 Tower kernel: PGD 0 P4D 0
Jan 18 12:34:02 Tower kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Jan 18 12:34:02 Tower kernel: Modules linked in: xfs nfsd lockd grace sunrpc md_mod bonding igb ptp pps_core i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ipmi_ssif ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd isci libsas intel_cstate intel_uncore intel_rapl_perf ahci libahci i2c_i801 i2c_core scsi_transport_sas wmi ipmi_si button [last unloaded: pps_core]
Jan 18 12:34:02 Tower kernel: CPU: 0 PID: 29155 Comm: kworker/u264:3 Not tainted 4.14.13-unRAID #1
Jan 18 12:34:02 Tower kernel: Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
Jan 18 12:34:02 Tower kernel: Workqueue: scsi_tmf_8 scmd_eh_abort_handler
Jan 18 12:34:02 Tower kernel: task: ffff880c05881c00 task.stack: ffffc90023394000
Jan 18 12:34:02 Tower kernel: RIP: 0010:isci_task_abort_task+0x18/0x334 [isci]
Jan 18 12:34:02 Tower kernel: RSP: 0018:ffffc90023397ce8 EFLAGS: 00010296
Jan 18 12:34:02 Tower kernel: RAX: ffffffffa00b4cbb RBX: ffff880bd77bf9a8 RCX: 0000000000000000
Jan 18 12:34:02 Tower kernel: RDX: ffff880c0d020420 RSI: 0000000000002100 RDI: 0000000000000000
Jan 18 12:34:02 Tower kernel: RBP: 0000000000000000 R08: 000000942cd7ac00 R09: 0000000000000000
Jan 18 12:34:02 Tower kernel: R10: ffffffff81c03ea8 R11: ffff880c05881cc0 R12: ffff880c030f1800
Jan 18 12:34:02 Tower kernel: R13: 0000000000000000 R14: 0000000000000008 R15: 0000000000000000
Jan 18 12:34:02 Tower kernel: FS: 0000000000000000(0000) GS:ffff880c0d400000(0000) knlGS:0000000000000000
Jan 18 12:34:02 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 18 12:34:02 Tower kernel: CR2: 0000000000000000 CR3: 0000000001c0a001 CR4: 00000000001606f0
Jan 18 12:34:02 Tower kernel: Call Trace:
Jan 18 12:34:02 Tower kernel: ? __freed_request+0x34/0x86
Jan 18 12:34:02 Tower kernel: ? freed_request+0x31/0x4b
Jan 18 12:34:02 Tower kernel: ? __blk_put_request+0xed/0x131
Jan 18 12:34:02 Tower kernel: ? blk_put_request+0x38/0x4b
Jan 18 12:34:02 Tower kernel: ? scsi_execute+0x168/0x177
Jan 18 12:34:02 Tower kernel: ? scsi_test_unit_ready+0x45/0x99
Jan 18 12:34:02 Tower kernel: ? cpuacct_charge+0x2c/0x6b
Jan 18 12:34:02 Tower kernel: ? __accumulate_pelt_segments+0x1d/0x2a
Jan 18 12:34:02 Tower kernel: ? dequeue_entity+0x49a/0x4bf
Jan 18 12:34:02 Tower kernel: ? pick_next_task_fair+0x227/0x3d9
Jan 18 12:34:02 Tower kernel: ? put_prev_entity+0x21/0x2d9
Jan 18 12:34:02 Tower kernel: sas_eh_abort_handler+0x2a/0x3c [libsas]
Jan 18 12:34:02 Tower kernel: scmd_eh_abort_handler+0x35/0x8f
Jan 18 12:34:02 Tower kernel: process_one_work+0x146/0x239
Jan 18 12:34:02 Tower kernel: ? rescuer_thread+0x258/0x258
Jan 18 12:34:02 Tower kernel: worker_thread+0x1c3/0x292
Jan 18 12:34:02 Tower kernel: kthread+0x10f/0x117
Jan 18 12:34:02 Tower kernel: ? kthread_create_on_node+0x3a/0x3a
Jan 18 12:34:02 Tower kernel: ? do_group_exit+0x95/0x95
Jan 18 12:34:02 Tower kernel: ret_from_fork+0x1f/0x30
Jan 18 12:34:02 Tower kernel: Code: e8 5d 41 5c 41 5d c3 b8 05 00 00 00 c3 b8 05 00 00 00 c3 41 57 41 56 41 55 41 54 55 48 89 fd 53 4c 8d 75 08 48 81 ec 68 01 00 00 <48> 8b 07 c7 44 24 10 00 00 00 00 c7 44 24 18 00 00 00 00 48 8b
Jan 18 12:34:02 Tower kernel: RIP: isci_task_abort_task+0x18/0x334 [isci] RSP: ffffc90023397ce8
Jan 18 12:34:02 Tower kernel: CR2: 0000000000000000
Jan 18 12:34:02 Tower kernel: ---[ end trace 8b38e50124b0ffca ]---

Trying two things....

Seemed FixCommon Problems would stall everything (it randomly started prior to the previous error)

Removed that

Also started in maintenance mode to do a full parity check.

Crossing Fingers, I have currently passed my longest "check" so far.

Total size:	8 TB
Elapsed time:	14 minutes
Current position:	91.3 GB (1.1 %)
Estimated speed:	112.1 MB/sec
Estimated finish:	19 hours, 36 minutes

Update: Got 4 hours + in parity check and was still averaging 80-100mb/sec which is NORMAL.

Stopped parity check. Stopped maintenance mode. Started Array, Started parity check again. Seemed to be running fine.

Turned VM back on. Turned ONE docker (the last docker img I installed). Plex.

So far Plex Running. VM Running. And Parity check going strong. Will enable docker(s) one by one and let run to see if any of the dockers are causing this problem, else seems like its the Fix Common Problem thing.

Edited January 19, 2018 by Nyghthawk

compund_soil · July 1, 2022

This is EXACTLY what I'm experiencing right now. I will try your mitigation steps when I get home. Thank you for thorough write-up! Did the parity check in 'maintenance mode' clear up everything? I assumed I had some permission issues, but I'll try your approach first.

Unreasonably Slow. Unable to Access Folders. Stops Reading Files, etc

Recommended Posts

Nyghthawk

Link to comment

JorgeB

Link to comment

Nyghthawk

Link to comment

compund_soil

Link to comment

Join the conversation