DjSamLb
Members-
Posts
55 -
Joined
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Everything posted by DjSamLb
-
Hi I'm having the same problems without any extra settings My extra configuration: #unassigned_devices_start #Unassigned devices share includes include = /tmp/unassigned.devices/smb-settings.conf #unassigned_devices_end
-
Can you guys please add support for Micron P320h? Thanks
-
@Joshj23 Thanks @ChatNoir
-
Hey, Wanted to know if u got it to work?
-
Noted thank you for great feedback!
-
attached thanks! tower-smart-20201205-1444.zip
-
I precleared and have this 6TB WD Red running smoothly for four years already. I ran Parity Check many times throughout the years, but yesterday's Parity Check gave me disk read errors. Should I be concerned and copy all my data to another disk? Should I RMA this drive ASAP? I have attached my SMART log text file. smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD60EFRX-68L0BN1 Serial Number: WD-WX31DB58YLD1 LU WWN Device Id: 5 0014ee 2b7e89483 Firmware Version: 82.00A82 User Capacity: 6,001,175,126,016 bytes [6.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5700 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sat Dec 5 09:39:35 2020 EET SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM feature is: Unavailable Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 5744) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 711) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 3 Spin_Up_Time POS--K 198 196 021 - 9100 4 Start_Stop_Count -O--CK 097 097 000 - 3621 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 100 253 000 - 0 9 Power_On_Hours -O--CK 049 049 000 - 37300 10 Spin_Retry_Count -O--CK 100 100 000 - 0 11 Calibration_Retry_Count -O--CK 100 253 000 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 49 192 Power-Off_Retract_Count -O--CK 200 200 000 - 22 193 Load_Cycle_Count -O--CK 195 195 000 - 15797 194 Temperature_Celsius -O---K 130 103 000 - 22 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 0 198 Offline_Uncorrectable ----CK 100 253 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 6 Ext. Comprehensive SMART error log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x21 GPL R/O 1 Write stream error log 0x22 GPL R/O 1 Read stream error log 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log 0xa8-0xb6 GPL,SL VS 1 Device vendor specific log 0xb7 GPL,SL VS 54 Device vendor specific log 0xbd GPL,SL VS 1 Device vendor specific log 0xc0 GPL,SL VS 1 Device vendor specific log 0xc1 GPL VS 93 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (6 sectors) Device Error Count: 1 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1 [0] occurred at disk power-on lifetime: 37274 hours (1553 days + 2 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 03 78 8f 70 40 00 Error: UNC at LBA = 0x03788f70 = 58232688 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 04 00 00 00 00 00 01 4f 6d b8 40 00 20d+10:45:32.933 READ FPDMA QUEUED 60 04 00 00 00 00 00 01 4f 69 b8 40 00 20d+10:45:32.930 READ FPDMA QUEUED 60 04 00 00 00 00 00 01 4f 65 b8 40 00 20d+10:45:32.923 READ FPDMA QUEUED 60 04 00 00 00 00 00 01 4f 61 b8 40 00 20d+10:45:32.919 READ FPDMA QUEUED 60 04 00 00 00 00 00 01 4f 5d b8 40 00 20d+10:45:32.915 READ FPDMA QUEUED SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 37287 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 258 (0x0102) Device State: Stand-by (1) Current Temperature: 22 Celsius Power Cycle Min/Max Temperature: 20/28 Celsius Lifetime Min/Max Temperature: 19/49 Celsius Under/Over Temperature Limit Count: 0/0 Vendor specific: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -41/85 Celsius Temperature History Size (Index): 478 (225) Index Estimated Time Temperature Celsius 226 2020-12-05 01:42 24 ***** ... ..(254 skipped). .. ***** 3 2020-12-05 05:57 24 ***** 4 2020-12-05 05:58 23 **** ... ..( 5 skipped). .. **** 10 2020-12-05 06:04 23 **** 11 2020-12-05 06:05 22 *** 12 2020-12-05 06:06 22 *** 13 2020-12-05 06:07 22 *** 14 2020-12-05 06:08 25 ****** ... ..( 5 skipped). .. ****** 20 2020-12-05 06:14 25 ****** 21 2020-12-05 06:15 24 ***** ... ..(203 skipped). .. ***** 225 2020-12-05 09:39 24 ***** SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds) Device Statistics (GP/SMART Log 0x04) not supported Pending Defects log (GP Log 0x0c) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 0 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 1 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000d 2 0 Non-CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x8000 4 1858907 Vendor specific tower-smart-20201205-0939.zip
-
some more gui is irresponsive at this stage this happened earlier in the morning a couple hours ago I did a hard reset Oct 5 09:47:45 Tower kernel: dump_stack+0x67/0x83 Oct 5 09:47:45 Tower kernel: nmi_cpu_backtrace+0x71/0x83 Oct 5 09:47:45 Tower kernel: ? lapic_can_unplug_cpu+0x97/0x97 Oct 5 09:47:45 Tower kernel: nmi_trigger_cpumask_backtrace+0x57/0xd4 Oct 5 09:47:45 Tower kernel: rcu_dump_cpu_stacks+0x8b/0xb4 Oct 5 09:47:45 Tower kernel: rcu_check_callbacks+0x296/0x5a0 Oct 5 09:47:45 Tower kernel: update_process_times+0x24/0x47 Oct 5 09:47:45 Tower kernel: tick_sched_timer+0x36/0x64 Oct 5 09:47:45 Tower kernel: __hrtimer_run_queues+0xb7/0x10b Oct 5 09:47:45 Tower kernel: ? tick_sched_handle.isra.0+0x2f/0x2f Oct 5 09:47:45 Tower kernel: hrtimer_interrupt+0xf4/0x20e Oct 5 09:47:45 Tower kernel: smp_apic_timer_interrupt+0x7b/0x93 Oct 5 09:47:45 Tower kernel: apic_timer_interrupt+0xf/0x20 Oct 5 09:47:45 Tower kernel: </IRQ> Oct 5 09:47:45 Tower kernel: RIP: 0010:queued_write_lock_slowpath+0x5e/0x6b Oct 5 09:47:45 Tower kernel: Code: 00 00 00 eb 25 ba ff 00 00 00 f0 0f b1 13 85 c0 75 e5 48 89 ef c6 07 00 0f 1f 40 00 5b 5d c3 f0 0f b1 13 3d 00 01 00 00 74 e8 <8b> 03 3d 00 01 00 00 74 ec f3 90 eb f3 55 53 48 89 fb 65 8b 35 19 Oct 5 09:47:45 Tower kernel: RSP: 0018:ffffc9000ee9fac0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 Oct 5 09:47:45 Tower kernel: RAX: 00000000000001ff RBX: ffff888820d96060 RCX: 0000000000000000 Oct 5 09:47:45 Tower kernel: RDX: 00000000000000ff RSI: ffff8890330a6000 RDI: ffff888820d96060 Oct 5 09:47:45 Tower kernel: RBP: ffff888820d96064 R08: 000000000000000b R09: ffff88867d7a8000 Oct 5 09:47:45 Tower kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002 Oct 5 09:47:45 Tower kernel: R13: 0000000000000000 R14: ffffc9000ee9fb60 R15: ffff888820d96230 Oct 5 09:47:45 Tower kernel: btrfs_try_tree_write_lock+0x1d/0x55 Oct 5 09:47:45 Tower kernel: lock_extent_buffer_for_io+0x1c/0x1ab Oct 5 09:47:45 Tower kernel: btree_write_cache_pages+0x28c/0x2ce Oct 5 09:47:45 Tower kernel: ? btrfs_wq_submit_bio+0x9d/0xad Oct 5 09:47:45 Tower kernel: do_writepages+0x28/0x51 Oct 5 09:47:45 Tower kernel: __writeback_single_inode+0x36/0x15a Oct 5 09:47:45 Tower kernel: writeback_sb_inodes+0x1e7/0x373 Oct 5 09:47:45 Tower kernel: __writeback_inodes_wb+0x63/0x9a Oct 5 09:47:45 Tower kernel: wb_writeback+0x11f/0x1c3 Oct 5 09:47:45 Tower kernel: wb_workfn+0x1d1/0x253 Oct 5 09:47:45 Tower kernel: process_one_work+0x16e/0x24f Oct 5 09:47:45 Tower kernel: worker_thread+0x1e2/0x2b8 Oct 5 09:47:45 Tower kernel: ? rescuer_thread+0x2a7/0x2a7 Oct 5 09:47:45 Tower kernel: kthread+0x10c/0x114 Oct 5 09:47:45 Tower kernel: ? kthread_park+0x89/0x89 Oct 5 09:47:45 Tower kernel: ret_from_fork+0x1f/0x40 Oct 5 09:48:28 Tower nginx: 2020/10/05 09:48:28 [error] 8051#8051: *17680 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.101.141, server: , request: "GET /plugins/ipmi/include/ipmi_temp.php?unit=C&dot=. HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower:801", referrer: "http://tower:801/Dashboard" Oct 5 09:48:28 Tower nginx: 2020/10/05 09:48:28 [error] 8051#8051: *17697 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.101.141, server: , request: "POST /webGui/include/DashUpdate.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower:801", referrer: "http://tower:801/Dashboard" Oct 5 09:48:28 Tower php-fpm[8025]: [WARNING] [pool www] server reached max_children setting (50), consider raising it Oct 5 09:48:28 Tower nginx: 2020/10/05 09:48:28 [error] 8051#8051: *17695 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.101.141, server: , request: "POST /webGui/include/DashboardApps.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower:801", referrer: "http://tower:801/Dashboard" Oct 5 09:48:32 Tower kernel: rcu: INFO: rcu_bh self-detected stall on CPU Oct 5 09:48:32 Tower kernel: rcu: 12-....: (1563708 ticks this GP) idle=b6e/1/0x4000000000000002 softirq=939787/947039 fqs=354696 Oct 5 09:48:32 Tower kernel: rcu: (t=1500026 jiffies g=-947 q=36) Oct 5 09:48:32 Tower kernel: NMI backtrace for cpu 12 Oct 5 09:48:32 Tower kernel: CPU: 12 PID: 11089 Comm: btrfs-transacti Tainted: G D O 4.19.107-Unraid #1 Oct 5 09:48:32 Tower kernel: Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.2 01/16/2015 Oct 5 09:48:32 Tower kernel: Call Trace: Oct 5 09:48:32 Tower kernel: <IRQ> Oct 5 09:48:32 Tower kernel: dump_stack+0x67/0x83 Oct 5 09:48:32 Tower kernel: nmi_cpu_backtrace+0x71/0x83 Oct 5 09:48:32 Tower kernel: ? lapic_can_unplug_cpu+0x97/0x97 Oct 5 09:48:32 Tower kernel: nmi_trigger_cpumask_backtrace+0x57/0xd4 Oct 5 09:48:32 Tower kernel: rcu_dump_cpu_stacks+0x8b/0xb4 Oct 5 09:48:32 Tower kernel: rcu_check_callbacks+0x296/0x5a0 Oct 5 09:48:32 Tower kernel: update_process_times+0x24/0x47 Oct 5 09:48:32 Tower kernel: tick_sched_timer+0x36/0x64 Oct 5 09:48:32 Tower kernel: __hrtimer_run_queues+0xb7/0x10b Oct 5 09:48:32 Tower kernel: ? tick_sched_handle.isra.0+0x2f/0x2f Oct 5 09:48:32 Tower kernel: hrtimer_interrupt+0xf4/0x20e Oct 5 09:48:32 Tower kernel: smp_apic_timer_interrupt+0x7b/0x93 Oct 5 09:48:32 Tower kernel: apic_timer_interrupt+0xf/0x20 Oct 5 09:48:32 Tower kernel: </IRQ> Oct 5 09:48:32 Tower kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x63/0x171 Oct 5 09:48:32 Tower kernel: Code: 2f 08 b8 00 01 00 00 0f 42 f0 8b 07 30 e4 09 c6 f7 c6 00 ff ff ff 74 0e 81 e6 00 ff 00 00 75 1a c6 47 01 00 eb 14 85 f6 74 0a <8b> 07 84 c0 74 04 f3 90 eb f6 66 c7 07 01 00 c3 48 c7 c2 40 07 02 Oct 5 09:48:32 Tower kernel: RSP: 0018:ffffc9000998fae0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 Oct 5 09:48:32 Tower kernel: RAX: 0000000000000101 RBX: ffff888820d96060 RCX: 0000000000000000 Oct 5 09:48:32 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff888820d96064 Oct 5 09:48:32 Tower kernel: RBP: ffff888820d96064 R08: ffff8890330a6000 R09: 0000000000000001 Oct 5 09:48:32 Tower kernel: R10: 0000000000000000 R11: ffff8890330a7130 R12: 0000000000000001 Oct 5 09:48:32 Tower kernel: R13: ffff888ff596f000 R14: 0000000000000001 R15: ffff88867eba0938 Oct 5 09:48:32 Tower kernel: queued_write_lock_slowpath+0x23/0x6b Oct 5 09:48:32 Tower kernel: btrfs_try_tree_write_lock+0x1d/0x55 Oct 5 09:48:32 Tower kernel: btrfs_search_slot+0x7a9/0x84a Oct 5 09:48:32 Tower kernel: lookup_inline_extent_backref+0x118/0x58d Oct 5 09:48:32 Tower kernel: __btrfs_free_extent+0xf2/0x90f Oct 5 09:48:32 Tower kernel: __btrfs_run_delayed_refs+0xa72/0xbf4 Oct 5 09:48:32 Tower kernel: ? pick_next_task_fair+0x3eb/0x480 Oct 5 09:48:32 Tower kernel: ? put_prev_entity+0x21/0x415 Oct 5 09:48:32 Tower kernel: btrfs_run_delayed_refs+0x5d/0x16d Oct 5 09:48:32 Tower kernel: btrfs_commit_transaction+0x54/0x79c Oct 5 09:48:32 Tower kernel: ? start_transaction+0x293/0x333 Oct 5 09:48:32 Tower kernel: transaction_kthread+0xd7/0x144 Oct 5 09:48:32 Tower kernel: ? btrfs_cleanup_transaction+0x492/0x492 Oct 5 09:48:32 Tower kernel: kthread+0x10c/0x114 Oct 5 09:48:32 Tower kernel: ? kthread_park+0x89/0x89 Oct 5 09:48:32 Tower kernel: ret_from_fork+0x1f/0x40 Oct 5 09:48:32 Tower kernel: Sending NMI from CPU 12 to CPUs 17: Oct 5 09:48:32 Tower kernel: NMI backtrace for cpu 17 Oct 5 09:48:32 Tower kernel: CPU: 17 PID: 26425 Comm: kworker/u48:18 Tainted: G D O 4.19.107-Unraid #1 Oct 5 09:48:32 Tower kernel: Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.2 01/16/2015 Oct 5 09:48:32 Tower kernel: Workqueue: writeback wb_workfn (flush-btrfs-2) Oct 5 09:48:32 Tower kernel: RIP: 0010:queued_write_lock_slowpath+0x60/0x6b Oct 5 09:48:32 Tower kernel: Code: 00 eb 25 ba ff 00 00 00 f0 0f b1 13 85 c0 75 e5 48 89 ef c6 07 00 0f 1f 40 00 5b 5d c3 f0 0f b1 13 3d 00 01 00 00 74 e8 8b 03 <3d> 00 01 00 00 74 ec f3 90 eb f3 55 53 48 89 fb 65 8b 35 19 bf f8 Oct 5 09:48:32 Tower kernel: RSP: 0018:ffffc9000ee9fac0 EFLAGS: 00000206 Oct 5 09:48:32 Tower kernel: RAX: 00000000000001ff RBX: ffff888820d96060 RCX: 0000000000000000 Oct 5 09:48:32 Tower kernel: RDX: 00000000000000ff RSI: ffff8890330a6000 RDI: ffff888820d96060 Oct 5 09:48:32 Tower kernel: RBP: ffff888820d96064 R08: 000000000000000b R09: ffff88867d7a8000 Oct 5 09:48:32 Tower kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002 Oct 5 09:48:32 Tower kernel: R13: 0000000000000000 R14: ffffc9000ee9fb60 R15: ffff888820d96230 Oct 5 09:48:32 Tower kernel: FS: 0000000000000000(0000) GS:ffff88903fa40000(0000) knlGS:0000000000000000 Oct 5 09:48:32 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 5 09:48:32 Tower kernel: CR2: 000000649dcfdfd4 CR3: 0000000001e0a003 CR4: 00000000001626e0 Oct 5 09:48:32 Tower kernel: DR0: 0000000002a259fb DR1: 0000000000000000 DR2: 0000000000000000 Oct 5 09:48:32 Tower kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 5 09:48:32 Tower kernel: Call Trace: Oct 5 09:48:32 Tower kernel: btrfs_try_tree_write_lock+0x1d/0x55 Oct 5 09:48:32 Tower kernel: lock_extent_buffer_for_io+0x1c/0x1ab Oct 5 09:48:32 Tower kernel: btree_write_cache_pages+0x28c/0x2ce Oct 5 09:48:32 Tower kernel: ? btrfs_wq_submit_bio+0x9d/0xad Oct 5 09:48:32 Tower kernel: do_writepages+0x28/0x51 Oct 5 09:48:32 Tower kernel: __writeback_single_inode+0x36/0x15a Oct 5 09:48:32 Tower kernel: writeback_sb_inodes+0x1e7/0x373 Oct 5 09:48:32 Tower kernel: __writeback_inodes_wb+0x63/0x9a Oct 5 09:48:32 Tower kernel: wb_writeback+0x11f/0x1c3 Oct 5 09:48:32 Tower kernel: wb_workfn+0x1d1/0x253 Oct 5 09:48:32 Tower kernel: process_one_work+0x16e/0x24f Oct 5 09:48:32 Tower kernel: worker_thread+0x1e2/0x2b8 Oct 5 09:48:32 Tower kernel: ? rescuer_thread+0x2a7/0x2a7 Oct 5 09:48:32 Tower kernel: kthread+0x10c/0x114 Oct 5 09:48:32 Tower kernel: ? kthread_park+0x89/0x89 Oct 5 09:48:32 Tower kernel: ret_from_fork+0x1f/0x40 Oct 5 09:48:43 Tower nginx: 2020/10/05 09:48:43 [error] 8051#8051: *17855 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.101.141, server: , request: "POST /webGui/include/DashUpdate.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower:801", referrer: "http://tower:801/Dashboard"
-
-
some more log entries: Oct 5 09:39:32 Tower kernel: kthread+0x10c/0x114 Oct 5 09:39:32 Tower kernel: ? kthread_park+0x89/0x89 Oct 5 09:39:32 Tower kernel: ret_from_fork+0x1f/0x40 Oct 5 09:39:32 Tower kernel: Sending NMI from CPU 12 to CPUs 17: Oct 5 09:39:32 Tower kernel: NMI backtrace for cpu 17 Oct 5 09:39:32 Tower kernel: CPU: 17 PID: 26425 Comm: kworker/u48:18 Tainted: G D O 4.19.107-Unraid #1 Oct 5 09:39:32 Tower kernel: Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.2 01/16/2015 Oct 5 09:39:32 Tower kernel: Workqueue: writeback wb_workfn (flush-btrfs-2) Oct 5 09:39:32 Tower kernel: RIP: 0010:queued_write_lock_slowpath+0x67/0x6b Oct 5 09:39:32 Tower kernel: Code: 00 f0 0f b1 13 85 c0 75 e5 48 89 ef c6 07 00 0f 1f 40 00 5b 5d c3 f0 0f b1 13 3d 00 01 00 00 74 e8 8b 03 3d 00 01 00 00 74 ec <f3> 90 eb f3 55 53 48 89 fb 65 8b 35 19 bf f8 7e 81 e6 00 ff 1f 00 Oct 5 09:39:32 Tower kernel: RSP: 0018:ffffc9000ee9fac0 EFLAGS: 00000206 Oct 5 09:39:32 Tower kernel: RAX: 00000000000001ff RBX: ffff888820d96060 RCX: 0000000000000000 Oct 5 09:39:32 Tower kernel: RDX: 00000000000000ff RSI: ffff8890330a6000 RDI: ffff888820d96060 Oct 5 09:39:32 Tower kernel: RBP: ffff888820d96064 R08: 000000000000000b R09: ffff88867d7a8000 Oct 5 09:39:32 Tower kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002 Oct 5 09:39:32 Tower kernel: R13: 0000000000000000 R14: ffffc9000ee9fb60 R15: ffff888820d96230 Oct 5 09:39:32 Tower kernel: FS: 0000000000000000(0000) GS:ffff88903fa40000(0000) knlGS:0000000000000000 Oct 5 09:39:32 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 5 09:39:32 Tower kernel: CR2: 000000649dcfdfd4 CR3: 0000000001e0a003 CR4: 00000000001626e0 Oct 5 09:39:32 Tower kernel: DR0: 0000000002a259fb DR1: 0000000000000000 DR2: 0000000000000000 Oct 5 09:39:32 Tower kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 5 09:39:32 Tower kernel: Call Trace: Oct 5 09:39:32 Tower kernel: btrfs_try_tree_write_lock+0x1d/0x55 Oct 5 09:39:32 Tower kernel: lock_extent_buffer_for_io+0x1c/0x1ab Oct 5 09:39:32 Tower kernel: btree_write_cache_pages+0x28c/0x2ce Oct 5 09:39:32 Tower kernel: ? btrfs_wq_submit_bio+0x9d/0xad Oct 5 09:39:32 Tower kernel: do_writepages+0x28/0x51 Oct 5 09:39:32 Tower kernel: __writeback_single_inode+0x36/0x15a Oct 5 09:39:32 Tower kernel: writeback_sb_inodes+0x1e7/0x373 Oct 5 09:39:32 Tower kernel: __writeback_inodes_wb+0x63/0x9a Oct 5 09:39:32 Tower kernel: wb_writeback+0x11f/0x1c3 Oct 5 09:39:32 Tower kernel: wb_workfn+0x1d1/0x253 Oct 5 09:39:32 Tower kernel: process_one_work+0x16e/0x24f Oct 5 09:39:32 Tower kernel: worker_thread+0x1e2/0x2b8 Oct 5 09:39:32 Tower kernel: ? rescuer_thread+0x2a7/0x2a7 Oct 5 09:39:32 Tower kernel: kthread+0x10c/0x114 Oct 5 09:39:32 Tower kernel: ? kthread_park+0x89/0x89 Oct 5 09:39:32 Tower kernel: ret_from_fork+0x1f/0x40 Oct 5 09:39:40 Tower kernel: md: recovery thread: PQ corrected, sector=628430296
-
Hey guys, Having a new error with very high cpu load: Oct 5 09:27:32 Tower kernel: <IRQ> Oct 5 09:27:32 Tower kernel: dump_stack+0x67/0x83 Oct 5 09:27:32 Tower kernel: nmi_cpu_backtrace+0x71/0x83 Oct 5 09:27:32 Tower kernel: ? lapic_can_unplug_cpu+0x97/0x97 Oct 5 09:27:32 Tower kernel: nmi_trigger_cpumask_backtrace+0x57/0xd4 Oct 5 09:27:32 Tower kernel: rcu_dump_cpu_stacks+0x8b/0xb4 Oct 5 09:27:32 Tower kernel: rcu_check_callbacks+0x296/0x5a0 Oct 5 09:27:32 Tower kernel: update_process_times+0x24/0x47 Oct 5 09:27:32 Tower kernel: tick_sched_timer+0x36/0x64 Oct 5 09:27:32 Tower kernel: __hrtimer_run_queues+0xb7/0x10b Oct 5 09:27:32 Tower kernel: ? tick_sched_handle.isra.0+0x2f/0x2f Oct 5 09:27:32 Tower kernel: hrtimer_interrupt+0xf4/0x20e Oct 5 09:27:32 Tower kernel: smp_apic_timer_interrupt+0x7b/0x93 Oct 5 09:27:32 Tower kernel: apic_timer_interrupt+0xf/0x20 Oct 5 09:27:32 Tower kernel: </IRQ> Oct 5 09:27:32 Tower kernel: RIP: 0010:queued_write_lock_slowpath+0x69/0x6b Oct 5 09:27:32 Tower kernel: Code: 0f b1 13 85 c0 75 e5 48 89 ef c6 07 00 0f 1f 40 00 5b 5d c3 f0 0f b1 13 3d 00 01 00 00 74 e8 8b 03 3d 00 01 00 00 74 ec f3 90 <eb> f3 55 53 48 89 fb 65 8b 35 19 bf f8 7e 81 e6 00 ff 1f 00 74 0c Oct 5 09:27:32 Tower kernel: RSP: 0018:ffffc9000ee9fac0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 Oct 5 09:27:32 Tower kernel: RAX: 00000000000001ff RBX: ffff888820d96060 RCX: 0000000000000000 Oct 5 09:27:32 Tower kernel: RDX: 00000000000000ff RSI: ffff8890330a6000 RDI: ffff888820d96060 Oct 5 09:27:32 Tower kernel: RBP: ffff888820d96064 R08: 000000000000000b R09: ffff88867d7a8000 Oct 5 09:27:32 Tower kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002 Oct 5 09:27:32 Tower kernel: R13: 0000000000000000 R14: ffffc9000ee9fb60 R15: ffff888820d96230 Oct 5 09:27:32 Tower kernel: btrfs_try_tree_write_lock+0x1d/0x55 Oct 5 09:27:32 Tower kernel: lock_extent_buffer_for_io+0x1c/0x1ab Oct 5 09:27:32 Tower kernel: btree_write_cache_pages+0x28c/0x2ce Oct 5 09:27:32 Tower kernel: ? btrfs_wq_submit_bio+0x9d/0xad Oct 5 09:27:32 Tower kernel: do_writepages+0x28/0x51 Oct 5 09:27:32 Tower kernel: __writeback_single_inode+0x36/0x15a Oct 5 09:27:32 Tower kernel: writeback_sb_inodes+0x1e7/0x373 Oct 5 09:27:32 Tower kernel: __writeback_inodes_wb+0x63/0x9a Oct 5 09:27:32 Tower kernel: wb_writeback+0x11f/0x1c3 Oct 5 09:27:32 Tower kernel: wb_workfn+0x1d1/0x253 Oct 5 09:27:32 Tower kernel: process_one_work+0x16e/0x24f Oct 5 09:27:32 Tower kernel: worker_thread+0x1e2/0x2b8 Oct 5 09:27:32 Tower kernel: ? rescuer_thread+0x2a7/0x2a7 Oct 5 09:27:32 Tower kernel: kthread+0x10c/0x114 Oct 5 09:27:32 Tower kernel: ? kthread_park+0x89/0x89 Oct 5 09:27:32 Tower kernel: ret_from_fork+0x1f/0x40
-
Turned out to be the vlan issue, used a dedicated network card for vlans and no more hangs!
-
Another Hang today managed to copy over the syslog before crashing any ideas? don't know where to start syslog
-
-
It seems the problem was the A/C unit in the server room switched to Heat lol Although CPU temps weren't affected I'm guessing it couldn't handle all the heat
-
I will try safe mode tonight From your experience if it's a hardware problem, what might be reoccurring every 24 hours or so. For example a PSU either works or it's fried and doesn't provide power. But here it's not the case, the server crashed after 24hours of stability thanks a lot for all the help
-
Anyway I can troubleshoot further? any tool or logging level?
-
Any possibility it might be a PSU or Hardware failure like MB or CPU?
-
I've always had custom IP addresses and never had any problems, I don't think it's the issue here any other ideas?
-
This is the syslog it froze around 4:12AM syslog-192.168.100.133.log
-
-
I will remove the Mellanox NIC and restart Thank you!
-
-
And here is the syslog and diagnostics syslog192.168.100.133.log tower-diagnostics-20200624-1212.zip