Jump to content

Alex.vision

Members
  • Content Count

    132
  • Joined

  • Last visited

Community Reputation

0 Neutral

About Alex.vision

  • Rank
    Advanced Member
  • Birthday 12/21/1983

Converted

  • Gender
    Male
  • Location
    Friday Harbor, WA
  • Personal Text
    Unraid1 176TB. Unraid2 136TB. Unraid3 54TB.

Recent Profile Visitors

751 profile views
  1. OK, I'll put this on hold for now. Going to be out of action for the next two weeks. When I get back Ill swap all the parts with my duplicate system and see if i get the same errors. Then hopefully I can narrow down the problem, one by one. Thanks @johnnie.black for the assistance!! -Alex
  2. Before I can change out all of the hardware to test the above hypothesis im trying to duplicate the data to another server. I strt the system and then intiate a transfer and see how much data can be pulled before it crashes. I just looked at my transfer and noticed it had stopped. So I logged into the web page, which displayed. I got three notifications about plugin updates, and when I clicked the plugins tab, the system refreshed to a blank page with the header still showing and the chrome busy icon in the tab. I quickly opened a terminal and tried to pull info from the syslog. root@Media:~# tail -f /var/log/syslog Mar 4 18:56:09 Media kernel: start_secondary+0x197/0x1b2 Mar 4 18:56:09 Media kernel: secondary_startup_64+0xa4/0xb0 Mar 4 18:56:32 Media kernel: sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 Mar 4 18:56:32 Media kernel: sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x2a 2a 00 00 00 08 01 00 00 01 00 Mar 4 18:56:32 Media kernel: print_req_error: I/O error, dev sda, sector 2049 Mar 4 18:56:32 Media kernel: Buffer I/O error on dev sda1, logical block 1, lost async page write Mar 4 18:56:32 Media kernel: sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 Mar 4 18:56:32 Media kernel: sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x2a 2a 00 00 00 17 91 00 00 01 00 Mar 4 18:56:32 Media kernel: print_req_error: I/O error, dev sda, sector 6033 Mar 4 18:56:32 Media kernel: Buffer I/O error on dev sda1, logical block 3985, lost async page write Mar 4 19:11:09 Media kernel: RDX: 0000000000000000 RSI: 0000000021bf5b50 RDI: 0000000000000000 Mar 4 19:11:09 Media kernel: RBP: 00000589298a6770 R08: 00000589298a6770 R09: 000000000000573e Mar 4 19:11:09 Media kernel: R10: 000000008f3bab38 R11: 071c71c71c71c71c R12: 0000000000000002 Mar 4 19:11:09 Media kernel: R13: ffffffff81e5e2a0 R14: 0000000000000000 R15: ffffffff81e5e378 Mar 4 19:11:09 Media kernel: ? cpuidle_enter_state+0xbf/0x141 Mar 4 19:11:09 Media kernel: do_idle+0x17e/0x1fc Mar 4 19:11:09 Media kernel: cpu_startup_entry+0x6a/0x6c Mar 4 19:11:09 Media kernel: start_secondary+0x197/0x1b2 Mar 4 19:11:09 Media kernel: secondary_startup_64+0xa4/0xb0 Mar 4 19:13:09 Media login[15244]: ROOT LOGIN on '/dev/pts/1' Mar 4 19:14:09 Media kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Mar 4 19:14:09 Media kernel: rcu: 4-...0: (0 ticks this GP) idle=cd2/1/0x4000000000000000 softirq=314663/314663 fqs=814935 Mar 4 19:14:09 Media kernel: rcu: 11-...0: (2 GPs behind) idle=98e/0/0x1 softirq=365175/365176 fqs=814935 Mar 4 19:14:09 Media kernel: rcu: (detected by 8, t=3300092 jiffies, g=1434025, q=713807) Mar 4 19:14:09 Media kernel: Sending NMI from CPU 8 to CPUs 4: Mar 4 19:14:09 Media kernel: NMI backtrace for cpu 4 Mar 4 19:14:09 Media kernel: CPU: 4 PID: 15614 Comm: unraidd7 Tainted: G O 4.19.98-Unraid #1 Mar 4 19:14:09 Media kernel: Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 5406 11/13/2019 Mar 4 19:14:09 Media kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x11e/0x171 Mar 4 19:14:09 Media kernel: Code: 48 03 04 cd 20 37 db 81 48 89 10 8b 42 08 85 c0 75 04 f3 90 eb f5 48 8b 0a 48 85 c9 74 c9 0f 0d 09 8b 07 66 85 c0 74 04 f3 90 <eb> f5 41 89 c0 66 45 31 c0 44 39 c6 74 0a 48 85 c9 c6 07 01 75 1b Mar 4 19:14:09 Media kernel: RSP: 0018:ffffc9000bd4fe80 EFLAGS: 00000002 Mar 4 19:14:09 Media kernel: RAX: 0000000000140101 RBX: ffffc9000bd4fec0 RCX: 0000000000000000 Mar 4 19:14:09 Media kernel: RDX: ffff88840e720740 RSI: 0000000000140000 RDI: ffff88840bfe8498 Mar 4 19:14:09 Media kernel: RBP: ffff88840bfe8498 R08: 000000000000029c R09: 0000000000000000 Mar 4 19:14:09 Media kernel: R10: 0000000000000020 R11: ffff88840e71fb40 R12: 0000000000000246 Mar 4 19:14:09 Media kernel: R13: ffff88840bfe8498 R14: ffff8883d4c81b00 R15: ffffc9000bc0faf0 Mar 4 19:14:09 Media kernel: FS: 0000000000000000(0000) GS:ffff88840e700000(0000) knlGS:0000000000000000 Mar 4 19:14:09 Media kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 4 19:14:09 Media kernel: CR2: 00001460ca2b2000 CR3: 0000000001e0a000 CR4: 0000000000340ee0 Mar 4 19:14:09 Media kernel: Call Trace: Mar 4 19:14:09 Media kernel: _raw_spin_lock_irqsave+0x29/0x31 Mar 4 19:14:09 Media kernel: prepare_to_wait_event+0x13/0xd2 Mar 4 19:14:09 Media kernel: md_thread+0x8f/0x115 [md_mod] Mar 4 19:14:09 Media kernel: ? wait_woken+0x6a/0x6a Mar 4 19:14:09 Media kernel: ? md_open+0x2c/0x2c [md_mod] Mar 4 19:14:09 Media kernel: kthread+0x10c/0x114 Mar 4 19:14:09 Media kernel: ? kthread_park+0x89/0x89 Mar 4 19:14:09 Media kernel: ret_from_fork+0x22/0x40 Mar 4 19:14:09 Media kernel: Sending NMI from CPU 8 to CPUs 11: Mar 4 19:14:09 Media kernel: NMI backtrace for cpu 11 Mar 4 19:14:09 Media kernel: CPU: 11 PID: 0 Comm: swapper/11 Tainted: G O 4.19.98-Unraid #1 Mar 4 19:14:09 Media kernel: Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 5406 11/13/2019 Mar 4 19:14:09 Media kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x6b/0x171 Mar 4 19:14:09 Media kernel: Code: 42 f0 8b 07 30 e4 09 c6 f7 c6 00 ff ff ff 74 0e 81 e6 00 ff 00 00 75 1a c6 47 01 00 eb 14 85 f6 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c2 40 07 02 00 65 48 03 15 48 6d f8 Mar 4 19:14:09 Media kernel: RSP: 0018:ffff88840e8c3c80 EFLAGS: 00000002 Mar 4 19:14:09 Media kernel: RAX: 0000000000140101 RBX: ffff88840bfe8498 RCX: 0000000000000000 Mar 4 19:14:09 Media kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88840bfe8498 Mar 4 19:14:09 Media kernel: RBP: 0000000000000003 R08: 0000000000000000 R09: ffff88840e8c3ca0 Mar 4 19:14:09 Media kernel: R10: 0000000000000000 R11: ffff88840e71fb40 R12: 0000000000000046 Mar 4 19:14:09 Media kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Mar 4 19:14:09 Media kernel: FS: 0000000000000000(0000) GS:ffff88840e8c0000(0000) knlGS:0000000000000000 Mar 4 19:14:09 Media kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 4 19:14:09 Media kernel: CR2: 00001460ca290000 CR3: 0000000001e0a000 CR4: 0000000000340ee0 Mar 4 19:14:09 Media kernel: Call Trace: Mar 4 19:14:09 Media kernel: <IRQ> Mar 4 19:14:09 Media kernel: _raw_spin_lock_irqsave+0x29/0x31 Mar 4 19:14:09 Media kernel: __wake_up_common_lock+0x5b/0xcb Mar 4 19:14:09 Media kernel: end_request+0x178/0x18e [md_mod] Mar 4 19:14:09 Media kernel: blk_update_request+0x114/0x21e Mar 4 19:14:09 Media kernel: scsi_end_request+0x29/0x203 Mar 4 19:14:09 Media kernel: scsi_io_completion+0x27c/0x4fa Mar 4 19:14:09 Media kernel: blk_mq_complete_request+0xea/0xef Mar 4 19:14:09 Media kernel: _scsih_io_done+0x6c5/0x6d7 [mpt3sas] Mar 4 19:14:09 Media kernel: ? load_balance+0x124/0x713 Mar 4 19:14:09 Media kernel: ? __accumulate_pelt_segments+0x1d/0x2c Mar 4 19:14:09 Media kernel: ? __update_load_avg_se+0xeb/0x19c Mar 4 19:14:09 Media kernel: _base_interrupt+0x1aa/0xe0a [mpt3sas] Mar 4 19:14:09 Media kernel: __handle_irq_event_percpu+0x36/0xcb Mar 4 19:14:09 Media kernel: handle_irq_event_percpu+0x2c/0x6f Mar 4 19:14:09 Media kernel: handle_irq_event+0x34/0x51 Mar 4 19:14:09 Media kernel: handle_edge_irq+0xfc/0x11f Mar 4 19:14:09 Media kernel: handle_irq+0x1c/0x1f Mar 4 19:14:09 Media kernel: do_IRQ+0x46/0xd0 Mar 4 19:14:09 Media kernel: common_interrupt+0xf/0xf Mar 4 19:14:09 Media kernel: </IRQ> Mar 4 19:14:09 Media kernel: RIP: 0010:cpuidle_enter_state+0xe8/0x141 Mar 4 19:14:09 Media kernel: Code: ff 45 84 f6 74 1d 9c 58 0f 1f 44 00 00 0f ba e0 09 73 09 0f 0b fa 66 0f 1f 44 00 00 31 ff e8 a8 8f bb ff fb 66 0f 1f 44 00 00 <48> 2b 2c 24 b8 ff ff ff 7f 48 b9 ff ff ff ff f3 01 00 00 48 39 cd Mar 4 19:14:09 Media kernel: RSP: 0018:ffffc90001a0be98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc Mar 4 19:14:09 Media kernel: RAX: ffff88840e8dfac0 RBX: ffff888408eb0c00 RCX: 000000000000001f Mar 4 19:14:09 Media kernel: RDX: 0000000000000000 RSI: 0000000021bf5b50 RDI: 0000000000000000 Mar 4 19:14:09 Media kernel: RBP: 00000589298a6770 R08: 00000589298a6770 R09: 000000000000573e Mar 4 19:14:09 Media kernel: R10: 000000008f3bab38 R11: 071c71c71c71c71c R12: 0000000000000002 Mar 4 19:14:09 Media kernel: R13: ffffffff81e5e2a0 R14: 0000000000000000 R15: ffffffff81e5e378 Mar 4 19:14:09 Media kernel: ? cpuidle_enter_state+0xbf/0x141 Mar 4 19:14:09 Media kernel: do_idle+0x17e/0x1fc Mar 4 19:14:09 Media kernel: cpu_startup_entry+0x6a/0x6c Mar 4 19:14:09 Media kernel: start_secondary+0x197/0x1b2 Mar 4 19:14:09 Media kernel: secondary_startup_64+0xa4/0xb0 I don't know if this helps narrow down the case of the lock up. I checked, I can still see the Overview page, but have no SMB access. I took a picture of the overview page and attached it. All information regarding the live CPU stats is frozen. Also I looked up the device listed in the errors above, sda1, sda is my flash drive. I'm at a loss for now. Thanks for the help!!
  3. Hmm, ok, I have an Identical system, I can try swapping all my drives over to that one and try to narrow it down to hardware or hw incompatibility if that seems like a viable option.
  4. Oh, right, I forgot about that. Sometimes it's hard to remember that your internet can go down. I have a dedicated gigabit synchronous fiber line that hasn't gone down in over a year, I forget that it is uncommon.
  5. It may be a bit overkill, but I use Duo on a few servers, it can push a request for login authentication to your device and allow you to approve it. It also has 2FA revolving codes, but I like the seamlessness of tapping "Approve" on my smartwatch instead of getting out my device, logging into Lastpass authenticator or Google or Auth, then typing in the codes. Duo or any 2FA isn't for everyone, but my vote would be to have it at least as an option.
  6. OK, Round two. I made the suggested changes from above; I also changed out my Marvell based HBA to an LSI model. I also upgraded to version 6.8.2 with the same results. I had thought the problem might have been solved, but the past week and a half my server really struggles to maintain any uptime. It locks up constantly, but I think I finally managed to get a log file that might help. In my last attempt to fix my problem myself, I erased my flash drive and started over, importing just a few settings from the old one. I thought maybe something on it was causing problems. Apparently not. I’m uploading to log files, one mid lockup and one after I hard reset the system. I can see a bunch of information starting on line 2810, but I don’t know enough about Linux to say if it really is bug or what not. I really hope this is something I can fix. alex.vision (crash) media-syslog-20200227-0511.zip alex.vision (next boot) media-syslog-20200227-1636.zip
  7. Well it has been a few days, and I have been walking on eggshells when it comes to running my server. It seems that either turning off the ram overclock or changing the Power Supply Idle Control seems to have remedied my random reboot issue. Thank you @johnnie.black and @Squid for the troubleshooting help. I have 2 days and 16 hours of uptime, which is way better than it has been in the past. I still have docker disabled but i'll work things back in one step at a time. A functioning system minus a few features is way better than no system at all. Thanks for the help!!
  8. jonnie.black Thanks for the help, I'll change the ram settings and look for the power setting when I get done with work. I would have done it this morning but I wanted to let memtest run for a good 24 hours. Could you tell me where that Ram chart came from? I guess I thought that because it was 3600 speed ram it wasnt really overclocked, even though I had to use XMP or DOCP. I think I remember the old days when it said 2667, 3200(OC), 3600(OC) on the box. I'll do some googling on the "typical current idle" too. Thanks
  9. Ah ok. Ill Reboot and start a Memtest Cycle
  10. Hello fellow UnRaiders, Issue My machine locks up randomly, never running for more than 2 or 3 days. I had originally thought it was related to my pi-hole docker, it seemed to produce the problem more frequently when I was running docker. However, with that disabled my machine still crashes. I can’t ping the system or reach any share. When I had the GUI enabled it would be completely frozen at the login screen. A few times I had been logged into the system and was running htop full screen, it would also be frozen, requiring a hard reset. System stability is so bad, I can start the computer in the morning and within a few hours it has locked up. Machine Specs MB: ASUS Prime X470-Pro (Bios Version 5406) CPU: AMD Ryzen 5 3600X (Stock speeds) (Stock Cooler) Ram: G.SKILL Ripjaws V Series 16GB (2 x 8GB) 288-Pin DDR4 SDRAM DDR4 3600 PSU: CORSAIR RM Series, RM850, 850 Watt HBA: AOC-SAS2LP-MV8 (I know it’s a Marvell based card; I’m swapping it for another LSI I have) External Das Card: LSI SAS 9212-4I4E Single Connected to 16 Bay Das. Drives: 22 8TB Data Drives, 1 8TB Parity and 1 8TB Cache Drive, mostly Seagate, some Western Digital. GPU: Gigabyte Video Card Graphics Cards GV-R523D3-1GL REV2.0 Attempted Fixes Ran in safe mode, no plugins, disabled docker, disabled VM’s. Disabled HT in bios. Disabled any extra mb ports, like serial and floppy support. Ran without the array started. I have run in GUI mode, GUI Safemode, no GUI and headless. I have run fix common problems, and other than the complaint about the Marvell, all seems well. No obvious issues. I haven't been able to run Memtest86 on this new board, for some reason it won't launch, it instantly reboots after I select it in the UnRaid boot menu. Unraid Versions I don’t keep the best records of the versions which work and which have problems. I can't remember what version I was on when this problem started after I built the thing. I do know I’m currently on 6.8.2 as of today, I was on 6.7.2 and worked my way up through each RC version up to 6.8.2 now. On each version I had crashes, but I just today went to 6.8.2 so it may or may not last. If (when) I crash on 6.8.2 I will post that current syslog too. Final Thoughts I’m attaching two photos of what displays on my monitor when the system locks up. I’m also attaching 2 diagnostic files, 1 from just after rebooting from a lockup (6.8.1). The second one is after my update to 6.8.2, I'm not expecting either log to have any big revelations, I think the photos of the screen might lead me in a better direction, if I knew what they meant. I'm at a complete loss here, and would love if someone could help me diagnose my problem. Alex.Vision Log 6.8.1 post Crash 1.zip Alex.Vision Log 6.8.2.zip
  11. Oh, ok. Ouch, these 8TB take a while, but I guess that's the cost of doing business with large drives. Thanks for the help, it is greatly appreciated. Now I have to figure out why the thing crashed in the first place. On to the next mission....
  12. I guess I missed the part about it being a different log. My wife set down my dinner right as I was typing this up, so steak and potatoes made me read a bit faster than usual. I knew I was going to miss something, I so rarely have issues, that when one comes around I lose basic sense and become "that guy who can't rtfm". I've attached the log you requested. MEDIA-preclear.disk-20181009-0749.zip
  13. I'm experiencing a slight problem with gfjardim's preclear plugin. I was right in the middle of a 3 cycle run of preclears on 2x 8TB drives when my server locked up and the webUI and the local GUI were completely frozen. I was forced to hard reset the server. After coming on line and restarting my array, I was able to resume both preclear runs. One of them began running right from where it left off. The second drive seems to be stuck at "starting", but nothing is happening. That was over 12 hours ago. I left it overnight and while I was at work just in case. The preclear log for that run seems to be stuck at " Type Yes to proceed:" I'm running unRaid 6.5.2 Preclear Ver 2018.09.20 I know there are updates to both, I was right in the middle of clearing these drives, so I didn't want to change anything and have to start the process over. Any thoughts on what I can do to get this preclear to resume from where it left off, or do I need to start over? Can I update the preclear plugin while running a preclear on a drive? My first reaction was that I better not. Thanks for any help. syslog.txt
  14. I'm not sure if anyone noticed but the picture you uploaded of your Asus router shows you using 4-6MB/s. If your camera traffic really is only that much, and is traveling over the router, than its hardly any traffic at all. I may be missing something but it sounds like a different issue. I have a few cameras on my network and I have their settings turned down quite a bit so they use almost no traffic. If your bit rate and fps are low enough then you can mitigate the camera bandwidth. I would take a top down view of your network segments, specifically the wiring of file server to the backup server, server to media player and cameras to DVD computer. If any of these run on the same cable or the same switch fed from the same cable you could run into issues. I just finished transferring about 80TB of data from my Unraid server through my network to my backup computer at the other end. All while streaming videos on my and my wife's computer. I have a single cable going to my server room switch and all that traffic from all of my clients travels down one wire. So I should have experienced something similar I would think. I'm wondering if it was not a network traffic issue but a potential disk io problem. Perhaps the media you were streaming was on one of the drives being hampered by the file transfer. Curious to find a solution. I'm late to this party if your already ordering new network cards but I felt obligated to discuss some of the basic things I saw. But it is past my bed time so maybe I'm seeing things that aren't there. Sent from my iPhone using Tapatalk