• Posts

    17
  • Joined

  • Last visited

Everything posted by [email protected]

  1. unraid-diagnostics-20210103-2337.zip ErrorWarningSystemArrayLogin Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: [sds] tag#2942 CDB: opcode=0x88 88 00 00 00 00 00 02 65 f4 d0 00 00 04 00 00 00 Jan 3 23:38:12 UNRAID kernel: scsi target7:0:18: handle(0x001d), sas_address(0x5001e67464d08fee), phy(14) Jan 3 23:38:12 UNRAID kernel: scsi target7:0:18: enclosure logical id(0x5001e67464d08fff), slot(14) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: task abort: SUCCESS scmd(00000000f342ddfc) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: attempting task abort! scmd(0000000036962349) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: [sds] tag#2941 CDB: opcode=0x88 88 00 00 00 00 00 02 65 f0 d0 00 00 04 00 00 00 Jan 3 23:38:12 UNRAID kernel: scsi target7:0:18: handle(0x001d), sas_address(0x5001e67464d08fee), phy(14) Jan 3 23:38:12 UNRAID kernel: scsi target7:0:18: enclosure logical id(0x5001e67464d08fff), slot(14) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: task abort: SUCCESS scmd(0000000036962349) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: attempting task abort! scmd(000000006879ad81) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: [sds] tag#2940 CDB: opcode=0x88 88 00 00 00 00 00 02 65 ec d0 00 00 04 00 00 00 Jan 3 23:38:12 UNRAID kernel: scsi target7:0:18: handle(0x001d), sas_address(0x5001e67464d08fee), phy(14) Jan 3 23:38:12 UNRAID kernel: scsi target7:0:18: enclosure logical id(0x5001e67464d08fff), slot(14) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: task abort: SUCCESS scmd(000000006879ad81) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: attempting task abort! scmd(000000004982ce18) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: [sds] tag#2939 CDB: opcode=0x88 88 00 00 00 00 00 02 65 e8 d0 00 00 04 00 00 00 Jan 3 23:38:12 UNRAID kernel: scsi target7:0:18: handle(0x001d), sas_address(0x5001e67464d08fee), phy(14) Jan 3 23:38:12 UNRAID kernel: scsi target7:0:18: enclosure logical id(0x5001e67464d08fff), slot(14) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: task abort: SUCCESS scmd(000000004982ce18) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: attempting task abort! scmd(000000005e533f88) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: [sds] tag#2938 CDB: opcode=0x88 88 00 00 00 00 00 02 65 e4 d0 00 00 04 00 00 00 Jan 3 23:38:12 UNRAID kernel: scsi target7:0:18: handle(0x001d), sas_address(0x5001e67464d08fee), phy(14) Jan 3 23:38:12 UNRAID kernel: scsi target7:0:18: enclosure logical id(0x5001e67464d08fff), slot(14) Jan 3 23:38:12 UNRAID kernel: sd 7:0:18:0: task abort: SUCCESS scmd(000000005e533f88) Jan 3 23:38:13 UNRAID kernel: sd 7:0:18:0: Power-on or device reset occurred Jan 3 23:38:13 UNRAID rc.diskinfo[8872]: SIGHUP received, forcing refresh of disks info. Jan 3 23:38:28 UNRAID kernel: mpt2sas_cm0: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302) Jan 3 23:38:28 UNRAID kernel: mpt2sas_cm0: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302) Jan 3 23:38:28 UNRAID kernel: mpt2sas_cm0: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302) Jan 3 23:38:28 UNRAID kernel: mpt2sas_cm0: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302) Jan 3 23:38:28 UNRAID kernel: mpt2sas_cm0: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302) Jan 3 23:38:28 UNRAID kernel: mpt2sas_cm0: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302) Jan 3 23:38:28 UNRAID kernel: mpt2sas_cm0: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302) Jan 3 23:38:28 UNRAID kernel: mpt2sas_cm0: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302) Jan 3 23:38:28 UNRAID kernel: mpt2sas_cm0: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302) Jan 3 23:38:28 UNRAID kernel: mpt2sas_cm0: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302) Jan 3 23:38:28 UNRAID kernel: mpt2sas_cm0: log_info(0x31120302): originator(PL), code(0x12), sub_code(0x0302) Jan 3 23:38:41 UNRAID kernel: sd 7:0:18:0: Power-on or device reset occurred Jan 3 23:38:41 UNRAID rc.diskinfo[8872]: SIGHUP received, forcing refresh of disks info.
  2. The memory is ECC corrected. So as IPMI isnt reporting anything in the log it shouuld be ok. I can switch back to my old MB for the storage array. And run a 30 day trial with dockers on the Supermicro board.
  3. Rebuild in progress. Not like the drives were not actually bad. Swapped some spares in for Disk 6 and 11. Mapped these with unassigned devices. So no data loss if the rebuild fails. just have to move the data back to the array. Isolated to, either the SAS backplane, SAS card, or SAS cables. Drives with arms out are those that dropped out. Isolated to 2 SAS cables/the SAS card with only 2 ports or SC846 backplane. Removed the card, added another RES2SV240. And the entire setup is running off the same 2 port LSI SAS card. Got a pretty crazy setup for a while. Once i get everything stable. I will probably hook another computer up to these 2 sas backplain ports and do some stress testing. I think i might just give up the sc846 and mount 24-30 drives on a piece of plywood on the wall. Any ideas? unraid-diagnostics-20201221-2353.zip
  4. I am going to rip out these disks. And figure out if they share a cable or something. SC846, I dbout they all share the same SAS cable. I will temporarily install them outside the enclosure. on SAS to SATA breakout cables. Disk 6ST8000AS0002-1NA17Z_Z840NY2V - 8 TB (sdw) Disk 11ST8000DM004-2CX188_WCT0DSW9 - 8 TB (sdr) sduST14000NM001G-2KJ103_ZL22SBSY - 14 TB (sdu) sdtWDC_WD140EDFZ-11A0VA0_9KGV6TSL - 14 TB (sdt) Disk 12WDC_WD80EZAZ-11TDBA0_7HJTPT5F - 8 TB (sdv) Any suggestions on how to recover from this. There are 3 disks with errors. unraid-diagnostics-20201221-2209.zip new 1.txt
  5. Not quite sure if this is the correct implementation. But I used a second port on my server and moved the dockers and VMs over. Br0 is only for the server, it is 10Gbe. Br5 is only for dockers and VMs, it is redundant 1Gbs, will set this up as a LAG later.
  6. So just happened again. Cant telnet. IMPI is reporting the following events? which are weird. The screen has the following on it. The IMPI log Event 509 roughly corresponds with the system becoming un-responsive.
  7. Does this kernel stuff mean anything bad? System has been up for 2 hours. Dec 1 20:44:20 UNRAID kernel: R10: 0000000000000098 R11: ffff889818870000 R12: 000000000000cd45 Dec 1 20:44:20 UNRAID kernel: R13: ffffffff81e91080 R14: 0000000000000000 R15: 000000000000b64c Dec 1 20:44:20 UNRAID kernel: FS: 0000000000000000(0000) GS:ffff888c4f600000(0000) knlGS:0000000000000000 Dec 1 20:44:20 UNRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 1 20:44:20 UNRAID kernel: CR2: 000056490c7dff78 CR3: 0000000001e0a001 CR4: 00000000001606f0 Dec 1 20:44:20 UNRAID kernel: Call Trace: Dec 1 20:44:20 UNRAID kernel: <IRQ> Dec 1 20:44:20 UNRAID kernel: ipv4_confirm+0xaf/0xb9 Dec 1 20:44:20 UNRAID kernel: nf_hook_slow+0x3a/0x90 Dec 1 20:44:20 UNRAID kernel: ip_local_deliver+0xad/0xdc Dec 1 20:44:20 UNRAID kernel: ? ip_sublist_rcv_finish+0x54/0x54 Dec 1 20:44:20 UNRAID kernel: ip_rcv+0xa0/0xbe Dec 1 20:44:20 UNRAID kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1 Dec 1 20:44:20 UNRAID kernel: __netif_receive_skb_one_core+0x53/0x6f Dec 1 20:44:20 UNRAID kernel: process_backlog+0x77/0x10e Dec 1 20:44:20 UNRAID kernel: net_rx_action+0x107/0x26c Dec 1 20:44:20 UNRAID kernel: __do_softirq+0xc9/0x1d7 Dec 1 20:44:20 UNRAID kernel: do_softirq_own_stack+0x2a/0x40 Dec 1 20:44:20 UNRAID kernel: </IRQ> Dec 1 20:44:20 UNRAID kernel: do_softirq+0x4d/0x5a Dec 1 20:44:20 UNRAID kernel: netif_rx_ni+0x1c/0x22 Dec 1 20:44:20 UNRAID kernel: macvlan_broadcast+0x111/0x156 [macvlan] Dec 1 20:44:20 UNRAID kernel: ? __switch_to_asm+0x41/0x70 Dec 1 20:44:20 UNRAID kernel: macvlan_process_broadcast+0xea/0x128 [macvlan] Dec 1 20:44:20 UNRAID kernel: process_one_work+0x16e/0x24f Dec 1 20:44:20 UNRAID kernel: worker_thread+0x1e2/0x2b8 Dec 1 20:44:20 UNRAID kernel: ? rescuer_thread+0x2a7/0x2a7 Dec 1 20:44:20 UNRAID kernel: kthread+0x10c/0x114 Dec 1 20:44:20 UNRAID kernel: ? kthread_park+0x89/0x89 Dec 1 20:44:20 UNRAID kernel: ret_from_fork+0x35/0x40 Dec 1 20:44:20 UNRAID kernel: ---[ end trace 716184adcfbc56ef ]--- Dec 1 22:36:54 UNRAID rpcbind[33908]: connect from 192.168.1.82 to getport/addr(mountd) Dec 1 22:36:54 UNRAID rpcbind[33909]: connect from 192.168.1.82 to getport/addr(mountd) Dec 1 22:36:54 UNRAID rpcbind[33910]: connect from 192.168.1.82 to getport/addr(mountd) Dec 1 22:36:54 UNRAID rpcbind[33911]: connect from 192.168.1.82 to getport/addr(mountd) Dec 1 22:36:59 UNRAID rpcbind[34030]: connect from unraid-diagnostics-20201201-2237.zip192.168.1.82 to getport/addr(mountd) Dec 1 22:36:59 UNRAID rpcbind[34031]: connect from 192.168.1.82 to getport/addr(mountd) Dec 1 22:36:59 UNRAID rpcbind[34032]: connect from 192.168.1.82 to getport/addr(mountd) Dec 1 22:36:59 UNRAID rpcbind[34033]: connect from 192.168.1.82 to getport/addr(mountd)
  8. I have been having some sever stability issues. Unraid UI partially locks up. shows the 502 gateway error. various others. I cant get it to shutdown. This has been repeated over months. Some times unraid wont last 48 hours. Other times it will last a couple of weeks. Had my cache drive go multiple times, i assume corruption was due to the lockups. Why is unraid so unstable right now? /etc/rc.d/rc.nginx restart /etc/rc.d/rc.php-fpm restart This doesnt work. Cant get a ssh shutdown to work unraid-diagnostics-20201201-2023.zip
  9. unraid-diagnostics-20201115-1453.zip I am at a loss at this point. A little stress out. Disk 1-2, 4-14 are recoverable. In theory I have no parity left and the only way to recover Disk 3 is for a rebuild to go successfully. SDD is the former Disk 4. I think I have some sort of bad IO issue??? Going to go check the cables. As I have had several drives drop out. 10Tb drive i was rebuilding to has dropped out; it was to be the repalcement Disk 3. Bought a 14 TB WD. It was to be the replacement for Disk 4. But I cant add it to the array as it is too large. Any idea why my drives are dropping out? How do i replace disk 3 and disk 4 with 14tb drives? I know that i should swap them to parity. But I don't want to do that now. SDE, SDQ, both have SMART errors, and are old, never removed them. I dont really want to rebuild over SDD as then I am putting that data at risk as well.
  10. Well backup and rebuilt of the cached pool worked. Pain in the ass. second time i have had to do it. Last time was due to an SSD biting the dust by having a ton of errors. This time, not so sure.
  11. I don't see any smart failures in pool. unraid-diagnostics-20201030-1035.zip
  12. I cant update my dockers. Plex stopped working. I think it cant write to the transcode folder? Based on my googling it is a BFRS cache pool issue. But I am a little out of my element on how to fixit. I tried to do a shutdown. but it couldn't get past syncing so i did an unclean shutdown and the cache drive was writable for a while. I deleted ~400Gb of stuff off of it. Then started a parity check, btrfs balance, and btrfs scrub with repair corrupted blocks checked. A couple of hours later the issue was back. Balance status Data, RAID1: total=742.00GiB, used=347.72GiB System, RAID1: total=32.00MiB, used=160.00KiB Metadata, RAID1: total=4.00GiB, used=1.23GiB GlobalReserve, single: total=369.34MiB, used=0.00B No balance found on '/mnt/cache' btrfs scrub status: UUID: f2db4662-24b4-4b09-8f87-932fdcd369dc Scrub started: Thu Oct 29 22:39:46 2020 Status: aborted Duration: 0:19:05 Total to scrub: 697.90GiB Rate: 228.39MiB/s Error summary: csum=2 Corrected: 0 Uncorrectable: 2 Unverified: 0 Oct 30 00:04:12 UNRAID kernel: loop: Write error at byte offset 587075584, length 4096. Oct 30 00:04:12 UNRAID kernel: print_req_error: I/O error, dev loop2, sector 1146632 Oct 30 00:04:12 UNRAID kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0 Oct 30 00:04:12 UNRAID kernel: loop: Write error at byte offset 581763072, length 4096. Oct 30 00:04:12 UNRAID kernel: print_req_error: I/O error, dev loop2, sector 1136256 Oct 30 00:04:12 UNRAID kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0 Oct 30 00:04:12 UNRAID kernel: loop: Write error at byte offset 583282688, length 4096. Oct 30 00:04:12 UNRAID kernel: print_req_error: I/O error, dev loop2, sector 1139224 Oct 30 00:04:12 UNRAID kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 Oct 30 00:04:12 UNRAID kernel: loop: Write error at byte offset 587075584, length 4096. Oct 30 00:04:12 UNRAID kernel: print_req_error: I/O error, dev loop2, sector 1146632 Oct 30 00:04:12 UNRAID kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0 Oct 30 00:04:12 UNRAID kernel: loop: Write error at byte offset 581763072, length 4096. Oct 30 00:04:12 UNRAID kernel: print_req_error: I/O error, dev loop2, sector 1136256 Oct 30 00:04:12 UNRAID kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 Oct 30 00:04:12 UNRAID kernel: loop: Write error at byte offset 587075584, length 4096. Oct 30 00:04:12 UNRAID kernel: print_req_error: I/O error, dev loop2, sector 1146632 Oct 30 00:04:12 UNRAID kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0 Oct 30 00:04:12 UNRAID kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2267: errno=-5 IO failure (Error while writing out transaction) unraid-syslog-20201030-1551.zip