Jump to content

Tydell

Members
  • Posts

    32
  • Joined

  • Last visited

Posts posted by Tydell

  1. @cambriancatalystDid you ever solve this?

     

    I think I solved mine. The syslog errors have gone away and I think bluetooth remains functional in my home assistant instance.

     

    Looks like the syslog flooding was a bug in bluez that they remedied with 5.64.  The repos ich777 uses for un-get don't yet have anything above 5.63, but I found a slackware repository with 5.64 (there are higher versions than that, but didn't want to push my luck) and added it to the sources.list file for un-get. I then ran 'un-get update' then 'un-get upgrade' to upgrade bluez from 5.63 to 5.64 and restarted bluez. 

     

    In detail:

    Add this line to Flash/config/plugins/un-get/sources.list: https://slackware.uk/slackware/slackware64-15.0/patches/ patches

    Terminal into Unraid & Run: 'un-get update' then 'un-get upgrade'

    Force bluez/bluetoothd to stop: sudo pkill bluetoothd

    Restart bluez: /etc/rc.d/rc.bluetooth start

  2. For what it's worth, I'm running into this as well.

     

    I installed bluez via un-get to expose bluetooth to Home Assistant, then started bluetooth via "/etc/rc.d/rc.bluetooth start" and now my syslog is being flooded with roughly 10 entries every second  with this:

     

    Quote

    Nov  1 08:28:04 Unraid bluetoothd[21613]: src/adv_monitor.c:btd_adv_monitor_offload_supported() Manager is NULL, get offload support failed
    Nov  1 08:28:04 Unraid bluetoothd[21613]: src/adv_monitor.c:btd_adv_monitor_offload_supported() Manager is NULL, get offload support failed
    Nov  1 08:28:04 Unraid bluetoothd[21613]: src/adv_monitor.c:btd_adv_monitor_offload_supported() Manager is NULL, get offload support failed
    Nov  1 08:28:04 Unraid bluetoothd[21613]: src/adv_monitor.c:btd_adv_monitor_offload_supported() Manager is NULL, get offload support failed
    Nov  1 08:28:04 Unraid bluetoothd[21613]: src/adv_monitor.c:btd_adv_monitor_offload_supported() Manager is NULL, get offload support failed
    Nov  1 08:28:04 Unraid bluetoothd[21613]: src/adv_monitor.c:btd_adv_monitor_offload_supported() Manager is NULL, get offload support failed
    Nov  1 08:28:04 Unraid bluetoothd[21613]: src/adv_monitor.c:btd_adv_monitor_offload_supported() Manager is NULL, get offload support failed
    Nov  1 08:28:04 Unraid bluetoothd[21613]: src/adv_monitor.c:btd_adv_monitor_offload_supported() Manager is NULL, get offload support failed
    Nov  1 08:28:04 Unraid bluetoothd[21613]: src/adv_monitor.c:btd_adv_monitor_offload_supported() Manager is NULL, get offload support failed
    Nov  1 08:28:04 Unraid bluetoothd[21613]: src/adv_monitor.c:btd_adv_monitor_offload_supported() Manager is NULL, get offload support failed

     

  3. I'm having an unfortunate issue as of late where my unraid server will completely lock up.  It drops off the network and is entirely unreachable.  A connected keyboard, monitor, and mouse are also unresponsive when this happens as well, so there's no local console either.  The only way for me to recover is to force a reboot.  An investigation of my logs hasn't yielded much.  I've set my logs to copy to my flash drive to get historical information and there aren't any events that I can discern around the time of the freeze.  The logs just kick back in at startup.  This server's been pretty solid the last couple years - not sure what's going on!  Any help shedding light on the potential cause would be hugely appreciated!

    diagnostics-20230302-2348.zip

  4. 1 hour ago, dopeytree said:

    So after you delete you SSl certs. 

    Unraid seems to turn SSL off. 

    So you need to head back to 'Management Access' under settings. 

    Then turn SSL back on. I use 'YES' rather than 'STRICT' to avoid this issue happening again.

    Then down a bit more click 'provision' and it will create the new SSL Certs.

    I don't know about anyone else, but after deleting my certs yesterday, turning SSL back on, and re-provisioning certs, I still can only get in via http, not https.

  5. Same thing happened to me today. Ultimately I got back in by SSHing in and deleting the certs on the flash drive (ssh into server, navigate to /boot/config/ssl and rename/delete the certs folder), then reboot (sbin/reboot). After doing that, I was able to get to the http ui.  Upon redoing my certs (settings/management access) and setting USE SSL/TLS to yes, I'm still unable to use ssl to get to my server. It redirects to port 80 and uses a self-signed cert, not the letsencrypt cert. The WAN Port check also fails even though I definitely have port 443 forwarded to my server. Version 6.10.2 if it matters.

    Also, I didn't proactively install/update anything recently, though an autoupdate of something certainly could have happened.  I've also not made any changes to my network setup either.  To my knowledge, I didn't make any changes to my server or network leading up to this.

    FWIW I have NerdPack installed. Don't remember why I have it tbh.

  6. Same thing happened to me today. Ultimately I got back in by SSHing in and deleting the certs on the flash drive (ssh into server, navigate to /boot/config/ssl and rename/delete the certs folder), then reboot (sbin/reboot). After doing that, I was able to get to the http ui.  Upon redoing my certs (settings/management access) and setting USE SSL/TLS to yes, I'm still unable to use ssl to get to my server. It redirects to port 80 and uses a self-signed cert, not the letsencrypt cert. The WAN Port check also fails even though I definitely have port 443 forwarded to my server. Version 6.10.2 if it matters.

    Also, I didn't proactively install/update anything recently, though an autoupdate of something certainly could have happened.  I've also not made any changes to my network setup either.  To my knowledge, I didn't make any changes to my server or network leading up to this.

    FWIW I have NerdPack installed. Don't remember why I have it tbh.

     

  7. Not sure if related or not, but today I suddenly stopped being able to get to the webgui as well.  Shares and apps all worked still without issue. Navigating to http://myserverip resulted in an nginx error.  https was giving me a dns error as it was trying to go to the custom myservers url with my hash in it.  Ultimately I got back in by SSHing in and deleting the certs on the flash drive (ssh into server, navigate to /boot/config/ssl and rename/delete the certs folder), then reboot (sbin/reboot). After doing that, I was able to get to the http ui.  Upon redoing my certs (settings/management access) and setting USE SSL/TLS to yes, I'm still unable to use ssl to get to my server. It redirects to port 80 and uses a self-signed cert, not the letsencrypt cert. The WAN Port check also fails even though I definitely have port 443 forwarded to my server. Version 6.10.2 if it matters.

    EDIT: Also, I didn't proactively install/update anything recently, though an autoupdate of something certainly could have happened.  I've also not made any changes to my network setup either.  To my knowledge, I didn't make any changes to my server or network leading up to this.

    EDIT2: FWIW I have NerdPack installed. Don't remember why I have it tbh.

  8. Just wanted to revisit this and report that I did disable spindown for a few days and after doing so, there were no more errors.  I conducted further testing by re-enabling spindown, letting the disk spin down, then initiating a transfer to the array and since this disk was nearly empty, it transferred to this drive.  It immediately threw all sorts of errors.  I then disabled spindown on the disk again and since then I've had no issues.  For that reason, I'm going to conclude that this disk just doesn't play nicely with spinning down.  Thanks @Vr2Io!

  9. I've recently made some changes to my unraid server.  I used to have an HBA with a 16 port SAS expander, but then also used onboard SATA ports to fill out my 24 drive bays.  I recently purchased and installed a 24-port SAS expander to eliminate my need to use onboard SATA ports as they seemed to be flaky/inconsistent.  I also installed a new 8TB drive.  Unraid is reporting errors on that drive now and i'm also receiving parity errors on that drive as well.  I've checked all the physical connections and have reseated the sata cable on the backplane.   Is there anything else I might check?  Here are what the recent errors look like:

    Quote

    Feb 4 01:00:21 TyRaid kernel: blk_update_request: I/O error, dev sdaa, sector 2937188552 op 0x0:(READ) flags 0x0 phys_seg 79 prio class 0
    Feb 4 01:00:21 TyRaid kernel: sd 8:0:25:0: [sdaa] tag#1522 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 cmd_age=0s
    Feb 4 01:00:21 TyRaid kernel: sd 8:0:25:0: [sdaa] tag#1522 CDB: opcode=0x88 88 00 00 00 00 00 57 54 3c e0 00 00 00 38 00 00
    Feb 4 01:00:21 TyRaid kernel: blk_update_request: I/O error, dev sdaa, sector 1465138400 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
    Feb 4 01:00:21 TyRaid kernel: sd 8:0:25:0: [sdaa] tag#1521 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 cmd_age=14s
    Feb 4 01:00:21 TyRaid kernel: sd 8:0:25:0: [sdaa] tag#1521 CDB: opcode=0x88 88 00 00 00 00 00 57 54 29 40 00 00 00 10 00 00
    Feb 4 01:00:21 TyRaid kernel: blk_update_request: I/O error, dev sdaa, sector 1465133376 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
    Feb 4 01:00:21 TyRaid kernel: sd 8:0:25:0: [sdaa] tag#1524 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 cmd_age=0s
    Feb 4 01:00:21 TyRaid kernel: sd 8:0:25:0: [sdaa] tag#1524 CDB: opcode=0x88 88 00 00 00 00 00 ae a8 52 70 00 00 00 10 00 00
    Feb 4 01:00:21 TyRaid kernel: blk_update_request: I/O error, dev sdaa, sector 2930266736 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
    Feb 4 01:33:15 TyRaid emhttpd: spinning down /dev/sdaa
    Feb 4 01:58:42 TyRaid kernel: sd 8:0:25:0: [sdaa] tag#1525 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
    Feb 4 01:58:46 TyRaid kernel: sd 8:0:25:0: [sdaa] tag#1504 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 cmd_age=0s
    Feb 4 01:58:46 TyRaid kernel: sd 8:0:25:0: [sdaa] tag#1504 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00
    Feb 4 01:58:46 TyRaid emhttpd: read SMART /dev/sdaa
    Feb 4 01:58:46 TyRaid kernel: sd 8:0:25:0: [sdaa] tag#1518 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 cmd_age=19s
    Feb 4 01:58:46 TyRaid kernel: sd 8:0:25:0: [sdaa] tag#1518 CDB: opcode=0x88 88 00 00 00 00 00 00 7b 83 d0 00 00 00 08 00 00
    Feb 4 01:58:46 TyRaid kernel: blk_update_request: I/O error, dev sdaa, sector 8094672 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    Feb 4 02:35:48 TyRaid emhttpd: spinning down /dev/sdaa
    Feb 4 14:02:23 TyRaid emhttpd: read SMART /dev/sdaa

     

    Any ideas as to what I might check next?  I super appreciate any help!

    tyraid-diagnostics-20220204-1359.zip

  10. The whole system has one nvidia gpu that's passed through to a vm and then the igpu that isn't passed through, per se, but is set up to be used in the plex docker container.

     

    The only thing I wish I had now was an ipmi port.  That would make this whole build just about perfect.  A couple other notes: some of the PCIE slots on this motherboard share lanes with the sata controller, so if you use some of the PCIE ports, it'll disable some of your onboard sata.  Of note though, I used a SAS expander in my last PCIE slot that only pulls power from the PCIE slot - it doesn't use any actual data.  That particular port did not disable the corresponding sata ports.  If I think of any other gotchas, i'll post them here.

    • Thanks 1
  11. Yep, I was able to use the igpu to transcode 4k content - it does the job just fine!

     

    I'm also currently passing a GPU I had laying around through to a VM for ethereum mining and it works great as well.  I used the ACS Override setting to break up my IOMMU groups so all the devices have their own IOMMU group.  I believe I had to do that in order to get the GPU to pass through to the VM.

     

    Overall, very happy with the build.

    • Thanks 1
  12. 11 minutes ago, JorgeB said:

    You need to specify the partition at the end:

     

    
    xfs_repair -v /dev/sdc1

    Oops, thanks for that! 

     

    12 minutes ago, JorgeB said:

    to change the UUID you can use:

    
    xfs_admin -U generate /dev/sdc1

    That worked!  Still doesn't mount though

    Jul 5 08:59:29 Unraid kernel: XFS (sdc1): Corruption warning: Metadata has LSN (1:22491) ahead of current LSN (1:2). Please unmount and run xfs_repair (>= v4.3) to resolve.
    Jul 5 08:59:29 Unraid kernel: XFS (sdc1): log mount/recovery failed: error -22
    Jul 5 08:59:29 Unraid kernel: XFS (sdc1): log mount failed
    Jul 5 08:59:29 Unraid unassigned.devices: Mount of '/dev/sdc1' failed: 'mount: /mnt/disks/Hitachi_HDS723030ALA640_MK0311YHG033ZA: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error. '

    Jul 5 08:59:29 Unraid kernel: XFS (sdc1): Corruption warning: Metadata has LSN (1:22491) ahead of current LSN (1:2). Please unmount and run xfs_repair (>= v4.3) to resolve.
    Jul 5 08:59:29 Unraid kernel: XFS (sdc1): log mount/recovery failed: error -22
    Jul 5 08:59:29 Unraid kernel: XFS (sdc1): log mount failed
    Jul 5 08:59:29 Unraid unassigned.devices: Mount of '/dev/sdc1' failed: 'mount: /mnt/disks/Hitachi_HDS723030ALA640_MK0311YHG033ZA: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error. '

     

     

    13 minutes ago, JorgeB said:
    30 minutes ago, Tydell said:

    ata4: SError: { UnrecovData 10B8B BadCRC }

    These errors are usually a bad SATA cable, please post a SMART report for that disk.

    Attached.  Lots of UDMA_CRC_Error_Count.

    Hitachi_HDS723030ALA640_MK0311YHG033ZA-20210705-0900.txt

  13. 21 minutes ago, trurl said:

    Do you have any evidence that the drive itself had failed? More likely is that you disturbed its connections when replacing the other disk.

     

    Do you still have that original disk? Can you mount it with Unassigned Devices?

    I might have evidence now - It won't mount in unassigned devices - says it has a dup UUID.  I haven't done what it suggests yet (running xfs_repair with the -L flag) I running xfs_repair -nv on it is not going well.:

    xfs_repair -nv /dev/sdc
    Phase 1 - find and verify superblock...
    bad primary superblock - bad magic number !!!
    
    attempting to find secondary superblock...
    .found candidate secondary superblock...
    unable to verify superblock, continuing...
    .found candidate secondary superblock...
    unable to verify superblock, continuing...
    .........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

     

    The system log file also threw some errors during the check:

    Jul 5 08:30:37 Unraid kernel: ata4.00: exception Emask 0x10 SAct 0x180000 SErr 0x280100 action 0x6 frozen
    Jul 5 08:30:37 Unraid kernel: ata4.00: irq_stat 0x08000000, interface fatal error
    Jul 5 08:30:37 Unraid kernel: ata4: SError: { UnrecovData 10B8B BadCRC }
    Jul 5 08:30:37 Unraid kernel: ata4.00: failed command: READ FPDMA QUEUED
    Jul 5 08:30:37 Unraid kernel: ata4.00: cmd 60/3f:98:00:30:03/05:00:00:00:00/40 tag 19 ncq dma 687616 in
    Jul 5 08:30:37 Unraid kernel: res 50/00:c1:3f:35:03/00:02:00:00:00/40 Emask 0x10 (ATA bus error)
    Jul 5 08:30:37 Unraid kernel: ata4.00: status: { DRDY }
    Jul 5 08:30:37 Unraid kernel: ata4.00: failed command: READ FPDMA QUEUED
    Jul 5 08:30:37 Unraid kernel: ata4.00: cmd 60/c1:a0:3f:35:03/02:00:00:00:00/40 tag 20 ncq dma 360960 in
    Jul 5 08:30:37 Unraid kernel: res 50/00:c1:3f:35:03/00:02:00:00:00/40 Emask 0x10 (ATA bus error)
    Jul 5 08:30:37 Unraid kernel: ata4.00: status: { DRDY }
    Jul 5 08:30:37 Unraid kernel: ata4: hard resetting link
    Jul 5 08:30:37 Unraid kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    Jul 5 08:30:37 Unraid kernel: ata4.00: configured for UDMA/133
    Jul 5 08:30:37 Unraid kernel: ata4: EH complete
    Jul 5 08:30:39 Unraid kernel: ata4.00: exception Emask 0x10 SAct 0x80800000 SErr 0x280100 action 0x6 frozen
    Jul 5 08:30:39 Unraid kernel: ata4.00: irq_stat 0x08000000, interface fatal error
    Jul 5 08:30:39 Unraid kernel: ata4: SError: { UnrecovData 10B8B BadCRC }
    Jul 5 08:30:39 Unraid kernel: ata4.00: failed command: READ FPDMA QUEUED
    Jul 5 08:30:39 Unraid kernel: ata4.00: cmd 60/3f:b8:00:00:0e/05:00:00:00:00/40 tag 23 ncq dma 687616 in
    Jul 5 08:30:39 Unraid kernel: res 50/00:c1:3f:05:0e/00:02:00:00:00/40 Emask 0x10 (ATA bus error)
    Jul 5 08:30:39 Unraid kernel: ata4.00: status: { DRDY }
    Jul 5 08:30:39 Unraid kernel: ata4.00: failed command: READ FPDMA QUEUED
    Jul 5 08:30:39 Unraid kernel: ata4.00: cmd 60/c1:f8:3f:05:0e/02:00:00:00:00/40 tag 31 ncq dma 360960 in
    Jul 5 08:30:39 Unraid kernel: res 50/00:c1:3f:05:0e/00:02:00:00:00/40 Emask 0x10 (ATA bus error)
    Jul 5 08:30:39 Unraid kernel: ata4.00: status: { DRDY }
    Jul 5 08:30:39 Unraid kernel: ata4: hard resetting link
    Jul 5 08:30:40 Unraid kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    Jul 5 08:30:40 Unraid kernel: ata4.00: configured for UDMA/133
    Jul 5 08:30:40 Unraid kernel: ata4: EH complete
    Jul 5 08:31:50 Unraid kernel: ata4.00: exception Emask 0x10 SAct 0xc00 SErr 0x280100 action 0x6 frozen
    Jul 5 08:31:50 Unraid kernel: ata4.00: irq_stat 0x08000000, interface fatal error
    Jul 5 08:31:50 Unraid kernel: ata4: SError: { UnrecovData 10B8B BadCRC }
    Jul 5 08:31:50 Unraid kernel: ata4.00: failed command: READ FPDMA QUEUED
    Jul 5 08:31:50 Unraid kernel: ata4.00: cmd 60/3f:50:00:58:4c/05:00:01:00:00/40 tag 10 ncq dma 687616 in
    Jul 5 08:31:50 Unraid kernel: res 50/00:c1:3f:5d:4c/00:02:01:00:00/40 Emask 0x10 (ATA bus error)
    Jul 5 08:31:50 Unraid kernel: ata4.00: status: { DRDY }
    Jul 5 08:31:50 Unraid kernel: ata4.00: failed command: READ FPDMA QUEUED
    Jul 5 08:31:50 Unraid kernel: ata4.00: cmd 60/c1:58:3f:5d:4c/02:00:01:00:00/40 tag 11 ncq dma 360960 in
    Jul 5 08:31:50 Unraid kernel: res 50/00:c1:3f:5d:4c/00:02:01:00:00/40 Emask 0x10 (ATA bus error)
    Jul 5 08:31:50 Unraid kernel: ata4.00: status: { DRDY }
    Jul 5 08:31:50 Unraid kernel: ata4: hard resetting link
    Jul 5 08:31:51 Unraid kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    Jul 5 08:31:51 Unraid kernel: ata4.00: configured for UDMA/133
    Jul 5 08:31:51 Unraid kernel: ata4: EH complete

     

    I think she's dead!

  14. Just now, trurl said:

    Do you have any evidence that the drive itself had failed? More likely is that you disturbed its connections when replacing the other disk.

     

    Do you still have that original disk? Can you mount it with Unassigned Devices?

     

    I have no evidence yet - I'll definitely be investigating.  I do still have the old drive, although it's unseated slightly from its hot swap bay. 

    It's weird - when the rebuild of that first disk was happening, the error count on that drive shot through the roof.  I noticed a bit later on that the drive was actually showing up in both the array AND unassigned devices at the same time. I didn't do anything at that time other than let the rebuild finish.  I then unseated it when I replaced it in the array.

     

    Also xfs_repair appars to have done the trick!  I'll be going through and seeing if I can find any corrupted files, but it had me run the repair twice and now the folder is back, so thank you for that!  Now to investigate the "failed" drive...

  15. So there are movies in the media share on a bunch of other disks that are completely fine, but if i navigate to \\unraid\media\movies, that folder in the share appears empty, but browsing the individual disks, the files are there.  the other folders under the media share appear to be unaffected - just the Movies folder appears affected.  I'll certainly look @ ddrescue though, much appreciated! 

  16. So yesterday, I was ready to swap in a spare drive to replace one that was starting to throw SMART errors.  I'd already moved all files off of it, so there was no real risk of data loss - or so I thought.  After I replaced it, during the 10 hour rebuild, another drive started throwing hundreds of thousands of errors during the rebuild. This drive contained a few dozen movies in my media share.  Not really knowing what else to do, I let the rebuild complete, which it did overnight.  I then replaced the newly failed drive as well and then let that rebuild complete.  Now, it would appear that my Movies folder under my media share has become corrupt somehow.  The files still exist spread throughout my drives - including the newly replaced drive, though I'd be shocked if there wasn't significant corruption in those files, which are quite easily replaceable.  My logs are now full of "Unmount and run xfs_repair" and "Metadata corruption detected at xfs_buf....." messages which eventually completely fill the log file until it stops filling up anymore.  Seems like at this point, I don't have many options other than to just let it sit there and do its thing.  Anything else I can do?  Potentially create a new share and use Krusader or Unbalance to move the missing files to the new share? I appreciate any insights!

    diagnostics-20210704-2126.zip

×
×
  • Create New...