daan_SVK

Members
  • Posts

    141
  • Joined

  • Last visited

Everything posted by daan_SVK

  1. I appreciate the time you took to respond but it was indeed incomplete image on the USB stick. this was a new install in new environment so I was suspecting something wrong with the network, I never had issues with the USB creator before. I will stick with the manual USB creation, is like we used to do it in the 90's anyway.
  2. manual install indeed resolved. I wish there was a way to tell the USB creator failed to write the full image to the USB stick. All it says is "Writing done!" which really indicates it was done successfully.
  3. I used the USB creator from the website, multiple USB keys as well. Should I just extract it manually? Doesn't the creator check the USB integrity once it writes the image?
  4. tower-diagnostics-20190101-1426.zip here is the zip, thank for looking at it.
  5. hi there, I'm just looking for some ideas on how to troubleshoot this further. I am trying to test a new build, it's a Lenovo P620 workstation. I imaged a new USB key and booted it up, I get no webgui, connection refused. - I can ping the server by hostname and IP - router shows the server by hostname as a connected device - I get no local gui because the server has a P2000 and it needs a driver first - I can not ssh into the server as the root password hasn't been set in the webgui first - I tried different USB sticks and different USB ports, it's always the same the server runs headless, is there anything else I can try?
  6. I will replace the cable, it's just weird the server started having all these odd issues all of a sudden.
  7. Sure, please see attached tower-diagnostics-20230207-1932.zip
  8. so the disk rebuild failed with read errors again on the same drive so I replaced it and the replacement drive is rebuilding now, however I now see this in the log: Feb 7 17:51:28 Tower kernel: ata9.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Feb 7 17:51:28 Tower kernel: ata9.00: irq_stat 0x08000000, interface fatal error Feb 7 17:51:28 Tower kernel: ata9: SError: { UnrecovData 10B8B BadCRC } Feb 7 17:51:28 Tower kernel: ata9.00: failed command: READ DMA EXT Feb 7 17:51:28 Tower kernel: ata9.00: cmd 25/00:40:68:1e:da/00:05:4b:00:00/e0 tag 4 dma 688128 in Feb 7 17:51:28 Tower kernel: res 50/00:00:67:1e:da/00:00:4b:00:00/e0 Emask 0x10 (ATA bus error) Feb 7 17:51:28 Tower kernel: ata9.00: status: { DRDY } Feb 7 17:51:28 Tower kernel: ata9: hard resetting link Feb 7 17:51:28 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 7 17:51:28 Tower kernel: ata9.00: configured for UDMA/133 Feb 7 17:51:28 Tower kernel: ata9: EH complete and also my parity drive UDMA CRC error count just went from 0 to 1. I was originally thinking to replace the Sata cable to the disabled drive but now with the CRC error on the parity drive I'm wondering if I should just abandon the motherboard controller and move all the drives onto an LSI card.
  9. I replaced the RAM and disabled C-states, I can't believe they were enabled. The drive is rebuilding now onto itself, I will report back. thanks again.
  10. I did as you suggested, rebooted and the btrfs errors cleared. I was hoping that was the end of it but the server locked up with a Kernel Panic two days later. Rebooted with a successful parity check, the server ran OK for another two or three days. Last night it locked up again with Kernel Panic. After it was rebooted, a disk was disabled during the parity check which never happened before. I have a spare disk that I can replace the faulty one, if it is indeed faulty. However, I can not stop the array as the server is reporting that the parity check is running. It does not appear so as all the disks are spun down. Pressing the Cancel or Resume Parity check button does not re-enable the Stop array button so I'm not sure how to proceed. the latest diagnostics is below, what's my best course of action here? thanks in advance! tower-diagnostics-20230205-1050.zip
  11. sure, please see attached. the pool was rebalanced and scrubbed after the docker image was recreated. tower-diagnostics-20230127-1717.zip
  12. I deleted and re-created the docker container, reinstalled all my dockers, but immediately saw more btrfs errors in the log: Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088 Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088 Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088 Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088 ran scrub on the cache pool, no errors reported. Are the errors from within the docker container img? How do I resolve this for good? I'm rebalancing the pool now as I saw a thread where the full FS allocation caused the same error on cache.
  13. just to be clear, are you saying recreating the docker image is a better approach than reformatting the cache pool? I'd like to address the possible cause of the corrupted image as well.
  14. Thanks for the reply, Yes, I am aware, but this corruption has been on the cache pool for some time without a changed count. I will run another memtest but I still need to deal with the read only docker image.
  15. hi guys, I keep loosing the ability to write to my docker image file. After seeing the below in my logs, I'm suspecting the cache or the image file is corrupted and was hoping to get some guidance on what should I do next. Jan 25 17:29:04 Tower kernel: ---[ end trace 0000000000000000 ]--- Jan 25 17:29:04 Tower kernel: BTRFS: error (device loop2: state A) in __btrfs_free_extent:3079: errno=-2 No such entry Jan 25 17:29:04 Tower kernel: BTRFS info (device loop2: state EA): forced readonly Jan 25 17:29:04 Tower kernel: BTRFS: error (device loop2: state EA) in btrfs_run_delayed_refs:2157: errno=-2 No such entry Jan 25 17:29:14 Tower flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update Recreate the docker.img or nuke the whole cache pool and reformat it? thanks in advance for reading, tower-diagnostics-20230125-1719.zip
  16. Can I use this method to migrate from the deprecated Paperless docker as well?
  17. perfect, thank you both for the quick confirmation!
  18. hi there, this is an old thread, is the info still valid? I have a drive in my array that is still on REISERFS. Can I use the procedure described here to convert it to XSF? The drive will be empty at the time of conversion. I don't mind rebuilding the parity.
  19. the official Radarr forum has a few other things that can be attempted to repair corruption in existing database, I might try to do that before starting anew. The step by step guide is here: https://wiki.servarr.com/useful-tools#recovering-a-corrupt-db-ui
  20. I restored from backup and it seems OK. thanks once again. what's the plan going forward with the update?
  21. same for me, library empty, just the same error you're getting
  22. do I have to re-import the library after rolling back? My web interface came back up, but the library is empty now.
  23. same issue here. Also, what I noticed is that if I open the webterminal while this is happening, the terminal just keeps disconnecting/reconnecting after restarting the gui, the web terminal won't load any more with a 503 Bad gateway error from the browser while this is in the log: Oct 28 15:31:17 Tower nginx: 2021/10/28 15:31:17 [crit] 31423#31423: *20062687 connect() to unix:/var/run/ttyd.sock failed (2: No such file or directory) while connecting to upstream, client: 192.168.1.56, server: , request: "GET /webterminal/ HTTP/1.1", upstream: "http://unix:/var/run/ttyd.sock:/", host: "tower.local", referrer: "http://tower.local/Main"
  24. no, I'm not running any DB containers, my setup is fairly simple. yes, it stays UP for about 10 minutes, then it stops. I might try the Nvidia one as well just to try to isolate the issue.
  25. the Pihole Container has 27hrs uptime so the 10 minute shut down interval is not caused by the Pihole docker restarting. Host Access is enabled on the Docker configuration page, before I enabled it I couldn't get Prometheus to connect to it. When I click the link in Prometheus for Pihole Explorer, it goes to the metrics site with all the parameters as long as the PiHole explorer is running. Grafana still doesnt pull any data though. with my limited knowledge, I don't see anything obvious in the logs I'm afraid. The standard Grafana dashboard works OK.