socke Posted August 5, 2022 Share Posted August 5, 2022 Can you give me some hep with my log. The temperature of my nvme is sometimes quite high, therefore I've to solve some environment problems in which my setup is placed. I think there ist a HW problem. Can you give me some advise if this is really a hardware problem? syslog.txt jarvis-diagnostics-20220805-1323.zip Quote Link to comment
JorgeB Posted August 5, 2022 Share Posted August 5, 2022 nvme1 (cache2) failed: === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! - available spare has fallen below threshold - media has been placed in read only mode Quote Link to comment
socke Posted August 7, 2022 Author Share Posted August 7, 2022 How does this match to this view? I was able to get my dockers on this ssds up and running again. I think I've to change nvme1. Can I just unplug this after shutdown and cache is up an running just with 1 ssd - right? My VM is not able to start "bad system config" any repair option are not working and chkdsk /f is failing because auf "device in read only". Can you give me some support how to proceed correctly to get everything up an running in the correct way / process? Sorry - the first time I am hitting any issue with my unraid system. That means I am a noob in troubleshooting anything ^^ BR Quote Link to comment
JorgeB Posted August 7, 2022 Share Posted August 7, 2022 1 hour ago, socke said: How does this match to this view? The GUI won't show device errors for pool members, see here for more info and better pool monitoring. Quote Link to comment
socke Posted November 10, 2022 Author Share Posted November 10, 2022 It seems both of the ssd have bad blocks or something. I got some 2 new SSDs. In the first step i changed nvme1n1 with a new one. I formatted the new one and added it to the cache pool My thoughts: After some time the cache is in sync and I can remove nvme0n1 with a new one. But when I started the system all my dockers and stuff was gone. BRTFS was not up and running and told me to many device changes or something.. Then I plugged in the old one and started my containers and stuff - everything fine. How can I check that both nvme ssd are in sync to change the last one? Am I missing any important steps? BR Quote Link to comment
JorgeB Posted November 10, 2022 Share Posted November 10, 2022 3 minutes ago, socke said: I formatted the new one and added it to the cache pool No need to format. Would need the diags after the the replacement attempt to see what's going on. Quote Link to comment
socke Posted November 14, 2022 Author Share Posted November 14, 2022 I was not able to add the nee Disc to the cache pool without formatting it. Added the diagnostics. BR Mansoor jarvis-diagnostics-20221114-2254.zip Quote Link to comment
JorgeB Posted November 15, 2022 Share Posted November 15, 2022 Cache1 is also failing, best to backup and re-format the pool with both new devices. Quote Link to comment
socke Posted November 15, 2022 Author Share Posted November 15, 2022 I know, that's the reason why I wanted to change both - but I am failing with replacing the first one. Just copying all data to my array? Are there any necessary important steps to consider..? Quote Link to comment
JorgeB Posted November 16, 2022 Share Posted November 16, 2022 Stop VM and docker services and copy everything you can to the array or elsewhere. Quote Link to comment
socke Posted November 16, 2022 Author Share Posted November 16, 2022 I'll do this. Quote Link to comment
socke Posted September 26, 2023 Author Share Posted September 26, 2023 Hi, I replaced my two ssd's one year ago with two brand new one's - not funny but a little expensive. And here I am again with same kind of errors: Examples: Sep 26 22:00:01 Jarvis kernel: BTRFS error (device nvme1n1p1): error writing primary super block to device 2 Sep 26 22:00:53 Jarvis kernel: btrfs_end_super_write: 5 callbacks suppressed Sep 26 22:00:53 Jarvis kernel: BTRFS warning (device nvme1n1p1): lost page write due to IO error on /dev/nvme0n1p1 (-5) Sep 26 22:00:53 Jarvis kernel: BTRFS error (device nvme1n1p1): error writing primary super block to device 2 Sep 26 22:00:55 Jarvis kernel: btrfs_dev_stat_print_on_error: 54 callbacks suppressed Sep 26 22:00:55 Jarvis kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 79773674, rd 270614, flush 11103637, corrupt 0, gen 0 I attached the diagnostics. I only got aware of it through my high log load - there was no hint from the system itself. I am really wondering if this problem is really a hardware issue? Maybe there's a problem because I am writing a lot of time series data (smart home) to InfluxDB docker located in the cache? BR and Thanks for your help. Socke jarvis-diagnostics-20230926-2152.zip Quote Link to comment
JorgeB Posted September 27, 2023 Share Posted September 27, 2023 One of the NVMe devices is dropping offline, this may help, also see here for better pool monitoring. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.