BRTFS maybe HW Issue

socke · August 5, 2022

Can you give me some hep with my log. The temperature of my nvme is sometimes quite high, therefore I've to solve some environment problems in which my setup is placed.

I think there ist a HW problem. Can you give me some advise if this is really a hardware problem?

syslog.txt

jarvis-diagnostics-20220805-1323.zip

JorgeB · August 5, 2022

nvme1 (cache2) failed:

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- available spare has fallen below threshold
- media has been placed in read only mode

socke · August 7, 2022

How does this match to this view?

I was able to get my dockers on this ssds up and running again.

I think I've to change nvme1. Can I just unplug this after shutdown and cache is up an running just with 1 ssd - right?

My VM is not able to start "bad system config" any repair option are not working and chkdsk /f is failing because auf "device in read only". Can you give me some support how to proceed correctly to get everything up an running in the correct way / process?

Sorry - the first time I am hitting any issue with my unraid system. That means I am a noob in troubleshooting anything ^^

BR

JorgeB · August 7, 2022

1 hour ago, socke said:

How does this match to this view?

The GUI won't show device errors for pool members, see here for more info and better pool monitoring.

socke · November 10, 2022

It seems both of the ssd have bad blocks or something. I got some 2 new SSDs. In the first step i changed nvme1n1 with a new one. I formatted the new one and added it to the cache pool

My thoughts: After some time the cache is in sync and I can remove nvme0n1 with a new one. But when I started the system all my dockers and stuff was gone. BRTFS was not up and running and told me to many device changes or something..

Then I plugged in the old one and started my containers and stuff - everything fine.

How can I check that both nvme ssd are in sync to change the last one? Am I missing any important steps?

BR

JorgeB · November 10, 2022

3 minutes ago, socke said:

I formatted the new one and added it to the cache pool

No need to format.

Would need the diags after the the replacement attempt to see what's going on.

socke · November 14, 2022

I was not able to add the nee Disc to the cache pool without formatting it.

Added the diagnostics.

BR

Mansoor

jarvis-diagnostics-20221114-2254.zip

JorgeB · November 15, 2022

Cache1 is also failing, best to backup and re-format the pool with both new devices.

socke · November 15, 2022

I know, that's the reason why I wanted to change both - but I am failing with replacing the first one. Just copying all data to my array? Are there any necessary important steps to consider..?

JorgeB · November 16, 2022

Stop VM and docker services and copy everything you can to the array or elsewhere.

socke · November 16, 2022

I'll do this.

socke · September 26, 2023

Hi, I replaced my two ssd's one year ago with two brand new one's - not funny but a little expensive. And here I am again with same kind of errors:

Examples:

Sep 26 22:00:01 Jarvis kernel: BTRFS error (device nvme1n1p1): error writing primary super block to device 2
Sep 26 22:00:53 Jarvis kernel: btrfs_end_super_write: 5 callbacks suppressed
Sep 26 22:00:53 Jarvis kernel: BTRFS warning (device nvme1n1p1): lost page write due to IO error on /dev/nvme0n1p1 (-5)
Sep 26 22:00:53 Jarvis kernel: BTRFS error (device nvme1n1p1): error writing primary super block to device 2
Sep 26 22:00:55 Jarvis kernel: btrfs_dev_stat_print_on_error: 54 callbacks suppressed
Sep 26 22:00:55 Jarvis kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 79773674, rd 270614, flush 11103637, corrupt 0, gen 0

I attached the diagnostics. I only got aware of it through my high log load - there was no hint from the system itself. I am really wondering if this problem is really a hardware issue? Maybe there's a problem because I am writing a lot of time series data (smart home) to InfluxDB docker located in the cache?

BR and Thanks for your help.

Socke

jarvis-diagnostics-20230926-2152.zip

JorgeB · September 27, 2023

One of the NVMe devices is dropping offline, this may help, also see here for better pool monitoring.

BRTFS maybe HW Issue

Recommended Posts

socke

Link to comment

JorgeB

Link to comment

socke

Link to comment

JorgeB

Link to comment

socke

Link to comment

JorgeB

Link to comment

socke

Link to comment

JorgeB

Link to comment

socke

Link to comment

JorgeB

Link to comment

socke

Link to comment

socke

Link to comment

JorgeB

Link to comment

Join the conversation