BRTFS maybe HW Issue

Followers

August 5, 20223 yr

Can you give me some hep with my log. The temperature of my nvme is sometimes quite high, therefore I've to solve some environment problems in which my setup is placed.

I think there ist a HW problem. Can you give me some advise if this is really a hardware problem?

syslog.txt

jarvis-diagnostics-20220805-1323.zip

Quote

August 5, 20223 yr

Community Expert

nvme1 (cache2) failed:

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- available spare has fallen below threshold
- media has been placed in read only mode

Quote

August 7, 20223 yr

Author

How does this match to this view?

I was able to get my dockers on this ssds up and running again.

I think I've to change nvme1. Can I just unplug this after shutdown and cache is up an running just with 1 ssd - right?

My VM is not able to start "bad system config" any repair option are not working and chkdsk /f is failing because auf "device in read only". Can you give me some support how to proceed correctly to get everything up an running in the correct way / process?

Sorry - the first time I am hitting any issue with my unraid system. That means I am a noob in troubleshooting anything ^^

Quote

August 7, 20223 yr

Community Expert

1 hour ago, socke said:

How does this match to this view?

The GUI won't show device errors for pool members, see here for more info and better pool monitoring.

Quote

3 months later...

November 10, 20223 yr

Author

It seems both of the ssd have bad blocks or something. I got some 2 new SSDs. In the first step i changed nvme1n1 with a new one. I formatted the new one and added it to the cache pool

My thoughts: After some time the cache is in sync and I can remove nvme0n1 with a new one. But when I started the system all my dockers and stuff was gone. BRTFS was not up and running and told me to many device changes or something..

Then I plugged in the old one and started my containers and stuff - everything fine.

How can I check that both nvme ssd are in sync to change the last one? Am I missing any important steps?

Quote

November 10, 20223 yr

Community Expert

3 minutes ago, socke said:

I formatted the new one and added it to the cache pool

No need to format.

Would need the diags after the the replacement attempt to see what's going on.

Quote

November 14, 20223 yr

Author

I was not able to add the nee Disc to the cache pool without formatting it.

Added the diagnostics.

Mansoor

jarvis-diagnostics-20221114-2254.zip

Quote

November 15, 20223 yr

Community Expert

Cache1 is also failing, best to backup and re-format the pool with both new devices.

Quote

November 15, 20223 yr

Author

I know, that's the reason why I wanted to change both - but I am failing with replacing the first one. Just copying all data to my array? Are there any necessary important steps to consider..?

Quote

November 16, 20223 yr

Community Expert

Stop VM and docker services and copy everything you can to the array or elsewhere.

Quote

November 16, 20223 yr

Author

I'll do this.

Quote

10 months later...

September 26, 20232 yr

Author

Hi, I replaced my two ssd's one year ago with two brand new one's - not funny but a little expensive. And here I am again with same kind of errors:

Examples:

Sep 26 22:00:01 Jarvis kernel: BTRFS error (device nvme1n1p1): error writing primary super block to device 2
Sep 26 22:00:53 Jarvis kernel: btrfs_end_super_write: 5 callbacks suppressed
Sep 26 22:00:53 Jarvis kernel: BTRFS warning (device nvme1n1p1): lost page write due to IO error on /dev/nvme0n1p1 (-5)
Sep 26 22:00:53 Jarvis kernel: BTRFS error (device nvme1n1p1): error writing primary super block to device 2
Sep 26 22:00:55 Jarvis kernel: btrfs_dev_stat_print_on_error: 54 callbacks suppressed
Sep 26 22:00:55 Jarvis kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 79773674, rd 270614, flush 11103637, corrupt 0, gen 0

I attached the diagnostics. I only got aware of it through my high log load - there was no hint from the system itself. I am really wondering if this problem is really a hardware issue? Maybe there's a problem because I am writing a lot of time series data (smart home) to InfluxDB docker located in the cache?

BR and Thanks for your help.

Socke

jarvis-diagnostics-20230926-2152.zip

Quote

September 27, 20232 yr

Community Expert

One of the NVMe devices is dropping offline, this may help, also see here for better pool monitoring.

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

BRTFS maybe HW Issue

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)