[6.9.1-6.11.0] Possible hardware failure? - Samsung 980 Pro Cache disk

CryPt00n · October 21, 2022

Hello,

since i use the Samsung 980 Pro nvme in my system it sometime crashes with the following logs. The logs repeat indefinitely and the system is crashed with this. Only hard shutdown is possible. At the restart, sometimes the disk is not detected from the system, so i have to restart again.

Could this be a hardware issue? SMART dont gives me any information about this, no errors found with Scrub. Is there another way to test the drive?

Other Specs:

Intel® Core™ i9-10900

ASRock Z490 PG Velocita

32-GB RAM

GTX 1070

Dell H310

Attached SMART report

Scrub:

UUID:             c8c0b930-1ed7-47db-8bec-641f81b3a351
Scrub started:    Fri Oct 21 12:58:20 2022
Status:           finished
Duration:         0:02:51
Total to scrub:   242.47GiB
Rate:             1.42GiB/s
Error summary:    no errors found

Logs from crash:

Oct 21 11:27:52 server kernel: blk_print_req_error: 103 callbacks suppressed
Oct 21 11:27:52 server kernel: I/O error, dev loop2, sector 29120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 21 11:27:52 server kernel: I/O error, dev loop2, sector 29120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 21 11:27:52 server kernel: I/O error, dev loop2, sector 29120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 21 11:27:52 server kernel: I/O error, dev loop2, sector 29120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 21 11:27:52 server kernel: I/O error, dev loop2, sector 29120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 21 11:27:52 server kernel: I/O error, dev loop2, sector 29120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 21 11:27:52 server kernel: I/O error, dev loop2, sector 15745232 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 21 11:27:52 server kernel: I/O error, dev loop2, sector 29120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 21 11:27:52 server kernel: I/O error, dev loop2, sector 29120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 21 11:27:52 server kernel: I/O error, dev loop2, sector 29120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 21 11:27:56 server kernel: btrfs_dev_stat_print_on_error: 489 callbacks suppressed
Oct 21 11:27:56 server kernel: BTRFS error (device dm-2: state EA): bdev /dev/mapper/nvme0n1p1 errs: wr 177, rd 30929, flush 0, corrupt 0, gen 0
Oct 21 11:27:56 server kernel: BTRFS error (device dm-2: state EA): bdev /dev/mapper/nvme0n1p1 errs: wr 177, rd 30930, flush 0, corrupt 0, gen 0
Oct 21 11:27:56 server kernel: BTRFS error (device dm-2: state EA): bdev /dev/mapper/nvme0n1p1 errs: wr 177, rd 30931, flush 0, corrupt 0, gen 0
Oct 21 11:27:56 server kernel: BTRFS error (device dm-2: state EA): bdev /dev/mapper/nvme0n1p1 errs: wr 177, rd 30932, flush 0, corrupt 0, gen 0
Oct 21 11:27:56 server kernel: BTRFS error (device dm-2: state EA): bdev /dev/mapper/nvme0n1p1 errs: wr 177, rd 30933, flush 0, corrupt 0, gen 0
Oct 21 11:27:56 server kernel: BTRFS error (device dm-2: state EA): bdev /dev/mapper/nvme0n1p1 errs: wr 177, rd 30934, flush 0, corrupt 0, gen 0
Oct 21 11:27:56 server kernel: BTRFS error (device dm-2: state EA): bdev /dev/mapper/nvme0n1p1 errs: wr 177, rd 30935, flush 0, corrupt 0, gen 0
Oct 21 11:27:56 server kernel: BTRFS error (device dm-2: state EA): bdev /dev/mapper/nvme0n1p1 errs: wr 177, rd 30936, flush 0, corrupt 0, gen 0
Oct 21 11:27:56 server kernel: BTRFS error (device dm-2: state EA): bdev /dev/mapper/nvme0n1p1 errs: wr 177, rd 30937, flush 0, corrupt 0, gen 0
Oct 21 11:27:56 server kernel: BTRFS error (device dm-2: state EA): bdev /dev/mapper/nvme0n1p1 errs: wr 177, rd 30938, flush 0, corrupt 0, gen 0

Thanks in advance for your help

Samsung_SSD_980_PRO_1TB_S5G-20221021-1305.txt

Edited October 21, 2022 by CryPt00n
More informations added

JorgeB · October 21, 2022

Please post the complete diagnostics.

CryPt00n · October 21, 2022

3 minutes ago, JorgeB said:

Please post the complete diagnostics.

Wasn´t able to download diagnostics at the crash. System is already rebooted and running again.

server-diagnostics-20221021-1330.zip

Edited October 21, 2022 by CryPt00n

JorgeB · October 21, 2022

If it happens again grab at least the syslog, to see the beginning of the error.

CryPt00n · October 21, 2022

Please post the complete diagnostics.

Okay, so nothing we can do currently? Last crash is like 2 months ago, so it does not happen often but sometime it does. Today, nothing special was done, backups at 6 this morning also went trough without problems.

JorgeB · October 21, 2022

Based on the log snippet you've posted looks like the device dropped offline, but without the rest cannot say for sure, if it dropped the below can sometimes help.

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

Reboot and see if it makes a difference.

CryPt00n · October 21, 2022

Alright, I will test this and also enabled syslog server for the future. So the next one will be catched then.

Thank you

CryPt00n · November 6, 2022

Hi,

crashed again yesterday. Got the diagnostics.

The nvme power state fix is currently not applied.

server-diagnostics-20221105-1355.zip

JorgeB · November 6, 2022

On 10/21/2022 at 12:51 PM, CryPt00n said:

I will test this and also enabled syslog server for the future.

And where is this?

CryPt00n · November 7, 2022

Attached a part of the syslog, where the crash is logged

syslog.log

JorgeB · November 7, 2022

NVMe device dropped offline, you should add the line above.

Bcy · November 7, 2022

this is 980 pro fireware bugs. you need update 980 pro fireware.

this issue on win10 have same issue by some fireware.

good luck to try

CryPt00n · November 11, 2022

Alright, added the line.

980 pro firmware is up to date

image.png.24f206ba5dd879aae1659a3bed02ec1c.png

zipt · October 2

@CryPt00n, did you ever find a solution to this? Having the same issues with a 980 pro 1tb drive intermittently over the last year with no solution

CryPt00n · October 2

30 minutes ago, zipt said:

@CryPt00n, did you ever find a solution to this? Having the same issues with a 980 pro 1tb drive intermittently over the last year with no solution

Yes, by using this config

On 10/21/2022 at 1:47 PM, JorgeB said:
Based on the log snippet you've posted looks like the device dropped offline, but without the rest cannot say for sure, if it dropped the below can sometimes help.

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
e.g.:
append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
Reboot and see if it makes a difference.

[6.9.1-6.11.0] Possible hardware failure? - Samsung 980 Pro Cache disk

Recommended Posts

CryPt00n

Link to comment

JorgeB

Link to comment

CryPt00n

Link to comment

JorgeB

Link to comment

CryPt00n

Link to comment

JorgeB

Link to comment

CryPt00n

Link to comment

CryPt00n

Link to comment

JorgeB

Link to comment

CryPt00n

Link to comment

JorgeB

Link to comment

Bcy

Link to comment

CryPt00n

Link to comment

zipt

Link to comment

CryPt00n

Link to comment

Join the conversation