Constant errors on logs after nvme upgrade


Moises

Recommended Posts

Quote

Sep 9 18:43:28 Tower kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:02:00.0
Sep 9 18:43:28 Tower kernel: nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 9 18:43:28 Tower kernel: nvme 0000:02:00.0: device [15b7:5006] error status/mask=00000001/0000e000

So after upgrading from an old generic ssd to a wd black sn750, I am seeing this message constantly on my log, nothing seems to be wrong with the server but makes any kind of debugging impossible since it happens every 5-10 seconds. Why is this happening and how do I fix it?

Link to comment
5 minutes ago, Moises said:

So after upgrading from an old generic ssd to a wd black sn750, I am seeing this message constantly on my log, nothing seems to be wrong with the server but makes any kind of debugging impossible since it happens every 5-10 seconds. Why is this happening and how do I fix it?

Have a look at this post from Jorge.

 

https://lime-technology.com/forums/topic/72837-error-log-filled-to-100-in-1day-47min-with-this/?do=findComment&comment=669775

Link to comment

Sadly this did not fix it, I don't have another m.2 slot on my motherboard so can't move it around and adding 

pci=nommconfto did not do anything

 

Edit: not sure how much this matters but I am booting in legacy mode on my motherboard, UEFI boot makes unraid not able to see my nvme for some reason

Edited by Moises
Link to comment
  • 2 weeks later...

Did you manage to resolve this?

 

I'm getting the same thing for my NVME drive as well and it fills the log file in a matter of hours.

 

 kernel: pcieport 0000:00:01.1: AER: Corrected error received: 0000:02:00.0
 kernel: nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
 kernel: nvme 0000:02:00.0:   device [144d:a804] error status/mask=00000001/00006000
 kernel: nvme 0000:02:00.0:    [ 0] RxErr  

Link to comment

I've managed to find a solution by using a different command in the syslinux configuration.

 

append initrd=/bzroot pci=noaer

 

Apparently pci=noaer disables Advanced Error Reporting. Unfortunately its a bit like pulling the bulb behind your check engine light on your car but not sure how else to disable these warnings. There are some other options as well which i might try and see if they also work.

 

pci=noaer

or

pci=nomsi

or

pci=nommcon

 

Found this info here: 

 

 

 

See more details here: 

 

 

And Here: https://askubuntu.com/questions/1104219/what-does-pci-noaer-or-pci-nomsi-mean

 

 

 

Link to comment
  • 4 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.