PCIe Bus errors when adding new disks to my array


Recommended Posts

Hi,

Happy user of Unraid I decided to upgrade my system.

I added 4 disks to my server. I already had 17 disks which brought my array to 21 disks + 2 parity disks and 1 cache.

I precleared the 4 disks without any error.

Since I tried to add add them to the array, I am facing several problems.

First, I was unable to start the array, I got a message "Unountable: no file system".

I tried rebooting the server and start the array in maintenance mode. No success, I tried different things and eventually, managed to get the opportunity  start and format 3 of the 4 drives.

When format finished, my array wouldn't start, I got the following message "Array starting: starting file activity".

I also started a new preclear for the 4th disks which apparently did not went well the first time because the system proposed me to resume the preclear process for a disk that already should have been precleared.

I am now stuck with my array that does not start and a disk preclearing for good I hope.

 

Here are some errors I got in my logs:

 

Sep 15 16:02:22 Ket kernel: pcieport 0000:00:1d.3: AER: Corrected error received: 0000:00:1d.3
Sep 15 16:02:22 Ket kernel: pcieport 0000:00:1d.3: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 15 16:02:22 Ket kernel: pcieport 0000:00:1d.3:   device [8086:a333] error status/mask=00000001/00002000




Sep 15 15:52:22 Ket kernel: pcieport 0000:00:1d.3:    [ 0] RxErr                  (First)
Sep 15 15:53:26 Ket kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep 15 15:53:26 Ket kernel: sd 9:0:5:0: [sdg] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00
Sep 15 15:53:26 Ket kernel: sd 9:0:5:0: [sdg] tag#4 CDB: opcode=0x88 88 00 00 00 00 01 62 24 12 00 00 00 02 00 00 00
Sep 15 15:53:26 Ket kernel: print_req_error: I/O error, dev sdg, sector 5941498368
Sep 15 15:53:26 Ket kernel: sd 9:0:5:0: [sdg] Unaligned partial completion (resid=287736, sector_sz=512)
Sep 15 15:53:26 Ket kernel: sd 9:0:5:0: [sdg] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Sep 15 15:53:26 Ket kernel: sd 9:0:5:0: [sdg] tag#3 Sense Key : 0xb [current] 
Sep 15 15:53:26 Ket kernel: sd 9:0:5:0: [sdg] tag#3 ASC=0x47 ASCQ=0x3 
Sep 15 15:53:26 Ket kernel: sd 9:0:5:0: [sdg] tag#3 CDB: opcode=0x88 88 00 00 00 00 01 62 24 0e 00 00 00 04 00 00 00

 

My system:

ASRock Rack motherboard E3C246D4M-4L

intel i7 9700K

DDR4 64GB

LSI Megaraid 9305-24i

 

I attached to this message a diagnostic dump of my server.

 

Help would be greatly appreciated.

 

Thanks.

 

Unraid_noob

ket-diagnostics-20190915-1418.zip

Link to comment

You're having issues with the flash drive:

Sep 15 14:20:41 Ket kernel: sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Sep 15 14:20:41 Ket kernel: sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
Sep 15 14:20:41 Ket kernel: sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
Sep 15 14:20:41 Ket kernel: sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 03 08 00 00 f0 00
Sep 15 14:20:41 Ket kernel: print_req_error: I/O error, dev sda, sector 776

 

Disk sdg appears to be suffering from a connection problem, replace cables, as for the PCIe errors they are usually OK, but can flood the log, a bios update or using another PCIe slot can help, if not try this.

Link to comment

Dear Jorge,

Thank you for your reply. I managed to solve almost all the problems.

After some further investigation I managed to mount the 4 additional disks but only when Unraid is started in safe mode.

This could be linked to a incompatible plugin.

Is there a means to start the server normally (not in safe mode) without starting the plugins. In other words, is it possible to uninstall the plugins without having them first started. I have seen that there is a "plugin command" which enables you to uninstall it but I think the plugin needs to be started in order to uninstall it.

I would like selectively add the plugins in order to find which one is causing me troubles.

 

Thank you for your reply.

Link to comment

Hi Jorge,

 

I managed to solve all my problems, thank you for your help.

However I am still getting the PCIe errors that are filling up my log files.

I tried to move the offending PCIe card (LSI SAS card) with no success.

I also tried to add the "pci=nommconfto" but still no success. Is there anything else I could try to solve my problem or at least clear those annoying messages?

 

Thank you

Link to comment

Hi Jorge,

 

Thank you again for your time.

I have 2 last questions:

- When you say Bios update is it  for the PCIe card of for the MB

- I still have the opportunity to replace the SAS card with the same make and model (A standard replacement). Is there a chance my errors will disappear or is this linked to the card model itself?

Link to comment
29 minutes ago, Unraid_Noob said:

When you say Bios update is it  for the PCIe card of for the MB

You should update both if an update is available.

 

30 minutes ago, Unraid_Noob said:

I still have the opportunity to replace the SAS card with the same make and model (A standard replacement). Is there a chance my errors will disappear or is this linked to the card model itself?

Same model/firmware will much likely have the same issues.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.