Server randomly drops a drive


Jessie

Recommended Posts

It has done it since new. Now a couple of years old.

It can run perfectly for weeks and then crash.

 

Tonight it dropped the parity drive.  It is possible that any of the drives will fail.

Hardware = ASUSTeK COMPUTER INC. PRIME X370-PRO Version Rev X.0x motherboard

AMD Ryzen 7 2700 Eight-Core @ 3500 MHz

16 GiB DDR4 Multi-bit ECC (max. installable capacity 128 GiB)

(Ecc ram only recently installed. It behaved for a while then started failing again.

 

Currently running 6.9.2. Can't go higher yet because vm is running seabios. Seabios wont pass through beyond 6.9.2.

I have tried changing power supplies.

It currently runs a windows 10 vm workstation and assorted dockers for nextcloud and collabora.

 

Can anyone shed some light on the syslog file?

The crash happened on 18 dec 2023.

 

Thanks in advance

 

 

 

 

younghometower-syslog-20231218-0934.zip

Link to comment
Dec 18 20:22:52 YoungHomeTower kernel: ahci 0000:01:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0xd3219000 flags=0x0000]

 

Problem with the onboard SATA controllers, this was a rather common issue with Ryzen board and older kernels, don't really see it now, if you cannot upgrade Unraid recommend using an add-on controller.

Link to comment
3 hours ago, JorgeB said:
Dec 18 20:22:52 YoungHomeTower kernel: ahci 0000:01:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0xd3219000 flags=0x0000]

 

Problem with the onboard SATA controllers, this was a rather common issue with Ryzen board and older kernels, don't really see it now, if you cannot upgrade Unraid recommend using an add-on controller.

I am currently in the process of converting the vm to ovmf. If I get it to 6.12.x will that fix it?

 

Link to comment
15 hours ago, JorgeB said:

It should, this was a very common issue and I stopped seeing it since v6.11 IIRC

image.thumb.png.c98d856be215943b27a5a019e86a6b53.png

 

Well. I converted to VM to ovmf then upgraded to 6.12.6.

 

VM works. Not fully tested yet.

Another question.

The system is configured acs override.  When ACS is off, group 13  and 14 are combined.

I have passed group 13 through to the vm.  Is it possible group 14 might be causing issues?

 

Edited by Jessie
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.