Clayton Posted October 29, 2021 Share Posted October 29, 2021 hey, curious if i have a controller failure. i have had 4 disk failures this month. first two were disabled in the array and unmountable. i replaced the drives and rebuilt the array. when finished, the two disks were still un-mountable. i used the filesystem check and ended up deleting the master log file. i was back up and running. i still have the two original disks. fast forward a week or two and i lost two different disks. this time they were only disabled but mountable. i removed them from the array and added them back in after moving them to a different drive bay. now that is fixed after rebuild. today my log space keeps filling up. i see errors i think are related to my pair of cache drives. all the problematic disks and my cache drives use a expansion/raid card in jbod mode. i forget the name off the top of my head. would someone be able to give any incite if this is a card failure. its a 12gb card, 8 sata connections. if im having issues only with my SSD's at this point since my last reboot, those are the only disks on that card. the rest of my drives are on the sata headers on my system board. darktower-diagnostics-20211029-1307.zip Quote Link to comment
JorgeB Posted October 29, 2021 Share Posted October 29, 2021 Looks more like a board/PCIe problem: Oct 29 00:47:47 DarkTower kernel: pcieport 0000:00:02.2: AER: Uncorrected (Fatal) error received: 0000:00:02.2 Oct 29 00:47:47 DarkTower kernel: pcieport 0000:00:02.2: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID) Oct 29 00:47:47 DarkTower kernel: pcieport 0000:00:02.2: device [8086:2f06] error status/mask=00000020/00000000 Oct 29 00:47:47 DarkTower kernel: pcieport 0000:00:02.2: [ 5] SDES (First) Oct 29 00:47:47 DarkTower kernel: mpt3sas_cm0: PCI error: detected callback, state(2)!! Oct 29 00:47:48 DarkTower kernel: pcieport 0000:00:02.2: AER: Root Port link has been reset Oct 29 00:47:48 DarkTower kernel: mpt3sas_cm0: PCI error: resume callback!! Oct 29 00:47:48 DarkTower kernel: pcieport 0000:00:02.2: AER: device recovery successful Try using a different PCIe slot if available. Quote Link to comment
Clayton Posted October 30, 2021 Author Share Posted October 30, 2021 reseated the card last night, same slot. went through a mover cycle and so far no log entries. looking good so far but will not count my chickens until it goes a few days to a week. Kinda a random question, since i had so many drive failures which were unlikely drive failures but i replaced at least two of them. if i have a drive failure in the future that is on a 12GB drive, can it be replaced by a 10TB if the 12 is not used in full, under 10TB? Quote Link to comment
trurl Posted October 30, 2021 Share Posted October 30, 2021 22 minutes ago, Clayton said: if i have a drive failure in the future that is on a 12GB drive, can it be replaced by a 10TB if the 12 is not used in full, under 10TB? You can only rebuild to a disk of the same size or larger. The entire disk is rebuilt, regardless of contents. Parity doesn't know anything about filesystems, only bits. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.