June 4, 20206 yr Every time I run a parity check, it looks like every single block fails. Corrected block count is huge. Nothing is catching my attention in the logs. I replaced the parity drive, did a parity check, then ran a second parity check which showed the same problem. smartctl -a looks clean. Everything is functioning but obviously concerned about a drive failure. All drives are btrfs. Could a file system problem on one of the drives cause this problem? Attached are the diags. tower-diagnostics-20200604-1145.zip
June 4, 20206 yr Community Expert Please start another correcting check without rebooting and post new diags after it runs for a couple of minutes.
June 4, 20206 yr Author I did it twice just to make sure. No reboot. Correction always starts at 0. Looks like it is not caused by a reboot. tower-diagnostics-20200604-1225.zip tower-diagnostics-20200604-1227.zip
June 4, 20206 yr Community Expert That's very strange, try running memtest, though I would suspect if the RAM was so bad every single sector was incorrect you'd see other issues.
June 4, 20206 yr Author Will do. Strange as this machine has been running unraid for many years. I will post the results.
June 6, 20206 yr Author I replaced the RAM just to eliminate that problem and ran memtest for 8+ hours with no errors. I also rebuilt the usb boot drive (keeping /config) to eliminate that. Same error, except it does not appear to be every block. There is a large group at the beginning that seems OK. New logs attached. Ran parity for a couple of minutes, stopped, then ran it again. Same blocks having problem. Attached new diags. I am not going to swear to it but it seems like this problem started when I went to 6.8.2 or 6.8.3. tower-diagnostics-20200606-1333.zip Edited June 6, 20206 yr by tasmaniac
June 6, 20206 yr Pls check BIOS, does controller set to IDE mode instead AHCI which parity disk attach. [1:0:1:0] disk ATA ST4000VN008-2DR1 SC60 /dev/sdc /dev/sg2 state=running queue_depth=1 scsi_level=6 type=0 device_blocked=0 timeout=30 dir: /sys/bus/scsi/devices/1:0:1:0 [/sys/devices/pci0000:00/0000:00:14.1/ata1/host1/target1:0:1/1:0:1:0] 00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 IDE Controller [1002:439c] (rev 40) Subsystem: Gigabyte Technology Co., Ltd SB7x0/SB8x0/SB9x0 IDE Controller [1458:5002] Kernel driver in use: pata_atiixp Kernel modules: pata_atiixp Jun 6 13:26:14 Tower kernel: ata1.01: ATA-10: ST4000VN008-2DR166, ZM4166N6, SC60, max UDMA/133 Jun 6 13:26:14 Tower kernel: ata1.01: 7814037168 sectors, multi 16: LBA48 NCQ (depth 0/32) Jun 6 13:26:14 Tower kernel: ata1.01: limited to UDMA/33 due to 40-wire cable Edited June 6, 20206 yr by Benson
June 6, 20206 yr Author Nice catch. Yes, bios was set to ide for a couple of sata ports. Now all AHCI. Ran parity for 10+GB, restarted parity, and the first 10GB came up clean. Doing a full parity now. Thanks for the help. I will reply one last time once complete parity check done.
June 7, 20206 yr Just a thought... If the SATA settings in BIOS were originally set to AHCI and reverted without user intervention, then it's just possible that the battery that keeps the BIOS settings CMOS RAM alive is in need of replacement. It's easy to forget, since they often last for very many years, but I have had a couple die on me in the past.
June 7, 20206 yr Community Expert 13 hours ago, Benson said: controller set to IDE mode instead AHCI which parity disk attach. Good catch! I've seen weird issues before with those controllers AMD set to IDE before, like not being able to format or even assign a device, but first time I see this!
Archived
This topic is now archived and is closed to further replies.