May 24, 20206 yr Hello all, I'm having regular file corruption/checksum errors on my cache drive SSD. The SSD is brand new and this is a fresh server build. The SSD is a WD 500GB Blue plugged in via SATA. I noticed it yesterday (the first time the cache drive was used). I woke up and files hadn't moved off the cache. I looked at the logs and noticed the files were corrupt. So, I shutdown the array and reformatted the cache (BTRFS again). I got some downloads started last night and woke up to the exact same thing this morning. Is it something that I'm doing or a setting that I have incorrect that's causing this? Or is my SSD possibly bad?
May 25, 20206 yr bad blocks seem high to me, unless I'm reading it wrong, for a drive thats been on less than 2 weeks. Hows the cable? new? I ask because cables are often overlooked and cause alot of issues if bad/damaged.
May 25, 20206 yr Community Expert Checksum errors mean data corruption, posting the diags might show some clues, also a good idea to run memtest.
May 25, 20206 yr Author Attached is the diagnostics from yesterday. blackbart-diagnostics-20200524-1107.zip I did start running a memtest yesterday (after dumping diagnostics). Oddly, I was having issues when I had all 4x sticks in at once (see photo). So I have been running 2x at a time. The first two passed 4/4 passes of memtest86. If the next two do as well, I may try something else like Karu or HCI. I may also try a single pass with one stick of RAM at a time, cycling through motherboard each slot. As for the SATA cables themselves, they are new and there are no crazy bends in them.
May 25, 20206 yr 32 minutes ago, subterminal said: Attached is the diagnostics from yesterday. blackbart-diagnostics-20200524-1107.zip 95.76 kB · 0 downloads I did start running a memtest yesterday (after dumping diagnostics). Oddly, I was having issues when I had all 4x sticks in at once (see photo). So I have been running 2x at a time. The first two passed 4/4 passes of memtest86. If the next two do as well, I may try something else like Karu or HCI. I may also try a single pass with one stick of RAM at a time, cycling through motherboard each slot. As for the SATA cables themselves, they are new and there are no crazy bends in them. Are all your sticks identical or is it a mix and match?
May 25, 20206 yr Community Expert Many motherboards are more likely to have RAM errors if all the slots are populated. The only acceptable result from memtest is 0 errors reported.
May 25, 20206 yr Author All the sticks are identical - 4x G.Skill 16GB That's odd that the motherboard would have issues with all 4 slots populated. That test I posted above eventually error'd out due to too many errors (10k+). But, if that's the issue, I can live with 2x 16GB. Edited May 25, 20206 yr by subterminal
May 25, 20206 yr Author This was the result of all 4 sticks in at once. But when tested 2x at a time, they work great Would it still be worth paying for an additional testing software like HCI or Karu for additional testing? Just to ensure that 2x sticks don't actually have errors Edited May 25, 20206 yr by subterminal
May 25, 20206 yr I dont think another software would help much because as it looks like your Mainboard or Ram Modules have a problem when all Slots/Moduls installed. Do you have latest BIOS Version installed?
May 25, 20206 yr Author 47 minutes ago, ryperx said: Do you have latest BIOS Version installed? Probably not. I could try to update the BIOS, but I'm realizing that I don't actually need 64GB of RAM. If 2x16 doesn't cause errors, I may honestly stick with that and call it a day. The only reason that I have 64GB is that I came by it for free. I was thinking of using another memory test software to verify 100% that the two sticks ran without issue and that the errors did infact come from having four sticks in. One thing to note is that the memtest of each set had a significant difference in the time to complete (~2+ hours). Theoretically they are the same RAM, but could there be a large timing/latency difference contributing to this issue?
Archived
This topic is now archived and is closed to further replies.