November 25, 20196 yr Community Expert 1 minute ago, G Speed said: It is IT mode? No, it's using the megaraid driver: 01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2008 [Falcon] [1000:0073] (rev 03) Subsystem: Dell PERC H310 [1028:1f78] Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas BTW, I've seen the same controller in raid mode still using the HBA driver (mpt3sas), not sure why sometimes one or the other is used, but that one is using the megaraid driver, SMART might work if you set the correct options, but it would be best to flash the controller to IT mode.
November 25, 20196 yr Author 10 minutes ago, johnnie.black said: No, it's using the megaraid driver: 01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2008 [Falcon] [1000:0073] (rev 03) Subsystem: Dell PERC H310 [1028:1f78] Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas BTW, I've seen the same controller in raid mode still using the HBA driver (mpt3sas), not sure why sometimes one or the other is used, but that one is using the megaraid driver, SMART might work if you set the correct options, but it would be best to flash the controller to IT mode. Thanks for catching that I will fix that up, but trying to figure out the disk issue more so One thought, when I "scrub" it's only reading existing data.. not free space. So the whole disk is not being read. Should I do an extended smart? At least the whole disk will be read... On that note, what comand do I use for that + logging?
November 25, 20196 yr Community Expert Scrub is used to check data integrity, not the disk, something is corrupting the data, even if the disk is failing it should never return corrupt data, though like mentioned it's happened before, running a SMART extended test is a good idea, alternatively you can also run a parity check (non correct), that will also read the entire disk.
November 25, 20196 yr Author 35 minutes ago, johnnie.black said: Scrub is used to check data integrity, not the disk, something is corrupting the data, even if the disk is failing it should never return corrupt data, though like mentioned it's happened before, running a SMART extended test is a good idea, alternatively you can also run a parity check (non correct), that will also read the entire disk. Can I do a parity check on a single disk?
November 25, 20196 yr Author 13 minutes ago, johnnie.black said: No, it will check all disks. Might as well just do extended smart then Is this correct? smartctl -t long /dev/sdc followed by smartctl -a -A /dev/sdc >/boot/smart.txt Edited November 25, 20196 yr by G Speed
November 25, 20196 yr Community Expert 10 minutes ago, G Speed said: Is this correct? smartctl -t long /dev/sdc Yes 10 minutes ago, G Speed said: smartctl -a -A /dev/sdc >/boot/smart.txt Use -x instead of -a, more info.
November 26, 20196 yr Author This is correct? === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 972 minutes for test to complete. Just to confirm, I can't see anything on the unraid server.. as in that disk is Spun down
November 26, 20196 yr Community Expert That's correct but since you're not running the test from the GUI you need to disable spin down, or SMART test will be interrupted.
November 26, 20196 yr Author 3 hours ago, johnnie.black said: That's correct but since you're not running the test from the GUI you need to disable spin down, or SMART test will be interrupted. Hmmm disk is spun down.. but it seems to be working? Offline data collection status:0x00OOfflinedatacollection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status:243Self-test routine in progress... 30% of test remaining.
November 26, 20196 yr Community Expert Looks like it is, or it should say "aborted by host", see if it finishes.
November 26, 20196 yr Author Did I mess something up? Drive is FINE? SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 5588 - smart.txt
November 27, 20196 yr Community Expert That's expected, except for the recent reported_unc error SMART looked fine.
November 27, 20196 yr Author 1 hour ago, johnnie.black said: That's expected, except for the recent reported_unc error SMART looked fine. Correct, but that happened previously and no increase. So why can't I move files over? Back to the same problem.. bug with the rc?
November 27, 20196 yr Community Expert Not a bug, your data is getting corrupted, you need to find why, if you already scrubbed the other disks like recommended and corruption is limited to disk2 it's likely a disk problem, not bad sectors so not detectable by SMART, but something else, if corruption also affects other disks then it's likely other problem, like bad RAM, controller, board/CPU, etc.
November 27, 20196 yr Author 51 minutes ago, johnnie.black said: Not a bug, your data is getting corrupted, you need to find why, if you already scrubbed the other disks like recommended and corruption is limited to disk2 it's likely a disk problem, not bad sectors so not detectable by SMART, but something else, if corruption also affects other disks then it's likely other problem, like bad RAM, controller, board/CPU, etc. Everything else is fine, doing a non correcting parity check now.. 2TB left; no errors so far..
November 29, 20196 yr Community Expert Then the most logical answer would be a problem with that disk that's silently corrupting data, I would replace it.
September 28, 20205 yr Google brought me to this old thread because I get the exact same failed csum errors on my btrfs: "csum 0x2ac15d26" And it looks like we have the same model drive: Seagate Barracuda Compute ST8000DM004-2CX188 These cheap SMR clearly have some systematic problem which manifests occasionally as poorly written, then unreadable bytes. I would guess that 0x2ac15d26 is the shash_digest of all 0x00s or 0xFFs. The drive itself never reports any problem -- SMART tests always succeed. As a workaround I run scrub weekly on this FS -- it finds and fixes a few hundred errors each time (always on newly written data).
Archived
This topic is now archived and is closed to further replies.