JorgeB Posted November 25, 2019 Share Posted November 25, 2019 1 minute ago, G Speed said: It is IT mode? No, it's using the megaraid driver: 01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2008 [Falcon] [1000:0073] (rev 03) Subsystem: Dell PERC H310 [1028:1f78] Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas BTW, I've seen the same controller in raid mode still using the HBA driver (mpt3sas), not sure why sometimes one or the other is used, but that one is using the megaraid driver, SMART might work if you set the correct options, but it would be best to flash the controller to IT mode. Quote Link to comment
G Speed Posted November 25, 2019 Author Share Posted November 25, 2019 10 minutes ago, johnnie.black said: No, it's using the megaraid driver: 01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2008 [Falcon] [1000:0073] (rev 03) Subsystem: Dell PERC H310 [1028:1f78] Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas BTW, I've seen the same controller in raid mode still using the HBA driver (mpt3sas), not sure why sometimes one or the other is used, but that one is using the megaraid driver, SMART might work if you set the correct options, but it would be best to flash the controller to IT mode. Thanks for catching that I will fix that up, but trying to figure out the disk issue more so One thought, when I "scrub" it's only reading existing data.. not free space. So the whole disk is not being read. Should I do an extended smart? At least the whole disk will be read... On that note, what comand do I use for that + logging? Quote Link to comment
JorgeB Posted November 25, 2019 Share Posted November 25, 2019 Scrub is used to check data integrity, not the disk, something is corrupting the data, even if the disk is failing it should never return corrupt data, though like mentioned it's happened before, running a SMART extended test is a good idea, alternatively you can also run a parity check (non correct), that will also read the entire disk. Quote Link to comment
G Speed Posted November 25, 2019 Author Share Posted November 25, 2019 35 minutes ago, johnnie.black said: Scrub is used to check data integrity, not the disk, something is corrupting the data, even if the disk is failing it should never return corrupt data, though like mentioned it's happened before, running a SMART extended test is a good idea, alternatively you can also run a parity check (non correct), that will also read the entire disk. Can I do a parity check on a single disk? Quote Link to comment
JorgeB Posted November 25, 2019 Share Posted November 25, 2019 No, it will check all disks. Quote Link to comment
G Speed Posted November 25, 2019 Author Share Posted November 25, 2019 (edited) 13 minutes ago, johnnie.black said: No, it will check all disks. Might as well just do extended smart then Is this correct? smartctl -t long /dev/sdc followed by smartctl -a -A /dev/sdc >/boot/smart.txt Edited November 25, 2019 by G Speed Quote Link to comment
JorgeB Posted November 25, 2019 Share Posted November 25, 2019 10 minutes ago, G Speed said: Is this correct? smartctl -t long /dev/sdc Yes 10 minutes ago, G Speed said: smartctl -a -A /dev/sdc >/boot/smart.txt Use -x instead of -a, more info. Quote Link to comment
G Speed Posted November 26, 2019 Author Share Posted November 26, 2019 This is correct? === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 972 minutes for test to complete. Just to confirm, I can't see anything on the unraid server.. as in that disk is Spun down Quote Link to comment
JorgeB Posted November 26, 2019 Share Posted November 26, 2019 That's correct but since you're not running the test from the GUI you need to disable spin down, or SMART test will be interrupted. Quote Link to comment
G Speed Posted November 26, 2019 Author Share Posted November 26, 2019 3 hours ago, johnnie.black said: That's correct but since you're not running the test from the GUI you need to disable spin down, or SMART test will be interrupted. Hmmm disk is spun down.. but it seems to be working? Offline data collection status:0x00OOfflinedatacollection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status:243Self-test routine in progress... 30% of test remaining. Quote Link to comment
JorgeB Posted November 26, 2019 Share Posted November 26, 2019 Looks like it is, or it should say "aborted by host", see if it finishes. Quote Link to comment
G Speed Posted November 26, 2019 Author Share Posted November 26, 2019 Did I mess something up? Drive is FINE? SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 5588 - smart.txt Quote Link to comment
JorgeB Posted November 27, 2019 Share Posted November 27, 2019 That's expected, except for the recent reported_unc error SMART looked fine. Quote Link to comment
G Speed Posted November 27, 2019 Author Share Posted November 27, 2019 1 hour ago, johnnie.black said: That's expected, except for the recent reported_unc error SMART looked fine. Correct, but that happened previously and no increase. So why can't I move files over? Back to the same problem.. bug with the rc? Quote Link to comment
JorgeB Posted November 27, 2019 Share Posted November 27, 2019 Not a bug, your data is getting corrupted, you need to find why, if you already scrubbed the other disks like recommended and corruption is limited to disk2 it's likely a disk problem, not bad sectors so not detectable by SMART, but something else, if corruption also affects other disks then it's likely other problem, like bad RAM, controller, board/CPU, etc. Quote Link to comment
G Speed Posted November 27, 2019 Author Share Posted November 27, 2019 51 minutes ago, johnnie.black said: Not a bug, your data is getting corrupted, you need to find why, if you already scrubbed the other disks like recommended and corruption is limited to disk2 it's likely a disk problem, not bad sectors so not detectable by SMART, but something else, if corruption also affects other disks then it's likely other problem, like bad RAM, controller, board/CPU, etc. Everything else is fine, doing a non correcting parity check now.. 2TB left; no errors so far.. Quote Link to comment
G Speed Posted November 29, 2019 Author Share Posted November 29, 2019 So I scrubed all my drives 0 Errors... Quote Link to comment
JorgeB Posted November 29, 2019 Share Posted November 29, 2019 Then the most logical answer would be a problem with that disk that's silently corrupting data, I would replace it. Quote Link to comment
dvanders Posted September 28, 2020 Share Posted September 28, 2020 Google brought me to this old thread because I get the exact same failed csum errors on my btrfs: "csum 0x2ac15d26" And it looks like we have the same model drive: Seagate Barracuda Compute ST8000DM004-2CX188 These cheap SMR clearly have some systematic problem which manifests occasionally as poorly written, then unreadable bytes. I would guess that 0x2ac15d26 is the shash_digest of all 0x00s or 0xFFs. The drive itself never reports any problem -- SMART tests always succeed. As a workaround I run scrub weekly on this FS -- it finds and fixes a few hundred errors each time (always on newly written data). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.