btrfs drive problems etc.

November 25, 20196 yr

Community Expert

1 minute ago, G Speed said:

It is IT mode?

No, it's using the megaraid driver:

01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2008 [Falcon] [1000:0073] (rev 03)
    Subsystem: Dell PERC H310 [1028:1f78]
    Kernel driver in use: megaraid_sas
    Kernel modules: megaraid_sas

BTW, I've seen the same controller in raid mode still using the HBA driver (mpt3sas), not sure why sometimes one or the other is used, but that one is using the megaraid driver, SMART might work if you set the correct options, but it would be best to flash the controller to IT mode.

Quote

November 25, 20196 yr

Author

10 minutes ago, johnnie.black said:
No, it's using the megaraid driver:
01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2008 [Falcon] [1000:0073] (rev 03)
    Subsystem: Dell PERC H310 [1028:1f78]
    Kernel driver in use: megaraid_sas
    Kernel modules: megaraid_sas
BTW, I've seen the same controller in raid mode still using the HBA driver (mpt3sas), not sure why sometimes one or the other is used, but that one is using the megaraid driver, SMART might work if you set the correct options, but it would be best to flash the controller to IT mode.

Thanks for catching that
I will fix that up, but trying to figure out the disk issue more so

One thought, when I "scrub" it's only reading existing data.. not free space.
So the whole disk is not being read.

Should I do an extended smart?
At least the whole disk will be read...

On that note, what comand do I use for that + logging?

Quote

November 25, 20196 yr

Community Expert

Scrub is used to check data integrity, not the disk, something is corrupting the data, even if the disk is failing it should never return corrupt data, though like mentioned it's happened before, running a SMART extended test is a good idea, alternatively you can also run a parity check (non correct), that will also read the entire disk.

Quote

November 25, 20196 yr

Author

35 minutes ago, johnnie.black said:

Scrub is used to check data integrity, not the disk, something is corrupting the data, even if the disk is failing it should never return corrupt data, though like mentioned it's happened before, running a SMART extended test is a good idea, alternatively you can also run a parity check (non correct), that will also read the entire disk.

Can I do a parity check on a single disk?

Quote

November 25, 20196 yr

Community Expert

No, it will check all disks.

Quote

November 25, 20196 yr

Author

13 minutes ago, johnnie.black said:

No, it will check all disks.

Might as well just do extended smart then
Is this correct?
smartctl -t long /dev/sdc
followed by
smartctl -a -A /dev/sdc >/boot/smart.txt

Edited November 25, 20196 yr by G Speed

Quote

November 25, 20196 yr

Community Expert

10 minutes ago, G Speed said:

Is this correct?
smartctl -t long /dev/sdc

Yes

10 minutes ago, G Speed said:

smartctl -a -A /dev/sdc >/boot/smart.txt

Use -x instead of -a, more info.

Quote

November 26, 20196 yr

Author

This is correct?

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 972 minutes for test to complete.

Just to confirm, I can't see anything on the unraid server.. as in that disk is Spun down

Quote

November 26, 20196 yr

Community Expert

That's correct but since you're not running the test from the GUI you need to disable spin down, or SMART test will be interrupted.

Quote

November 26, 20196 yr

Author

3 hours ago, johnnie.black said:

That's correct but since you're not running the test from the GUI you need to disable spin down, or SMART test will be interrupted.

Hmmm disk is spun down.. but it seems to be working?

Offline data collection status:0x00OOfflinedatacollection activity was never started.

Auto Offline Data Collection: Disabled.

Self-test execution status:243Self-test routine in progress...

30% of test remaining.

Quote

November 26, 20196 yr

Community Expert

Looks like it is, or it should say "aborted by host", see if it finishes.

Quote

November 26, 20196 yr

Author

Did I mess something up? Drive is FINE?

SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 5588 -

smart.txt

Quote

November 27, 20196 yr

Community Expert

That's expected, except for the recent reported_unc error SMART looked fine.

Quote

November 27, 20196 yr

Author

1 hour ago, johnnie.black said:

That's expected, except for the recent reported_unc error SMART looked fine.

Correct, but that happened previously and no increase.

So why can't I move files over? Back to the same problem.. bug with the rc?

Quote

November 27, 20196 yr

Community Expert

Not a bug, your data is getting corrupted, you need to find why, if you already scrubbed the other disks like recommended and corruption is limited to disk2 it's likely a disk problem, not bad sectors so not detectable by SMART, but something else, if corruption also affects other disks then it's likely other problem, like bad RAM, controller, board/CPU, etc.

Quote

November 27, 20196 yr

Author

51 minutes ago, johnnie.black said:

Not a bug, your data is getting corrupted, you need to find why, if you already scrubbed the other disks like recommended and corruption is limited to disk2 it's likely a disk problem, not bad sectors so not detectable by SMART, but something else, if corruption also affects other disks then it's likely other problem, like bad RAM, controller, board/CPU, etc.

Everything else is fine, doing a non correcting parity check now.. 2TB left; no errors so far..

Quote

November 29, 20196 yr

Author

So I scrubed all my drives 0 Errors...

Quote

November 29, 20196 yr

Community Expert

Then the most logical answer would be a problem with that disk that's silently corrupting data, I would replace it.

Quote

September 28, 20205 yr

Google brought me to this old thread because I get the exact same failed csum errors on my btrfs: "csum 0x2ac15d26"

And it looks like we have the same model drive: Seagate Barracuda Compute ST8000DM004-2CX188

These cheap SMR clearly have some systematic problem which manifests occasionally as poorly written, then unreadable bytes. I would guess that 0x2ac15d26 is the shash_digest of all 0x00s or 0xFFs. The drive itself never reports any problem -- SMART tests always succeed.

As a workaround I run scrub weekly on this FS -- it finds and fixes a few hundred errors each time (always on newly written data).

Quote

btrfs drive problems etc.

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)