Jump to content

btrfs drive problems etc.


G Speed

Recommended Posts

1 minute ago, G Speed said:

It is IT mode?

No, it's using the megaraid driver:

01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2008 [Falcon] [1000:0073] (rev 03)
    Subsystem: Dell PERC H310 [1028:1f78]
    Kernel driver in use: megaraid_sas
    Kernel modules: megaraid_sas

BTW, I've seen the same controller in raid mode still using the HBA driver (mpt3sas), not sure why sometimes one or the other is used, but that one is using the megaraid driver, SMART might work if you set the correct options, but it would be best to flash the controller to IT mode.

Link to comment
10 minutes ago, johnnie.black said:

No, it's using the megaraid driver:


01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2008 [Falcon] [1000:0073] (rev 03)
    Subsystem: Dell PERC H310 [1028:1f78]
    Kernel driver in use: megaraid_sas
    Kernel modules: megaraid_sas

BTW, I've seen the same controller in raid mode still using the HBA driver (mpt3sas), not sure why sometimes one or the other is used, but that one is using the megaraid driver, SMART might work if you set the correct options, but it would be best to flash the controller to IT mode.


Thanks for catching that :)
I will fix that up, but trying to figure out the disk issue more so :(

One thought, when I "scrub" it's only reading existing data.. not free space.
So the whole disk is not being read.

Should I do an extended smart?
At least the whole disk will be read...

On that note, what comand do I use for that + logging?
 

Link to comment

Scrub is used to check data integrity, not the disk, something is corrupting the data, even if the disk is failing it should never return corrupt data, though like mentioned it's happened before, running a SMART extended test is a good idea, alternatively you can also run a parity check (non correct), that will also read the entire disk.

Link to comment
35 minutes ago, johnnie.black said:

Scrub is used to check data integrity, not the disk, something is corrupting the data, even if the disk is failing it should never return corrupt data, though like mentioned it's happened before, running a SMART extended test is a good idea, alternatively you can also run a parity check (non correct), that will also read the entire disk.

Can I do a parity check on a single disk?

Link to comment

This is correct?

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 972 minutes for test to complete.

 

Just to confirm, I can't see anything on the unraid server.. as in that disk is Spun down

Link to comment
3 hours ago, johnnie.black said:

That's correct but since you're not running the test from the GUI you need to disable spin down, or SMART test will be interrupted.

Hmmm disk is spun down.. but it seems to be working?

 

Offline data collection status:0x00OOfflinedatacollection activity was never started.

Auto Offline Data Collection: Disabled.

Self-test execution status:243Self-test routine in progress...

30% of test remaining.

Link to comment

Not a bug, your data is getting corrupted, you need to find why, if you already scrubbed the other disks like recommended and corruption is limited to disk2 it's likely a disk problem, not bad sectors so not detectable by SMART, but something else, if corruption also affects other disks then it's likely other problem, like bad RAM, controller, board/CPU, etc.

Link to comment
51 minutes ago, johnnie.black said:

Not a bug, your data is getting corrupted, you need to find why, if you already scrubbed the other disks like recommended and corruption is limited to disk2 it's likely a disk problem, not bad sectors so not detectable by SMART, but something else, if corruption also affects other disks then it's likely other problem, like bad RAM, controller, board/CPU, etc.

Everything else is fine, doing a non correcting parity check now.. 2TB left; no errors so far..

 

 

 

 

Link to comment
  • 9 months later...

Google brought me to this old thread because I get the exact same failed csum errors on my btrfs: "csum 0x2ac15d26"

And it looks like we have the same model drive: Seagate Barracuda Compute ST8000DM004-2CX188

 

These cheap SMR clearly have some systematic problem which manifests occasionally as poorly written, then unreadable bytes. I would guess that 0x2ac15d26 is the shash_digest of all 0x00s or 0xFFs. The drive itself never reports any problem -- SMART tests always succeed.

As a workaround I run scrub weekly on this FS -- it finds and fixes a few hundred errors each time (always on newly written data).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...