moved hardware parity now disabled

Timbiotic · September 8, 2019

On latest unraid 6.7.2 and just moved hardware to upgrade. everything came up great except parity which is disabled. any ideas why? attaching diagnostics

lillis.69.mu-diagnostics-20190908-0018.zip

Timbiotic · September 8, 2019

stopped array

set no device for parity

started array

stopped array

set back parity

started array and its rebuilding...

hopefully there isnt anything more serious wrong.

im wondering if it was the manual sdparm spin down i did before shutting down array. sure hope they figure out sas spin down.

if someone wants to check diagnostic and can explain to me why parity was disabled that would be helpful.

JorgeB · September 8, 2019

4 hours ago, Timbiotic said:

if someone wants to check diagnostic and can explain to me why parity was disabled that would be helpful.

Diags are after rebooting when parity was already disabled, so we can't see what happened.

Timbiotic · September 9, 2019

it rebuilt and everything seemed fine then it disabled again. attaching recent before reboot

lillis.69.mu-diagnostics-20190909-1403.zip

JorgeB · September 9, 2019

Looks more like a connection problem, replace/swap cables/backplane slot and try again, you can also run an extended SMART test.

Timbiotic · September 9, 2019

When you say try again do you mean rebuild parity ? I reseated everything and made sure cable was tight. Do I need to remove and re add it again for a whole new rebuild ?

JorgeB · September 9, 2019

31 minutes ago, Timbiotic said:

When you say try again do you mean rebuild parity ?

Yes

31 minutes ago, Timbiotic said:

I reseated everything and made sure cable was tight.

I would recommend replacing or swapping with another disk to rule out the cables, or if the same happens you'll still won't know if it's the disk or cables.

Timbiotic · September 10, 2019

12 hours ago, johnnie.black said:

Looks more like a connection problem, replace/swap cables/backplane slot and try again, you can also run an extended SMART test.

I ran an extended smart test but how do you see results? It says completed in the download report but not much else. I’ll try swapping the disk tomorrow to rule out cables.

Vr2Io · September 10, 2019

On 9/8/2019 at 11:52 AM, Timbiotic said:

im wondering if it was the manual sdparm spin down i did before shutting down array. sure hope they figure out sas spin down.

Suggest disable the "IDLE SAS" script

Sep 8 15:00:43 lillis emhttpd: cmd: /usr/local/emhttp/plugins/user.scripts/startScript.sh /tmp/user.scripts/tmpScripts/IDLE SAS/script
Sep 8 15:04:52 lillis kernel: mdcmd (55): ?
Sep 8 15:05:23 lillis kernel: mdcmd (56): spindown 0

JorgeB · September 10, 2019

3 hours ago, Timbiotic said:

I ran an extended smart test but how do you see results?

In the SMART report.

Timbiotic · September 10, 2019

15 hours ago, Benson said:

Suggest disable the "IDLE SAS" script

Sep 8 15:00:43 lillis emhttpd: cmd: /usr/local/emhttp/plugins/user.scripts/startScript.sh /tmp/user.scripts/tmpScripts/IDLE SAS/script
Sep 8 15:04:52 lillis kernel: mdcmd (55): ?
Sep 8 15:05:23 lillis kernel: mdcmd (56): spindown 0

it is disabled i only run it manually, trying to figure it out still. The mdcmd spindown 0 was me manually doing it in cli to see if the sas drive responded. Surprisingly it did.

Timbiotic · September 10, 2019

12 hours ago, johnnie.black said:

In the SMART report.

Does anything in here jump out to you? It says it healthy in the smart column on the main dashboard. But it still is disabled. I think i will swap out the drive now. But its "new" from ebay. I can see by the smart report that the reseller just pulled it from something and resold it .

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.56-Unraid] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST6000NM0285
Revision:             EF02
Compliance:           SPC-4
User Capacity:        6,001,175,126,016 bytes [6.00 TB]
Logical block size:   512 bytes
Physical block size: 4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c5008696d16b
Serial number:        ZAD0Q1E70000C721BAUY
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Sep 10 15:00:10 2019 EDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning: Enabled
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature: 28 C
Drive Trip Temperature: 60 C

Manufactured in week 07 of year 2017
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 715
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 1679
Elements in grown defect list: 0

Vendor (Seagate Cache) information
Blocks sent to initiator = 2973709184
Blocks received from initiator = 2118791472
Blocks read from cache and sent to initiator = 48672929
Number of read and write commands whose size <= segment size = 271022
Number of read and write commands whose size > segment size = 30068

Vendor (Seagate/Hitachi) factory information
number of hours powered up = 664.45
number of minutes until next internal SMART test = 12

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites corrected invocations   [10^9 bytes] errors
read:   3967346996        0         0 3967346996          0       8119.609           0
write:         0        0         0         0          0      12080.227           0

Non-medium error count: 0

SMART Self-test log
Num Test              Status                 segment LifeTime LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1 Background long   Completed                   -     648                 - [-   -    -]
# 2 Background short Completed                   -     529                 - [-   -    -]
# 3 Background long   Completed                   -     503                 - [-   -    -]
# 4 Background short Completed                   -     493                 - [-   -    -]
# 5 Background short Aborted (by user command)   -     460                 - [-   -    -]

Long (extended) Self-test duration: 28041 seconds [467.4 minutes]

Background scan results log
Status: no scans active
    Accumulated power on time, hours:minutes 664:27 [39867 minutes]
    Number of background scans performed: 0, scan progress: 0.00%
    Number of background medium scans performed: 0

Protocol Specific port log page for SAS SSP
relative target port id = 1
generation code = 0
number of phys = 1
phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 6 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5008696d169
    attached SAS address = 0x500605b0094e6eb2
    attached phy identifier = 6
    Invalid DWORD count = 8
    Running disparity error count = 8
    Loss of DWORD synchronization = 21
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 8
     Running disparity error count: 8
     Loss of dword synchronization count: 21
     Phy reset problem count: 0
relative target port id = 2
generation code = 0
number of phys = 1
phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; unknown
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5008696d16a
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0

JorgeB · September 11, 2019

11 hours ago, Timbiotic said:

Does anything in here jump out to you?

Looks fine and the long test completed without error, you'll need to re-sync parity to re-enable the disk.

Timbiotic · September 11, 2019

Do you think me manually spinning down the drive could have caused it? Is there anytime only parity would be active legitimately? I only spun it down when all other members were spun down.

JorgeB · September 11, 2019

Normal spin down shouldn't be a problem, but since it's SAS who knows, would need the diags from that time to confirm.

Timbiotic · September 11, 2019

BEnson found it in the diag from earlier i posted

Sep 8 15:00:43 lillis emhttpd: cmd: /usr/local/emhttp/plugins/user.scripts/startScript.sh /tmp/user.scripts/tmpScripts/IDLE SAS/script
Sep 8 15:04:52 lillis kernel: mdcmd (55): ?
Sep 8 15:05:23 lillis kernel: mdcmd (56): spindown 0

JorgeB · September 11, 2019

Unrelated, disk was only disabled on the next day, and there's no spin down command before that:

Sep  9 08:58:27 lillis kernel: sd 9:0:3:0: [sdg] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Sep  9 08:58:27 lillis kernel: sd 9:0:3:0: [sdg] tag#1 Sense Key : 0x2 [current] [descriptor]
Sep  9 08:58:27 lillis kernel: sd 9:0:3:0: [sdg] tag#1 ASC=0x4 ASCQ=0x2
Sep  9 08:58:27 lillis kernel: sd 9:0:3:0: [sdg] tag#1 CDB: opcode=0x8a 8a 08 00 00 00 00 ae a8 73 68 00 00 00 08 00 00
Sep  9 08:58:27 lillis kernel: print_req_error: I/O error, dev sdg, sector 2930275176
Sep  9 08:58:27 lillis kernel: md: disk0 write error, sector=2930275104
Sep  9 08:58:28 lillis kernel: md: disk0 write error, sector=2930275112

Looks more like a connection/power problem to me, because of this:

Sep  9 08:58:27 lillis kernel: print_req_error: I/O error, dev sdg, sector 0

Started with an error on sector 0

Timbiotic · September 15, 2019

i know its “sas” and unraid doesnt spin down sas. But i just put in an identical 4tb drive with same firmware as my others and unraid will spin down the others but not this one. is it possible these m1015 controllers get weird with both ports connected? or maybe its the disk settings themselves in the pa page.

lillis.69.mu-diagnostics-20190915-1413.zip

Timbiotic · September 15, 2019

nevermind i downloaded the smart report and its running a background smart scan. i didnt trigger it, is that normal when adding new drive? or is it just a disk timer that says now is the time?

Timbiotic · September 15, 2019

Again I think something is whacky , another 4tb sas drive is doing the background scan thing and spins down just fine ... same model, same firmware, same settings... wtf

moved hardware parity now disabled

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived