Jump to content
We're Hiring! Full Stack Developer ×

drive disabled again (brand new drive)


Recommended Posts

so i recently had 2 4tb WD red cmr drives die (or i belive they died) 
both were giving corrupt file system errors for weeks not at the same time but one on one day then the other on another, happening repeatedly but i just ran xfs repair and carried on, then i noticed both disk 3 & 4 dropped offline and giving i/o errors on xfs repair unable to find superblock, tried multiple different cables, different power connection combinations etc, anyway i replaced with 2 brand new seagate ironwolf 6tb (lost alot of data in the process as i only had single parity but its all media that can be redownloaded which i am in the process of)

anyway the drives been powered now for maybe a total of 3 days, and ive just had notification that disk 4 has been disabled due to errors

i dont beleive it has failed but am i right in thinking i can just stop the array, remove disk 4, start in maintenance, stop and then readd it to rebuild and renable?
i did come across this thread doing some quick googling, has this been found to be a fix and worth doing?

diags attached incase anything can be found to be the cause in there, im using a lsi 9201 in IT mode if my memory is correct but i may be wrong on the exact model

 

tower-diagnostics-20230609-0945.zip

Edited by Mattlevant
Link to comment

Looks more like a power/connection problem, check/replace cables and try again.

 

These are logged like a disk problem, so good idea to run an extended SMART test on diak1:

 

Jun  8 12:29:56 Tower kernel: sd 9:0:0:0: [sdb] tag#205 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=6s
Jun  8 12:29:56 Tower kernel: sd 9:0:0:0: [sdb] tag#205 Sense Key : 0x3 [current] 
Jun  8 12:29:56 Tower kernel: sd 9:0:0:0: [sdb] tag#205 ASC=0x11 ASCQ=0x0 
Jun  8 12:29:56 Tower kernel: sd 9:0:0:0: [sdb] tag#205 CDB: opcode=0x88 88 00 00 00 00 00 01 64 66 98 00 00 02 f0 00 00
Jun  8 12:29:56 Tower kernel: critical medium error, dev sdb, sector 23357080 op 0x0:(READ) flags 0x0 phys_seg 94 prio class 0
Jun  8 12:29:56 Tower kernel: md: disk1 read error, sector=23357016
Jun  8 12:29:56 Tower kernel: md: disk1 read error, sector=23357024
Jun  8 12:29:56 Tower kernel: md: disk1 read error, sector=23357032
Jun  8 12:29:56 Tower kernel: md: disk1 read error, sector=23357040

 

Link to comment
4 minutes ago, JorgeB said:

Looks more like a power/connection problem, check/replace cables and try again.

 

These are logged like a disk problem, so good idea to run an extended SMART test on diak1:

 

Jun  8 12:29:56 Tower kernel: sd 9:0:0:0: [sdb] tag#205 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=6s
Jun  8 12:29:56 Tower kernel: sd 9:0:0:0: [sdb] tag#205 Sense Key : 0x3 [current] 
Jun  8 12:29:56 Tower kernel: sd 9:0:0:0: [sdb] tag#205 ASC=0x11 ASCQ=0x0 
Jun  8 12:29:56 Tower kernel: sd 9:0:0:0: [sdb] tag#205 CDB: opcode=0x88 88 00 00 00 00 00 01 64 66 98 00 00 02 f0 00 00
Jun  8 12:29:56 Tower kernel: critical medium error, dev sdb, sector 23357080 op 0x0:(READ) flags 0x0 phys_seg 94 prio class 0
Jun  8 12:29:56 Tower kernel: md: disk1 read error, sector=23357016
Jun  8 12:29:56 Tower kernel: md: disk1 read error, sector=23357024
Jun  8 12:29:56 Tower kernel: md: disk1 read error, sector=23357032
Jun  8 12:29:56 Tower kernel: md: disk1 read error, sector=23357040

 

missed quote from response
see above

Link to comment
4 minutes ago, JorgeB said:

No, for disk4, like mentioned disk1 is logged as an actual disk problem

 

ah sorry misunderstood initially as disk 4 wasnt specified in your response, i will try a different cable (i have 2 spares on my LSI card and 2 regular sata cables i can plug into the mainboard)

running extended self test on disk 1 now

for reference smart attributes attached for disk 1
 

disk 1.PNG

Link to comment

SMART attributes look fine, but it's logged as a disk issue both on the syslog and on extended SMART info, and there are multiple errors from earlier, all these UNC @ LBA are not a good sign:

Error 55 [6] occurred at disk power-on lifetime: 57993 hours (2416 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 01 64 66 98 40 00  Error: UNC at LBA = 0x01646698 = 23357080

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 02 f0 00 08 00 00 01 64 66 98 40 00  1d+22:23:27.698  READ FPDMA QUEUED
  60 00 10 00 00 00 00 80 ac a6 20 40 00  1d+22:23:27.698  READ FPDMA QUEUED
  60 01 f0 00 10 00 00 01 64 64 a8 40 00  1d+22:23:27.690  READ FPDMA QUEUED
  60 04 00 00 08 00 00 01 64 60 a8 40 00  1d+22:23:27.690  READ FPDMA QUEUED
  60 03 80 00 00 00 00 01 64 5d 28 40 00  1d+22:23:27.689  READ FPDMA QUEUED

Error 54 [5] occurred at disk power-on lifetime: 57786 hours (2407 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 54 ad 34 f8 40 00  Error: UNC at LBA = 0x54ad34f8 = 1420637432

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 04 00 00 00 00 00 54 ad 34 f8 40 00  2d+11:10:42.878  READ FPDMA QUEUED
  60 04 00 00 08 00 00 54 a7 54 f8 40 00  2d+11:10:41.508  READ FPDMA QUEUED
  60 04 00 00 00 00 00 54 a7 50 f8 40 00  2d+11:10:41.508  READ FPDMA QUEUED
  60 04 00 00 00 00 00 54 a7 4c f8 40 00  2d+11:10:41.502  READ FPDMA QUEUED
  60 04 00 00 00 00 00 54 a7 48 f8 40 00  2d+11:10:41.499  READ FPDMA QUEUED

Error 53 [4] occurred at disk power-on lifetime: 57317 hours (2388 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 2a ac 0e 68 40 00  Error: UNC at LBA = 0x2aac0e68 = 715918952

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 20 00 00 00 00 2a ac 0e 68 40 00  5d+06:18:02.593  READ FPDMA QUEUED
  60 00 20 00 18 00 00 2a a3 ff 28 40 00  5d+06:18:02.510  READ FPDMA QUEUED
  60 00 20 00 10 00 00 2a ab ff 28 40 00  5d+06:18:02.510  READ FPDMA QUEUED
  60 00 20 00 08 00 00 2a a3 ff 48 40 00  5d+06:18:02.510  READ FPDMA QUEUED
  60 00 20 00 00 00 00 2a ab ff 48 40 00  5d+06:18:02.510  READ FPDMA QUEUED

Error 52 [3] occurred at disk power-on lifetime: 57128 hours (2380 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 ad a0 e7 38 40 00  Error: UNC at LBA = 0x1ada0e738 = 7207970616

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 00 00 00 00 01 ad a0 e7 38 40 00  2d+04:22:59.028  READ FPDMA QUEUED
  60 00 08 00 00 00 00 e8 fc c7 d0 40 00  2d+04:22:55.579  READ FPDMA QUEUED
  60 00 20 00 18 00 00 2a a0 87 e8 40 00  2d+04:22:55.552  READ FPDMA QUEUED
  60 00 20 00 10 00 00 2a a8 87 e8 40 00  2d+04:22:55.552  READ FPDMA QUEUED
  60 00 20 00 08 00 00 2a a0 88 08 40 00  2d+04:22:55.552  READ FPDMA QUEUED

Error 51 [2] occurred at disk power-on lifetime: 56999 hours (2374 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 00 00 40 40 00  Error: UNC at LBA = 0x00000040 = 64

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 08 00 00 00 00 00 00 00 40 40 00  2d+16:09:06.123  READ FPDMA QUEUED
  60 00 08 00 00 00 00 e8 e8 15 d8 40 00  2d+16:09:02.836  READ FPDMA QUEUED
  60 00 20 00 18 00 00 2b 70 1f a8 40 00  2d+16:09:02.758  READ FPDMA QUEUED
  60 00 20 00 10 00 00 2b 78 1f a8 40 00  2d+16:09:02.758  READ FPDMA QUEUED
  60 00 20 00 08 00 00 2b 70 1f c8 40 00  2d+16:09:02.758  READ FPDMA QUEUED

Error 50 [1] occurred at disk power-on lifetime: 54448 hours (2268 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 02 77 7c 46 08 40 00  Error: UNC at LBA = 0x2777c4608 = 10594567688

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 00 00 00 00 02 77 7c 46 08 40 00 15d+09:54:26.733  READ FPDMA QUEUED
  60 04 00 00 00 00 02 77 76 92 08 40 00 15d+09:54:24.824  READ FPDMA QUEUED
  60 04 00 00 00 00 02 77 76 8e 08 40 00 15d+09:54:24.822  READ FPDMA QUEUED
  60 04 00 00 18 00 02 77 76 8a 08 40 00 15d+09:54:24.818  READ FPDMA QUEUED
  60 04 00 00 10 00 02 77 76 86 08 40 00 15d+09:54:24.818  READ FPDMA QUEUED

Error 49 [0] occurred at disk power-on lifetime: 53679 hours (2236 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 84 21 6a 40 40 00  Error: UNC at LBA = 0x184216a40 = 6511749696

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 04 00 00 10 00 01 84 21 6a 40 40 00 11d+20:38:22.772  READ FPDMA QUEUED
  60 04 00 00 08 00 01 84 21 66 40 40 00 11d+20:38:22.772  READ FPDMA QUEUED
  60 04 00 00 00 00 01 84 21 62 40 40 00 11d+20:38:22.772  READ FPDMA QUEUED
  60 04 00 00 08 00 01 84 21 5e 40 40 00 11d+20:38:22.771  READ FPDMA QUEUED
  60 04 00 00 00 00 01 84 21 5a 40 40 00 11d+20:38:22.770  READ FPDMA QUEUED

Error 48 [23] occurred at disk power-on lifetime: 51049 hours (2127 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 38 74 01 28 40 00  Error: UNC at LBA = 0x38740128 = 947126568

 

Link to comment
1 hour ago, JorgeB said:

SMART attributes look fine, but it's logged as a disk issue both on the syslog and on extended SMART info, and there are multiple errors from earlier, all these UNC @ LBA are not a good sign:

Error 55 [6] occurred at disk power-on lifetime: 57993 hours (2416 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 01 64 66 98 40 00  Error: UNC at LBA = 0x01646698 = 23357080

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 02 f0 00 08 00 00 01 64 66 98 40 00  1d+22:23:27.698  READ FPDMA QUEUED
  60 00 10 00 00 00 00 80 ac a6 20 40 00  1d+22:23:27.698  READ FPDMA QUEUED
  60 01 f0 00 10 00 00 01 64 64 a8 40 00  1d+22:23:27.690  READ FPDMA QUEUED
  60 04 00 00 08 00 00 01 64 60 a8 40 00  1d+22:23:27.690  READ FPDMA QUEUED
  60 03 80 00 00 00 00 01 64 5d 28 40 00  1d+22:23:27.689  READ FPDMA QUEUED

Error 54 [5] occurred at disk power-on lifetime: 57786 hours (2407 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 54 ad 34 f8 40 00  Error: UNC at LBA = 0x54ad34f8 = 1420637432

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 04 00 00 00 00 00 54 ad 34 f8 40 00  2d+11:10:42.878  READ FPDMA QUEUED
  60 04 00 00 08 00 00 54 a7 54 f8 40 00  2d+11:10:41.508  READ FPDMA QUEUED
  60 04 00 00 00 00 00 54 a7 50 f8 40 00  2d+11:10:41.508  READ FPDMA QUEUED
  60 04 00 00 00 00 00 54 a7 4c f8 40 00  2d+11:10:41.502  READ FPDMA QUEUED
  60 04 00 00 00 00 00 54 a7 48 f8 40 00  2d+11:10:41.499  READ FPDMA QUEUED

Error 53 [4] occurred at disk power-on lifetime: 57317 hours (2388 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 2a ac 0e 68 40 00  Error: UNC at LBA = 0x2aac0e68 = 715918952

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 20 00 00 00 00 2a ac 0e 68 40 00  5d+06:18:02.593  READ FPDMA QUEUED
  60 00 20 00 18 00 00 2a a3 ff 28 40 00  5d+06:18:02.510  READ FPDMA QUEUED
  60 00 20 00 10 00 00 2a ab ff 28 40 00  5d+06:18:02.510  READ FPDMA QUEUED
  60 00 20 00 08 00 00 2a a3 ff 48 40 00  5d+06:18:02.510  READ FPDMA QUEUED
  60 00 20 00 00 00 00 2a ab ff 48 40 00  5d+06:18:02.510  READ FPDMA QUEUED

Error 52 [3] occurred at disk power-on lifetime: 57128 hours (2380 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 ad a0 e7 38 40 00  Error: UNC at LBA = 0x1ada0e738 = 7207970616

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 00 00 00 00 01 ad a0 e7 38 40 00  2d+04:22:59.028  READ FPDMA QUEUED
  60 00 08 00 00 00 00 e8 fc c7 d0 40 00  2d+04:22:55.579  READ FPDMA QUEUED
  60 00 20 00 18 00 00 2a a0 87 e8 40 00  2d+04:22:55.552  READ FPDMA QUEUED
  60 00 20 00 10 00 00 2a a8 87 e8 40 00  2d+04:22:55.552  READ FPDMA QUEUED
  60 00 20 00 08 00 00 2a a0 88 08 40 00  2d+04:22:55.552  READ FPDMA QUEUED

Error 51 [2] occurred at disk power-on lifetime: 56999 hours (2374 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 00 00 40 40 00  Error: UNC at LBA = 0x00000040 = 64

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 08 00 00 00 00 00 00 00 40 40 00  2d+16:09:06.123  READ FPDMA QUEUED
  60 00 08 00 00 00 00 e8 e8 15 d8 40 00  2d+16:09:02.836  READ FPDMA QUEUED
  60 00 20 00 18 00 00 2b 70 1f a8 40 00  2d+16:09:02.758  READ FPDMA QUEUED
  60 00 20 00 10 00 00 2b 78 1f a8 40 00  2d+16:09:02.758  READ FPDMA QUEUED
  60 00 20 00 08 00 00 2b 70 1f c8 40 00  2d+16:09:02.758  READ FPDMA QUEUED

Error 50 [1] occurred at disk power-on lifetime: 54448 hours (2268 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 02 77 7c 46 08 40 00  Error: UNC at LBA = 0x2777c4608 = 10594567688

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 00 00 00 00 02 77 7c 46 08 40 00 15d+09:54:26.733  READ FPDMA QUEUED
  60 04 00 00 00 00 02 77 76 92 08 40 00 15d+09:54:24.824  READ FPDMA QUEUED
  60 04 00 00 00 00 02 77 76 8e 08 40 00 15d+09:54:24.822  READ FPDMA QUEUED
  60 04 00 00 18 00 02 77 76 8a 08 40 00 15d+09:54:24.818  READ FPDMA QUEUED
  60 04 00 00 10 00 02 77 76 86 08 40 00 15d+09:54:24.818  READ FPDMA QUEUED

Error 49 [0] occurred at disk power-on lifetime: 53679 hours (2236 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 84 21 6a 40 40 00  Error: UNC at LBA = 0x184216a40 = 6511749696

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 04 00 00 10 00 01 84 21 6a 40 40 00 11d+20:38:22.772  READ FPDMA QUEUED
  60 04 00 00 08 00 01 84 21 66 40 40 00 11d+20:38:22.772  READ FPDMA QUEUED
  60 04 00 00 00 00 01 84 21 62 40 40 00 11d+20:38:22.772  READ FPDMA QUEUED
  60 04 00 00 08 00 01 84 21 5e 40 40 00 11d+20:38:22.771  READ FPDMA QUEUED
  60 04 00 00 00 00 01 84 21 5a 40 40 00 11d+20:38:22.770  READ FPDMA QUEUED

Error 48 [23] occurred at disk power-on lifetime: 51049 hours (2127 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 38 74 01 28 40 00  Error: UNC at LBA = 0x38740128 = 947126568

 

so are you saying disk 1 is likely to be the next one to fail?

Link to comment
1 hour ago, JorgeB said:

That means the disk is OK, at least for now, keep monitoring but any more read errors I would consider replacing it.

just come back to find disk 3 has errors now so thats disk 1,3&4 giving errors

can you just check disk 3?
3&4 are my new seagate disks

running extended self test on 3 now but maybe you can see something in diagnostics
 

tower-diagnostics-20230610-1258.zip

Link to comment
17 minutes ago, JorgeB said:

It's not logged as a disk problem, and the disk looks healthy, most likely power/connection, check/replace cables and/or try a different PSU.

Ive just lost disk 2 now this morning, I'm beginning to think it's a issue with my lsi card causing drop outs

 

Does the diagnostics show anything for that or is that something that isn't logged?

 

I'm currently bodging together a power and SATA connector from my old dell server which was a proprietary 8 pin to a female molex because I've run out of spare sata cables 

16864749899246249613144881992861.jpg

Link to comment
1 minute ago, JorgeB said:

Though not as common it could be a bad controller, if power/cables don't help try a different one.

I just don't see how I can be dropping so many drives so often 

All my drives can't be bad and all the sas-sata cables coming from the controller can't be bad either

 

The common denominator is the LSI card everything goes through that, gonna try bypass it completely and see what happens 

Link to comment
2 hours ago, Mattlevant said:

But 800w supply should be plenty

Not relevant to cable count. Every added drive on a single feed drops the voltage available to all drives on that feed by a small amount. If you reach a point where that voltage sags enough during simultaneous drive spinup, you will see communication errors on those drives at that point.

 

The amount of voltage drop on a feed is effected by the thickness of the wire and the length, also any connections like the slip fit between adapters and the modular connections on the PSU.

Link to comment
13 hours ago, JonathanM said:

Not relevant to cable count. Every added drive on a single feed drops the voltage available to all drives on that feed by a small amount. If you reach a point where that voltage sags enough during simultaneous drive spinup, you will see communication errors on those drives at that point.

 

The amount of voltage drop on a feed is effected by the thickness of the wire and the length, also any connections like the slip fit between adapters and the modular connections on the PSU.

Makes sense I suppose

But it was random which got the errors, it's not like it was the last drive on the string every time

 

Anyway

I've currently got 4 powered via my bodged molex to sata power

The way I've connected it I've got 2 on each string on a Y split from 1 molex 4 pin

The other 2 drives are on a single string from the dedicated sata power cable from the PSU

 

So 3 strings of 2 now also totally bypassed the lsi card

Now I'm unsure if it was power related or the LSI card was at fault but it's currently online and ran a whole parity check took 18 hours without any drive errors (quite a few parity errors 500k or so which Im running a correcting check to fix currently)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...