Yet More Drive Issues - Any ideas? - General Support

January 22, 20215 yr

Disk 2. Twice this week this drive has been disabled.

Both Diagnostics attached. Passes an extended SMART test, can't see any errors in the log apart from:

Jan 22 00:02:35 Tower kernel: sd 6:0:3:0: [sdl] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00

The first time it showed an irrationally high number of "Reads" as can be made out in the screenshot below (sorry about the red filter).

lOQejnS9XuRgyaEnmZBjc5MYwgDaqMdNG6fNqDb6

First time, whilst it was disabled I copied the emulated data to another drive, whilst running an extended SMART. On passing the SMART I re-built to the same drive (note there was little to rebuild as a moved it off).

2 evenings later the same disk goes down - albeit without the ridiculous number of reads.

Drive is attached to and LSI 8 port card. 3 other drives are on the same 4 sata to sas cable. Been like this for 6-8 weeks without any issues (on these drives/controller at least).

Expert Opinion please?

tower-diagnostics-20210122-1019.zip tower-diagnostics-20210120-0351.zip

Quote

January 22, 20215 yr

Community Expert

Drive dropped offline both times, because of that there's no SMART report, but assuming it look good it's likely a connection problem, try swapping cables with another drive, on same or on different controller.

Quote

January 25, 20215 yr

Author

@JorgeB Thanks for the reply, as always.

I haven't yet done what you suggested as the box is hard to get to and I have to empty a cupboard before even attempting to open it up. In the meantime I rebuilt to the same disk.

Alas it disabled itself again the next night!

However, what I've noticed is that all 3 of the error states have occurred at almost the same time...

20 JAN 01:17 - Tower: Alert [TOWER] - Disk 2 in error state (disk dsbl) SAMSUNG_HD103SJ_S2C8J9GZA00505 (sdl)

22 JAN 01:20 - Tower: Alert [TOWER] - Disk 2 in error state (disk dsbl) SAMSUNG_HD103SJ_S2C8J9GZA00505 (sdl)

24 JAN 01:22 - Tower: Alert [TOWER] - Disk 2 in error state (disk dsbl) SAMSUNG_HD103SJ_S2C8J9GZA00505 (sdl)

This can't be a coincidence.

Log attached from the latest error. Any clues as to what the server is doing at this time every night?

Shouldn't be Mover, CA AppData Backup or SSD Trim.

tower-diagnostics-20210125-1422.zip

Quote

January 25, 20215 yr

Community Expert

6 minutes ago, air_marshall said:

what the server is doing at this time every night?

What do you get from the command line with this?

crontab -l

Quote

January 25, 20215 yr

Author

Here is the Cron output

Linux 4.19.107-Unraid.
root@Tower:~# crontab -l
# If you don't want the output of a cron job mailed to you, you have to direct
# any output to /dev/null.  We'll do this here since these jobs should run
# properly on a newly installed system.  If a script fails, run-parts will
# mail a notice to root.
#
# Run the hourly, daily, weekly, and monthly cron jobs.
# Jobs that need different timing may be entered into the crontab as before,
# but most really don't need greater granularity than this.  If the exact
# times of the hourly, daily, weekly, and monthly cron jobs do not suit your
# needs, feel free to adjust them.
#
# Run hourly cron jobs at 47 minutes after the hour:
47 * * * * /usr/bin/run-parts /etc/cron.hourly 1> /dev/null
#
# Run daily cron jobs at 4:40 every day:
40 4 * * * /usr/bin/run-parts /etc/cron.daily 1> /dev/null
#
# Run weekly cron jobs at 4:30 on the first day of the week:
30 4 * * 0 /usr/bin/run-parts /etc/cron.weekly 1> /dev/null
#
# Run monthly cron jobs at 4:20 on the first day of the month:
20 4 1 * * /usr/bin/run-parts /etc/cron.monthly 1> /dev/null
0 1 * * 2 /usr/local/emhttp/plugins/ca.backup2/scripts/backup.php &>/dev/null 2>&1
root@Tower:~#

Any clues?

Quote

January 25, 20215 yr

Community Expert

8 hours ago, trurl said:
What do you get from the command line with this?
crontab -l

Might you not also want the contents of the file /etc/cron.d/root to see if that is running anything at those times?

Quote

January 28, 20215 yr

Author

On 1/25/2021 at 10:45 PM, itimpi said:

Might you not also want the contents of the file /etc/cron.d/root to see if that is running anything at those times?

# Generated docker monitoring schedule:
10 0 * * 1 /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/dockerupdate.php check &> /dev/null

# Generated system monitoring schedule:
*/1 * * * * /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null

# Generated mover schedule:
40 3 * * 5 /usr/local/sbin/mover &> /dev/null

# Generated parity check schedule:
0 0 1 * * /usr/local/sbin/mdcmd check NOCORRECT &> /dev/null

# Generated plugins version check schedule:
10 0 * * 1 /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugincheck &> /dev/null

# Generated Unraid OS update check schedule:
11 0 * * 1 /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/unraidcheck &> /dev/null

# Generated cron settings for docker autoupdates
0 0 * * 0 /usr/local/emhttp/plugins/ca.update.applications/scripts/updateDocker.php >/dev/null 2>&1
# Generated cron settings for plugin autoupdates
0 0 * * * /usr/local/emhttp/plugins/ca.update.applications/scripts/updateApplications.php >/dev/null 2>&1

# CRON for CA background scanning of applications
0 * * * * php /usr/local/emhttp/plugins/community.applications/scripts/notices.php > /dev/null 2>&1

# Generated ssd trim schedule:
0 2 * * 1 /sbin/fstrim -a -v | logger &> /dev/null

# Generated system data collection schedule:
*/1 * * * * /usr/local/emhttp/plugins/dynamix.system.stats/scripts/sa1 1 1 &> /dev/null

Any clues?

Quote

January 28, 20215 yr

Community Expert

Looks like the only thing scheduled to start around then is the CA Backup plug-in scheduled to start at 1:00 am. That should not lead to your problem though unless it is triggering something non-obvious.

Quote

January 28, 20215 yr

Community Expert

1 minute ago, itimpi said:

CA Backup plug-in scheduled to start at 1:00 am

On Tue, his syslog timestamps are on Wed

I always just go to corntab.com instead of trying to remember how to parse these.

Quote

February 12, 20215 yr

Author

Woes continue.

Firstly I re-manufactured the sata power cables to this bank of drives as I didn't like them. Connected it back up, did a short SMART test, rebuilt to same drive. That night same drive disabled at 01:40am.

Then I switched the SAS port it was on, rebuild to same disk, then that night the same drive disabled at 01:42am.

WTF is going on here. Happy to accept the drive might be bad despite passing SMART tests, but disabling itself at such consistent times I don't believe is just a co-incidence....

Latest diagnostics attached but I doubt it tell us anything new.

Shall i just give up and remove the drive?

tower-diagnostics-20210212-2340.zip

Quote

February 13, 20215 yr

Community Expert

Nothing is assigned as disk1, is that expected?

Disk2 is disabled and doesn't appear to be connected since there is no SMART report for it.

Quote

February 13, 20215 yr

Community Expert

8 hours ago, air_marshall said:

but disabling itself at such consistent times I don't believe is just a co-incidence....

You may have a ghost in the machine...

There are signs of the problem earlier:

Feb 12 00:01:28 Tower kernel: sd 10:0:2:0: attempting task abort! scmd(0000000003923bdb)
Feb 12 00:01:28 Tower kernel: sd 10:0:2:0: [sdk] tag#6275 CDB: opcode=0x85 85 09 0e 00 00 00 02 00 07 00 00 00 00 00 2f 00
Feb 12 00:01:28 Tower kernel: scsi target10:0:2: handle(0x000b), sas_address(0x4433221104000000), phy(4)
Feb 12 00:01:28 Tower kernel: scsi target10:0:2: enclosure logical id(0x500605b00991da10), slot(7)
Feb 12 00:01:32 Tower kernel: sd 10:0:2:0: task abort: SUCCESS scmd(0000000003923bdb)
Feb 12 00:02:08 Tower kernel: sd 10:0:2:0: device_block, handle(0x000b)
Feb 12 00:02:10 Tower kernel: sd 10:0:2:0: device_unblock and setting to running, handle(0x000b)
Feb 12 00:02:10 Tower kernel: sd 10:0:2:0: [sdk] Synchronizing SCSI cache
Feb 12 00:02:10 Tower kernel: sd 10:0:2:0: [sdk] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
Feb 12 00:02:10 Tower kernel: mpt2sas_cm0: removing handle(0x000b), sas_addr(0x4433221104000000)
Feb 12 00:02:10 Tower kernel: mpt2sas_cm0: enclosure logical id(0x500605b00991da10), slot(7)

I would swap that disk with another from the onboard SATA controller to see if it changes anything.

Quote

February 13, 20215 yr

Author

11 hours ago, trurl said:

Nothing is assigned as disk1, is that expected?

Disk2 is disabled and doesn't appear to be connected since there is no SMART report for it.

I had to drop disk1 because it failed a whilte ago and I shrank the array.

Quote

February 13, 20215 yr

Author

5 hours ago, JorgeB said:

You may have a ghost in the machine...

There are signs of the problem earlier:


Feb 12 00:01:28 Tower kernel: sd 10:0:2:0: attempting task abort! scmd(0000000003923bdb)
Feb 12 00:01:28 Tower kernel: sd 10:0:2:0: [sdk] tag#6275 CDB: opcode=0x85 85 09 0e 00 00 00 02 00 07 00 00 00 00 00 2f 00
Feb 12 00:01:28 Tower kernel: scsi target10:0:2: handle(0x000b), sas_address(0x4433221104000000), phy(4)
Feb 12 00:01:28 Tower kernel: scsi target10:0:2: enclosure logical id(0x500605b00991da10), slot(7)
Feb 12 00:01:32 Tower kernel: sd 10:0:2:0: task abort: SUCCESS scmd(0000000003923bdb)
Feb 12 00:02:08 Tower kernel: sd 10:0:2:0: device_block, handle(0x000b)
Feb 12 00:02:10 Tower kernel: sd 10:0:2:0: device_unblock and setting to running, handle(0x000b)
Feb 12 00:02:10 Tower kernel: sd 10:0:2:0: [sdk] Synchronizing SCSI cache
Feb 12 00:02:10 Tower kernel: sd 10:0:2:0: [sdk] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
Feb 12 00:02:10 Tower kernel: mpt2sas_cm0: removing handle(0x000b), sas_addr(0x4433221104000000)
Feb 12 00:02:10 Tower kernel: mpt2sas_cm0: enclosure logical id(0x500605b00991da10), slot(7)

I would swap that disk with another from the onboard SATA controller to see if it changes anything.

Thanks again @JorgeB, given time and case constraints I'll shrink the array for now and investigate further at another time.

PITA, that'll be 3 drives I've had to drop in as many months since my re-casing project 😞

Quote

Yet More Drive Issues - Any ideas?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)