[6.8.3] Disk 7 failed, after reboot disk 1 missing so can't start server


mrbens

Recommended Posts

Hello, in need of assistance please.

 

Attached diagnostics from time of issue before reboot and today.

 

Had an error pop up saying disk 7 failed.

 

Saved diagnostic logs and rebooted. Plan was to try a preclear of disk 7 to see if I could reintroduce it again.

 

Syslog is full of disk 1 errors but can't see anything for disk 7.

 

Disk 1 then showed missing after reboot and has been missing since so can't start the server or try rebuild either disk.

 

2071644359_1disk1missing.thumb.jpg.997dcb2e4e19a8ffaafcfcf272493a43.jpg

 

Further reboot showed disk 3 also missing but that's been showing up ok again since that one time.

 

1337770537_2disk13missing.thumb.jpg.3d5b4f3c05c025ae9dadd5e587a33170.jpg

 

Struggling to get the server to see disk 1. Clicking 'no device' doesn't show any disks in the drop down menu.

 

Swapped round SATA cables between disk 1 and disk 2 (both go into same 2 port SATA card) but disk 1 still shows missing.

 

Swapped round SATA cables between disk 1 and parity (parity goes straight into motherboard) and disk 1 still shows missing. 

 

Replaced the 4 port SATA power splitter cable that goes into parity, disk 1, 2 & 3 and disk 1 still shows missing.

 

Swapped PSU SATA power cable that goes into the 4x SATA splitter with another splitter but disk 1 still missing.

 

Connected a PSU SATA power cable without a splitter to disk 1 and disk 1 still missing.

 

Disk 7 SMART data:

 

1004945758_3disk7SMART.thumb.jpg.8913b5f6fc11f472bb9c9da88f4e53e5.jpg

 

Disk 7 SMART overall-health: Passed

 

When the server powers up there is a hard disk clicking sound for a few seconds which is probably from disk 1 as it happens when disk 7 power cable is removed before booting.

 

I wonder if disk 7 might be OK to reintroduce somehow so I can rebuild disk 1?

 

Is there a way to try recover from this without losing all the data on both disks please?

 

Thanks,

Ben

tower-diagnostics-20210816-0608 [before reboot].zip tower-diagnostics-20210911-1400 [latest info].zip

Edited by mrbens
Link to comment
6 hours ago, JorgeB said:

Disk1 is likely dead and disk7 appears to be failing, if that's the case you can't rebuild disk1 with single parity, still and since there's no SMART report for disk7 can't see if a SMART test was run or not, try to run an extended test to confirm if the disk really failed.

Thanks for the reply. The SMART test fails at 10%. Tried twice. Attached the output.

 

Short test also won't run and says "Errors occurred - Check SMART report".

tower-smart-20210912-1530.zip

Link to comment
  • 3 months later...

I'm going to try ddrescue on the disk 7 that was previously disabled due to errors. Current SMART report:

 

SMART.thumb.PNG.f2d4b47f0f78c6abafacff6fc1640e52.PNG

 

It doesn't show a file system and mount is grayed out. When the array is stopped it does show the disk as being available to re-add to the array but I guess that's not a good idea.

 

Unassigned.thumb.PNG.6b4c6bd51c03f192539103bd77152616.PNG

 

The other unassigned disk sdl is a spare previously used in the array that I rebuilt with a larger disk and still has the data on. What's the best way to try ddrescue of sdd onto sdl please?

 

Should I preclear sdl and add it to the array to do it that way to save having to copy the recovered data over or better to leave it unassigned without preclearing and copy any data it can recover onto the array manually after?

 

If leaving unmounted is this definitely the correct command:

ddrescue -f /dev/sdd /dev/sdl /boot/ddrescue.log

 

Thanks

Edited by mrbens
Link to comment

Thanks again. I got this far before it stopped:

 

root@Tower:~# ddrescue -f /dev/sdd /dev/sdl /boot/ddrescue.log
GNU ddrescue 1.23
Press Ctrl-C to interrupt
     ipos:    1327 GB, non-trimmed:   58475 MB,  current rate:       0 B/s
     opos:    1327 GB, non-scraped:        0 B,  average rate:  53796 kB/s
non-tried:    1672 GB,  bad-sector:        0 B,    error rate:   2370 MB/s
  rescued:    1269 GB,   bad areas:        0,        run time:  6h 33m 12s
pct rescued:   42.29%, read errors:   892328,  remaining time:         n/a
                              time since last successful read:         39s
Copying non-tried blocks... Pass 5 (forwards)
ddrescue: Input file disappeared: No such file or directory
 

Both disks are still showing under Unassigned Devices.

 

Should I try running the same command again to see if it resumes?

Link to comment
6 minutes ago, mrbens said:

Input file disappeared

This means the disk dropped.

 

6 minutes ago, mrbens said:

Both disks are still showing under Unassigned Devices.

With the same identifier?

 

7 minutes ago, mrbens said:

Should I try running the same command again to see if it resumes?

It will resume if you use the same log file.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.