Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

(Solved) Failed Disk (remove?)

Featured Replies

My Server has become unstable (crashed and parity checks lead to more crashes). After it come back again, I had a failed disk.

The disk looks "unmountable". As I have plenty of space (50% usage) I'm wondering if I should remove the disk, try to put it back into the array or do something else.

Is there a way to find out how "dead" the disk is?

knowlage-diagnostics-20190426-1816.zip

Edited by Jaster

  • Community Expert

There are a few relatively recent UNC @ LBA errors, you should run an extended SMART test.

  • Author

It's a very old 2TB disk, I guess it's time to remove it.

Is there a way to remove it and have the array re allocate the missing data somehwere else (as said, I do have plenty of space left)?

  • Community Expert

Not automatically, either check filesystem on the emulated disk and move the data to other disks, or mount the old disk with UD and copy to the array after doing a new config and re-syncing parity.

  • Author

 

5 minutes ago, johnnie.black said:

Not automatically, either check filesystem on the emulated disk and move the data to other disks

I do this by...? Hit the "check" button and then use unbalance?

  • Community Expert

Then you'll still need to do a new config and re-sync parity without that disk, though this doesn't bode well:

26 minutes ago, Jaster said:

crashed and parity checks lead to more crashes

 

  • Author

So what do you suggest?

  • Community Expert

I would first try to find out why the server is crashing, run memtest, check cooling, power supply, etc.

  • Author

It seems to be the disk, everything else is doing well. So I either replace or remove it. I'm sure it won't pass another parity run as I tried that a couple of times.

  • Community Expert

Bad disk shouldn't make Unraid crash, and it doesn't look bad, still move the data first, remove the disk and then try to re-sync parity, if it's still crashes it wasn't the disk.

  • Author
#	Attribute Name	Flag	Value	Worst	Threshold	Type	Updated	Failed	Raw Value
1	Raw read error rate	0x002f	200	200	051	Pre-fail	Always	Never	0
3	Spin up time	0x0027	176	174	021	Pre-fail	Always	Never	4158
4	Start stop count	0x0032	097	097	000	Old age	Always	Never	3449
5	Reallocated sector count	0x0033	200	200	140	Pre-fail	Always	Never	0
7	Seek error rate	0x002e	200	200	000	Old age	Always	Never	0
9	Power on hours	0x0032	040	040	000	Old age	Always	Never	43936 (5y, 4d, 16h)
10	Spin retry count	0x0032	100	100	000	Old age	Always	Never	0
11	Calibration retry count	0x0032	100	100	000	Old age	Always	Never	0
12	Power cycle count	0x0032	100	100	000	Old age	Always	Never	257
192	Power-off retract count	0x0032	200	200	000	Old age	Always	Never	153
193	Load cycle count	0x0032	199	199	000	Old age	Always	Never	3295
194	Temperature celsius	0x0022	117	088	000	Old age	Always	Never	30
196	Reallocated event count	0x0032	200	200	000	Old age	Always	Never	0
197	Current pending sector	0x0032	200	200	000	Old age	Always	Never	0
198	Offline uncorrectable	0x0030	100	253	000	Old age	Offline	Never	0
199	UDMA CRC error count	0x0032	200	200	000	Old age	Always	Never	0
200	Multi zone error rate	0x0008	200	200	000	Old age	Offline	Never	0

I'm running the check now, let's see what happens. If it passes, I'll try to reset the config and run a parity check.

Edited by Jaster

  • Community Expert

I already saw that on the diags.

  • Author
1 minute ago, johnnie.black said:

I already saw that on the diags.


Phase 1 - find and verify superblock...
        - block cache size set to 3062096 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 8 tail block 4
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
agf_freeblks 122094651, counted 122094649 in ag 3
agf_freeblks 94873563, counted 94873565 in ag 1
agf_freeblks 121856187, counted 121856185 in ag 2
agi_freecount 1, counted 0 in ag 3
agi_freecount 1, counted 0 in ag 3 finobt
agi_freecount 1, counted 0 in ag 1
agi_freecount 1, counted 0 in ag 1 finobt
inode chunk claims untracked block, finobt block - agno 2, bno 3901920, inopb 8
inode chunk claims untracked block, finobt block - agno 2, bno 3901921, inopb 8
inode chunk claims untracked block, finobt block - agno 2, bno 3901922, inopb 8
inode chunk claims untracked block, finobt block - agno 2, bno 3901923, inopb 8
inode chunk claims untracked block, finobt block - agno 2, bno 3901924, inopb 8
inode chunk claims untracked block, finobt block - agno 2, bno 3901925, inopb 8
inode chunk claims untracked block, finobt block - agno 2, bno 3901926, inopb 8
inode chunk claims untracked block, finobt block - agno 2, bno 3901927, inopb 8
undiscovered finobt record, ino 2178699008 (2/31215360)
finobt ir_freecount/free mismatch, inode chunk 2/31215360, freecount 30 nfree 32
invalid inode count, inode chunk 2/31215360, count 0 ninodes 64
undiscovered finobt record, ino 2147483712 (2/64)
finobt ir_freecount/free mismatch, inode chunk 2/64, freecount 54 nfree 24
invalid inode count, inode chunk 2/64, count 0 ninodes 64
undiscovered finobt record, ino 2147484608 (2/960)
finobt ir_freecount/free mismatch, inode chunk 2/960, freecount 6 nfree 28
invalid inode count, inode chunk 2/960, count 0 ninodes 64
agi_freecount 1, counted 0 in ag 2
agi_freecount 1, counted 90 in ag 2 finobt
sb_ifree 9, counted 6
sb_fdblocks 338655161, counted 339149767
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
found inodes not in the inode allocation tree
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 1
        - agno = 2
No modify flag set, skipping phase 5
Inode allocation btrees are too corrupted, skipping phases 6 and 7
Maximum metadata LSN (1:3751) is ahead of log (1:8).
Would format log to cycle 4.
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Fri Apr 26 19:06:04 2019

Phase		Start		End		Duration
Phase 1:	04/26 19:06:03	04/26 19:06:04	1 second
Phase 2:	04/26 19:06:04	04/26 19:06:04
Phase 3:	04/26 19:06:04	04/26 19:06:04
Phase 4:	04/26 19:06:04	04/26 19:06:04
Phase 5:	Skipped
Phase 6:	Skipped
Phase 7:	Skipped

Total run time: 1 second

run xfs_repair with -L?

  • Community Expert

First without -n, and if it still asks for it, and likely it will, use -L.

  • Author

I think I screwed it up (a bit), I copy/pased repair with drive md1 inseard of md4.

I cancled ([ctrl]+[C]) and as everything looked fine I went on and fixed 4.

After I made a new config, it told me disk1 is unmounable. Trying to stop the array, it "hangs" with  Array Stopping•Retry unmounting disk share(s)... argh.

 

  • Community Expert

disk1 should be fixable with xfs_repair, new config should be done only after you copy disk4's data.

  • Author

I can't get into xfs_repair as I can't get the array into maintaiance.

As d4 was repaired, I'll do a new config and include it in order to run a parity check and hope. If it works, I'll unbalance all data off d4 and remove it.

  • Community Expert

Only the file system was repaired (and it was the emulated disk filesystem, not the actual disk), it won't make any difference for a parity check, or if it crashes or not, though like I said I doubt it's disk related, still if you plan to remove disk4 no point in doing a new config with it.

Edited by johnnie.black

  • Author

I got all disk put back and try to run a parity check. Lets see what it does...

Is there anything I can enable for some kind of "extended" monitoring?

  • Community Expert

System notifications are enough to monitor usual disk warning signs.

  • Author
Apr 27 20:19:16 tower kernel: ata3: hard resetting link
Apr 27 20:19:16 tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 27 20:19:16 tower kernel: ata3.00: supports DRM functions and may not be fully accessible
Apr 27 20:19:16 tower kernel: ata3.00: NCQ Send/Recv Log not supported
Apr 27 20:19:16 tower kernel: ata3.00: supports DRM functions and may not be fully accessible
Apr 27 20:19:16 tower kernel: ata3.00: NCQ Send/Recv Log not supported
Apr 27 20:19:16 tower kernel: ata3.00: configured for UDMA/133
Apr 27 20:19:16 tower kernel: ata3: EH complete
Apr 27 20:19:16 tower kernel: ata3.00: exception Emask 0x10 SAct 0x200 SErr 0x400100 action 0x6 frozen
Apr 27 20:19:16 tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error
Apr 27 20:19:16 tower kernel: ata3: SError: { UnrecovData Handshk }
Apr 27 20:19:16 tower kernel: ata3.00: failed command: WRITE FPDMA QUEUED
Apr 27 20:19:16 tower kernel: ata3.00: cmd 61/80:48:40:2f:76/00:00:03:00:00/40 tag 9 ncq dma 65536 out
Apr 27 20:19:16 tower kernel: res 40/00:48:40:2f:76/00:00:03:00:00/40 Emask 0x10 (ATA bus error)
Apr 27 20:19:16 tower kernel: ata3.00: status: { DRDY }
Apr 27 20:19:16 tower kernel: ata3: hard resetting link
Apr 27 20:19:17 tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 27 20:19:17 tower kernel: ata3.00: supports DRM functions and may not be fully accessible
Apr 27 20:19:17 tower kernel: ata3.00: NCQ Send/Recv Log not supported
Apr 27 20:19:17 tower kernel: ata3.00: supports DRM functions and may not be fully accessible
Apr 27 20:19:17 tower kernel: ata3.00: NCQ Send/Recv Log not supported
Apr 27 20:19:17 tower kernel: ata3.00: configured for UDMA/133
Apr 27 20:19:17 tower kernel: ata3: EH complete
Apr 27 20:19:18 tower kernel: ata3.00: exception Emask 0x10 SAct 0x4 SErr 0x400100 action 0x6 frozen
Apr 27 20:19:18 tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error
Apr 27 20:19:18 tower kernel: ata3: SError: { UnrecovData Handshk }
Apr 27 20:19:18 tower kernel: ata3.00: failed command: WRITE FPDMA QUEUED
Apr 27 20:19:18 tower kernel: ata3.00: cmd 61/80:10:40:cc:eb/00:00:02:00:00/40 tag 2 ncq dma 65536 out
Apr 27 20:19:18 tower kernel: res 40/00:10:40:cc:eb/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
Apr 27 20:19:18 tower kernel: ata3.00: status: { DRDY }
Apr 27 20:19:18 tower kernel: ata3: hard resetting link
Apr 27 20:19:18 tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 27 20:19:18 tower kernel: ata3.00: supports DRM functions and may not be fully accessible
Apr 27 20:19:18 tower kernel: ata3.00: NCQ Send/Recv Log not supported
Apr 27 20:19:18 tower kernel: ata3.00: supports DRM functions and may not be fully accessible
Apr 27 20:19:18 tower kernel: ata3.00: NCQ Send/Recv Log not supported
Apr 27 20:19:18 tower kernel: ata3.00: configured for UDMA/133
Apr 27 20:19:18 tower kernel: ata3: EH complete

Array is back and party seems to be valid, but I do get some errors... how can I dig deeper?

knowlage-diagnostics-20190427-2138.zip

  • Community Expert

ata3 is the SSD, replaces cables, Samsung SSDs are particularity pick with cable quality.

  • Author

Ok, thanks.

Where can I see the ata/port/drive mapping?

  • Community Expert

On the syslog, search for the ata#, you can also click on the little disk icon next to each disk on the main page to see that device's related log info.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.