September 11, 2025Sep 11 This all started a week or two ago. I have two disks (one parity, one data), both 18 TB Toshiba drives.First the data drive was disabled and emulated. I switched SATA ports and SATA cables on both drives. Then I rebuilt the data.Now the parity drive is disabled. I'm not sure which disk is having issues, if any. Short SMART tests don't reveal any obvious issues.Attaching diagnostics. Not sure what to do from here? Any help appreciated.istanbul-diagnostics-20250911-1959.zipUnfortunately the diagnostics were created AFTER reboot, because the web GUI started acting strange, like it reloaded the list of disks every 1 second. I couldn't even reboot the system for some reason, I had to shut it off (from the web GUI). Edited September 11, 2025Sep 11 by domidomi
September 12, 2025Sep 12 Community Expert Diags don't show parity getting disabled, so we can't see what happened, but disk1 is showing a pending sector, run an extended SMART test and post new diags
September 17, 2025Sep 17 Author On 9/12/2025 at 8:27 AM, JorgeB said:Diags don't show parity getting disabled, so we can't see what happened, but disk1 is showing a pending sector, run an extended SMART test and post new diagsAttaching new diagnostics. Thanks!istanbul-diagnostics-20250917-1229.zip
September 17, 2025Sep 17 Community Expert SMART test passed, and it's not longer showing pending sectors. Try resyncing parity again; if there are more errors, grab new diags before rebooting.
September 17, 2025Sep 17 Author 5 minutes ago, JorgeB said:SMART test passed, and it's not longer showing pending sectors. Try resyncing parity again; if there are more errors, grab new diags before rebooting.Thanks. Just to be clear though, it seems that the data disk (disk 1) shows read errors, doesn't it? Should I still resync parity?Error 1 [0] occurred at disk power-on lifetime: 26868 hours (1119 days + 12 hours) When the command that caused the error occurred, the device was in standby mode. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 43 00 08 00 00 23 79 e6 a8 40 00 Error: UNC at LBA = 0x2379e6a8 = 595191464 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- -- -- == == == -- -- -- -- -- --------------- -------------------- 60 05 40 00 68 00 00 23 79 fe 00 40 00 00:34:33.558 READ FPDMA QUEUED 60 00 80 00 60 00 00 23 79 fd 80 40 00 00:34:31.486 READ FPDMA QUEUED 60 05 40 00 58 00 00 23 79 f8 40 40 00 00:34:31.486 READ FPDMA QUEUED 60 05 40 00 50 00 00 23 79 f3 00 40 00 00:34:31.483 READ FPDMA QUEUED 60 05 40 00 48 00 00 23 79 ed c0 40 00 00:34:31.480 READ FPDMA QUEUEDSMART Extended Self-test Log Version: 1 (1 sectors)Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error# 1 Extended offline Completed: read failure 10% 26952 595191464# 2 Short offline Completed without error 00% 26868 -# 3 Short offline Completed without error 00% 26509 -# 4 Short offline Completed without error 00% 26429 -# 5 Short offline Completed without error 00% 16895 -# 6 Short offline Completed without error 00% 10394 -# 7 Short offline Completed without error 00% 5341 -# 8 Short offline Completed without error 00% 4876 -# 9 Short offline Interrupted (host reset) 50% 4876 -#10 Short offline Completed without error 00% 235 -
September 17, 2025Sep 17 Community Expert 5 minutes ago, domidomi said:Just to be clear though, it seems that the data disk (disk 1)Sorry, I didn't read back far enough and got tricked by the thread title; the problem was disk1, not parity, and that one failed the SMART test.See if the parity disk mounts with UD, since there's only a data disk, it will be a mirror.
September 17, 2025Sep 17 Author 1 minute ago, JorgeB said:Sorry, I didn't read back far enough; the problem was disk1, not parity, and that one failed the SMART test.See if the parity disk mounts with UD, since there's only a data disk, it will be a mirror.No worries. Sorry, but I don't understand your last sentence. What exactly is UD and how do I check if the parity disk mounts with UD?
September 17, 2025Sep 17 Author Assuming that UD means the "Unassigned Devices" application, should I now stop the array, then unassign the parity disk, and try to mount it using UD? If so, should I start the array (with no parity) before doing anything with UD?
September 17, 2025Sep 17 Community Expert 2 hours ago, domidomi said:should I now stop the array, then unassign the parity disk, and try to mount it using UDCorrect.If the disk mounts and contents looks OK, you can do a new config (Tools - New config) and assign old parity as disk1, and you can then add a new parity later.
September 17, 2025Sep 17 Author 1 hour ago, JorgeB said:Correct.If the disk mounts and contents looks OK, you can do a new config (Tools - New config) and assign old parity as disk1, and you can then add a new parity later.It seems to me that contents does NOT look OK, in the sense that files that have been added to the array since August 26th do not appear on the parity disk when mounted using UD. Thanks for your help and please advise me on how to proceed from here?EDIT: I got unsure if I actually BOTH replaced the cables AND the ports as I reported in my initial post - I have now definitely replaced both and I'm running a read-check again. It will take about 18 hours. Still not sure what to do.EDIT 2: Only 2.3% into the read check there's already 77 read errors detected. Not sure what it means for me in this situation. I'm still unsure which of the two disks is actually failing. Edited September 17, 2025Sep 17 by domidomi
September 17, 2025Sep 17 Community Expert If the data from parity is incomplete, I would recommend ddrescue to clone old disk1 to a new disk; that will be the way to try and recover as much data as possible.
September 17, 2025Sep 17 Author 50 minutes ago, JorgeB said:If the data from parity is incomplete, I would recommend ddrescue to clone old disk1 to a new disk; that will be the way to try and recover as much data as possible.Thanks again for your continued support! So if I understand correctly:1. Install a new disk2. ddrescue -f /dev/failing /dev/replacement /boot/ddrescue.log3. Set the replacement drive as the NEW PARITY disk4. Set the OLD PARITY disk as the NEW DATA disk5. Rebuild the NEW DATA disk (previously parity disk) from the NEW PARITY disk (replacement disk)Please let me know if this is correct?
September 18, 2025Sep 18 Community Expert Solution 12 hours ago, domidomi said:Set the replacement drive as the NEW PARITY diskNope, used the cloned disk as new disk1.
September 18, 2025Sep 18 Author The ddrescue command failed after a while, the replacement disk disappeared somehow. No idea why or what happened. /dev/sdc just disappeared. It doesn't show up in the terminal or in the UnRAID GUI.root@istanbul:~# ddrescue -f /dev/sdb /dev/sdc /boot/ddrescue.log 17:38:50 [162/162]GNU ddrescue 1.29.1Press Ctrl-C to interruptipos: 2640 GB, non-trimmed: 45056 B, current rate: 122 MB/sopos: 2640 GB, non-scraped: 0 B, average rate: 265 MB/snon-tried: 15359 GB, bad-sector: 0 B, error rate: 0 B/srescued: 2640 GB, bad areas: 0, run time: 2h 45m 55spct rescued: 14.66%, read errors: 1, remaining time: 21h 27mtime since last successful read: n/aCopying non-tried blocks... Pass 1 (forwards)ddrescue: /dev/sdc: Write error: Invalid argument (errno=22)root@istanbul:~# cat /boot/ddrescue.log# Mapfile. Created by GNU ddrescue version 1.29.1# Command line: ddrescue -f /dev/sdb /dev/sdc /boot/ddrescue.log# Start time: 2025-09-18 09:43:52# Current time: 2025-09-18 12:29:47# Copying non-tried blocks... Pass 1 (forwards)# current_pos current_status current_pass0x266CC6D0000 ? 1# pos size status0x00000000 0x46F3CD5000 +0x46F3CD5000 0x0000B000 *0x46F3CE0000 0x0ABB0000 ?0x46FE890000 0x21FCDE40000 +0x266CC6D0000 0xDF833930000 ?Attaching new diagnostics.istanbul-diagnostics-20250918-1740.zip Edited September 18, 2025Sep 18 by domidomi
September 18, 2025Sep 18 Community Expert Disk may have dropped offline; try again using the same log file and it will resume from where it was.
September 18, 2025Sep 18 Author I tried it again (with no changes, no reboot), and the following happened:root@istanbul:/dev# ddrescue -f /dev/sdb /dev/sdc /boot/ddrescue.logGNU ddrescue 1.29.1Press Ctrl-C to interruptInitial status (read from mapfile)rescued: 2640 GB, tried: 45056 B, bad-sector: 0 B, bad areas: 0Current status ipos: 2640 GB, non-trimmed: 45056 B, current rate: 0 B/s opos: 2640 GB, non-scraped: 0 B, average rate: 0 B/snon-tried: 15359 GB, bad-sector: 0 B, error rate: 0 B/s rescued: 2640 GB, bad areas: 0, run time: 0spct rescued: 14.66%, read errors: 0, remaining time: n/a time since last successful read: n/aCopying non-tried blocks... Pass 1 (forwards)ddrescue: /dev/sdc: Write error: No space left on device (errno=28)Looking under /dev/disk/by-id, the disk doesn't even appear, not sure why the disk would drop offline like that, but I shut down the system, re-seated the disk with yet another new cable in a new SATA port, and now I'm running ddrescue in tmux again. It looks like it was able to continue where it left off.Is it "safer" (less chance of corruption) to restart ddrescue from zero again, or does ddrescue have safeguards against corruption when continuing from the log file?
September 18, 2025Sep 18 Author Ugh, I'm not sure anymore which ports I've tried, which cables I've tried, but I started getting UDMA CRC issues again, so ultimately I removed the log file and I'm now trying to restart the process with this command:# ddrescue -n -v -f /dev/sdd /dev/sdb /boot/ddrescue.logHopefully it will actually finish the process this time, and if not, at least I know what to keep track of and hopefully make this process more structured.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.