aurevo Posted February 18, 2020 Share Posted February 18, 2020 (edited) Hello, after I changed the processor and mainboard and put the server back into operation, a hard disk was not available. As far as I saw, the SMART values were ok, but I wanted to remove the hard disk anyway, because a first rebuild attempt with the same hard disk was much too slow. I created a new configuration and now the rebuild is running, but the speed is absolutely not ok: Total size: 8 TB Elapsed time: 1 minute Current position: 40.8 MB (0.0 %) Estimated speed: 429.1 KB/sec Estimated finish: 215 days, 6 hours, 57 minutes Parity-Sync/Data-Rebuild in the past was much faster and this values are looking like an error or another reason. First log before removing disk, second after removing and starting rebuild. Can anyone see anything in the logs? tower-diagnostics-20200218-2321.zip tower-diagnostics-20200218-2022.zip Edited February 18, 2020 by aurevo Quote Link to comment
JorgeB Posted February 19, 2020 Share Posted February 19, 2020 You're using a controller with a SATA port multiplier (or it's connected to one) and these are not recommended for various reasons, in this case it keeps resetting the disks: Feb 18 23:22:33 Tower kernel: ata10.00: failed to read SCR 1 (Emask=0x40) Feb 18 23:22:33 Tower kernel: ata10.00: failed to read SCR 0 (Emask=0x40) Feb 18 23:22:33 Tower kernel: ata10.01: failed to read SCR 1 (Emask=0x40) Feb 18 23:22:33 Tower kernel: ata10.02: failed to read SCR 1 (Emask=0x40) Feb 18 23:22:33 Tower kernel: ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 18 23:22:33 Tower kernel: ata10.00: failed command: READ DMA EXT Feb 18 23:22:33 Tower kernel: ata10.00: cmd 25/00:40:80:76:04/00:05:00:00:00/e0 tag 10 dma 688128 in Feb 18 23:22:33 Tower kernel: res 50/00:00:3f:86:04/00:00:00:00:00/e0 Emask 0x4 (timeout) Feb 18 23:22:33 Tower kernel: ata10.00: status: { DRDY } Feb 18 23:22:34 Tower kernel: ata10.15: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 18 23:22:34 Tower kernel: ata10.00: hard resetting link Feb 18 23:22:35 Tower kernel: ahci 0000:01:00.0: FBS is disabled Feb 18 23:22:35 Tower kernel: ahci 0000:01:00.0: FBS is enabled Feb 18 23:22:35 Tower kernel: ata10.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 18 23:22:35 Tower kernel: ahci 0000:01:00.0: FBS is disabled Feb 18 23:22:35 Tower kernel: ahci 0000:01:00.0: FBS is enabled Feb 18 23:22:35 Tower kernel: ata10.01: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 18 23:22:36 Tower kernel: ahci 0000:01:00.0: FBS is disabled Feb 18 23:22:36 Tower kernel: ahci 0000:01:00.0: FBS is enabled Feb 18 23:22:36 Tower kernel: ata10.02: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 18 23:22:36 Tower kernel: ata10.00: configured for UDMA/33 Feb 18 23:22:36 Tower kernel: ata10.01: configured for UDMA/133 Feb 18 23:22:36 Tower kernel: ata10.02: configured for UDMA/133 Feb 18 23:22:36 Tower kernel: ata10: EH complete Quote Link to comment
aurevo Posted February 20, 2020 Author Share Posted February 20, 2020 On 2/19/2020 at 8:33 AM, johnnie.black said: You're using a controller with a SATA port multiplier (or it's connected to one) and these are not recommended for various reasons, in this case it keeps resetting the disks: Feb 18 23:22:33 Tower kernel: ata10.00: failed to read SCR 1 (Emask=0x40) Feb 18 23:22:33 Tower kernel: ata10.00: failed to read SCR 0 (Emask=0x40) Feb 18 23:22:33 Tower kernel: ata10.01: failed to read SCR 1 (Emask=0x40) Feb 18 23:22:33 Tower kernel: ata10.02: failed to read SCR 1 (Emask=0x40) Feb 18 23:22:33 Tower kernel: ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 18 23:22:33 Tower kernel: ata10.00: failed command: READ DMA EXT Feb 18 23:22:33 Tower kernel: ata10.00: cmd 25/00:40:80:76:04/00:05:00:00:00/e0 tag 10 dma 688128 in Feb 18 23:22:33 Tower kernel: res 50/00:00:3f:86:04/00:00:00:00:00/e0 Emask 0x4 (timeout) Feb 18 23:22:33 Tower kernel: ata10.00: status: { DRDY } Feb 18 23:22:34 Tower kernel: ata10.15: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 18 23:22:34 Tower kernel: ata10.00: hard resetting link Feb 18 23:22:35 Tower kernel: ahci 0000:01:00.0: FBS is disabled Feb 18 23:22:35 Tower kernel: ahci 0000:01:00.0: FBS is enabled Feb 18 23:22:35 Tower kernel: ata10.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 18 23:22:35 Tower kernel: ahci 0000:01:00.0: FBS is disabled Feb 18 23:22:35 Tower kernel: ahci 0000:01:00.0: FBS is enabled Feb 18 23:22:35 Tower kernel: ata10.01: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 18 23:22:36 Tower kernel: ahci 0000:01:00.0: FBS is disabled Feb 18 23:22:36 Tower kernel: ahci 0000:01:00.0: FBS is enabled Feb 18 23:22:36 Tower kernel: ata10.02: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 18 23:22:36 Tower kernel: ata10.00: configured for UDMA/33 Feb 18 23:22:36 Tower kernel: ata10.01: configured for UDMA/133 Feb 18 23:22:36 Tower kernel: ata10.02: configured for UDMA/133 Feb 18 23:22:36 Tower kernel: ata10: EH complete This controller worked for several weeks and months without such problems, so why the problems start now? So at the moment it is not any of the disks? Quote Link to comment
JorgeB Posted February 20, 2020 Share Posted February 20, 2020 2 minutes ago, aurevo said: so why the problems start now? Don't know, just know that those problems are typical when using port multipliers. Quote Link to comment
aurevo Posted February 20, 2020 Author Share Posted February 20, 2020 4 hours ago, johnnie.black said: Don't know, just know that those problems are typical when using port multipliers. Do you have any other solutions? It really worked for months without such problems, the controller can't start now, when I just changed the mainboard, I see the problems there. Quote Link to comment
JorgeB Posted February 20, 2020 Share Posted February 20, 2020 9 minutes ago, aurevo said: Do you have any other solutions? My solution would be to replace the known problematic controller, other than that can't help. Quote Link to comment
aurevo Posted February 26, 2020 Author Share Posted February 26, 2020 On 2/20/2020 at 6:50 PM, johnnie.black said: My solution would be to replace the known problematic controller, other than that can't help. Even though I absolutely can't understand why the controller worked for months and now suddenly doesn't, I swapped it for a Fujitsu D2607-8i and crossflashd it to LSI 9211-8i. Now I have the next problem. The hard disks were all recognized again and I wanted to start the array to do a rebuild / parity sync. Now it shows me an unmountable filesystem for disk 3. Is there anything I can do here, or have I lost data again thanks to Unraid? tower-diagnostics-20200226-1508.zip Quote Link to comment
JorgeB Posted February 26, 2020 Share Posted February 26, 2020 There's a problem with disk3: Feb 26 15:06:30 Tower kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Feb 26 15:06:30 Tower kernel: sd 1:0:6:0: [sdl] tag#1698 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Feb 26 15:06:30 Tower kernel: sd 1:0:6:0: [sdl] tag#1698 Sense Key : 0x2 [current] Feb 26 15:06:30 Tower kernel: sd 1:0:6:0: [sdl] tag#1698 ASC=0x4 ASCQ=0x0 Feb 26 15:06:30 Tower kernel: sd 1:0:6:0: [sdl] tag#1698 CDB: opcode=0x88 88 00 00 00 00 00 ae a8 a8 58 00 00 01 08 00 00 Feb 26 15:06:30 Tower kernel: print_req_error: I/O error, dev sdl, sector 2930288728 Feb 26 15:06:30 Tower kernel: md: disk3 read error, sector=2930288664 Feb 26 15:06:30 Tower kernel: md: disk3 read error, sector=2930288672 Feb 26 15:06:30 Tower kernel: md: disk3 read error, sector=2930288680 Feb 26 15:06:30 Tower kernel: md: disk3 read error, sector=2930288688 Feb 26 15:06:30 Tower kernel: md: disk3 read error, sector=2930288696 Feb 26 15:06:30 Tower kernel: md: disk3 read error, sector=2930288704 Start by replacing/swapping cables. Quote Link to comment
JorgeB Posted February 26, 2020 Share Posted February 26, 2020 Looking again at the original diags it's possible, even probable, that disk3 is really having issues, but before and because of the port multiplier the problem appeared to be on multiple disks, all the ones connected to the same port. Quote Link to comment
aurevo Posted February 26, 2020 Author Share Posted February 26, 2020 58 minutes ago, johnnie.black said: Looking again at the original diags it's possible, even probable, that disk3 is really having issues, but before and because of the port multiplier the problem appeared to be on multiple disks, all the ones connected to the same port. Changed cable and port to internal port on mainboard. Total size:8 TB Elapsed time:less than a minute Current position:22.0 MB (0.0 %) Estimated speed:759.3 KB/sec Estimated finish:121 days, 9 hours, 38 minutes Now it is incredible slow again, but filesystem can be read again. Quote Link to comment
JorgeB Posted February 26, 2020 Share Posted February 26, 2020 Please post new diags. Quote Link to comment
aurevo Posted February 26, 2020 Author Share Posted February 26, 2020 1 hour ago, johnnie.black said: Please post new diags. Sure, I forgot tower-diagnostics-20200226-1821.zip Quote Link to comment
JorgeB Posted February 26, 2020 Share Posted February 26, 2020 These are just after a reboot with the array stopped, start the array and the parity sync then post new ones. Quote Link to comment
aurevo Posted February 26, 2020 Author Share Posted February 26, 2020 1 hour ago, johnnie.black said: These are just after a reboot with the array stopped, start the array and the parity sync then post new ones. tower-diagnostics-20200226-1955.zip Quote Link to comment
JorgeB Posted February 26, 2020 Share Posted February 26, 2020 Again issues with disk3, since you've already used a different controller/SATA cable try also a different power cable, if issues persist it's likely a disk problem, but because parity is invalid you can't rebuild it, see if you can copy any important data to another disk, failing that try ddrescue. Quote Link to comment
aurevo Posted February 26, 2020 Author Share Posted February 26, 2020 1 hour ago, johnnie.black said: Again issues with disk3, since you've already used a different controller/SATA cable try also a different power cable, if issues persist it's likely a disk problem, but because parity is invalid you can't rebuild it, see if you can copy any important data to another disk, failing that try ddrescue. Also changed power cable. Copy started with good speed, get lower and abort via Krusader. Interesting, that the disk was completely not recognized on the Fujitsu Controller and now is recognized, but partially defect. Does the ddrescue also works on encrypted volumes? And any way I need another disk with same space or more, right? Quote Link to comment
JorgeB Posted February 27, 2020 Share Posted February 27, 2020 11 hours ago, aurevo said: Interesting, that the disk was completely not recognized on the Fujitsu Controller and now is recognized, It was recognized, but it gave errors immediately, it didn't even mount, different drivers can behave differently when there are issues. 11 hours ago, aurevo said: Does the ddrescue also works on encrypted volumes? And any way I need another disk with same space or more, right? Yes to both. Quote Link to comment
aurevo Posted February 29, 2020 Author Share Posted February 29, 2020 On 2/27/2020 at 8:27 AM, johnnie.black said: It was recognized, but it gave errors immediately, it didn't even mount, different drivers can behave differently when there are issues. Yes to both. I ask one more time before I make a mistake and the data is finally lost: My defective disk is part of the array, my new disk isn't. The disks are not mounted and I use ddrescue as follows: ddrescue -f /dev/sdX /dev/sdY /boot/ddrescue.log How do I decrypt the data, or is the hard disk copied encrypted to the new hard disk? And which hard disk is my source and which destination? Quote Link to comment
JorgeB Posted March 2, 2020 Share Posted March 2, 2020 On 2/29/2020 at 3:42 PM, aurevo said: How do I decrypt the data, or is the hard disk copied encrypted to the new hard disk? dd/ddrescue is a bit by bit copy, including encryption. On 2/29/2020 at 3:42 PM, aurevo said: And which hard disk is my source and which destination? Source first, then destination. Quote Link to comment
aurevo Posted March 2, 2020 Author Share Posted March 2, 2020 3 hours ago, johnnie.black said: dd/ddrescue is a bit by bit copy, including encryption. Source first, then destination. Yes, from source to destination. But how too find the right /dev/sdX? Quote Link to comment
aurevo Posted March 3, 2020 Author Share Posted March 3, 2020 (edited) 24 minutes ago, johnnie.black said: On the GUI's main page. So it is running now for at least six hours. If it finishes, how to show UnRAID to take the new cloned disk as the old one? Do I assign the new hard disk to the old slot, or can I leave it in the new slot and UnRAID notices that the old data is there? If the SSH connection terminates unexpectedly, is ddrescue still running in the background or do I need to restart the process? And is it then aborted, or does it continue at the old location? Edited March 3, 2020 by aurevo Quote Link to comment
JorgeB Posted March 3, 2020 Share Posted March 3, 2020 25 minutes ago, aurevo said: If it finishes, how to show UnRAID to take the new cloned disk as the old one? Tool -> new config -> keep all assignments Then unassign old disk and assign new one in its place (new disk need to be same capacity as the old one to be used in the array) Check "parity is already valid" before stating array (but if there are errors on ddrescue a parity check will be needed) 25 minutes ago, aurevo said: If the SSH connection terminates unexpectedly, is ddrescue still running in the background or do I need to restart the process? You'll need to restart, using "screen" can get around that. Quote Link to comment
aurevo Posted March 3, 2020 Author Share Posted March 3, 2020 (edited) 2 hours ago, johnnie.black said: Tool -> new config -> keep all assignments Then unassign old disk and assign new one in its place (new disk need to be same capacity as the old one to be used in the array) Check "parity is already valid" before stating array (but if there are errors on ddrescue a parity check will be needed) You'll need to restart, using "screen" can get around that. Old one was 4TB, new one 6TB, so I only can mount it and copy over data or can it be same size or bigger? Parity check / rebuild is also needed because the parity was not given, that was the reason to use ddrescue. Screen is not installed. Can I use the one from the Nerd Pack? Edited March 3, 2020 by aurevo Quote Link to comment
JorgeB Posted March 3, 2020 Share Posted March 3, 2020 You can still mount it with UD then copy data to the array, Unraid won't accept it since partition won't be using the full disk, you could sync parity with the disk unmountable then rebuild, partition would be correctly recreated during that, but it will take longer. 4 minutes ago, aurevo said: Screen is not installed. How to install it? Install the nerdpack plugin, you can then installed screen there. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.