trurl Posted March 4, 2020 Share Posted March 4, 2020 1 hour ago, aurevo said: Yes, then I actually had a wrong understanding of how it works. So what did you think? It might be worth giving you a better understanding. Misunderstandings can lead to costly mistakes. Quote Link to comment
aurevo Posted March 4, 2020 Author Share Posted March 4, 2020 18 minutes ago, johnnie.black said: Looks more like a connection/power issue, but it can be the controller. I used the same power cables that were working for several months without problems. The SAS-SATA cable are brand new and one of them I already replaced. You got a recommendation for me? Otherwise I would install the old port multiplier again and test if it works with it. Currently it is a D2607-A21 crossflashed. But I don't know how I can test or see if it's the controller. Quote Link to comment
JorgeB Posted March 4, 2020 Share Posted March 4, 2020 Just now, aurevo said: But I don't know how I can test or see if it's the controller. Easiest way would be to test with another controller, but you could test that controller on another computer if possible. Quote Link to comment
JorgeB Posted March 4, 2020 Share Posted March 4, 2020 Using another PSU would also be a good idea to rule out power issue. Quote Link to comment
aurevo Posted March 6, 2020 Author Share Posted March 6, 2020 On 3/4/2020 at 6:41 PM, johnnie.black said: Easiest way would be to test with another controller, but you could test that controller on another computer if possible. I think in the next few days I will first install the old controller again and test it. For the exchange of the power supply I would have to take all components apart again. I flashed the controller according to the following instructions: https://forums.unraid.net/topic/12114-lsi-controller-fw-updates-irit-modes/?do=findComment&comment=522038 Is the manual still the most current one, or are there new options in the meantime? Quote Link to comment
JorgeB Posted March 7, 2020 Share Posted March 7, 2020 10 hours ago, aurevo said: I flashed the controller according to the following instructions: Those are current. Quote Link to comment
aurevo Posted March 7, 2020 Author Share Posted March 7, 2020 (edited) 16 hours ago, johnnie.black said: Those are current. I just switched to the old SATA controller. Diagnostics attached. Now everything is available, but I can't mount the ddrescue-dest, although it was mountable before I changed the adapter, now the only option is to format. The four Toshiba HDD still showing error state. tower-diagnostics-20200308-0036.zip Edited March 7, 2020 by aurevo Quote Link to comment
JorgeB Posted March 9, 2020 Share Posted March 9, 2020 On 3/7/2020 at 11:39 PM, aurevo said: but I can't mount the ddrescue-dest Which disk is that one? All array devices appear to be mounting correctly, though there are still ATA errors on at least two devices. Quote Link to comment
aurevo Posted March 9, 2020 Author Share Posted March 9, 2020 4 hours ago, johnnie.black said: Which disk is that one? All array devices appear to be mounting correctly, though there are still ATA errors on at least two devices. The unassigned device ST6000... I can not mount it through UD Quote Link to comment
JorgeB Posted March 9, 2020 Share Posted March 9, 2020 That's one of the disks having ATA errors (hardware problem): Mar 8 00:35:30 Tower kernel: ata5.00: status: { DRDY SENSE ERR } Mar 8 00:35:30 Tower kernel: ata5.00: error: { ABRT } Mar 8 00:35:30 Tower kernel: ata5.00: configured for UDMA/133 (device error ignored) Mar 8 00:35:30 Tower kernel: ata5: EH complete Mar 8 00:35:30 Tower kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Mar 8 00:35:30 Tower kernel: ata5.00: irq_stat 0x40000001 Mar 8 00:35:30 Tower kernel: ata5.00: failed command: READ DMA EXT Mar 8 00:35:30 Tower kernel: ata5.00: cmd 25/00:08:00:f4:a0/00:00:ba:02:00/e0 tag 3 dma 4096 in Mar 8 00:35:30 Tower kernel: res 53/04:08:00:f4:a0/00:00:ba:02:00/e0 Emask 0x1 (device error) Mar 8 00:35:30 Tower kernel: ata5.00: status: { DRDY SENSE ERR } Mar 8 00:35:30 Tower kernel: ata5.00: error: { ABRT } Mar 8 00:35:30 Tower kernel: ata5.00: configured for UDMA/133 (device error ignored) Mar 8 00:35:30 Tower kernel: sd 5:0:0:0: [sdf] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Mar 8 00:35:30 Tower kernel: sd 5:0:0:0: [sdf] tag#3 Sense Key : 0x5 [current] Mar 8 00:35:30 Tower kernel: sd 5:0:0:0: [sdf] tag#3 ASC=0x21 ASCQ=0x4 Mar 8 00:35:30 Tower kernel: sd 5:0:0:0: [sdf] tag#3 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f4 00 00 00 00 08 00 00 Mar 8 00:35:30 Tower kernel: print_req_error: I/O error, dev sdf, sector 11721044992 Mar 8 00:35:30 Tower kernel: Buffer I/O error on dev sdf, logical block 1465130624, async page read You still have some hardware issue causing problems with the devices using different controllers, I don't remember, did you try another power supply? Quote Link to comment
aurevo Posted March 9, 2020 Author Share Posted March 9, 2020 (edited) 3 hours ago, johnnie.black said: That's one of the disks having ATA errors (hardware problem): Mar 8 00:35:30 Tower kernel: ata5.00: status: { DRDY SENSE ERR } Mar 8 00:35:30 Tower kernel: ata5.00: error: { ABRT } Mar 8 00:35:30 Tower kernel: ata5.00: configured for UDMA/133 (device error ignored) Mar 8 00:35:30 Tower kernel: ata5: EH complete Mar 8 00:35:30 Tower kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Mar 8 00:35:30 Tower kernel: ata5.00: irq_stat 0x40000001 Mar 8 00:35:30 Tower kernel: ata5.00: failed command: READ DMA EXT Mar 8 00:35:30 Tower kernel: ata5.00: cmd 25/00:08:00:f4:a0/00:00:ba:02:00/e0 tag 3 dma 4096 in Mar 8 00:35:30 Tower kernel: res 53/04:08:00:f4:a0/00:00:ba:02:00/e0 Emask 0x1 (device error) Mar 8 00:35:30 Tower kernel: ata5.00: status: { DRDY SENSE ERR } Mar 8 00:35:30 Tower kernel: ata5.00: error: { ABRT } Mar 8 00:35:30 Tower kernel: ata5.00: configured for UDMA/133 (device error ignored) Mar 8 00:35:30 Tower kernel: sd 5:0:0:0: [sdf] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Mar 8 00:35:30 Tower kernel: sd 5:0:0:0: [sdf] tag#3 Sense Key : 0x5 [current] Mar 8 00:35:30 Tower kernel: sd 5:0:0:0: [sdf] tag#3 ASC=0x21 ASCQ=0x4 Mar 8 00:35:30 Tower kernel: sd 5:0:0:0: [sdf] tag#3 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f4 00 00 00 00 08 00 00 Mar 8 00:35:30 Tower kernel: print_req_error: I/O error, dev sdf, sector 11721044992 Mar 8 00:35:30 Tower kernel: Buffer I/O error on dev sdf, logical block 1465130624, async page read You still have some hardware issue causing problems with the devices using different controllers, I don't remember, did you try another power supply? Where this errors with both controllers? This is for sdf, which ones having these errors too? No, I did not changed the Power Supply. No Power Supply at hand actually. Edited March 9, 2020 by aurevo Quote Link to comment
JorgeB Posted March 9, 2020 Share Posted March 9, 2020 IIRC there were errors with all controllers, on the original ad-don controller with a port multiplier, on the LSI, and this time with the onboard SATA ports, hence why I suspect there could be a power issue. Quote Link to comment
aurevo Posted March 19, 2020 Author Share Posted March 19, 2020 On 3/9/2020 at 6:34 PM, johnnie.black said: IIRC there were errors with all controllers, on the original ad-don controller with a port multiplier, on the LSI, and this time with the onboard SATA ports, hence why I suspect there could be a power issue. The current status is as follows: I have rebuilt the CPU, the mainboard back to the old components. In parallel I exchanged the power supply with an identical one. After that I had the parity restored and currently it looks good, even if the previously mentioned hard disks still show errors (though not always, the errors had disappeared temporarily). Would you be so kind to have another look at the logs? Currently the hard drives are connected onboard and with the LSI controller. In normal use, the server is running as desired again and no longer outputs errors. tower-diagnostics-20200319-0125.zip Quote Link to comment
JorgeB Posted March 19, 2020 Share Posted March 19, 2020 Everything looks normal on the diags posted, there aren't any disk errors. Quote Link to comment
aurevo Posted March 29, 2020 Author Share Posted March 29, 2020 On 3/19/2020 at 8:30 AM, johnnie.black said: Everything looks normal on the diags posted, there aren't any disk errors. Hello, until this night the server was running without problems and with a good performance as it should be for ten days. The problems started when the JDownloader Docker was no longer available. This morning I restarted the server and the 2TB disk was unreachable/disabled and emulated. Since there is no important data on this disk, I created a new configuration and started the parity sync. Now the log is currently filled with errors while reading Disk3 and I paused the process for now. tower-diagnostics-20200329-1333.zip Quote Link to comment
JorgeB Posted March 30, 2020 Share Posted March 30, 2020 Disk looks mostly OK but there are recent UNC @ LBA errors, you should run an extended SMART test. Quote Link to comment
aurevo Posted April 1, 2020 Author Share Posted April 1, 2020 On 3/30/2020 at 11:56 AM, johnnie.black said: Disk looks mostly OK but there are recent UNC @ LBA errors, you should run an extended SMART test. I have removed the (possibly) defective 2TB and 4TB hard disks, as they have caused problems in the past. Now I would like to try to rescue the data with ddrescue according to your instructions. The 4TB hard disk is still installed and as a target I have connected a 4TB external hard disk. Under Dashboard the old disk is shown as "sdd" (but not under UD or Main) and the new one as "sdm". Unfortunately no copy process is possible at the moment: root@Tower:/# ddrescue -f /dev/sdd /dev/sdm /boot/ddrescue.log GNU ddrescue 1.23 Press Ctrl-C to interrupt Initial status (read from mapfile) rescued: 4000 GB, tried: 0 B, bad-sector: 0 B, bad areas: 0 Current status ipos: 0 B, non-trimmed: 0 B, current rate: 0 B/s opos: 0 B, non-scraped: 0 B, average rate: 0 B/s non-tried: 0 B, bad-sector: 0 B, error rate: 0 B/s rescued: 4000 GB, bad areas: 0, run time: 0s pct rescued: 100.00%, read errors: 0, remaining time: n/a time since last successful read: n/a tower-diagnostics-20200402-0106.zip Quote Link to comment
JorgeB Posted April 2, 2020 Share Posted April 2, 2020 Reboot, try ddrescue again (re-check disk identifiers, they could change), and if it fails post new diags. Quote Link to comment
aurevo Posted April 2, 2020 Author Share Posted April 2, 2020 1 hour ago, johnnie.black said: Reboot, try ddrescue again (re-check disk identifiers, they could change), and if it fails post new diags. Rebooted and tried again. Same message. tower-diagnostics-20200402-1229.zip Quote Link to comment
JorgeB Posted April 2, 2020 Share Posted April 2, 2020 Did you ever use ddrescue before? If yes delete the log file (/boot/ddrescue.log) or use a different name. Quote Link to comment
aurevo Posted April 3, 2020 Author Share Posted April 3, 2020 (edited) On 4/2/2020 at 1:05 PM, johnnie.black said: Did you ever use ddrescue before? If yes delete the log file (/boot/ddrescue.log) or use a different name. Yes I used it before. After deleting the log file, it worked. Got this error, I think the disk I copied to was smaller than the disk I was copying from. Copied disk is not mountable with UD, it only says "Format". root@Tower:~# ddrescue -f /dev/sde /dev/sda /boot/ddrescue.log GNU ddrescue 1.23 Press Ctrl-C to interrupt ipos: 786902 MB, non-trimmed: 0 B, current rate: 184 MB/s ipos: 814066 MB, non-trimmed: 0 B, current rate: 186 MB/s ipos: 1861 GB, non-trimmed: 0 B, current rate: 158 MB/s ipos: 4000 GB, non-trimmed: 0 B, current rate: 4325 kB/s opos: 4000 GB, non-scraped: 0 B, average rate: 50349 kB/s non-tried: 34430 kB, bad-sector: 0 B, error rate: 0 B/s rescued: 4000 GB, bad areas: 0, run time: 22h 4m 18s pct rescued: 99.99%, read errors: 0, remaining time: 1s time since last successful read: n/a Copying non-tried blocks... Pass 1 (forwards) ddrescue: Write error: No space left on device tower-diagnostics-20200403-1802.zip Edited April 3, 2020 by aurevo Quote Link to comment
JorgeB Posted April 3, 2020 Share Posted April 3, 2020 35 minutes ago, aurevo said: ddrescue: Write error: No space left on device Yes, not enough space on target disk, you can use a disk larger than 4TB (to mount with UD, not in the array). Quote Link to comment
aurevo Posted April 8, 2020 Author Share Posted April 8, 2020 On 4/3/2020 at 6:39 PM, johnnie.black said: Yes, not enough space on target disk, you can use a disk larger than 4TB (to mount with UD, not in the array). Buying a larger hard disk than the one I own is pointless and too expensive for these purposes. I had removed the 2TB and the 4TB hard disk, because I thought that they only cause errors. However, I was able to copy all the data until just before the end (because the disk I wanted to clone to was too small), and for the second time, and copy all the data that was on the 4TB disk to the array. So I thought I might be able to add that disk back into the array, but again I get tons of errors. I did a SMART check and an extended SMART check and passed. Is the disk broken, or why do I only get these errors in the array, everything else is working fine? Quote Link to comment
aurevo Posted April 9, 2020 Author Share Posted April 9, 2020 3 hours ago, johnnie.black said: Need diags Sorry. Now attached. tower-diagnostics-20200409-1409.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.