dvaldez Posted July 21, 2019 Share Posted July 21, 2019 (edited) Here is the sequence of events: I see an alert saying health of array has failed, turns out a disk has read errors I read that this 'read error' issue is often a cable issue so I get a new cable and use a new sata port on the motherboard. After booting up again, I see that the disk is now too small to be added to be added back into the array (its the same disk). I have a gigabyte motherboard and I read about the HPA partition I go into BIOS and disable the HPA, I also notice the system clock is off, so I update that as well After booting up again, the previously 'read error' disk is now showing as a 'New Device', BUT an additional disk is now showing as 'Wrong' What can I do here? I've read of people suggesting doing a 'New Config' in some situations, but I'm reading that this will prevent you from being able to rebuild any existing failed disks, which is what the 'read errors' disk is considered to be, right? I've attached the before diagnostics (...-0805.zip) and the after diagnostics (...-1103.zip) This is very stressful makes me want to add a parity disk for dual parity once I hopefully get this resolved mainstore-diagnostics-20190721-0805.zip mainstore-diagnostics-20190721-1103.zip Edited July 21, 2019 by dvaldez added screenshot Quote Link to comment
dvaldez Posted July 22, 2019 Author Share Posted July 22, 2019 Any ideas? I don't know what to do at this point Quote Link to comment
Vr2Io Posted July 22, 2019 Share Posted July 22, 2019 (edited) You means orginal HPA was enable ? Anyway just keep orginal BIOS setting first. If you assume disk3 was in good state, then just perform new config and preserve all setting. Start array and check any abnormal, any unmountable file system, perform uncorrect parity check ..... etc. Remember : Set parity valid, don't do any format disk If anything wrong, you may invalid disk3 again and rebuild ( before this step, all other disk must mountabe and content readable ) and don't write anything to array. Edited July 22, 2019 by Benson Quote Link to comment
JorgeB Posted July 22, 2019 Share Posted July 22, 2019 Make sure bios backup function is disabled and remove HPA from disk4, other disks also have HPA enable but only remove from disk4 for now. Quote Link to comment
dvaldez Posted July 22, 2019 Author Share Posted July 22, 2019 (edited) Disk 3 should be fine, but it was removed from the array for at least 18 hours when the 'read errors' started, I wrote backups to the array within that time, is it still ok to do a New Config on? Also yes, originally HPA was enabled, I didn't know. On my motherboard I have 2 options in BIOS: HPA or BIOS Backup, I cannot disable it seems. I changed from HPA to BIOS Backup Edited July 22, 2019 by dvaldez adding detail Quote Link to comment
Vr2Io Posted July 22, 2019 Share Posted July 22, 2019 (edited) 19 minutes ago, dvaldez said: Disk 3 should be fine, but it was removed from the array for at least 18 hours when the 'read errors' started, I wrote backups to the array within that time, is it still ok to do a New Config on? If disk 3 fine, then at least it have data and you won't total lost on disk 3. If it lost then data need rebuild by others disk. If you have backup, then things could more easy. The importance was other disk end up readable and mountable. Due to I am not sure HPA will cause disk mountable or not. So you may best follow others suggestion. And in general, should keep orignal status. Edited July 22, 2019 by Benson Quote Link to comment
dvaldez Posted July 24, 2019 Author Share Posted July 24, 2019 I put the BIOS back to the original config, and I did New Config Now I have all of the original disks assigned in all of the original slots, but I cannot start the array as "The parity drive is not the biggest" I tried to remove the HPA using hdparm but I am getting some error: root@mainstore:~# hdparm -N p7814037168 /dev/sdb /dev/sdb: setting max visible sectors to 7814037168 (permanent) SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 10 51 40 01 21 00 00 00 a0 af 00 00 00 00 00 00 00 00 00 00 00 00 00 00 max sectors = 7814035055/7814037168, HPA is enabled I guess the next step is to try and use the 2nd HPA removal method. If I can say that all of the data disks are good and have the data that I want, is it possible to just buy and add a larger (6TB or something) disk and put that in the parity slot instead? Would I retain all of the data? How could I check if the original "read errors" disk is good? I definitely was not doing any rebuilding when it failed but there may have been a parity check running. I have not written any new data to the array to my knowledge, but I do have a few things that automatically mount the array via NFS so I'm not completely sure. Quote Link to comment
dvaldez Posted July 24, 2019 Author Share Posted July 24, 2019 OK I tried to use the HDAT2 method, when I used the set max command - it gave some error, I think my motherboard may not support these commands... I then tried the "Auto set max" command, and it said it was successful... I then booted back into unraid, and set all the disks in the proper slots, and it still says the parity disk is not the biggest disk... I know I did the correct one in HDAT2 because I disconnected all of the other disks to be sure. At this point, if possible, I'll just buy a larger disk 6/8TB and put it in the parity slot, will this be ok? If the data on the "read errors" disk was never corrupted or damaged, and it was in fact just a cable issue, will it retain all of the data that was on there? What should I do? Quote Link to comment
JorgeB Posted July 24, 2019 Share Posted July 24, 2019 1 hour ago, dvaldez said: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 10 51 40 01 21 00 00 00 a0 af 00 00 00 00 00 00 00 00 00 00 00 00 00 00 max sectors = 7814035055/7814037168, HPA is enabled This is usually controller related, try using a different controller or removing the HPA on another computer. Quote Link to comment
Vr2Io Posted July 24, 2019 Share Posted July 24, 2019 (edited) 5 hours ago, dvaldez said: If I can say that all of the data disks are good and have the data that I want, is it possible to just buy and add a larger (6TB or something) disk and put that in the parity slot instead? Would I retain all of the data? Yes. Even you can start array now without parity and add parity later. The reason for kept all disk as org. state really because hope have a last chance to resume data from parity and other disk. But I still not confirm if a data disk which modify by HPA still mountable or not. Edited July 24, 2019 by Benson Quote Link to comment
dvaldez Posted July 24, 2019 Author Share Posted July 24, 2019 well luckily I was never able to properly remove HPA on the 'read errors' data disk, so it should be in the same state it was in when the read errors occurred, unless HPA was changed automatically afterwards... Quote Link to comment
dvaldez Posted July 26, 2019 Author Share Posted July 26, 2019 (edited) I have a 6TB drive coming in the mail to use as a replacement parity disk, but I tried to start the array without the parity, just the remaining 4 disks. At this time the last 2 disks are unmountable: I didn't make any changes on disk 3 but I tried to remove the HPA on disk 4 previously, it failed however....How can I fix this?? I have attached an updated diagnostic if it matters mainstore-diagnostics-20190726-0225.zip Edited July 26, 2019 by dvaldez correcting detail Quote Link to comment
dvaldez Posted July 26, 2019 Author Share Posted July 26, 2019 should I try doing the xfs_repair method on the 2 disks? Quote Link to comment
dvaldez Posted July 26, 2019 Author Share Posted July 26, 2019 So I went for an xfs_repair on disk 3, the disk I think (if I remember correctly) has the least amount of data root@mainstore:~# xfs_repair -v /dev/md3 Phase 1 - find and verify superblock... error reading superblock 4 -- seek to offset 4000785907712 failed couldn't verify primary superblock - attempted to perform I/O beyond EOF !!! attempting to find secondary superblock... .found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... .found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... .found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... ..found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... .found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... .found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... ...................................................................................................................................... the .....'s have continued ever since for at least 40 mins and is still continuing...should I cancel at this point? there hasn't been anything else Quote Link to comment
Vr2Io Posted July 26, 2019 Share Posted July 26, 2019 33 minutes ago, dvaldez said: So I went for an xfs_repair on disk 3, the disk I think (if I remember correctly) has the least amount of data root@mainstore:~# xfs_repair -v /dev/md3 Phase 1 - find and verify superblock... error reading superblock 4 -- seek to offset 4000785907712 failed couldn't verify primary superblock - attempted to perform I/O beyond EOF !!! attempting to find secondary superblock... .found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... .found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... .found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... ..found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... .found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... .found candidate secondary superblock... error reading superblock 4 -- seek to offset 4000785907712 failed unable to verify superblock, continuing... ...................................................................................................................................... the .....'s have continued ever since for at least 40 mins and is still continuing...should I cancel at this point? there hasn't been anything else Stop it. Quote Link to comment
Vr2Io Posted July 26, 2019 Share Posted July 26, 2019 (edited) On 7/24/2019 at 1:25 PM, dvaldez said: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 10 51 40 01 21 00 00 00 a0 af 00 00 00 00 00 00 00 00 00 00 00 00 00 00 max sectors = 7814035055/7814037168, HPA is enabled I guess the next step is to try and use the 2nd HPA removal method. How about HPA remove on disk3 disk4 try on other disk controller ? Disk 1,3,4 have HPA, but disk 1 mountable. What output of ( I want to check does fail due to disk in frozen ) hdparm -I sde ( disk3 ) hdparm -I sdd ( disk4 ) Edited July 26, 2019 by Benson Quote Link to comment
JorgeB Posted July 26, 2019 Share Posted July 26, 2019 Unsupported partition layout means current partition isn't using the full disk, because of the HPA, if you had valid parity you could rebuild one disk at a time, without it I would use UD to mount the disks and copy the data to other(s) before re-formatting it. Quote Link to comment
dvaldez Posted July 26, 2019 Author Share Posted July 26, 2019 Here are the outputs of hdparm -I for both the unmountable disks: root@mainstore:~# hdparm -I /dev/sde /dev/sde: ATA device, with non-removable media Model Number: WDC WD40EFRX-68N32N0 Serial Number: WD-WCC7K5JCVK5E Firmware Revision: 82.00A82 Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 Standards: Used: unknown (minor revision code 0x006d) Supported: 10 9 8 7 6 5 Likely used: 10 Configuration: Logical max current cylinders 16383 0 heads 16 0 sectors/track 63 0 -- LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 7814035055 Logical Sector size: 512 bytes Physical Sector size: 4096 bytes Logical Sector-0 offset: 0 bytes device size with M = 1024*1024: 3815446 MBytes device size with M = 1000*1000: 4000785 MBytes (4000 GB) cache/buffer size = unknown Form Factor: 3.5 inch Nominal Media Rotation Rate: 5400 Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE Power-Up In Standby feature set * SET_FEATURES required to spinup after power up SET_MAX security extension * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * 64-bit World wide name * IDLE_IMMEDIATE with UNLOAD * WRITE_UNCORRECTABLE_EXT command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * Gen1 signaling speed (1.5Gb/s) * Gen2 signaling speed (3.0Gb/s) * Gen3 signaling speed (6.0Gb/s) * Native Command Queueing (NCQ) * Host-initiated interface power management * Phy event counters * Idle-Unload when NCQ is active * NCQ priority information * READ_LOG_DMA_EXT equivalent to READ_LOG_EXT DMA Setup Auto-Activate optimization Device-initiated interface power management * Software settings preservation * SMART Command Transport (SCT) feature set * SCT Write Same (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) * SCT Data Tables (AC5) unknown 206[12] (vendor specific) unknown 206[13] (vendor specific) * DOWNLOAD MICROCODE DMA command * WRITE BUFFER DMA command * READ BUFFER DMA command Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count supported: enhanced erase 480min for SECURITY ERASE UNIT. 480min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 50014ee21119cfe8 NAA : 5 IEEE OUI : 0014ee Unique ID : 21119cfe8 Checksum: correct root@mainstore:~# hdparm -I /dev/sdd /dev/sdd: ATA device, with non-removable media Model Number: WDC WD40EFRX-68N32N0 Serial Number: WD-WCC7K5XR9FEE Firmware Revision: 82.00A82 Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 Standards: Used: unknown (minor revision code 0x006d) Supported: 10 9 8 7 6 5 Likely used: 10 Configuration: Logical max current cylinders 16383 0 heads 16 0 sectors/track 63 0 -- LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 7814035055 Logical Sector size: 512 bytes Physical Sector size: 4096 bytes Logical Sector-0 offset: 0 bytes device size with M = 1024*1024: 3815446 MBytes device size with M = 1000*1000: 4000785 MBytes (4000 GB) cache/buffer size = unknown Form Factor: 3.5 inch Nominal Media Rotation Rate: 5400 Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE Power-Up In Standby feature set * SET_FEATURES required to spinup after power up SET_MAX security extension * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * 64-bit World wide name * IDLE_IMMEDIATE with UNLOAD * WRITE_UNCORRECTABLE_EXT command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * Gen1 signaling speed (1.5Gb/s) * Gen2 signaling speed (3.0Gb/s) * Gen3 signaling speed (6.0Gb/s) * Native Command Queueing (NCQ) * Host-initiated interface power management * Phy event counters * Idle-Unload when NCQ is active * NCQ priority information * READ_LOG_DMA_EXT equivalent to READ_LOG_EXT DMA Setup Auto-Activate optimization Device-initiated interface power management * Software settings preservation * SMART Command Transport (SCT) feature set * SCT Write Same (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) * SCT Data Tables (AC5) unknown 206[12] (vendor specific) unknown 206[13] (vendor specific) * DOWNLOAD MICROCODE DMA command * WRITE BUFFER DMA command * READ BUFFER DMA command Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count supported: enhanced erase 490min for SECURITY ERASE UNIT. 490min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 50014ee2bb3783a5 NAA : 5 IEEE OUI : 0014ee Unique ID : 2bb3783a5 Checksum: correct root@mainstore:~# Quote Link to comment
BRiT Posted July 26, 2019 Share Posted July 26, 2019 UD is Unassigned Devices plugin. Quote Link to comment
Vr2Io Posted July 26, 2019 Share Posted July 26, 2019 3 hours ago, dvaldez said: Here are the outputs of hdparm -I for both the unmountable disks: Both not frozen Quote Link to comment
dvaldez Posted July 26, 2019 Author Share Posted July 26, 2019 How do I use the UD plugin to copy the data out of the 'unmountable' disks? I have it installed, but if I unassign the 2 "unmountable" disks, I cannot start the array, even in maintenance mode Quote Link to comment
Vr2Io Posted July 26, 2019 Share Posted July 26, 2019 (edited) If you can't mount disk3 4 in Unraid, no matter in array or UD, I would suggest you use HDAT2 bootable to remove HPA for disk 3 4 only, then try mount it by Unraid or XFS repair again. Edited July 26, 2019 by Benson Quote Link to comment
Vr2Io Posted July 26, 2019 Share Posted July 26, 2019 (edited) If you want try again remove HPA for disk 3 4 in Unraid, pls . Power off whole system . Disconnect disk 3 4 SATA link . Power on, boot in unraid . Connect back SATA link . Try again remove HPA on disk 3 4 Edited July 26, 2019 by Benson Quote Link to comment
dvaldez Posted July 26, 2019 Author Share Posted July 26, 2019 Finally a positive update: I tried doing the hdparm -N command on the 2 unmountable disks as well as the parity disk which was "not the largest disk in the array" while plugged into a DIFFERENT storage controller, they were successful this time. I did a new config with the 4 data disks in the proper positions (as 2 of the disks were showing 'wrong') and was able to see all of the data I then assigned the parity disk and it seems to have been recognized, as it DIDNT show the message "all data on this disk will be erased when array started" parity was showing invalid (as expected due to 18hrs downtime of disk 3) and it is now running a parity sync/data rebuild Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.