Jump to content

J0my

Members
  • Posts

    26
  • Joined

  • Last visited

Posts posted by J0my

  1. 16 hours ago, JorgeB said:

    Looks like a SATA cable problem, replace the cable and if the emulated disk is mounting and contents look correct you can rebuild on top.

    Thanks for the reply, the cable is from the raid card and is a mini sas to 4 sata break out cable (not the exact cable I have just an example of it), should I replace the whole cable or just swap to another of the 3 sata connectors on the cable? The case is also a Silverstone ds380b so the cable plugs into the hotswap pcb.

     

    Also with the rebuilding it is to deselect the drive, run the array, turn off the array, then re-select the drive?

  2. Hello, just today one of my drives (Disk 2) because disabled with the red X (Unraid 6.9.2). I have attached the diagnostics for it as well. I was transferring some files on it when it started to get errors for example (Jun 26 09:27:53 Tower kernel: md: disk2 write error, sector=194517880). A little bit before this started happening I had gotten a notification about "UDMA CRC error count" for the drive as well.

     

    I am not sure how to proceed and am asking for any advice on what appears to be the issue and how to resolve it, like if the drive is starting to die and should be replaced etc or some other action needs to be taken.

    tower-diagnostics-20230626-1009.zip

  3. 35 minutes ago, trurl said:

    For future reference, Diagnostics zip already includes SMART for all attached disks and syslog since reboot, so no need to attach separately.

     

    
    # 1  Extended offline    Completed: read failure       50%      9570         8595749280
    

     

    Replace disk.

    Thanks bud, good to know it is only diagnostics that I need to attach, will look at replacing the disk.

     

    I really appreciate the help :)

  4. Hello, I recently got a notification in Unraid of Offline uncorrectable/Current pending sector errors, where both have a value of 16 on the drive "ST8000DM004-2CX188_WCT00DET - 8 TB". I ran a SMART extended self-test and got a "Errors occurred - Check SMART report" result, I have included the Smart Report as well as diagnostics and Syslog as I don't really know how severe of an issue it is.

    I have seen in other posts about this issue that it may not indicate the drive is dying, but I would just like some feedback if possible if the attached logs etc can help determine the severity and if it is dying and should be replaced.

    Any help would be appreciated :)

    tower-diagnostics-20210812-1009.zip

  5. 1 hour ago, johnnie.black said:

     

    @johnnie.black Thanks for that, as per the wiki I ran -nv and it gave me these results, I put it in a txt doc to save this page getting super long xfs_repair status -nv .txt 

     

    I am not going to pretend I know what any of that means so if possible what would the next step be based on its findings?

  6. Hello, earlier this month one my drives was spitting out errors and unraid ended up disabling it (Disk 1) tower-diagnostics-20190502-1415.zip, after previous issues with this happening and on recommendations from other members I replaced my raid card with an LSI 9211-8i.

     

    Once I got it installed and went through the process of de selecting the drive, starting then stopping the array, reelecting the drive and rebuilding it tower-diagnostics-20190521-1004.zip has had this "Unmountable: No file system" in its details (see attached screenshot).

     

    Being relatively new to unraid I was not sure or not if that when it rebuilt the array it would make the drive function again as normal. I am just asking what I should be doing to correct this, or is the drive potentially just not functioning properly and have likely lost the data that was on it? (as it is not on any of the other drives).

    Array 2019-05-21 100750.jpg

  7. 4 hours ago, Benson said:

    Reboot won't bricked it.

     

    I don't think NTFS is the problem, otherwise you can't execute sas2flash.

     

    For my understanding, flash firmware won't help PAL error issue.

    So even though I have erased the flash memory, everything says to not reboot, but for what ever reason I am unable to get it to execute the commands, but maddeningly it let me do the erasing one.

     

    3 hours ago, Abnorm said:

    A summary of what i did in terms of flashing the card, just check the last post, hopefully it helps :

     

    Hi thanks for the link, just not sure it can help all that much as stated above when ever I do the "sas2flsh.efi -o -f 2118it.bin" it just gives me the syntax error and will not let me do anything. Still have the machine on in case there's anything that can be done.

  8. Hello, I have spent all day following guides on how to flash a LSI 9211-8i into IT mode via USB as I had just bought the card. I am unable to do the flashing via BIOS due to the PAL errors which from everything I have read means the Mobo is UEFI (Gigabyte Z170N-WIFI). I had also tried every way online on how to try get my usb key set up to UEFI but for what ever reason the computer just would not enter the shell, I would get the option to load the USB in UEFI but it would just flash and go back the menu on boot devices (Rufus>FreeDOS>Fat32 etc etc and the usb drive is only 8GB) or if I selected the non UEFI it would just load into FreeDOS. (I had tried every different form of sas2flash.efi in the root directory of freedos usb or non bootable usb, had tried the /efi/root/ efi goes here and none would ever load up the UEFI options which were set as boot priority it just would not load just do that flicker thing).

    For what ever brain aneurysm I had at the time in rufus I noticed a UEFI:NTFS option which for the hell of it I tried and put in the sas2flash.efi as well as the shellx64.efi in the efi/root folder and it actually booted up so I started following the UEFI guides commands

     

    fs0:/> sas2flash.efi -listall
    fs0:/> sas2flash.efi -o -e 6
    fs0:/> sas2flash.efi -o -f 2118it.bin -b mptsas2.rom
    fs0:/> sas2flash.efi -listall

    2118it.bin & mptsas2.rom (latest versions) are in the root directory with the sas2flash.efi

     

    basically I got to the -o -e 6 and it worked and I was ecstatic as it was finally working. I did the second line and I would get a 

    ERROR: Could not open file: 2118it.bin, Syntax Error see Command Usage Below: 
    
    -f x: Flash firmware image
    x = the firmware file to flash

    that for both files, it would not let me flash them.

     

    So atm I still have the machine running as everything says to not turn it off. I have likely bricked it with how the usb drive is in ntfs and not fat32. The main question is, is there any way to try salvage it to somehow get the files to install or failing that if i have to reboot is there any way to unbrick the card. Any help would be appreciated.

  9. 21 hours ago, jonathanm said:

    Depends, and to expand on what johnnie said, if it's an automatic check, no correct is always best. IF there are errors, you need to find out WHY there are errors, then correct the condition that caused the errors, and only then run a correcting check followed by another non correcting to see if the issue has been resolved.

     

    One example where a correcting check may be needed is after an unclean shutdown.

    Thanks for expanding on that point, it is really helpful :)

  10. 5 minutes ago, johnnie.black said:

    Start in normal mode, disks aren't mounted in safe mode.

    Done, it is all back and present *deep sigh* thank you very much mate, you really are a life saver as I am not really knowledgeable in this stuff so I really appreciate your patience with me.

     

    One last question just for a general knowledge thing, when doing a Parity-Check, should the "Write corrections to parity" box be checked or not and what is the recommended frequency of the checks?

     

    Will deff be looking to buy a new raid card or just use the 4 sata ports on my mobo and not use a cache drive for the time being.

  11. 4 minutes ago, johnnie.black said:

    Use -L

    Ok have done that and now have this 

    Phase 1 - find and verify superblock...
    Phase 2 - using internal log
            - zero log...
    ALERT: The filesystem has valuable metadata changes in a log which is being
    destroyed because the -L option was used.
            - scan filesystem freespace and inode maps...
            - found root inode chunk
    Phase 3 - for each AG...
            - scan and clear agi unlinked lists...
            - process known inodes and perform inode discovery...
            - agno = 0
            - agno = 1
            - agno = 2
            - agno = 3
            - agno = 4
            - agno = 5
            - agno = 6
            - agno = 7
            - agno = 8
            - agno = 9
            - process newly discovered inodes...
    Phase 4 - check for duplicate blocks...
            - setting up duplicate extent list...
            - check for inodes claiming duplicate blocks...
            - agno = 0
            - agno = 1
            - agno = 3
            - agno = 2
            - agno = 4
            - agno = 5
            - agno = 6
            - agno = 7
            - agno = 8
            - agno = 9
    Phase 5 - rebuild AG headers and trees...
            - reset superblock...
    Phase 6 - check inode connectivity...
            - resetting contents of realtime bitmap and summary inodes
            - traversing filesystem ...
            - traversal finished ...
            - moving disconnected inodes to lost+found ...
    Phase 7 - verify and correct link counts...
    Maximum metadata LSN (1:1048697) is ahead of log (1:2).
    Format log to cycle 4.
    done

    tower-diagnostics-20190402-2016.zip So it is still in maint mode and for now still showing that the drive is "Unmountable: No file system" is there something that should be done next to test if its worked?

  12. 2 hours ago, johnnie.black said:

    You need to remove the -n flag (no modify)

    removing the -n flag gets me this 
     

    Phase 1 - find and verify superblock...
    Phase 2 - using internal log
            - zero log...
    ERROR: The filesystem has valuable metadata changes in a log which needs to
    be replayed.  Mount the filesystem to replay the log, and unmount it before
    re-running xfs_repair.  If you are unable to mount the filesystem, then use
    the -L option to destroy the log and attempt a repair.
    Note that destroying the log may cause corruption -- please attempt a mount
    of the filesystem before doing this.

     

  13. 31 minutes ago, johnnie.black said:

    Let the sync finish then run a filesystem check on disk1

    Phase 1 - find and verify superblock...
    Phase 2 - using internal log
            - zero log...
    ALERT: The filesystem has valuable metadata changes in a log which is being
    ignored because the -n option was used.  Expect spurious inconsistencies
    which may be resolved by first mounting the filesystem to replay the log.
            - scan filesystem freespace and inode maps...
            - found root inode chunk
    Phase 3 - for each AG...
            - scan (but don't clear) agi unlinked lists...
            - process known inodes and perform inode discovery...
            - agno = 0
            - agno = 1
            - agno = 2
            - agno = 3
            - agno = 4
            - agno = 5
            - agno = 6
            - agno = 7
            - agno = 8
            - agno = 9
            - process newly discovered inodes...
    Phase 4 - check for duplicate blocks...
            - setting up duplicate extent list...
            - check for inodes claiming duplicate blocks...
            - agno = 1
            - agno = 0
            - agno = 2
            - agno = 3
            - agno = 4
            - agno = 5
            - agno = 6
            - agno = 7
            - agno = 8
            - agno = 9
    No modify flag set, skipping phase 5
    Phase 6 - check inode connectivity...
            - traversing filesystem ...
            - traversal finished ...
            - moving disconnected inodes to lost+found ...
    Phase 7 - verify link counts...
    Maximum metadata LSN (1:1048685) is ahead of log (1:1048661).
    Would format log to cycle 4.
    No modify flag set, skipping filesystem flush and exiting.

    This is what has come up since doing the filesystem check on disk 1 (tower-diagnostics-20190402-1712.zip) Not sure what the next step now is, array is still in maint mode.

  14. 11 hours ago, johnnie.black said:

    Correct, you can stop it immediately after start.

    Ok have done that it currently parity sync/rebuilding about 6hrs left. I have also just noticed that one of my drives Disk 1 now says it is "Unmountable: No file system" tower-diagnostics-20190402-0834.zip Should I potentially be stopping the rebuild if it is reading (which it is) from that drive for the rebuild of the parity drive, rebooting and seeing if it comes back and then starting again on the rebuild?

    Unraid2.jpg

  15. 1 hour ago, johnnie.black said:

    Parity disk dropped offline, likely controller related, you can reboot and post new diags so there's a SMART report for parity, but best best bet would likely be to replace that controller with one of the recommended LSI HBAs.

    I rebooted and got the diagnostics (tower-diagnostics-20190401-2027.zip), I did not start the array though as it is currently disabled, I am trying to remember the way of getting it to re-enable, it was to unassign the drive, start the array, stop the array(at what point do you stop it though?), re-assign the drive and then restart the array isn't it?

  16. Hello, booted up my nas today and started the array, I had started and finished copying over some files and after I was done noticed the notifications that there were errors on my parity drive (see attached image). Have since stopped the array and downloaded the diagnostic file (tower-diagnostics-20190401-1040.zip). I feel the need to mention that when booting up the nas it did initially not go to the "RocketRAID bios setting utility" (attached image just from google to show what I was talking about) as it normally would and had to reboot a few times for it to come up. After getting it to come up it all started up normally.

    I had been told by @johnnie.black in a previous post I made about drive errors that it could very likely be the raid card and should replace it with another controller he recommended. 

     

    Just so I know (as I am not very experienced in using unraid) is what order of things I should proceed with things. Do I power down the nas and just wait for a new raid card, do I do a parity check now, or some diagnostic thing I should be doing to make sure there is nothing actually wrong with the drive. Just to mention I do not have an extra 10TB hdd on hand to be able to swap it out etc.

     

    Would the files that I had copied over when these errors occured after this be needed to copied again? Please let me know and I will get it all under way so I can get my nas back up and running :)

    Unraid.png

    hqdefault.jpg

  17. Thanks for getting back to me, I will do as you mentioned and try one of the different cables, I bought a new case for the nas so it is as good as any time now to do it all at once and rebuild like you mentioned. I will indeed keep an closer eye on the drives to see if any others have the same happen to them, if it is however the controller, is there perhaps a standard go to one that you could recommend?

     

    I will also update once I have tried your suggested solutions.

×
×
  • Create New...