Marky

Members
  • Posts

    79
  • Joined

  • Last visited

Posts posted by Marky

  1. Thanks for the info guys. I was thinking it was a glitch myself also.

    Going to replace it anyway and work on it outside of unraid.

     

    A possibility (that I read online so not sure if true) is that its a spare sector that has not been brought into

    use but the drive firmware has seen it as potentially bad.

     

    May call HGST and see if they have an explanation. Will report back if I find out anything.

     

    Mark

     

  2. Just recently upgraded to unraid 6 from 5 and immediately saw that one if my disks has 8 pending sectors.

     

    Since this is an AF drive I understand that to mean there is 1 4k sector pending to be reallocated.

     

    I've run both smart short and long tests and they both pass. I see no errors and there is no LBA of where this bad sector is.

    Would like to find that out to see if it was in use and what file might be affected.

     

    I've copied all the data off the drive to another machine as a failsafe and again I had no read errors.

     

    Anyone have any ideas why if there is indeed a bad sector why the smart reports don't show the LBA.

     

    The drive is a HGST 6TB NAS.

     

    Any help would be appreciated.

     

    Mark

     

  3. So wanted to give an update on this as it is strange.

     

    I compared all the files and only one file was different, so replaced the bad file on the new disk.

    Ran a correcting parity check to fix anything.

    Did a non correcting check to make sure and no sync errors.

     

    Here's where it gets strange.

     

    I've upgraded another disk and the same thing happened.

    Re-built fine, no errors.

    Did a non correcting check and again 5 sync errors.

    Compared all the files and again only one file is different.

     

    This time I had a look at the contents of the bad file. There was a whole section of zero's

    but in part of that block was the string EFI PART.

     

    This seems odd to me as though wrong data was written on the rebuild.

    Wonder if I'm hitting an edge case somehow of a bug?

     

    Before I update any more drives I'm going to move to V6 I think.

     

    Mark

     

     

  4. Its Reiser.

     

    I've just been looking at Linux reader, unfortunately it does not look like you can use any external tools. No driver letter or mount point externally.

     

    I'm thinking of the network route, safest to leave the new drive where it is.

     

    Easier I think to just mount the old drive in a Linux install read only.

     

    Any recommend utility on Linux to do the full directory compare of all files?

     

    Thanks for your help.

     

    Mark

     

  5.  

    Hard to know the exact cause I guess. Parity was good before the rebuild. No read errors shown on any disk after rebuild.

    They only showed after a non-correcting check after the rebuild was complete.

    Parity should only be read during rebuild so not sure how there would be parity errors.

     

    I think my plan to be safe is to put old and new disks in a different box and binary diff the files.

    If all files match then I guess there are parity errors so I can do a correcting parity.

     

    If they don't match then a do another rebuild, check parity and then also binary diff again.

     

    Anything wrong with this plan?

     

     

  6. 5 parity errors after a data disk rebuild would seem to imply that the rebuilt data doesn't match parity. I would say rebuild again except for these 2 questions:

     

    Did you preclear or otherwise test the new disk?

     

    What about SMART for all drives?

     

    I did run a full preclear on the new disk, all clear.

    SMART checks on all drives are good.

     

    It was my assumption that the rebuild had a glitch as parity would not be written to on a rebuild unless there were errors.

    No errors shown on UI after the rebuild on any disk.

     

    First time I have seen this happen and upraded many disks in the same system over the years.

     

    Mark

     

  7. After some advice on the best way to procede. Here is what I did.

     

    1) Ran a parity check, all fine no errors.

    2) Upgraded data disk to a larger one and rebuilt data.

    3) Ran a non-correcting check to make sure everything was good, 5 sync errors.

     

    No errors showing in the ui on any disk so I'm wondering what caused these. Never had it happen before.

     

    Would it be best to do another data rebuild and run the parity check again?

     

    Also thinking of connecting the old and new disk to another box and running a binary diff on the files to check them.

     

    Any advice would be appreciated.

     

    Mark

     

     

  8. I agree with garycase, these are great drives. I would not have posted the link if I did not use them myself. I bought another couple of them and luckily got my local Fry's to pricematch.

     

    For me the speed difference between these and 7200rpm drives has not been noticeable. They have the advantage because of spinning slow of lower power use and less heat generation.

    What you want in a box with many drives close together.

     

    Mark

     

  9. Agree jaybee.

     

    Since this is still on the website about the company.

     

    Lime Technology, LLC is a privately held company founded in 2005. Our company strives for the highest possible Customer satisfaction and Business integrity.

     

    Failing at that in my opinion.

     

  10. I've not tried RC8 yet to see if speed problems are still there but all RC releases i've tried have the problem.

     

    4.7 parity check ~55MB/s

    B14 parity check ~50MB/s (stilll using this version due to speed issues on RC's)

     

    RC? parity check ~17MB/s!!!!!

     

    Here's the hardware config.

     

    20 Hitachi drives

    Supermicro C2SEA Motherboard

    2x AOC-SASLP-MV8 Controllers

     

    4 drives on the motherboard (1 of which is the parity) and the other 16 on the 2 SASLP controllers

     

    Mark

     

  11. I'm using 2 SASLP cards and have had speed issues with all the RC's i've tried. Parity checks at just 17MB/sec is pathetic.

     

    Gone back to B14 and turned off any drive spindowns (which i hate) but it does make it reliable.

     

    Parity checks on B14 are ~50MB/sec.

     

    So system is capable of decent performance.

     

    Mark

     

    I myself heeded tyrindor's advice and went back to RC6. While of course I want the best performance I can get, the risk of data corruption is far, far worse than poor performance.

     

    I'm curious, though--you said that you turned off drive spindowns. Has that been proven to be the downfall of B14? Drive spindowns = inevitable problems?

     

    I was getting problems with drives being redballed for no real reasons. Believe it might have been a timing issue of the the time to spin a drive up and unraid thinking a drive was not responding. Since leaving all drives spunup i've not had an issue.

     

    I'm staying with what works until there is a proven release that works and does not have speed problems.

     

    Mark

     

  12. I found /sbin and it has poweroff as a function as opposed to powerdown.  I guess I should use that. 

     

    My issue now is where I can put the log files other than on the flash drive, i.e. boot.  I have a large USB stick internal to the case that I am using as my "Scratch" drive. It is not part of the array, so I do not understand why it is holding up a stopping of the array.  Could someone explain.  I will do a test eliminating its use and leaving the log files where ever they normally go and see if I can then stop the array.

     

    Will report back later

     

    You should not use /sbin/poweroff since it will not cleanly stop the array first and next time you power up, the array will start a parity check.  The correct command to use is /root/powerdown which is a symlink to /usr/local/sbin/powerdown which is also the same script invoked when you press the power button on your chassis.  However, current -rc4 has a bug, fixed in -rc5, than also doesn't cleanly stop the array first  :P

     

    As for the continuous "Retry unmounting..." messages - I don't see how your mounted scratch device can cause this.  As mbryanr mentioned, try the "lsof | grep mnt" command to find out what has one of the disk mount points open.

     

    I think i'm right in thinking though that powerdown only works if emhttp is running. Correct? Is there a safe way to powerdown if emhttp has crashed. A problem in that case seeing as ehttp cannot be restarted.

     

  13. Far more likely it is emhttp incorrectly handling un-expected (null) entries in /proc/mdcmd

    What makes you think that?

     

    To others-

    You can not kill emhttp and then restart, it is not designed to do that and you will get a segfault every time.

     

    Actually you can do a powerdown with the script by opening a terminal session with the server and entering "powerdown" that's the only way I've been able to get WebGUI back, is a powerdown and reboot.

    I had an issue like this on Saturday and of course hard to check wiki and doc when there is a site outage.

     

    What do you do if emhttp has died/become unresponsive and can't be restared if you want to shut the server down.

     

    I believe the powerdown script requires the web interface to be running, is there another way to safely shutdown?

     

    I tried just that, through telnet and on the server itself. powerdown script does nothing as it needs the web interface running

    which it wasn't as it had bombed out. If the web interface can't be restarted due to the way it was written then your stuck.

     

    I had to use poweroff which of course then wanted to run a parity check. When the webgui died all disk operations had

    completed so i stopped the parity check and just let it run a non correcting check, everything was ok.

     

    If there is no way to really shutdown the server if the webgui dies and can't be restarted then maybe an enhancement

    is needed to add this. Of course the webgui should not ideally have these issues but as can be seen by all reports

    of it being unresponsive its quite common.