Jump to content

NAS

Moderators
  • Posts

    5,042
  • Joined

  • Last visited

Posts posted by NAS

  1. I too have read the article. Its obviously written to get a reaction and TBH thats a good thing.

     

    The one thought that I keep coming back to is the relation between speed of rebuild and size of array. With the relatively slow speed of unRAID parity creation or disk recovery and realistically we are looking at the average user having 10TB+ arrays within 3 years the chances of a failure within the unprotected period increases. Disk size is increasing WAY faster than parity speed.

     

    That combined with the fact that during these periods the disks are thrashing their asses off increasing the chance of failure.

     

    The real probability is beyond me but it is my real worry.

  2. VERY NICE POST. Mitigation comments inline:

     

    1.  “Doh!”  I didn’t mean to delete those files!  Stupid mistakes still ranks #1 in terms of losing data.  If only we could protect ourselves from ourselves.

     

    In theory a recycle bin could be added. Know of no device that does this though.

     

    2.  Trying to resolve a problem in the heat of moment without giving yourself time to weigh the options and discuss with forum members.  (Knowing what to do in the event of a drive failure is very important to a successful outcome.  Following your instincts when you are scared and stressed almost guarantees you are going to make a mistake that will cost you data.  A cool head can overcome almost any single failure. )

     

    Mitigating stupidity isnt possible :)

     

    3.  Press the restore button thinking that it is going to help you restore data.  (Although this can make data loss more likely, even if you press the restore button at a bad time there is still a good chance that you can restore your data if you don't start the array. )

     

    unRAID should mitigate this with sanity checks and better explanations.

     

    4.  Running with bad parity.  (Some users never perform a final parity check after the array is all set up to ensure that the array is integral.  There was an unRAID bug several versions ago that could cause drives to not be cleared properly and resulted in corrupted parity.  Run periodic parity checks!  It is the only way to know parity is good.  It also has positive side benefits).

     

    unRAID should mitigate this with scheduled and required parity checks.

     

    5.  Discovering you have bad sectors on other disks when trying to rebuild a failed or upsized disk.  (It is much more likely for a drive to develop a bad sector than for it to fail.  The drive itself, with some help from unRAID, can fix bad sectors if you run parity checks.  But if you don’t, and experience a true drive failure, trying to rebuild it and getting sector errors on another disk will make the recovered disk inaccurate in some very hard to figure out ways.)

     

    unRAID should mitigate this with scheduled and required sector checks.

     

    6.  Have a drive that is known to be formatted show up as unformatted but press the format button anyway.  (unRAID can mistakenly report a drive as unformatted in certain situations.  If you know the drive IS formatted, don’t reformat it!  If you accidently do reformat, seek guidance in the forums.  Data recovery is still possible in many cases.)

     

    unRAID should mitigate this by fixing the bug.

     

    7.  Install a new beta on your production array and not know how to handle unexpected errors.  (Be careful installing beta versions of unRAID on production arrays.  The chances of losing data as a result are low, but the beta may contain bugs that make it look as though there are serious issues with your array and push you to take corrective actions (like pressing restore or format) that you would not otherwise do.  If you wish to experiment with betas, know there is a risk.  Consider waiting several days for other users with test arrays to report positive experiences before taking the plunge.)

     

    User should mitigate this by common sense.

     

    8.  Let the computer overheat - e.g., don’t install enough fans, leave in hot location, overnight guest throws a blanket on the server to kill the noise (this actually happened!), etc..  (Heat is a major issue for all electronics, and computer equipment is especially vulnerable to premature failure due to operating at high heat.  Pay particular attention to getting your drive temps below 45C if at all possible.  Otherwise you are asking for trouble!)

     

    unRAID should mitigate this with temperature triggered events.

     

    9.  Not realize that a drive has failed for a very long time and then have a second drive fail.  (Unraid will simulate a failed disk, and it may not be easy for you to determine if this has happened if you are not checking the management page periodically.  Worse, a drive can be spun down and fail and you might not know for a very long time (until another drive fails) that it has failed.  Another good reason to run monthly parity checks.  The chances of two drive actually failing at the same time without some underlying cause is very low, but the chances of two drives failing before you realize is substantially higher.)

     

    unRAID should mitigate this by audible and email warnings.

     

    10.  Accidently assign a data drive to the parity slot and start the array.  (This is a rather deadly mistake.  The only way to recover from such a blunder is to immediately stop the procedure and run the reiserfsck tool to rebuild the directory structure.  Chances for complete data recovery are low, but chances of significant data recovery are high.)

     

    unRAID should mitigate this by checking and warning before allocation of the drive.

     

     

    To my eye the majority of these extremmely valid Top 10 could be partially/completely mitigated/de-risked by unRAID. Perhaps after the current beta makes stable thres a case to be made for the next version to focus on these.?

  3.  

    The bolded item below is most likely your problem.

     

    Advanced

    - USB Configuration

    ---- USB Functions ENABLED

    ---- Legacy USB Support AUTO

    ---- Port 64/60 Emulation DISABLED

    ---- USB 2.0 Controller Mode HISPEED

    ---- BIOS EHCI Hand-off ENABLED

    ---- USB Mass Storage Device Contiguration

    -------- Device #1 {your USB drive type should be listed here}

    -------- Emulation Type FORCED FDD (options are Auto, Floppy, Forced FDD, Hard Disk, CDROM)

     

    Boot

    - Removable Drives

    (make sure your USB is at the top of the list - otherwise move it to the top)

    - Boot Device Priority

    (make sure your USB is at the top of the list - otherwise move it to the top)

     

    Good luck!

     

    And we have a definitive winner.!

     

    Why i wasnt seeing the removiable drive option is that once you have set the USB to "Forced FDD" you need to save the changes and allow the BIOS to reboot. Once this is done you get the Removable Media Menu and the abilty to make the boot order changes required. This was why i wasnt seeing it as during experiments i never saved and rebooted every time.

     

    Now i can add and remove disks without haveing to alter the BIOS.

     

    Superb thanks all ! ;D

  4. Gents thanks for all the advice. The Mrs is having a music-a-thon this now so downing the server to try wouldn't go down to well. :) Will try in the AM.

     

    It really surprises me how such a simple thing has escaped MB bios coders this really shouldn't be complicated but perhaps i dont know enough to fully comprehend how complicated the back end is.

     

    bjp999 thanks for doing this for me its really appreciated.

  5. ...It's not a problem until you add more drives and you lose the ability to pick the USB as the bootable HDD.

     

    It seems the magic number of drives is 12 on the P5B. Once you get to that as you predicted you cannot select the USB from the list and therefore cannot boot.

     

    I reformatted the key using the correct HP tool but for the life of me I cannot get any menu choices for Removable Media. I am sure ive seen this option somewhere once during this adventure but I cannot make it happen again.

     

    Interestingly if you press F8 during the BIOS boot sequence you are presented with a list of every drive of all types on your system to boot from. There is no 12 limit on this menu. The downside of this is even with a working USB flash install it hangs when you choose USB boot from this menu.

     

    The ASUS manual doesn't seem to help me here as its written with a view to normal users.

     

    So my quandary is how do I get the P5B to see the flash usb as removable media. Any ideas?

  6. A success story.

     

    The shipped BIOS was 801 so i flashed it to 1002. This was trivially easy using the inbuild flashing tool.

     

    This however did not fix the problem. The Lexar USB was still in the list of Hard Drives (which is NOT the list of boot devices it is a differernt menu item). The "Boot Device Priority" bios list had 4 random devices.

     

    The fix in the end  is easy though. In the List of Hard drives (the option immediately below the "Boot Device Priority" menu item) you can change the order of disks. Select the USB Lexar and make it device 1. Immediately it replaces the SCSI boot option in the "Boot Device Priority" and unRAID will boot.

     

    I will add this to the Base Build document now.

     

    None of my other drives are recognised but thats another problem for another thread should I not find the solution myself.

    Update: By changing IDE SATA to AHCI all my drives became detected. Updated the wiki.

     

    Thanks all

  7. Using base build from the wiki.

     

    With all components installed except the physical hard drives i can select USB lexar drive to boot from. Boots quickly with no problems.

     

    However if I add hard drives to the Supermicro cards the BIOS forgets that i want to boot from USB and trys to boot from SCSI-0 by default. Normally this would be fine as i should be able to go and change the boot order.

     

    Thing is Lexar is no longer in the the bootable device list.

     

    If  I go to the general Hard Drive list in the BIOS the Lexar is still in there but its right at the end.

     

    Its as if once the list gets to long the BIOS just pushes options off the end.

     

    This is indeed a conundrum anyone got any ideas?

  8. google says it better than i ever could

     

    /lost+found - Linux should always go through a proper shutdown. Sometimes your system might crash or a power failure might take the machine down.

    Either way, at the next boot, a lengthy filesystem check using fsck will be done. Fsck will go through the system and try to recover any corrupt files that it finds. The result of this recovery operation will be placed in this directory. The files recovered are not likely to be complete or make much sense but there always is a chance that something worthwhile is recovered.

  9. in simple terms its stuff thats been lost and then found again :P

     

    Essentially its data that is on the drive but has been lost due to filesystem errors of some sort. When you checked your partition the tool found this data and stuck it in this directory rather than ignoring or deleting it.

     

    Thats the good news... the bad news is that ive never seen anything recovered that is any use.

     

    I believe (but wait for confirmation) that it is safe to delete.

     

    [im sure someone will post a far more detailed explanation]

×
×
  • Create New...