Joe L.
-
Posts
19,009 -
Joined
-
Last visited
-
Days Won
1
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Posts posted by Joe L.
-
-
Google no longer supports downloads of individual files from code.google.com.
unmenu cannot be installed using those instructions.
Joe L.
-
I only (very) recently put 6.2 beta on my server.The preclear_disk.sh script is failing on v6.2 betas because the sfdisk -R is no longer supported. Preclear_disk.sh is reporting the disk as busy and will not preclear it. It looks like 'blockdev --rereadpt' is the replacement according to the sfdisk man-pages here http://man7.org/linux/man-pages/man8/sfdisk.8.html.
" Since version 2.26 sfdisk no longer provides the -R or --re-read
option to force the kernel to reread the partition table. Use
blockdev --rereadpt instead."
EDIT: There is also an issue with reads failing. I changed the following:
read_entire_disk( ) { # Get the disk geometry (cylinders, heads, sectors) fgeometry=`fdisk -l $1 2>/dev/null` units=`echo "$fgeometry" | grep Units | awk '{ print $9 }'`
to
read_entire_disk( ) { # Get the disk geometry (cylinders, heads, sectors) fgeometry=`fdisk -l $1 2>/dev/null` units=`echo "$fgeometry" | grep Units | awk '{ print $8 }'`
and the reads will work.
Joe L. - Can we get an official fix and an update from you?
I did not have any issue pre-clearing the second parity disk I have just added to my array.
The fix will need to wait until I add/replace one of the existing disks with a larger one.
(Otherwise, I have no way to test the process. )
Whatever the fix might be, it must be backwards compatible with the older releases of unRAID.
In the interim, you can type this command to "patch" the preclear_disk.sh command
First change directory to the directory holding the preclear_disk.sh command. For most, it will be
cd /boot
then type (or copy from here and paste) the following:
sed -i -e "s/print \$9 /print \$8 /" -e "s/sfdisk -R /blockdev --rereadpt /" preclear_disk.sh
Your preclear disk script will be edited and should work with the two changes you mentioned. (actually, each occurs in two places, so there are a total of 4 lines changed)
Joe L.
- 1
- 1
-
You are both missing an important part of the equation.Many of us feel that running two or three preclear cycles will get the drive past the 'infant mortality' portion of the bathtub curve (google for further discussion). Uncovering an early HD failure before putting that drive into an array is much less stressful than finding a compromised array in the first week after introducing a new drive into the mix.
PS--- I could tell a story about how the concept of infant mortality came into general knowledge to the military during WWII but that would be completely out of topic...
I agree that there is value in stress testing the drive and checking to make sure nothing is failing after the first few writes.
That said, maybe this signals that a new plugin needs to be made that removes the clearing portion of the plugin and instead focuses entirely on stress testing. Leave the clearing entirely to the OS since that's not an issue anymore.
This should allow more cycles of stress testing without having to have that long post read cycle (that verifies the dive is zeroed) meaning you can do more cycles faster... I think.
I think you are missing a part of the equation. It is not only the stress introduced by the testing, the elapsed time is an integral part of the entire process.
Un-readable sectors are ONLY marked as un-readable when they are read. Therefore, unRAID's writing of zeros to the disk does absolutely nothing to ensure all the sectors on the disk can be read. (Brand new disks have no sectors marked as un-readable)
Sectors marked as un-readable are ONLY re-allocated when they are subsequently written to. It is the reason the preclear process I wrote first reads the entire disk and then writes zeros to it. (It allows it to identify un-readable sectors, and fix them where possible)
The entire reason for the post-read phase is because quite a number of disks failed when subsequently read after being written.
If you rely on unRAID to write zeros to the disk and then put it into service, the first time you'll learn of an un-readable sector error is when you go to read the disk after you've put your data on it. (or during a subsequent parity check)
The new feature in this release of unRAID will help some to avoid a lengthy un-anticipated outage of their server if they had not pre-cleared a disk, and for that it is a great improvement. This improvement in unRAID 6.2 does not however test the disks's reliability in any way, nor identify un-readable sectors (since it only writes them, and does not read them at all)
Additional discussion about the difference between the unRAID 6.2 initial zeroing of drives replacing the preclear process should continue in another thread... and not clutter up this thread in the announcement forum.
Joe L.
-
I personally would run it again, as I understand it, because unraid runs from RAM, logs don't survive a reboot (or loss of power) so there is no way to tell what the previous preclear indicated, if you really don't want to, you could run a SMART test and if everything in there is OK you will probably be fine, what you wont be able to tell is if there were any changes before the start of the preclear and after which can indicate possible problems.
*edit* looking at that smart test there is nothing that stands out to me, but I'm no expert, Current_Pending_Sector and Reallocated_Sector_Ct are 0 though so will probably be ok.
Actually, the preclear script logs its reports on the flash drive in
/boot/preclear_reports
You might look there. If the report is not there, then it finished the clearing step, but not the post-read phase to see if it was successfully zeroed.
Joe L.
-
no, not moved on... Just have precious free time to be as heavily involved as I was a few years ago. (when I was not working.)Yes, I think Joe L. has moved on. What changes have you made to the cache_dirs script and are there any incompatibilities to the original script?
EDIT: So far all I see is that you changed the 'B' to 'b' for disks busy.
Ha! I think you need glasses if that's all the change you see But I totally get the need for a better description than the change-log.
I don't have the time right now to go into details and I don't remember everything I did but this is what I wrote earlier:
I have added adaptive depth level, to prevent cache_dirs from thrashing disks when they are otherwise occupied and cache is evicted. I found the cache was often evicted with the number of files I had when system become occupied with other things.
I added the ability to adjust depth automatically based on whether scans are judged to cause disk access or not. It judges that a disk has been accessed during scan if scan takes a long time or if any recent disk access was made (and no recent disk access was made before scanning). The purpose being to avoid the situations where cache_dirs will continuously search through my files keeping disks busy all the time. Before it was also rather difficult to tell if cache_dirs was 'trashing' my disks, now its quite clear from the log if logging is enabled (though the log is rather large at the moment). If disks are kept spinning for some consecutive scans, the depth is decreased, and future rescan is scheduled at higher depth.
If the file '/var/log/cache_dirs_lost_cache.log' exists then it will write a log that is easily imported into spreadsheet (excel) so its easier to check whether it trashes disks with current settings. I also added the kill I mentioned and some other quite minor bug-fixes.
If you need more let me know, and I might supply more detail over christmas. If you think it looks good and useful I might do a clean up run on the script. I havn't felt like spending more time on the script if nobody but me used it.
Best Alex
My servers are both built with out-dated hardware. I cannot contribute in the same way I did in the past. (One is an original server sold by Limetech, with IDE based drives, the second newer, but incapable to handle virtualization)
I do follow the threads... and respond occasionally...
Joe L.
-
Or, the SATA cables are picking up noise from adjacent cables. (adjacent power OR SATA cables)
This often occurs when a user attempts to make their server look neat by bundling all the SATA cables together.
When doing so, it is putting into place a situation where induced noise is very likely.
Therefore, cut the tie-wraps bundling cables together. Yes, it looks less neat, but... you'll see far fewer noise induced CRC errors.
Joe L.
-
If the errors are in different places each time, it is more likely to be a memory problem, disk controller problem, or a power supply problem.Thanks for the info, I had never seen this.
I am trying it now, although I think it is not the perfect choice for my case, since errors, appear in different places of the HDDs.
Very first thing to check is to run a memory test, preferably overnight (or at least several full passes). As often as not, a bad memory strip is the issue.
Joe L.
-
Even though the pre-clear had failed (detected it had not filled the disk as expected), it could have written what looks to the BIOS as a valid master-boot-record to the hard-disk being cleared.Now the strange problem:
My initial or 1st Drive seemed to exhibit a similar problem so I was not particularly worried, albeit frustrated at the loss of time but... So I decided I would just re-boot (when the drive fails preclear it ceases to be seen) and low and behold the Windows machine would not boot off any USB stick regardless of location and/or Boot Selection attempt??? Very strange behavior, I finally re-attached my Windows HD and booted successfully, then totally cleared the 2nd 5TB Drive thinking that something was amiss. Still no change despite 3 different valid USB bootable sticks.
Finally this evening, I went into the Bios again and "Restored Default Settings" and low & behold UnRaid now Boots and I've re-started the PreClear Plug-In.
So the "burning" question is How Did PreClear Failure somehow write "code" to the BIOS! Is there anything that this will do to a system in the future?
First, no software (including Preclear) writes to the BIOS.
This is actually a common problem with many motherboards. Whenever you change the installed drives list for the system, the BIOS may decide to "help" you, and reorder the boot order so that the most likely hard drive will be booted, which is usually NOT the USB drive you had configured! You did the right thing by going into the BIOS and correcting the boot order, making sure the right drive is booted, not what the BIOS *thinks* is the right drive.
Thanks Rob - I agree in a sense, but I actually selected a "seen" USB Bootable Hard Drive and it/they still failed. Maybe the Bios still changed it to the Cleared (not PreCleared) hard drive as it showed "no Bootable disc found".
Still an interesting and "freaky" thing to witness. It worked fine until the PreClear "failed" then would not boot until it was reset.
Dave
In other words, as RobJ said, your bios was trying to "help" you by choosing one of your hard-disks to boot from that it thought had a valid master-boot-record, and since none contain actual code to boot from, nothing would boot until you set the bios back to boot from the correct usb-flash-drive.
-
Since lime-tech is in release-candidate-2 of 6.1, I'd not expect new features, but instead just tiny bug-fixes so they can get to 6.1 final.Is NUT UPS support still planned for 6.1? Really looking forward to this.
(I can't speak for lime-tech, as I'm a customer, just like you, so it is always possible they would throw in something at the last moment... but I would look to a community plugin rather than something in 6.1 natively)
Joe L.
-
you can see all the available options by typing
preclear_disk.sh -?
In any case, you can specify multiple cycles on the command line easily, just use "-c N"
where N = the number of cycles desired.
example:
preclear_disk.sh -c 3 /dev/sdX
-
Yes,
I think ANY problem with the preclear is an issue./boot/preclear_bjp.sh -f -A -D /dev/sdz
Thanks,
Dave
The results of the preclear run should be stored on the flash drive in the preclear_reports folder.
Hi Guys,
From ~ 10 days ago, I wrote about my problem PreClearing a 5TB drive with a V6 key using a separate computer. The first drive got all the way through 2x but indicated that it could not Preclear the MBR, hence itimpi's comment.
I've checked, no log reports at all. After rebooting and adding the PreClear Plugin & Script, I started over several times but at some point 20+ hours into Preclear the drive/system disconnect (lose communication with the drive) rendering Preclear useless.
This has now happened to a new 2nd 5TB Drive (fresh out of the box). Anyone have a thought?
Dave
your drive is losing contact with the disk controller.
You are lucky you are discovering the issue before you start loading your data to the drive.
In many cases in the past, the issue was poor or intermittent SATA cabling to the drive, or an intermittent power splitter or power connection, or intermittent drive tray, or back-plane, or a power supply inadequate for the number of drives connected. Occasionally, it was traced to a flaky SATA controller port.
What exact power supply are you using? How many disks are being powered from it?
Do not get confused by the preclear report stating it could not clear the MBR. It must write a protective MBR to the drive, for older utilities that expect it to be there, even though the actual partition is located further up on the disk. Apparently, at the point where the MBR is being written the drive is already not communicating with the disk controller. (So no writes to the drive would work, regardless of what they were for)
Joe L.
-
That board is not in the server I just upgraded, so therefore, I cannot answer if it works, or not.I don't know for SURE, but if I recall correctly Joel's configuration includes a 1430SA ... and I know he just updated to v6 with no problem. If he noticed this, perhaps he'll confirm that he's using a 1430SA. [i'll send him a PM and a link to this comment.]
-
That was it. I'll update the wiki to make it more clear for the next "expert"I'll try removing those two files and let you know what happens when I reboot.
Joe L.
Joe L.
-
Yes, I know, and I followed its instructions specifically. (and followed the section for ADVANCED users who did not wish to re-format the flash drive.)There is this wiki: Upgrading to UnRAID v6
Tom...
root@Tower:~# ls -l /boot/extra
total 0
root@Tower:~# ls -l /boot/plugins
total 352
-rwxrwxrwx 1 root root 1510 Aug 10 2013 webGui-latest.plg*
-rwxrwxrwx 1 root root 333600 Aug 10 2013 webGui-latest.txz*
root@Tower:~# grep -v "#" /boot/config/go
/usr/local/sbin/emhttp &
root@Tower:~#
I'm guessing the files in the plugins directory should not be there. (I've never used unRAID plugins, so I'm guessing these were for the stock unRAID interface)
I did not install them specifically, I did copy the "plugins folder from the distribution to the flash drive, but that would have left previous contents.
I'll try removing those two files and let you know what happens when I reboot.
Joe L.
-
I did not. But clearing it made no difference. (I just tried)
Did you clear your browser cache?Upgrading from 5.0.6 to 6.0-rc4.
Which method did you use to upgrade? Did you format your flash drive, or manually move things around?
I did the whole format process moving from 5.0.6 a few weeks ago, and it went very smoothly.
I did not do a complete reformat. I did rename everything and disable everything, and never used dynamix ever previously.
-
Upgrading from 5.0.6 to 6.0-rc4.
Many small issues...
First, "failed to load COM32 file menu.c32 when I first attempted to boot my flash drive.
To get it to boot I had to copy menu.c32 from the syslinux folder on the flash drive to the root of the flash drive.
Then, once that was resolved, it booted and allowed me to see the disk assignment page by invoking //tower
after assigning each of the data drives, I get:
URL: tower/undefined
404 File Not Found
hitting the back button and then refreshing the browser, I see the assigned disk, but at the bottom of the main disk assignment page is:
Fatal error: Call-time pass-by-reference has been removed in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(278) : eval()'d code on line 56
After assigning all the disks as they were previously, I'm stuck...
I cannot start the array now that I've assigned my drives because of this error, as there are no buttons present to start the array.
Help is requested. This is my first attempt to boot on 6.X and I'm not impressed so far with the experience. I can click around on the various tabs in the interface and can get to all the pages... Apparently, it is just the main disk assignment page with the error so far.
It appears to me as if going straight from 5.0.6 to 6-rc4 is going to be an issue for some.
(I should add I did not completely re-format the flash drive.. I did disable all the packages/add-ons, etc in the config/go script, re-named the packages folder to packages_v5, etc. There was a older menu.c32 in the root directory of the flash drive... perhaps it was confusing syslinux. I did run the make_bootable.bat script from within Windows Vista using "run as administrator" on it before my first attempt to boot 6-rc4.)
Joe L.
-
it might be because the correct way to supply the "recursive" argument (or any argument for that matter) isI'm trying to execute a simple copy command in my go script to copy an sabnzbd skin into the right location. Anyone know why this wouldn't work?
cp /mnt/cache/applications/sabnzbd/skinsholding/Knockstrap /usr/local/sabnzbd/interfaces/ -r
I can literally copy and paste that text into the terminal from the go script and the copy works fine. Is the cache not mounted yet when the script runs?
Thanks!
cp -r source destination
options to the copy command must come before the source and destination directories/file-names
-
yes, you can do exactly as you stated.So I've started a preclear on a 4TB drive but I just read that my building is shutting off the power sometime overnight for some maintenance. Preread will finish but I'll have to shut the server during the zeroing. Is it OK to just do a ctrl-c on the script during the zeroing, shut down the server, start up in the morning and preclear again but skipping the preread? I presume if I get to the zeroing stage tonight, it'll mean the preread was error free?
Thanks in advance.
-
You are wrong. If stopped as described in that link to the wiki instructions, the array will not need to perform a parity check upon restart.Depends on what you call clean... This basically makes sure all writing is stopped.. I think (but could be wrong) that a parity check would still start after tge reboot..
The key command is
/root/mdcmd stop
which you'll only be able to perform successfully after un-mounting all the disks. (those are the first steps in the wiki link)
Joe L
-
I would try running the short smart test before doing anything that would power cycle the disk.Disk definitely virginal. Never seen such low numbers!
Self-tests are a good idea as was already mentioned.
If the self-test passes, the behavior might be due to bad cabling. Although this doesn't have the normal symptoms, I'd definitely try replacing the SATA cable.
type
smartctl -t short /dev/sdi
then wait for the time it indicates and get a new smart report
smartctl -A /dev/sdi
followed by the same steps for the long test, waiting several hours or more as indicated when invoked before getting a subsequent smartctl -A report.
(Don't forget to disable any spin-down timers, as spinning down the disk will terminate the long test.)
smartctl -t long /dev/sdi
waiting hours as needed, then
smartctl -A /dev/sdi
It might have currently stopped responding to read requests, but might start again if power cycled. The actual issue could be with the disk controller OR the disk itself.
(That is not a good behavior, as if you cease being able to read a disk it is a very bad thing in any network-storage-device)
-
I would not use either drive.
The first has over 900 sectors already re-allocated (it will not get better with use), and the second is already FAILING the smart test.
184 End-to-End_Error 0x0032 099 099 099 Old_age Always FAILING_NOW 1
Joe L.
-
Actually, it said "0 bytes copied" so it could not read the disk when it was trying to.
Might be fine in operation, but I expect you might want to keep an eye on it.
Can you get a smart report on the drive right now?
(does it respond at all to read requests?)
What do you see when you run this command that attempts to read the disk's first 195 sectors:
(it will print, at most, 30 lines of text)
dd if=/dev/sdi count=195 | od -c -A d | sed 30q
-
based on the smart report, it is highly likely to be bad cabling to the drive, or cabling picking up noise from adjacent cables.
(if you neatly tie-wrapped all the drive cables, you've caused the problem. Do NOT run the cables all parallel to each other or to power cables.)
It might also possibly indicate a power supply at its limits, with the power supplied to the drive being noisy causing the checksum errors in communicating with the drive that are showing in the SMART report:
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 172 -
Better to RMA the drive now, then to have it stop responding when attempting to load it with your data (or fail when using it to recover another failed disk).
Then it indicates your disk is dropping off-line in some way and can no longer be accessed. (either a bad disk, or a power supply that cannot supply proper power to the disk, or a disk controller that is stopping to respond, or a loose cable or connector, or loose drive tray, or back-plane. )I am having trouble preclearing a Seagate 4tb drive. Every time that I attempt to perform a preclear it hangs on step two. When it freezes, if I go to the console the monitor keeps refreshing "No such file or directory exists dev/sde."
Please see my original thread here with error logs - http://lime-technology.com/forum/index.php?topic=38014.0
Sorry to say, difficiult to isolate which it might be)
Thanks Joe. I think that I am just going to RMA the drive even thought it passes all of the Seagate SeaTools tests.
I am using a Norco 4224 case and I have tried preclearing the drive in multiple slots to eliminate the possibility of a bad cable, or PCI-E card with no luck. I was able to preclear an old 2 TB drive just fine so I am suspecting the drive.
Joe L.
Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add
in User Customizations
Posted
Yes, and I fixed my original post. (and I've used "sed" for over 40 years)