Jump to content

Joe L.

Members
  • Posts

    19,009
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by Joe L.

  1. One of your disks (/dev/sdi) is having lots of errors as seen in this excerpt below from your syslog. Either the drive died, or a cable came loose. In either case, I doubt the pre-clear will finish on its own. Joe L. Feb 26 16:51:40 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Feb 26 16:51:40 Tower kernel: ata8.00: irq_stat 0x00020002, device error via SDB FIS Feb 26 16:51:40 Tower kernel: ata8.00: cmd 60/00:00:e0:7e:37/02:00:05:00:00/40 tag 0 ncq 262144 in Feb 26 16:51:40 Tower kernel: res 41/40:00:05:80:37/00:00:05:00:00/40 Emask 0x409 (media error) <F> Feb 26 16:51:40 Tower kernel: ata8.00: status: { DRDY ERR } Feb 26 16:51:40 Tower kernel: ata8.00: error: { UNC } Feb 26 16:51:40 Tower kernel: ata8.00: configured for UDMA/100 Feb 26 16:51:40 Tower kernel: ata8: EH complete Feb 26 16:51:40 Tower kernel: sd 8:0:0:0: [sdi] 1953525168 512-byte hardware sectors (1000205 MB) Feb 26 16:51:40 Tower kernel: sd 8:0:0:0: [sdi] Write Protect is off Feb 26 16:51:40 Tower kernel: sd 8:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Feb 26 16:51:40 Tower kernel: sd 8:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 26 16:51:44 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Feb 26 16:51:44 Tower kernel: ata8.00: irq_stat 0x00060002, device error via SDB FIS Feb 26 16:51:44 Tower kernel: ata8.00: cmd 60/00:00:e0:7e:37/02:00:05:00:00/40 tag 0 ncq 262144 in Feb 26 16:51:44 Tower kernel: res 41/40:00:05:80:37/00:00:05:00:00/40 Emask 0x409 (media error) <F> Feb 26 16:51:44 Tower kernel: ata8.00: status: { DRDY ERR } Feb 26 16:51:44 Tower kernel: ata8.00: error: { UNC } Feb 26 16:51:44 Tower kernel: ata8.00: configured for UDMA/100 Feb 26 16:51:44 Tower kernel: ata8: EH complete Feb 26 16:51:44 Tower kernel: sd 8:0:0:0: [sdi] 1953525168 512-byte hardware sectors (1000205 MB) Feb 26 16:51:44 Tower kernel: sd 8:0:0:0: [sdi] Write Protect is off Feb 26 16:51:44 Tower kernel: sd 8:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Feb 26 16:51:44 Tower kernel: sd 8:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 26 16:51:48 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Feb 26 16:51:48 Tower kernel: ata8.00: irq_stat 0x00060002, device error via SDB FIS Feb 26 16:51:48 Tower kernel: ata8.00: cmd 60/00:00:e0:7e:37/02:00:05:00:00/40 tag 0 ncq 262144 in Feb 26 16:51:48 Tower kernel: res 41/40:00:05:80:37/00:00:05:00:00/40 Emask 0x409 (media error) <F> Feb 26 16:51:48 Tower kernel: ata8.00: status: { DRDY ERR } Feb 26 16:51:48 Tower kernel: ata8.00: error: { UNC } Feb 26 16:51:48 Tower kernel: ata8.00: configured for UDMA/100 Feb 26 16:51:48 Tower kernel: ata8: EH complete Feb 26 16:51:48 Tower kernel: sd 8:0:0:0: [sdi] 1953525168 512-byte hardware sectors (1000205 MB) Feb 26 16:51:48 Tower kernel: sd 8:0:0:0: [sdi] Write Protect is off Feb 26 16:51:48 Tower kernel: sd 8:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Feb 26 16:51:48 Tower kernel: sd 8:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 26 16:51:53 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Feb 26 16:51:53 Tower kernel: ata8.00: irq_stat 0x00060002, device error via SDB FIS Feb 26 16:51:53 Tower kernel: ata8.00: cmd 60/00:00:e0:7e:37/02:00:05:00:00/40 tag 0 ncq 262144 in Feb 26 16:51:53 Tower kernel: res 41/40:00:05:80:37/00:00:05:00:00/40 Emask 0x409 (media error) <F> Feb 26 16:51:53 Tower kernel: ata8.00: status: { DRDY ERR } Feb 26 16:51:53 Tower kernel: ata8.00: error: { UNC } Feb 26 16:51:53 Tower kernel: ata8.00: configured for UDMA/100 Feb 26 16:51:53 Tower kernel: ata8: EH complete Feb 26 16:51:53 Tower kernel: sd 8:0:0:0: [sdi] 1953525168 512-byte hardware sectors (1000205 MB) Feb 26 16:51:53 Tower kernel: sd 8:0:0:0: [sdi] Write Protect is off Feb 26 16:51:53 Tower kernel: sd 8:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Feb 26 16:51:53 Tower kernel: sd 8:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 26 16:51:57 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Feb 26 16:51:57 Tower kernel: ata8.00: irq_stat 0x00060002, device error via SDB FIS Feb 26 16:51:57 Tower kernel: ata8.00: cmd 60/00:00:e0:7e:37/02:00:05:00:00/40 tag 0 ncq 262144 in Feb 26 16:51:57 Tower kernel: res 41/40:00:05:80:37/00:00:05:00:00/40 Emask 0x409 (media error) <F> Feb 26 16:51:57 Tower kernel: ata8.00: status: { DRDY ERR } Feb 26 16:51:57 Tower kernel: ata8.00: error: { UNC } Feb 26 16:51:57 Tower kernel: ata8.00: configured for UDMA/100 Feb 26 16:51:57 Tower kernel: ata8: EH complete Feb 26 16:51:57 Tower kernel: sd 8:0:0:0: [sdi] 1953525168 512-byte hardware sectors (1000205 MB) Feb 26 16:51:57 Tower kernel: sd 8:0:0:0: [sdi] Write Protect is off Feb 26 16:51:57 Tower kernel: sd 8:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Feb 26 16:51:57 Tower kernel: sd 8:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 26 16:52:01 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Feb 26 16:52:01 Tower kernel: ata8.00: irq_stat 0x00060002, device error via SDB FIS Feb 26 16:52:01 Tower kernel: ata8.00: cmd 60/00:00:e0:7e:37/02:00:05:00:00/40 tag 0 ncq 262144 in Feb 26 16:52:01 Tower kernel: res 41/40:00:05:80:37/00:00:05:00:00/40 Emask 0x409 (media error) <F> Feb 26 16:52:01 Tower kernel: ata8.00: status: { DRDY ERR } Feb 26 16:52:01 Tower kernel: ata8.00: error: { UNC } Feb 26 16:52:01 Tower kernel: ata8.00: configured for UDMA/100 Feb 26 16:52:01 Tower kernel: sd 8:0:0:0: [sdi] Result: hostbyte=0x00 driverbyte=0x08 Feb 26 16:52:01 Tower kernel: sd 8:0:0:0: [sdi] Sense Key : 0x3 [current] [descriptor] Feb 26 16:52:01 Tower kernel: Descriptor sense data with sense descriptors (in hex): Feb 26 16:52:01 Tower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Feb 26 16:52:01 Tower kernel: 05 37 80 05 Feb 26 16:52:01 Tower kernel: sd 8:0:0:0: [sdi] ASC=0x11 ASCQ=0x4 Feb 26 16:52:01 Tower kernel: end_request: I/O error, dev sdi, sector 87523333 Feb 26 16:52:01 Tower kernel: Buffer I/O error on device sdi, logical block 10940416 Feb 26 16:52:01 Tower kernel: Buffer I/O error on device sdi, logical block 10940417 Feb 26 16:52:01 Tower kernel: Buffer I/O error on device sdi, logical block 10940418 Feb 26 16:52:01 Tower kernel: Buffer I/O error on device sdi, logical block 10940419 Feb 26 16:52:01 Tower kernel: Buffer I/O error on device sdi, logical block 10940420 Feb 26 16:52:01 Tower kernel: Buffer I/O error on device sdi, logical block 10940421 Feb 26 16:52:01 Tower kernel: Buffer I/O error on device sdi, logical block 10940422 Feb 26 16:52:01 Tower kernel: Buffer I/O error on device sdi, logical block 10940423 Feb 26 16:52:01 Tower kernel: Buffer I/O error on device sdi, logical block 10940424 Feb 26 16:52:01 Tower kernel: Buffer I/O error on device sdi, logical block 10940425 Feb 26 16:52:01 Tower kernel: ata8: EH complete Feb 26 16:52:01 Tower kernel: sd 8:0:0:0: [sdi] 1953525168 512-byte hardware sectors (1000205 MB) Feb 26 16:52:01 Tower kernel: sd 8:0:0:0: [sdi] Write Protect is off Feb 26 16:52:01 Tower kernel: sd 8:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Feb 26 16:52:01 Tower kernel: sd 8:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 26 16:52:06 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 Feb 26 16:52:06 Tower kernel: ata8.00: irq_stat 0x00060002, device error via SDB FIS Feb 26 16:52:06 Tower kernel: ata8.00: cmd 60/00:00:e0:80:37/01:00:05:00:00/40 tag 0 ncq 131072 in Feb 26 16:52:06 Tower kernel: res 68/02:00:00:00:00/00:00:00:00:68/00 Emask 0x2 (HSM violation) Feb 26 16:52:06 Tower kernel: ata8.00: status: { DRDY DF DRQ } Feb 26 16:52:06 Tower kernel: ata8.00: cmd 60/08:08:00:80:37/00:00:05:00:00/40 tag 1 ncq 4096 in Feb 26 16:52:06 Tower kernel: res 41/40:00:05:80:37/00:00:05:00:00/40 Emask 0x409 (media error) <F> Feb 26 16:52:06 Tower kernel: ata8.00: status: { DRDY ERR } Feb 26 16:52:06 Tower kernel: ata8.00: error: { UNC } Feb 26 16:52:06 Tower kernel: ata8: hard resetting link
  2. Since the one process is a child of the other, I don't think you started two... it is normal to see two processes while it is clearing the drive as the clear is done in a background process while a foreground process updates the display. On the other hand, it sure looks as if the process has stopped (assuming no read or write activity is actually occurring) You might just abort it by typing "Control-C" in the window where it was started and try once more after running a smartctl test on the drive. I have seen drives stop and look like they locked up like this when other activity occurred concurrently. It is a matter of a "deadlock" where two process both wait for the same resource to be freed, but each is really waiting for the other. If you have time, and anything else is going on on the server, I'd just try waiting (overnight) Then, I'd let it know who's boss. Joe L.
  3. The first step unRAID performs in upgrading a drive is to resize the file-system on the in-memory version of the drive being replaced. At that point, from there onward during the rebuild of the new drive onto its replacement, you have access to the entire new drive up to its physical limits. It is entirely possible to write to ANY block on the drive being upgraded, while it is being rebuilt on the replacement. For that reason, it is possible that the 750Gig portion of the 1T drive being used as a replacement is not all zeros as you might expect. You are not computing parity when replacing an existing drive, you are instead computing the "data" using the existing parity and remaining data drives. As I already said, the opportunity to save a bit of time when re-constructing a drive might not be there, as the expanded file-system is already in place. When replacing/upgrading an existing drive, the pre-clear script will only make it easier to burn-in and identify a defective drive before you use it to replace the existing drive. There is no logic in unRAID to make the rebuild go faster if the upper portion of the disk was already filled with zeros. (and I'm not sure it could regardless, since that portion of the drive has already been "formatted" to its final expanded size. and the corresponding bits in the physical drive need to be brought into sync..) Joe L.
  4. Although the preclear script might have done a nice burnin test of the drive, I don't think it benefits you on a drive rebuild (unless it prevents clearing of the rest of the disk, which am doubtful about). The preclear script does nothing that will save time when rebuilding an existing drive. The array downtime it saves is only when adding an additional drive to an existing array where parity is already configured and calculated. As you said, the preclear script will let you have some confidence the drive you will be rebuilding onto is working properly. True, a drive that is "rebuilt" onto does not get cleared, or formatted. It gets the original drive's contents (and formatting). Joe L.
  5. If the shfs process is incrementally growing, it will eventually not be able to allocate memory and crash the server. If it is not using up memory, then it will loop forever... (if it indeed is in a loop)
  6. The recursive "rm -r" is a very dangerous command if any of the links still exist or if any directories exist under Movies with actual movie files in them... I hope you did an ls -R Movies to ensure you knew what the recursive remove was going to remove. Otherwise, you might have just asked your server to remove all your movies... You would have gotten rid of your problem, but also gotten rid of your collection of movies... (Ouch...) Joe L.
  7. You will probably need to cd /mnt/disk3/Movies and then remove the symbolic links in it to the other drives. Joe L.
  8. The odds are the "Movies" directory is still a link to /mnt/user/movies, so it looks empty. Did you have movie files in the /mnt/disk3/Movies folder before you created the symbolic link? Joe L.
  9. You seem to have re-discovered that infinite loops still seem to take forever to complete, even with today's faster CPUs. I would say your best bet would be to disable user shares, then remove the link you created, then re-enable the user-shares. You can kill user shares by the following command killall shfs You can then remove the link you created (but be careful here...) first change directory to /mnt/disk3 do an ls -l Movies and see if it is a symbolic link. If it is, it will have a "->" pointing to the actual directory. If it is a symbolic link, you can just use the rm command to remove it. if Movies is a symbolic link a simple rm Movies will delete it. Then, after you clean up the link, you can re-start user shares with: /usr/local/sbin/shfs /mnt/user Good luck.. Joe L.
  10. root@Tower:/boot/scripts# preclear_disk.sh -t /dev/sdb Pre-Clear unRAID Disk ######################################################################## Device Model: ST31500341AS Serial Number: 9VS0HE2T Firmware Version: CC1H User Capacity: 1,500,301,910,016 bytes Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors Units = sectors of 1 * 512 = 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 63 2930277167 1465138552+ 0 Empty Partition 1 does not end on cylinder boundary. ######################################################################## ============================================================================ == == DISK /dev/sdb IS PRECLEARED == ============================================================================ root@Tower:/boot/scripts# All I did was boot the server, install the smartctl libraries, install the preclear script, and kick it off in three different telnet windows on the three different drives. 2 completed successfully, 1 didn't. I would do a through memory test then, and/or replace the cable to disk with another, as there would be no reason why reading a disk one day would give a different result than reading it the next. In any case, you will want to run it through another pre_clear disk cycle, just to make sure it is working well before you add it to the array. That is one of the major reasons you are burning in the drives... to detect errors that are much harder to deal with once you start using the disks for data. Joe L.
  11. Type: fdisk -l /dev/sdb dd if=/dev/sdb count=1 | od -x -A d Post the output of both commands. Joe L. Tower login: root Linux 2.6.27.7-unRAID. root@Tower:~# fdisk -l /dev/sdb Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 1 182402 1465138552+ 0 Empty Partition 1 does not end on cylinder boundary. root@Tower:~# dd if=/dev/sdb count=1 | od -x -A d 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000298241 s, 1.7 MB/s 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0000448 0000 0000 0000 003f 0000 7af1 aea8 0000 0000464 0000 0000 0000 0000 0000 0000 0000 0000 * 0000496 0000 0000 0000 0000 0000 0000 0000 aa55 0000512 root@Tower:~# It sure looks to me as if the geometry is identical, and the "od" looks the same too as mine for the 1.5TB disk. What do you get if you type: preclear_disk.sh -t /dev/sdb I'll be shocked if it does not indicate the clearing worked as it was supposed to. I'm a bit at a loss to figure out what is happening... You also had off numbers with your elapsed time calculation. It is almost as if the "shell" was having memory problems. We you doing anything else at the time the preclear was occurring to the same disk? Did you reset the time-zone and/or time when pre-clear was in progress? Could you have had a second preclear_disk.sh running on the same disk at the same time? If the output of the preclear_disk.sh -t /dev/sdb indicates it is cleared, then you might want to look at your syslog for any indications of disk read errors. Something made it think the data was different the last time it looked. Joe L.
  12. Type: fdisk -l /dev/sdb dd if=/dev/sdb count=1 | od -x -A d Post the output of both commands. Should be interesting to see what happened. The "dd" output should look like this for a 1.5TB drive (assuming your geometry is the same as my 1.5TB drive) root@Tower:/boot# dd if=/dev/sdb count=1 | od -x -A d 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00120228 s, 426 kB/s 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0000448 0000 0000 0000 003f 0000 7af1 aea8 0000 0000464 0000 0000 0000 0000 0000 0000 0000 0000 * 0000496 0000 0000 0000 0000 0000 0000 0000 aa55 0000512 The fdisk something like this: root@Tower:/boot# fdisk -l /dev/sdb Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 1 182402 1465138552+ 0 Empty Partition 1 does not end on cylinder boundary. Joe L.
  13. Interesting... I just loaded 4.4.2 myself the other day, but I don't think I've pre-cleared a disk since then. I'll need to give it a try. What "telnet" client are you using? Are you using "putty" or the command built into windows? I'm in the same time-zone as you, so my server should act the same. Joe L.
  14. Yes. Oops, I see what you are talking about now... Looks like your time-zone might not be set on your server. What do you get when you type: date '+%s' in another telnet window. I'll bet it is not just a number of "seconds" it returns. It should look like this: root@Tower:/boot# date '+%s' 1231539768 This was fixed in the most recent 4.4.2 unraid release, and broken in 4.4 and 4.5beta. The pre-clear will still work, but the elapsed time might need to be tracked manually. Joe L.
  15. Nothing you can do until the drives get added to the next version of smartctl. It happens with lots of new drives. I took a look a few hours ago, 5.38 is the most current version of smartctl unless you want to go to their development CVS tree and compile it yourself. Fortunately, most of the SMART parameters are common between the manufacturers and drive models, so the SMART reports will still help to know if the drive is acting up. I'll be curious to learn how quickly the drives clear on your server. On my array it took about 20 hours to do two concurrent 1.5TB drives while it was also doing a monthly parity check I had scheduled. All I can say is the PCI bus on mt poor server was probably very glad when it was over. Joe L.
  16. You should be able to scroll backwards (and forwards) on the console by using Shift-Pg-Up and Shift-PgDown Yes, if there is a lot of differences in the "smart" output, it will scroll the rest off the top of the screen. The actual "smart" output files are in /tmp/smart_startNNNN and /tmp/smart_finishNNNN where NNNN = the process ID of the clearing script. Type ls -l /tmp/smart* to see their names. You can re-create the "diff" with diff /tmp/smart_startNNN /tmp/smart_finishNNN The actual "SMART" output is also saved in your syslog. You can look in /var/log/syslog for it. You can use the "syslog" viewer built into unmenu to see it there. The "Ghost" entry in unMRNU is not a ghost, it is an actual partition. In fact, it was the most difficult part of the pre-clear script to get correct. It has to be exactly as if unRAID had set up the partition, skipping the first cylinder on the disk, and extending for the entire remainder of the drive. The pre-clear process creates that partition on the cleared disk. It does not put a file-system on it, but the partition is there, and it would be /dev/sdb1 (for /dev/sdb) If you are using unMENU you can use the "Smart" view of the myMain plug-in page to see how the drive did as far as SMART goes. Most important are any re-allocated sectors, and any pending re-allocation. I recently purchased two 1.5TB drives and have been putting them through pre-clear cycles to burn them in. Below is a screen capture of the myMain "Smart view" for two of my new drives I am burning in. One of them (sdb) initially had a bad cable, so the "reported_uncorrect" errors are not as bad as it might seem. That same drive re-allocated three sectors the first time I did a pre-clear. I've been running it again and again, and the number or reallocated sectors has not increased, so the drive is probably stable. (In any case, it has a 5 Yr warranty, so I'll keep an eye on it) It sounds like everything went as expected with your preclear. You can test it, of course, by typing preclear_disk.sh -t /dev/sdb Joe L. What I find most interesting is that unless you get SMART reports on the drives you have no idea these errors are happening... That means "some" of the MS-Windows errors we see might be a disk acting up, and not the Microsoft-OS. Of course, they should give you the tools to monitor the disks health... but they don't. <rant> (A crashed disk/computer is often leads to a NEW sale of a Microsoft-OS. They really don't have a huge incentive to keep the existing OS working, besides, they give no easy way to replace the disk anyway when it starts to go bad.) </rant>
  17. You can install "screen" as a supplemental package. When you invoke it and then start a command you can then disconnect and later re-connect to a running process. Otherwise, there is no other way I know if you don't have a system console. Both these packages are needed. Use "installpkg package_name.tgz" to install each in turn as shown below. http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/ap/screen-4.0.3-i486-1.tgz and http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/a/utempter-1.1.4-i486-1.tgz Most of us have a "packages" directory to hold downloaded packages. Create it by typing mkdir /boot/packages Download the two files by either typing: cd /boot/packages wget http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/ap/screen-4.0.3-i486-1.tgz and cd /boot/packages wget http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/a/utempter-1.1.4-i486-1.tgz Or download them to your windows PC by clicking on the links above, and then move them to the packages folder on your flash drive using windows file-explorer. (you will need to create the "packages" folder if it does not exist) \\tower\flash\packages To install these packages, log onto the unRAID server as root and then type: cd /boot/packages installpkg utempter-1.1.4-i486-1.tgz installpkg screen-4.0.3-i486-1.tgz cd /boot Then type screen Then start up the preclear_disk.sh process. To detach, leaving the preclear_disk.sh process running, type Control-A d Then, 10 hours later you can re-attach to the running process by logging in and typing screen -r To create another screen window for a second/third concurrent preclear, type Control-A c To switch between the screen windows type: Control-A P or Control-A N for the next or previous screen session A good article on "screen" can be found here: http://www.linuxjournal.com/article/6340 The manual page for screen is here: http://ss64.com/bash/screen.html It can do a lot more. You can "name" the screen sessions, list the sessions Control-A " (Control-A followed by a "quote") Edit: updated links to screen packages Joe L.
  18. Cool... have fun. It sure will make it start the clearing process. There are only three situations where the second array would not clear the drive. 1. No parity drive is defined 2. The disk has a valid reiserfs that starts at cylinder 63 and extends to the end of the disk as partition 1 and no other partitions. (The normal way unRAID creates a data disk) 3. The disk has a special "pre-clear" signature in the first 512 bytes on the first sector on the disk. (This differs based on disk size and geometry, and it the whole purpose of the preclear script) If you want to erase just the MBR you can use the -n option and type "Control-C" once the zeroing of the bulk of the drive starts. That should be enough to make it look unformatted to a new unRAID server. They can be pre-cleared in any system. The script does look for the unRAID specific files and folders to ensure you do not shoot yourself in your foot, but it will probably even work on a non-unraid linux box. It might give an error or two to stdout as it will not be able to open /boot/config/disks.cfg or run the "mdcmd status" command, but it should work. It will still make sure the disk is not mounted. Joe L.
  19. Are you both using the newer version of the pre-clear script, or the older original one? The older script exited the pre/post read when the "dd" command failed to read the expected number of blocks. I originally thought that would only happen when you reached the end of the disk and it tried to read disk blocks past the end. Apparently, a "read" error is occurring, before the end of the disk is reached. On the original script this exits the read phase early. When I was told this was happening I re-wrote that logic and had the loop continue reading until the end., even if errors occurred. (We really want the errors to identify bad blocks, and bad hardware) The newer version of preclear_disk.sh should never exit the read phase early. Joe L.
  20. At the command prompt type: ls -l /dev/disk/by-id A listing of all your drives will appear looking like the listing below. The "device" is at the very end of the line. A device that is brand new is likely to only have one entry matching its model/serial number. A disk that has been partitioned will have an additional entry per partition in the listing with a trailing "1" on its device name. You want to use the base device name, not the name of the first partition or subsequent partition. If the last field on the line matching your model/serial number is ../../sdb then your device is /dev/sdb (partial listing of my array follows) root@Tower:~# ls -l /dev/disk/by-id total 0 lrwxrwxrwx 1 root root 9 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG3V5LD -> ../../hdb lrwxrwxrwx 1 root root 10 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG3V5LD-part1 -> ../../hdb1 lrwxrwxrwx 1 root root 9 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG4V99D -> ../../hdd lrwxrwxrwx 1 root root 10 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG4V99D-part1 -> ../../hdd1 lrwxrwxrwx 1 root root 9 Dec 31 12:10 scsi-SATA_WDC_WD10EACS-00_WD-WCAU44206983 -> ../../sde lrwxrwxrwx 1 root root 10 Dec 31 12:10 scsi-SATA_WDC_WD10EACS-00_WD-WCAU44206983-part1 -> ../../sde1 lrwxrwxrwx 1 root root 9 Dec 31 12:10 usb-SanDisk_Corporation_MobileMate_200445269218B56190D7-0:0 -> ../../sda lrwxrwxrwx 1 root root 10 Dec 31 12:10 usb-SanDisk_Corporation_MobileMate_200445269218B56190D7-0:0-part1 -> ../../sda1 To confirm, type: smartctl -i -d ata /dev/sdb It should print the drive size, model, and serial number that matches your new drive as below... root@Tower:~# smartctl -i -d ata /dev/sdb smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3750640AS Serial Number: 5QD2ZR29 Firmware Version: 3.AAE User Capacity: 750,156,374,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Fri Jan 2 15:00:51 2009 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled Joe L.
  21. Nope, no special procedure.. in fact if the drive is mounted in any way the pre-clear script will refuse to run on it. You do need to know its proper device name, and that is it. Before you pre-clear it you just need to physically connect it to your server, but DO NOT assign it to your array. The pre-clear will allow you to burn in a drive. I'd try it for 1 cycle first, then for a few more. Before you do, make sure the "smartctl" program is functional on your version of unRAID. (A library needed for it to work is missing on the 4.4 and 4.5beat versions, but can easily be added once it is downloaded) A 1TB drive might take between 6 and 10 hours to pre-read/clear/post-read for 1 cycle, depending on the speed of your array. The smartctl program is not needed to burn-in a drive, but it is the only way to know if the burn-in turns up any issues. Joe L.
  22. Drive went back today. Replacement should be here Monday. Hopefully I will fare better with the new one. I have a few worries though. I wonder if this is typical for this new series of drives. As you can see from reviews and other user reports, there have been quite a few problems encountered. I am currently using one as my parity drive. Though its labeling indicates that it is not in the affected batch with firmware problems, I need to check it out anyway. I hope the new drive coming on Monday will test fine. Assuming that it does, I will replace the parity drive with the new one and then run your tool on the older one to verify its health. That sounds like a very good plan. While it is running it uses bandwidth to and from the disks on whatever BUS they are on. It basically is sitting waiting on the disk to respond most of the time. Other than that, because it is reading and writing so much it used memory from the buffer cache it makes less memory available to other processes. Now, as soon as it stops running those blocks of memory in the buffer cache will be re-used by Linux once they become the "least-recently-used" blocks. (the first therefore to be re-used) To restore performance, stop running the script... it is as simple as that. It uses no resources when not running. Think of it this way, a high bit-rate HD movie might have a bit rate of 35MB/s. This script is reading and writing to disks at twice that speed. (75MB/s or so) It really puts a strain on the disk... probably even more than a parity check (on a single disk) In fact, part of the "read" routine is to perform 5 read requests in parallel...moving the disk head randomly all over the disk. That situation is much more difficult on the disk hardware than simply playing a linear set of blocks of a movie. (but then I am trying to determine marginal hardware issues...before you add the disk to the array) You are welcome. Passing this test does not guaranty the disk will not crash the following week, it could happen. It is less likely to crash, at least in my mind, if it does pass. I initially wrote this routiine to add drives more quickly to the array and minimize down time. (amazing how dependent upon the server my wife has become... it is *way* more convenient to watch all the holiday movies) If not pre-cleared, I'd be facing 4 or more hours downtime as a disk is cleared. The ability to use it the same routine to burn-in a drive was a natural addition. Joe L.
  23. You are correct... the number of relocated sectors did not change from the last pre-clear cycle. You can be sure, the bad sectors are still there, the preclear script just does not show them when it displays the "differences" between the SMART report it takes at the start of the cycle, and the SMART report it takes at the end of the pre-clear. In your previous test to this, at the beginning there were 187 reallocated sectors, and 4 sectors pending reallocation. At the end, there were 191 reallocated sectors, and 0 pending re-allocation. It re-affirms what the script was designed to do, to identify the un-readable sectors during its "read" phases, and allowing the SMART software to re-map them during its writing of zeros to the drive. Although it might have found all the currently un-readable sectors, the "High-Fly-Writes are still incrementing. It my understanding they are not a sign of a healthy drive. If you want to see the full SMART reports, they are also saved in the syslog... I feel pretty good in that the script has proven its worth, both in allowing you to add a drive to your array with minimal down-time, and as in this case, allowing you to burn-in a drive before you add it to your array and get it replaced if it shows signs of an early failure. In my array I've got a very old 250Gig drive, way out of warranty that has 100 Sectors reallocated. That number has not changed since I started running it through pre-clear cycles, so I'm guessing it will last until I eventually replace it with a larger disk. I don't have the luxury of returning it for replacement. Send the drive in for replacement... It is not one you want to start out with in your array. Thanks for running it through another test cycle for me. Joe L.
  24. I noticed an increasing number of reallocated sectors, and still more pending reallocation. I would RMA the drive... It is marginal at best, and a prime candidate for problems as it gets older. Since errors continue to occur there is no way to know if they will ever stop. If you have not removed the diisk yet from your array, please download and use the newer version of the preclear_disk.sh script on it. Your disk is a perfect test case for me. (you will help me, and others who follow) Joe L.
  25. I uploaded a new version of the script, slightly modified to not abort the pre/post-read loop early on a read-failure. Please download again if you are using an earlier version. Other than that small difference, it is exactly the same. Joe L.
×
×
  • Create New...