Joe L. Posted January 5, 2009 Share Posted January 5, 2009 Here's a really dumb question: If I have a headless unRAID box, I can use Putty to connect to my server and start the preclear script. But, do I have to leave the Putty session open for 10+ hours in order to view the status? Is there any way to start preclear, disconnect from my box, connect again and view the preclear status? I bought myself 2 WD Green drives that I am going to preclear. Thanks for any info. You can install "screen" as a supplemental package. When you invoke it and then start a command you can then disconnect and later re-connect to a running process. Otherwise, there is no other way I know if you don't have a system console. Both these packages are needed. Use "installpkg package_name.tgz" to install each in turn as shown below. http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/ap/screen-4.0.3-i486-1.tgz and http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/a/utempter-1.1.4-i486-1.tgz Most of us have a "packages" directory to hold downloaded packages. Create it by typing mkdir /boot/packages Download the two files by either typing: cd /boot/packages wget http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/ap/screen-4.0.3-i486-1.tgz and cd /boot/packages wget http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/a/utempter-1.1.4-i486-1.tgz Or download them to your windows PC by clicking on the links above, and then move them to the packages folder on your flash drive using windows file-explorer. (you will need to create the "packages" folder if it does not exist) \\tower\flash\packages To install these packages, log onto the unRAID server as root and then type: cd /boot/packages installpkg utempter-1.1.4-i486-1.tgz installpkg screen-4.0.3-i486-1.tgz cd /boot Then type screen Then start up the preclear_disk.sh process. To detach, leaving the preclear_disk.sh process running, type Control-A d Then, 10 hours later you can re-attach to the running process by logging in and typing screen -r To create another screen window for a second/third concurrent preclear, type Control-A c To switch between the screen windows type: Control-A P or Control-A N for the next or previous screen session A good article on "screen" can be found here: http://www.linuxjournal.com/article/6340 The manual page for screen is here: http://ss64.com/bash/screen.html It can do a lot more. You can "name" the screen sessions, list the sessions Control-A " (Control-A followed by a "quote") Edit: updated links to screen packages Joe L. Quote Link to comment
abq-pete Posted January 5, 2009 Share Posted January 5, 2009 Joe, How about an option to output to a text file and copy the text file to the flash root when completed? Regards, Peter Quote Link to comment
bill_in_socal Posted January 5, 2009 Share Posted January 5, 2009 Thanks for that golden nugget Joe! I can imagine that someday preclear will be integrated into unMenu. But until then, "screen" looks like a great solution. I am going to give it a shot. Thanks for your time and I hope you got to do your new server build this weekend. Thanks again Quote Link to comment
bill_in_socal Posted January 5, 2009 Share Posted January 5, 2009 I ran my first preclear last night. I connected a console to my unRAID server. It appeared to run as expected steps 1 & 2 as I retired for the evening. This morning, the console displays what looks like a lot of double spaced SMART info. I can't see it all on the screen. And, I have a "ghost" entry "sdb1" in my unMenu drive listing which wasn't there before preclear completed. So, I don't know if preclear was successful or not. Is there a log of the preclear output someplace? Quote Link to comment
Joe L. Posted January 5, 2009 Share Posted January 5, 2009 I ran my first preclear last night. I connected a console to my unRAID server. It appeared to run as expected steps 1 & 2 as I retired for the evening. This morning, the console displays what looks like a lot of double spaced SMART info. I can't see it all on the screen. And, I have a "ghost" entry "sdb1" in my unMenu drive listing which wasn't there before preclear completed. So, I don't know if preclear was successful or not. Is there a log of the preclear output someplace? You should be able to scroll backwards (and forwards) on the console by using Shift-Pg-Up and Shift-PgDown Yes, if there is a lot of differences in the "smart" output, it will scroll the rest off the top of the screen. The actual "smart" output files are in /tmp/smart_startNNNN and /tmp/smart_finishNNNN where NNNN = the process ID of the clearing script. Type ls -l /tmp/smart* to see their names. You can re-create the "diff" with diff /tmp/smart_startNNN /tmp/smart_finishNNN The actual "SMART" output is also saved in your syslog. You can look in /var/log/syslog for it. You can use the "syslog" viewer built into unmenu to see it there. The "Ghost" entry in unMRNU is not a ghost, it is an actual partition. In fact, it was the most difficult part of the pre-clear script to get correct. It has to be exactly as if unRAID had set up the partition, skipping the first cylinder on the disk, and extending for the entire remainder of the drive. The pre-clear process creates that partition on the cleared disk. It does not put a file-system on it, but the partition is there, and it would be /dev/sdb1 (for /dev/sdb) If you are using unMENU you can use the "Smart" view of the myMain plug-in page to see how the drive did as far as SMART goes. Most important are any re-allocated sectors, and any pending re-allocation. I recently purchased two 1.5TB drives and have been putting them through pre-clear cycles to burn them in. Below is a screen capture of the myMain "Smart view" for two of my new drives I am burning in. One of them (sdb) initially had a bad cable, so the "reported_uncorrect" errors are not as bad as it might seem. That same drive re-allocated three sectors the first time I did a pre-clear. I've been running it again and again, and the number or reallocated sectors has not increased, so the drive is probably stable. (In any case, it has a 5 Yr warranty, so I'll keep an eye on it) It sounds like everything went as expected with your preclear. You can test it, of course, by typing preclear_disk.sh -t /dev/sdb Joe L. What I find most interesting is that unless you get SMART reports on the drives you have no idea these errors are happening... That means "some" of the MS-Windows errors we see might be a disk acting up, and not the Microsoft-OS. Of course, they should give you the tools to monitor the disks health... but they don't. <rant> (A crashed disk/computer is often leads to a NEW sale of a Microsoft-OS. They really don't have a huge incentive to keep the existing OS working, besides, they give no easy way to replace the disk anyway when it starts to go bad.) </rant> Quote Link to comment
bill_in_socal Posted January 5, 2009 Share Posted January 5, 2009 Excellent Joe. I didn't even think to look at the myMenu SMART page. Looks like my drive is OK. Was also unaware of the console scrolling hotkeys. Expecting 2 more 1TB WD drives today, so preclear is going to be busy. I really appreciate all your help. Thanks! Quote Link to comment
prostuff1 Posted January 7, 2009 Share Posted January 7, 2009 Just started this script on a 1TB Seagate drive. Am going for 3 cycles and will let everyone know how long it takes (expect to here back sometime tomorrow night most likely). This is a great little script that would be great if it was included in unMenu (which i still need to get working with my BubbaRaid install). Thanks for the work you have done Joe!! Quote Link to comment
JDGJr Posted January 7, 2009 Share Posted January 7, 2009 starting to preclear 2 1.5TB Seagates. brand new box, brand new unRAID user. similar to a previous smartctl post on this thread, my drives both show this in the report: Device is: Not in smartctl database Does this mean i have to configure something differently to take advantage of SMART? tia And, Joe - thanks for this tool, looks like a real timesaver! Quote Link to comment
Joe L. Posted January 7, 2009 Share Posted January 7, 2009 starting to preclear 2 1.5TB Seagates. brand new box, brand new unRAID user. similar to a previous smartctl post on this thread, my drives both show this in the report: Device is: Not in smartctl database Does this mean i have to configure something differently to take advantage of SMART? tia And, Joe - thanks for this tool, looks like a real timesaver! Nothing you can do until the drives get added to the next version of smartctl. It happens with lots of new drives. I took a look a few hours ago, 5.38 is the most current version of smartctl unless you want to go to their development CVS tree and compile it yourself. Fortunately, most of the SMART parameters are common between the manufacturers and drive models, so the SMART reports will still help to know if the drive is acting up. I'll be curious to learn how quickly the drives clear on your server. On my array it took about 20 hours to do two concurrent 1.5TB drives while it was also doing a monthly parity check I had scheduled. All I can say is the PCI bus on mt poor server was probably very glad when it was over. Joe L. Quote Link to comment
JDGJr Posted January 7, 2009 Share Posted January 7, 2009 I'll be curious to learn how quickly the drives clear on your server. On my array it took about 20 hours to do two concurrent 1.5TB drives while it was also doing a monthly parity check I had scheduled. All I can say is the PCI bus on mt poor server was probably very glad when it was over. looks like the 1st one i kicked of will complete in about 14:20. the 2nd completed in 12:35. I think I only enabled SMART on the drives after the process started, so I'll rerun both in a bit - doesn't hurt to be sure. Quote Link to comment
prostuff1 Posted January 9, 2009 Share Posted January 9, 2009 Mine got done in 26 hours and 35 minutes. That was 3 cycles on a 1TB Seagate drive. The script worked great and it stressed the drive like i wanted. Once i get done with this i will might run it on the old parity drive. I'm not sure i really want to/need to as the old parity drive was running fine. Quote Link to comment
JonathanM Posted January 9, 2009 Share Posted January 9, 2009 I just kicked off three telnet preclear sessions on three new 1.5TB drives. Is the time display supposed to show the dashes? Quote Link to comment
Joe L. Posted January 9, 2009 Share Posted January 9, 2009 I just kicked off three telnet preclear sessions on three new 1.5TB drives. Is the time display supposed to show the dashes? Yes. Oops, I see what you are talking about now... Looks like your time-zone might not be set on your server. What do you get when you type: date '+%s' in another telnet window. I'll bet it is not just a number of "seconds" it returns. It should look like this: root@Tower:/boot# date '+%s' 1231539768 This was fixed in the most recent 4.4.2 unraid release, and broken in 4.4 and 4.5beta. The pre-clear will still work, but the elapsed time might need to be tracked manually. Joe L. Quote Link to comment
JonathanM Posted January 10, 2009 Share Posted January 10, 2009 This is a fresh (as in rolled 1/2 hour before use) install of 4.4.2 with the only customizations being the download and install of the smartctl libraries, and download of the new york timezone file. Timezone is set to custom in the configuration. The date command as you specified returned 1231549459 as of 8:05 eastern. Quote Link to comment
Joe L. Posted January 10, 2009 Share Posted January 10, 2009 This is a fresh (as in rolled 1/2 hour before use) install of 4.4.2 with the only customizations being the download and install of the smartctl libraries, and download of the new york timezone file. Timezone is set to custom in the configuration. The date command as you specified returned 1231549459 as of 8:05 eastern. Interesting... I just loaded 4.4.2 myself the other day, but I don't think I've pre-cleared a disk since then. I'll need to give it a try. What "telnet" client are you using? Are you using "putty" or the command built into windows? I'm in the same time-zone as you, so my server should act the same. Joe L. Quote Link to comment
JonathanM Posted January 10, 2009 Share Posted January 10, 2009 What "telnet" client are you using? Are you using "putty" or the command built into windows? I'm in the same time-zone as you, so my server should act the same. This was the stock w2k command line telnet. I normally use putty, but this server is not on my home lan. Quote Link to comment
JonathanM Posted January 10, 2009 Share Posted January 10, 2009 The first disk didn't complete successfully. What logs and or other info do I need to look at to figure out why? Quote Link to comment
Joe L. Posted January 10, 2009 Share Posted January 10, 2009 The first disk didn't complete successfully. What logs and or other info do I need to look at to figure out why? Type: fdisk -l /dev/sdb dd if=/dev/sdb count=1 | od -x -A d Post the output of both commands. Should be interesting to see what happened. The "dd" output should look like this for a 1.5TB drive (assuming your geometry is the same as my 1.5TB drive) root@Tower:/boot# dd if=/dev/sdb count=1 | od -x -A d 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00120228 s, 426 kB/s 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0000448 0000 0000 0000 003f 0000 7af1 aea8 0000 0000464 0000 0000 0000 0000 0000 0000 0000 0000 * 0000496 0000 0000 0000 0000 0000 0000 0000 aa55 0000512 The fdisk something like this: root@Tower:/boot# fdisk -l /dev/sdb Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 1 182402 1465138552+ 0 Empty Partition 1 does not end on cylinder boundary. Joe L. Quote Link to comment
barbapapa Posted January 12, 2009 Share Posted January 12, 2009 Hi, thanks for a great script! I used it on a 1 TB WD10EADS yesterday and it seemed to get through it ok. It came up with one mildly worrisome error: UDMA_CRC_Error_Count : 1 Is that something worth worrying about? Jan 12 00:54:05 Tower preclear_disk-finish[1004]: SMART Attributes Data Structure revision number: 16 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: Vendor Specific SMART Attributes with Thresholds: Jan 12 00:54:05 Tower preclear_disk-finish[1004]: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 3 Spin_Up_Time 0x0027 170 169 021 Pre-fail Always - 6483 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 25 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 12 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 23 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 20 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 25 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 1 Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 Took about 11.5 hours to get through one cycle. I'm hoping the fact that it's just one UDMA CRC error that I should be OK. I'm going to do another WD10EADS shortly. Next question: I'd like to stress-test a drive that is already part of my array (2 data drives, no parity drive), but doesn't have any data on it yet. I thought about using the preclear utility on it, but if I remove it from the array, then I can't restart the array - I get the "Too many wrong/missing disks" error. Can I just use the restore function? Is there a better way to stress-test a drive that is already part of the array? Actually I wouldn't mind testing the drive that DOES have data on it as well. Quote Link to comment
RobJ Posted January 12, 2009 Share Posted January 12, 2009 UDMA_CRC_Error_Count : 1 Is that something worth worrying about? No, unless it continues to increase. If it rises further, then you may want to replace its SATA cable with a better one. Quote Link to comment
JonathanM Posted January 12, 2009 Share Posted January 12, 2009 The first disk didn't complete successfully. What logs and or other info do I need to look at to figure out why? Type: fdisk -l /dev/sdb dd if=/dev/sdb count=1 | od -x -A d Post the output of both commands. Joe L. Tower login: root Linux 2.6.27.7-unRAID. root@Tower:~# fdisk -l /dev/sdb Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 1 182402 1465138552+ 0 Empty Partition 1 does not end on cylinder boundary. root@Tower:~# dd if=/dev/sdb count=1 | od -x -A d 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000298241 s, 1.7 MB/s 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0000448 0000 0000 0000 003f 0000 7af1 aea8 0000 0000464 0000 0000 0000 0000 0000 0000 0000 0000 * 0000496 0000 0000 0000 0000 0000 0000 0000 aa55 0000512 root@Tower:~# Quote Link to comment
Joe L. Posted January 12, 2009 Share Posted January 12, 2009 The first disk didn't complete successfully. What logs and or other info do I need to look at to figure out why? Type: fdisk -l /dev/sdb dd if=/dev/sdb count=1 | od -x -A d Post the output of both commands. Joe L. Tower login: root Linux 2.6.27.7-unRAID. root@Tower:~# fdisk -l /dev/sdb Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 1 182402 1465138552+ 0 Empty Partition 1 does not end on cylinder boundary. root@Tower:~# dd if=/dev/sdb count=1 | od -x -A d 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000298241 s, 1.7 MB/s 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0000448 0000 0000 0000 003f 0000 7af1 aea8 0000 0000464 0000 0000 0000 0000 0000 0000 0000 0000 * 0000496 0000 0000 0000 0000 0000 0000 0000 aa55 0000512 root@Tower:~# It sure looks to me as if the geometry is identical, and the "od" looks the same too as mine for the 1.5TB disk. What do you get if you type: preclear_disk.sh -t /dev/sdb I'll be shocked if it does not indicate the clearing worked as it was supposed to. I'm a bit at a loss to figure out what is happening... You also had off numbers with your elapsed time calculation. It is almost as if the "shell" was having memory problems. We you doing anything else at the time the preclear was occurring to the same disk? Did you reset the time-zone and/or time when pre-clear was in progress? Could you have had a second preclear_disk.sh running on the same disk at the same time? If the output of the preclear_disk.sh -t /dev/sdb indicates it is cleared, then you might want to look at your syslog for any indications of disk read errors. Something made it think the data was different the last time it looked. Joe L. Quote Link to comment
JonathanM Posted January 12, 2009 Share Posted January 12, 2009 What do you get if you type: preclear_disk.sh -t /dev/sdb I'll be shocked if it does not indicate the clearing worked as it was supposed to. I'm a bit at a loss to figure out what is happening... You also had off numbers with your elapsed time calculation. It is almost as if the "shell" was having memory problems. We you doing anything else at the time the preclear was occurring to the same disk? Did you reset the time-zone and/or time when pre-clear was in progress? Could you have had a second preclear_disk.sh running on the same disk at the same time? If the output of the preclear_disk.sh -t /dev/sdb indicates it is cleared, then you might want to look at your syslog for any indications of disk read errors. Something made it think the data was different the last time it looked. Joe L. root@Tower:/boot/scripts# preclear_disk.sh -t /dev/sdb Pre-Clear unRAID Disk ######################################################################## Device Model: ST31500341AS Serial Number: 9VS0HE2T Firmware Version: CC1H User Capacity: 1,500,301,910,016 bytes Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors Units = sectors of 1 * 512 = 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 63 2930277167 1465138552+ 0 Empty Partition 1 does not end on cylinder boundary. ######################################################################## ============================================================================ == == DISK /dev/sdb IS PRECLEARED == ============================================================================ root@Tower:/boot/scripts# All I did was boot the server, install the smartctl libraries, install the preclear script, and kick it off in three different telnet windows on the three different drives. 2 completed successfully, 1 didn't. Quote Link to comment
Joe L. Posted January 12, 2009 Share Posted January 12, 2009 What do you get if you type: preclear_disk.sh -t /dev/sdb I'll be shocked if it does not indicate the clearing worked as it was supposed to. I'm a bit at a loss to figure out what is happening... You also had off numbers with your elapsed time calculation. It is almost as if the "shell" was having memory problems. We you doing anything else at the time the preclear was occurring to the same disk? Did you reset the time-zone and/or time when pre-clear was in progress? Could you have had a second preclear_disk.sh running on the same disk at the same time? If the output of the preclear_disk.sh -t /dev/sdb indicates it is cleared, then you might want to look at your syslog for any indications of disk read errors. Something made it think the data was different the last time it looked. Joe L. root@Tower:/boot/scripts# preclear_disk.sh -t /dev/sdb Pre-Clear unRAID Disk ######################################################################## Device Model: ST31500341AS Serial Number: 9VS0HE2T Firmware Version: CC1H User Capacity: 1,500,301,910,016 bytes Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors Units = sectors of 1 * 512 = 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 63 2930277167 1465138552+ 0 Empty Partition 1 does not end on cylinder boundary. ######################################################################## ============================================================================ == == DISK /dev/sdb IS PRECLEARED == ============================================================================ root@Tower:/boot/scripts# All I did was boot the server, install the smartctl libraries, install the preclear script, and kick it off in three different telnet windows on the three different drives. 2 completed successfully, 1 didn't. I would do a through memory test then, and/or replace the cable to disk with another, as there would be no reason why reading a disk one day would give a different result than reading it the next. In any case, you will want to run it through another pre_clear disk cycle, just to make sure it is working well before you add it to the array. That is one of the major reasons you are burning in the drives... to detect errors that are much harder to deal with once you start using the disks for data. Joe L. Quote Link to comment
JonathanM Posted January 12, 2009 Share Posted January 12, 2009 I'm kicking off another set of 3 preclears on all 3 disks. We'll see in a couple days. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.