PeteAron

Members
  • Posts

    265
  • Joined

  • Last visited

Everything posted by PeteAron

  1. So, if you were in my position, what would you do? I have about 300 gb left to copy, but it is moving at less than 50 kb/sec, with errors growing. Would you stop the copy where it is, rebuild the failing disk with a new disk, then do the copy again? I think if I continue as is it will take another day or so at this speed. I'm not sure what the best course of action is.
  2. @ Frank - yes that is what I thought was happening. @trurl - The disk in "Dashboard" has a yellow triangle next to it, and the popup shows 3xxxx reallocated sectors. In "Main" this disk shows 46xxx errors. No other disks show errors. the failing disk does not have a red X next to it.
  3. When a read error occurs, am I writing the bad data from the disk with the read error, or the good data from parity? If the latter I should be ok. is parity being updated when I get a read error and write the data to a new disk? If so then I am already screwed aren't I?
  4. trurl, I used rsync -avPX /mnt/disk10/ /mnt/disk11/ it looks like the array is copying to the xfs swap disk using the parity protection. it is nearly complete - 2.54 out of 2.73 TB have been copied, and I am up to 43000 errors on the failing disk. I have a new disk on the way - after this copy is complete, I will probably use the procedure to remove the failing disk.
  5. I have a failing disk - it shows over 1000 reallocated sectors. So instead of doing the usual rebuild of the disk by replacing it, I was trying this method to copy the files to the new xfs disc. I had just completed a parity check with no errors, so I thought I would be ok. I began the disk copy last night following the published procedure. about 2/3 the way through, the copy rate dropped to about 5-30 kb/sec, and I am seeing 36000 or so errors on the unraid display for this copy. should I abort the copy, remove the failing disk, and rebuild it to a new replacement, and after that work on the xfs conversion? Or, should I just wait for the copy to complete? if I need to abort, how do I do that? I am using a direct console command to do this.
  6. Can I ask what the driver is for this preference? power drain and heat generation. i only need one 6tb parity drive. i currently have a 'full' array with 15 total drives. my case holds 15 drives and I have 16 sata slots. i have room for another card to bring me up to 24 but there's a significant expense involved in moving from a 16 to 24 drive array. in my case, i will need the new sata card, a larger power supply (ok, maybe not - i have a 650w unit), and a larger case. so i would need to throw away about 1/2 of my non-drive server investment. So i am thinking ahead - something I didnt do last year when I bought two $90 3TB 7200 rpm drive3s - these will last another 5 years or so. i need to be replacing older drives with big, cool, slow 6tb drives and maintain my server the way it is. For the next few years, anyway
  7. i am on the fence. i was just about to buy a couple of these, but they are 7200 rpm; id prefer 5900 for my non-parity drives.
  8. That's just it, i don't know where the smartlog is. I ran the test from the web interface and it simply said the test is complete- no report. Maybe I didnt do this correctly?
  9. Thank you WeeboTech. I ran the long test, but I have no idea what the output is - is there a report somewhere? thanks, kf
  10. Hi All, I have a quick question - looking at the smart report this drive is throwing a couple of errors and Im not sure when to pull the plug. this is just my cache drive so little data is at risk - unless when the mover takes data to my array. Can you please have a look at this smart report and give my your opinion - what to watch for as time goes on or when is it time to literally throw this in the trash? thanks! smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Hitachi Deskstar 7K1000.C Device Model: Hitachi HDS721010CLA332 Serial Number: JP2930HQ1N9AVH LU WWN Device Id: 5 000cca 35dd75206 Firmware Version: JP4OA39C User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Sun Feb 15 12:03:06 2015 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 9812) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 164) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 099 099 016 Pre-fail Always - 65537 2 Throughput_Performance 0x0005 136 136 054 Pre-fail Offline - 93 3 Spin_Up_Time 0x0007 116 116 024 Pre-fail Always - 325 (Average 325) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 1652 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 130 130 020 Pre-fail Offline - 35 9 Power_On_Hours 0x0012 096 096 000 Old_age Always - 30960 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 509 192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 1720 193 Load_Cycle_Count 0x0012 099 099 000 Old_age Always - 1720 194 Temperature_Celsius 0x0002 240 240 000 Old_age Always - 25 (Min/Max 11/42) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 0 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
  11. Thank you, Frank. I ran the parity check and it is fine. Darryl
  12. Hi Everyone, Stupid question it seems like: Can I run a parity check while I am pre-clearing my new 4TB drive? thanks, -KF
  13. Thanks, Gary, and dgaschk. I will check those out. I use SyncBack SE running on a Windows box. I've got a scheduled task that runs a small script that (a) send a WOL command to the backup server; (b) sends WOL commands to the two UnRAID servers (which should already be on ... this is just-in-case); © waits 3 minutes (longer than needed); and then (d) runs SyncBack with a "Backup UnRAID Servers" profile that automatically backs up the servers. I've got the task set to run once/month ... but could clearly do it at whatever interval I wanted. I think once/month is plenty, as the backup server is actually a 2nd complete backup of everything -- I also backup all data to a dedicated backup disk which I switch out and store securely whenever it fills up.
  14. Gary, could you please tell me what software you use to back up your servers to your BUS? Thanks!
  15. I apologize for being thick. I think you asked me to do the following, and that's what I did. no dice: chmod +x dtest.sh detest.sh 1565565768 40 Should i run this from a different directory? Im running it from /boot. it tells me command not found. Here you go. I see where you changed permissions to make the script executable, but I don't see where you actually ran it. Yes. Next enter "detest.sh 1565565768 40" to run the test. chmod is only needed once.
  16. I was following the advice of some here in running a non-correcting parity check, avoiding the blanket assumption that the data drives are always correct. At least this way you have the opportunity to rebuild the data in the event that parity is actually correct. I never found the method to come to this conclusion, though. so I usually run a second check to confirm and then a third to rebuild - not that this has happened often.
  17. I dont see a /hashes directory in /var/log...
  18. dgaschk, I ran the script from the /boot folder. it ran, but nothing happened - should there be output to the screen? There is nothing written to the syslog, either. thanks,
  19. One more thing. In the syslog you will see that this last parity check clearly says "NO CORRECT" however, in the main display it says parity was checked two days ago and 5 corrections were written to the parity disk. I'm running another check now. I will do the scripts testing if someone can help me with that.
  20. i think i should try the scripting suggestion listed in the fact, but I am not comfortable just copying what is there since I dont really know what I am doing and this is a pretty full server. Can someone walk me through this? What do I do - create a txt file like this for each sdx? : #!/bin/bash LOG_DIR=/var/log/hashes cd $LOG_DIR for i in {1..5} do echo "Begin sdc for the $i time." dd if=/dev/sdc skip=156556570 count=200 | md5sum -b >> sdc.log done exit How do I execute it?
  21. Here is a zip file with the smart reports and my syslog. Thanks for your help. I am looking through the faq. Smart_Reports_1229.zip syslog-2013-12-29.txt
  22. OK, i have just run another routine monthly parity check, and I see the exact same five errors. The array has been rebooted twice since my last post. Now I worry - and how do I find out where these errors are, maybe I have a bad disk. Any thoughts? Dec 27 13:46:18 Repository kernel: NTFS driver 2.1.30 [Flags: R/W MODULE]. (System) Dec 27 13:46:38 Repository unmenu-status: Exiting unmenu web-server, exit status code = 141 Dec 27 13:46:38 Repository unmenu-status: Starting unmenu web-server Dec 27 14:47:46 Repository kernel: mdcmd (805): spindown 6 (Routine) Dec 27 15:04:17 Repository kernel: mdcmd (806): spindown 1 (Routine) Dec 27 15:04:57 Repository kernel: mdcmd (807): spindown 2 (Routine) Dec 27 15:04:58 Repository kernel: mdcmd (808): spindown 5 (Routine) Dec 27 22:19:59 Repository kernel: mdcmd (809): check NOCORRECT (unRAID engine) Dec 27 22:19:59 Repository kernel: md: recovery thread woken up ... (unRAID engine) Dec 27 22:19:59 Repository kernel: md: recovery thread checking parity... (unRAID engine) Dec 27 22:19:59 Repository kernel: md: using 8000k window, over a total of 2930266532 blocks. (unRAID engine) Dec 28 00:29:47 Repository kernel: md: parity incorrect, sector=1565565768 (Errors) Dec 28 00:29:47 Repository kernel: md: parity incorrect, sector=1565565776 (Errors) Dec 28 00:29:47 Repository kernel: md: parity incorrect, sector=1565565784 (Errors) Dec 28 00:29:47 Repository kernel: md: parity incorrect, sector=1565565792 (Errors) Dec 28 00:29:47 Repository kernel: md: parity incorrect, sector=1565565800 (Errors) Dec 28 05:45:06 Repository kernel: mdcmd (810): spindown 1 (Routine) Dec 28 05:45:07 Repository kernel: mdcmd (811): spindown 3 (Routine) Dec 28 05:45:07 Repository kernel: mdcmd (812): spindown 6 (Routine) Dec 28 05:45:08 Repository kernel: mdcmd (813): spindown 7 (Routine) Dec 28 05:45:09 Repository kernel: mdcmd (814): spindown 8 (Routine) Dec 28 07:03:29 Repository kernel: md: sync done. time=31410sec (unRAID engine) Dec 28 07:03:29 Repository kernel: md: recovery thread sync completion status: 0 (unRAID engine)
  23. Thanks for the info hexen. I didnt realize that. Why would the seagate 2tb drives not show the same behavior? Thanks for the link Dale. I do need to update my firmware. It sounds like i need to shutdown the array and hook the drives up to a windows computer to do this?