spasszeit

Members
  • Posts

    62
  • Joined

  • Last visited

Everything posted by spasszeit

  1. I seem to have been having problems with 4.7 which I did not have before. I run two servers and ever since upgrading to 4.7 I get occasional drive drive drop off from the array. First time it happened I thought the drive was bad. Re-seated the cables, that did not help - during the system post I see one port not being detected. When array starts the disk is missing. I replaced the drive with a new one and the array rebuilt itself just fine. I put the bad drive into the other tower and ran pre-clear on it - no smart errors. Last night I had the same problem on the other server. The problem exhibited itself by inability to access web interface, though access to shares via windows and via Telnet worked fine. I captured the syslog - see attached. There is an awful lot of messages related to, I think, my SM AOC-SASLP-MV8 and a disk. I run two AOC-SASLP-MV8 cards in each server. Now, my friend built a server recently as well. He is running one AOC-SASLP-MV8 card, and he mentioned to me that ever since upgrading to 4.7 he has also experienced a problem like mine. I hope Tom can look into this and advise on what's going on here and how we can address it. Syslog_Apr_14.zip
  2. Yes, I did two builds this year based on X8SIL-F motherboards and ordered most of the parts from them. No complaints. Very quick shipping. Packaged well.
  3. Also noticed that you could save substantially on the Supermicro card and 3Ware cables by buying from Provantage... at least $45 in aggregate. Did not check your other hardware.
  4. I have a very strong suspicion that, like you said before, it may be PSU related. The disks are dropping like flies. Must be something with the hardware that is causing this epidemic of failures. I did not mention this before, but during one of the reboots the other night another two drives went missing. I powered down the server, re-seated power and SATA cables and powered it up. The drives, thank God, came back up. It finally dawned on me that I should not take any more chances so I stopped the rebuild and powered down the server. I don't rule out other hardware problems either, especially that I know I've had memory/mobo issues before. So, I decided to rebuild the entire machine. It's been in my plans anyway. The new hardware is on the way so I should have the new build next week. As of right now, my plan is as follows: 1. Take out disks 1 and 9 and copy files off of them to my second unraid machine. I am looking to find a way to do it via some kind of gui as my command line skills are lacking and I don't trust them 100%. Your suggestion to copy files to a Windows machine sounds like something I'd like to try, but I am not sure I want to do that as I have two empty 2TB drives in my second unRaid. 2. Once the new machine is ready I will put disks 1 and 9 back in, and try to rebuild disk 7, or maybe I can extract files from the 'virtual' disk 7. Yes, I still have the physical disk 7, but it is bricked. 3. If step 2 fails, I will back up files off of another 1TB disk that is the same model and same firmware and swap controller boards with the dead drive. Perhaps I can revive it. 4. If that fails as well... well, I hope I can somehow figure out what was on it and get that content back. Good thing is all of my critical files are backed up on the second server. Question, suppose I can copy files off Disks 1 and 9, is there an efficient way to tell if all files are there and if there are any corruptions? Or should I just compare the 'used' space in the original config with the one on the new drive, and then go check each individual file manually? Thanks for your support bjp999.
  5. Would it maybe make sense to stop the rebuild, upgrade the hardware and then simply copy the the files off Disk 1, and Disk 7 to another server? I suppose at this point I don't care how retarded or time consuming the process will be, the only thing I want is to not lose any or much of my data. Desperately need some expert advice.
  6. Yes, last night I was able to access Disk 7. Though it was extremely slow. After clicking on the drive it took several minutes before the directory under the drive appeared.
  7. The errors next to disk 1 are increasing. Initially the count went to 27000 and then stopped. Later I came home the count was 47K. It was about the same this morning, and then it increased. Currently the count is at 49,600. So, I would say it is sporadic. Here is the smart test report on Disk 1: Statistics for /dev/sdl 00R_WD-WCAVY0252674 smartctl -a -d ata /dev/sdl smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD20EADS-00R6B0 Firmware Version: 01.00A01 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Feb 9 09:57:47 2011 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x85) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 41) The self-test routine was interrupted by the host with a hard or soft reset. Total time to complete Offline data collection: (40800) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail Always - 204446 3 Spin_Up_Time 0x0027 149 148 021 Pre-fail Always - 9541 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 75 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 12088 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 53 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 7 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 67 194 Temperature_Celsius 0x0022 127 114 000 Old_age Always - 25 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 196 196 000 Old_age Always - 1374 198 Offline_Uncorrectable 0x0030 199 196 000 Old_age Offline - 394 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 025 001 000 Old_age Offline - 35182 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Interrupted (host reset) 90% 12088 - # 2 Short offline Interrupted (host reset) 90% 12085 - # 3 Short offline Completed: read failure 10% 12023 461689921 # 4 Short offline Completed: read failure 10% 12023 551849266 # 5 Short offline Completed without error 00% 6976 - # 6 Short offline Completed without error 00% 5520 - # 7 Short offline Completed without error 00% 4787 - # 8 Short offline Completed without error 00% 4761 - # 9 Short offline Completed without error 00% 4711 - #10 Short offline Completed without error 00% 4710 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
  8. The rebuild has not been going too well guys. Seems like Disk 1 is also having problems. I captured current syslog and it's pretty much all red. Can someone please take a look and give me a general idea what seems to be the problem? So, I have new hardware on the way. Should I just stop now and perform the rebuild on the new hardware or let this one finish? I am afraid though that something else will break by the time it's done. The speed goes up to 14000-23000KB/s briefly and stays below 500MB/s most of the time. Over last night the progress was from 32.3% to 39% today. At this rate it will take it a month to complete... Feb 9 07:14:52 Tower kernel: ata11: hard resetting link Feb 9 07:14:53 Tower kernel: ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 9 07:14:53 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 07:14:53 Tower kernel: ata11.00: device reported invalid CHS sector 0 Feb 9 07:14:53 Tower kernel: ata11: EH complete Feb 9 07:15:23 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 9 07:15:23 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 07:15:23 Tower kernel: ata11.00: cmd 25/00:00:27:d9:0c/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 07:15:23 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 9 07:15:23 Tower kernel: ata11.00: status: { DRDY } Feb 9 07:15:23 Tower kernel: ata11: hard resetting link Feb 9 07:15:25 Tower kernel: ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 9 07:15:25 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 07:15:25 Tower kernel: ata11.00: device reported invalid CHS sector 0 Feb 9 07:15:25 Tower kernel: ata11: EH complete Feb 9 07:15:55 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 9 07:15:55 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 07:15:55 Tower kernel: ata11.00: cmd 25/00:00:27:d9:0c/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 07:15:55 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 9 07:15:55 Tower kernel: ata11.00: status: { DRDY } Feb 9 07:15:55 Tower kernel: ata11: hard resetting link Feb 9 07:15:56 Tower kernel: ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 9 07:15:56 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 07:15:56 Tower kernel: ata11.00: device reported invalid CHS sector 0 Feb 9 07:15:56 Tower kernel: ata11: EH complete Feb 9 07:16:27 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 9 07:16:27 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 07:16:27 Tower kernel: ata11.00: cmd 25/00:00:27:d9:0c/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 07:16:27 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 9 07:16:27 Tower kernel: ata11.00: status: { DRDY } Feb 9 07:16:27 Tower kernel: ata11: hard resetting link .......................... Feb 9 08:01:16 Tower kernel: handle_stripe read error: 1529839464/1, count: 1 Feb 9 08:01:16 Tower kernel: md: disk1 read error Feb 9 08:01:16 Tower kernel: handle_stripe read error: 1529839472/1, count: 1 Feb 9 08:02:02 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 9 08:02:02 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:02:02 Tower kernel: ata11.00: cmd 25/00:00:b7:9e:2f/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 08:02:02 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 9 08:02:02 Tower kernel: ata11.00: status: { DRDY } Feb 9 08:02:02 Tower kernel: ata11: hard resetting link Feb 9 08:02:02 Tower kernel: ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 9 08:02:02 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:02:02 Tower kernel: ata11.00: device reported invalid CHS sector 0 Feb 9 08:02:02 Tower kernel: ata11: EH complete Feb 9 08:04:26 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 9 08:04:26 Tower kernel: ata11.00: irq_stat 0x40000001 Feb 9 08:04:26 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:04:26 Tower kernel: ata11.00: cmd 25/00:00:57:02:31/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 08:04:26 Tower kernel: res 51/40:2f:21:03:31/00:03:5b:00:00/e0 Emask 0x9 (media error) Feb 9 08:04:26 Tower kernel: ata11.00: status: { DRDY ERR } Feb 9 08:04:26 Tower kernel: ata11.00: error: { UNC } Feb 9 08:04:26 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:04:26 Tower kernel: ata11: EH complete Feb 9 08:10:53 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 9 08:10:53 Tower kernel: ata11.00: irq_stat 0x40000001 Feb 9 08:10:53 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:10:53 Tower kernel: ata11.00: cmd 25/00:00:af:15:79/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 08:10:53 Tower kernel: res 51/40:7f:26:19:79/00:00:5b:00:00/e0 Emask 0x9 (media error) Feb 9 08:10:53 Tower kernel: ata11.00: status: { DRDY ERR } Feb 9 08:10:53 Tower kernel: ata11.00: error: { UNC } Feb 9 08:10:53 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:10:53 Tower kernel: ata11: EH complete Feb 9 08:19:04 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 9 08:19:04 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:19:04 Tower kernel: ata11.00: cmd 25/00:00:27:04:9f/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 08:19:04 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 9 08:19:04 Tower kernel: ata11.00: status: { DRDY } Feb 9 08:19:04 Tower kernel: ata11: hard resetting link Feb 9 08:19:05 Tower kernel: ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 9 08:19:05 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:19:05 Tower kernel: ata11.00: device reported invalid CHS sector 0 Feb 9 08:19:05 Tower kernel: ata11: EH complete Feb 9 08:19:36 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 9 08:19:36 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:19:36 Tower kernel: ata11.00: cmd 25/00:00:27:04:9f/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 08:19:36 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 9 08:19:36 Tower kernel: ata11.00: status: { DRDY } Feb 9 08:19:36 Tower kernel: ata11: hard resetting link Feb 9 08:19:36 Tower kernel: ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 9 08:19:36 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:19:36 Tower kernel: ata11.00: device reported invalid CHS sector 0 Feb 9 08:19:36 Tower kernel: ata11: EH complete Feb 9 08:20:52 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 9 08:20:52 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:20:52 Tower kernel: ata11.00: cmd 25/00:00:27:0d:9f/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 08:20:52 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 9 08:20:52 Tower kernel: ata11.00: status: { DRDY } Feb 9 08:20:52 Tower kernel: ata11: hard resetting link Feb 9 08:20:53 Tower kernel: ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 9 08:20:53 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:20:53 Tower kernel: ata11.00: device reported invalid CHS sector 0 Feb 9 08:20:53 Tower kernel: ata11: EH complete Feb 9 08:21:23 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 9 08:21:23 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:21:23 Tower kernel: ata11.00: cmd 25/00:00:27:0d:9f/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 08:21:23 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 9 08:21:23 Tower kernel: ata11.00: status: { DRDY } Feb 9 08:21:23 Tower kernel: ata11: hard resetting link Feb 9 08:21:24 Tower kernel: ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 9 08:21:24 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:21:24 Tower kernel: ata11.00: device reported invalid CHS sector 0 Feb 9 08:21:24 Tower kernel: ata11: EH complete Feb 9 08:23:58 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 9 08:23:58 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:23:58 Tower kernel: ata11.00: cmd 25/00:00:27:02:bf/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 08:23:58 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 9 08:23:58 Tower kernel: ata11.00: status: { DRDY } Feb 9 08:23:58 Tower kernel: ata11: hard resetting link Feb 9 08:23:59 Tower kernel: ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 9 08:23:59 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:23:59 Tower kernel: ata11.00: device reported invalid CHS sector 0 Feb 9 08:23:59 Tower kernel: ata11: EH complete Feb 9 08:25:03 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 9 08:25:03 Tower kernel: ata11.00: irq_stat 0x40000001 Feb 9 08:25:03 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:25:03 Tower kernel: ata11.00: cmd 25/00:00:27:22:bf/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 08:25:03 Tower kernel: res 51/40:ff:26:23:bf/00:02:5b:00:00/e0 Emask 0x9 (media error) Feb 9 08:25:03 Tower kernel: ata11.00: status: { DRDY ERR } Feb 9 08:25:03 Tower kernel: ata11.00: error: { UNC } Feb 9 08:25:03 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:25:03 Tower kernel: ata11: EH complete Feb 9 08:25:20 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 9 08:25:20 Tower kernel: ata11.00: irq_stat 0x40000001 Feb 9 08:25:20 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:25:20 Tower kernel: ata11.00: cmd 25/00:00:27:22:bf/00:04:5b:00:00/e0 tag 0 dma 524288 in Feb 9 08:25:20 Tower kernel: res 51/40:9f:78:22:bf/00:03:5b:00:00/e0 Emask 0x9 (media error) Feb 9 08:25:20 Tower kernel: ata11.00: status: { DRDY ERR } Feb 9 08:25:20 Tower kernel: ata11.00: error: { UNC } Feb 9 08:25:20 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:25:20 Tower kernel: ata11: EH complete Feb 9 08:32:12 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 9 08:32:12 Tower kernel: ata11.00: irq_stat 0x40000001 Feb 9 08:32:12 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:32:12 Tower kernel: ata11.00: cmd 25/00:a8:07:cd:2d/00:03:5c:00:00/e0 tag 0 dma 479232 in Feb 9 08:32:12 Tower kernel: res 51/40:27:7f:cf:2d/00:01:5c:00:00/e0 Emask 0x9 (media error) Feb 9 08:32:12 Tower kernel: ata11.00: status: { DRDY ERR } Feb 9 08:32:12 Tower kernel: ata11.00: error: { UNC } Feb 9 08:32:12 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:32:12 Tower kernel: ata11: EH complete Feb 9 08:32:35 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 9 08:32:35 Tower kernel: ata11.00: irq_stat 0x40000001 Feb 9 08:32:35 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:32:35 Tower kernel: ata11.00: cmd 25/00:a8:07:cd:2d/00:03:5c:00:00/e0 tag 0 dma 479232 in Feb 9 08:32:35 Tower kernel: res 51/40:37:75:cf:2d/00:01:5c:00:00/e0 Emask 0x9 (media error) Feb 9 08:32:35 Tower kernel: ata11.00: status: { DRDY ERR } Feb 9 08:32:35 Tower kernel: ata11.00: error: { UNC } Feb 9 08:32:35 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:32:35 Tower kernel: ata11: EH complete Feb 9 08:33:05 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 9 08:33:05 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:33:05 Tower kernel: ata11.00: cmd 25/00:a8:07:cd:2d/00:03:5c:00:00/e0 tag 0 dma 479232 in Feb 9 08:33:05 Tower kernel: res 40/00:37:75:cf:2d/00:01:5c:00:00/e0 Emask 0x4 (timeout) Feb 9 08:33:05 Tower kernel: ata11.00: status: { DRDY } Feb 9 08:33:05 Tower kernel: ata11: hard resetting link Feb 9 08:33:06 Tower kernel: ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 9 08:33:06 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:33:06 Tower kernel: ata11: EH complete Feb 9 08:33:36 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 9 08:33:36 Tower kernel: ata11.00: failed command: READ DMA EXT Feb 9 08:33:36 Tower kernel: ata11.00: cmd 25/00:a8:07:cd:2d/00:03:5c:00:00/e0 tag 0 dma 479232 in Feb 9 08:33:36 Tower kernel: res 40/00:37:75:cf:2d/00:01:5c:00:00/e0 Emask 0x4 (timeout) Feb 9 08:33:36 Tower kernel: ata11.00: status: { DRDY } Feb 9 08:33:36 Tower kernel: ata11: hard resetting link Feb 9 08:33:37 Tower kernel: ata11: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 9 08:33:37 Tower kernel: ata11.00: configured for UDMA/33 Feb 9 08:33:37 Tower kernel: ata11: EH complete Total lines: 3000
  9. Good to know that I can repeat the process again with the new hardware. As far as the reallocated sectors, I checked every drive and I see 5 drives with reallocated sectors, ranging from 1 to 5. The one drive that failed first I checked later had 1700 pending reallocated, but no reallocated sectors.
  10. I bet your are right about the hardware. But I have no one but myself to blame. I have been having spats of problems with this server from time to time, like once a year, but then it stabilizes and I keep delaying hardware upgrade. A while ago after some troubleshooting I discovered that my mobo (P5BVMDO) all of a sudden refused to work with the RAM that was installed. Eventually I replaced the RAM but left only one stick as with two sticks the system would not boot. As soon as I recover the data I will upgrade the hardware. Perhaps, I should have done that before following the steps that you outlined.
  11. The speed dropped to 2,00KB/s... then back up to 6,000KB/s... keeps jumping up and down every time I refresh the browser. There is one error on disk 1.
  12. bjp999, thank you so much for the detailed response. I followed the steps and so far so good... I think. The writes on drive 7 are increasing, and the reads on all other drives are increasing. Though, I do see some minor write increases on other drives as well. The speed started off at 16,000KB/s then went down to 200Kb/s and then ramped back up to 19,500KB/s... I am letting it run and let's see what happens.
  13. Need some help here. I am running version 4.5.6. Two days ago a 2TB drive (1736 in slot 9) got spun down by itself and had a green blinking ball next to it. I rebooted the server and after the reboot the drive had a red ball next to it. I got a new drive, precleared it on a different server then replaced the bad drive and started the rebuild process. One day into rebuild I noticed the speed was 14KB/s, barely crawling. The drive in slot 2 had numerous errors. I refreshed the page and the server became unresponsive. So I hard rebooted it. This time another drive (PHBV in slot 7) became red was reported as missing. I powered the server down, re-seated the cables and powered the server on. No change. I took both drives out and put them into my other server. The PHBV is dead completely, perhaps the controller board went dead. The 1736 I can mount, short smart test shows 1700 pending reallocations. I decided to put the 1736 back into the original server and try to rebuild the PHBV drive, but now Slot 9 wants the 1007 drive, which is the replacement that I bought for the 1736 in slot 9. I attached the Disk Status page. Any advice on how I can rebuild the PHBV drive now and then rebuild the 1736 drive? Probably wishful thinking. In that case, how can I copy the data from this drive? If I am able to mount, I should be able to copy. Just can't figure out how. As for the other driver, perhaps the controller got fried and I may be able to revive it... Did not capture the system log originally, unfortunately.
  14. Congrats on finishing off your build. Enjoy it. Since SM's measurement of CPU temps is strange - all it says 'low' under PC health tab in IPMI dashboard, the only other way I can check the temps is in Unmenu. Here you go: coretemp-isa-0000 Adapter: ISA adapter Core 0: +35.0 C (high = +84.0 C, crit = +100.0 C) coretemp-isa-0001 Adapter: ISA adapter Core 1: +33.0 C (high = +84.0 C, crit = +100.0 C) coretemp-isa-0002 Adapter: ISA adapter Core 2: +35.0 C (high = +84.0 C, crit = +100.0 C) coretemp-isa-0003 Adapter: ISA adapter Core 3: +32.0 C (high = +84.0 C, crit = +100.0 C) The ambient temps are about 22-24 C I would guess. I am using stock HSF as well. Front of case has 2x 120mm intake fans, and rear has 2x 80mm exhaust fans.
  15. Now that I got my second server up and running, I would like to set up a scheduled back up of certain shares on Tower1 to Tower2. I think i got down the basics of the syntax for the rsyncd.conf file, and am able to sync Photos share (for now) manually but when it comes to automating all this I am in quite over my head, so I'd really appreciate some guidance on this. Here is what I am doing and questions I have: 1. Following JoeL's examples, I set up rsyncd.conf file on Tower2: uid = root gid = root use chroot = no max connections = 4 pid file = /var/run/rsyncd.pid timeout = 600 log file = /var/log/rsyncd.log [Photos] path = /mnt/user/media/Backups/Photos comment = /mnt files read only = FALSE 2. Automatically invoke rsync daemon process on Tower2 every time the server is rebooted. So, manually the daemon is invoked with this command: rsync --daemon --config=/boot/config/rsyncd.conf Should it be added to the go script? It would make sense, but I am curious why I don't see this command in the 'go' script in the example from this thread - http://lime-technology.com/forum/index.php?topic=3417.0 3. To start the rsync process based on some schedule, I understand I need to add something similar to this cron job to 'go' script on Tower1: #set up rsync between the two servers every other day at 3 am - will be commented out for Server2 go script echo "0 3 2-6,8-13,15-20,21-31 * * /usr/bin/rsync rsync://Server2/disk1/*" >>/tmp/crontab echo "0 3 2-6,8-13,15-20,21-31 * * /usr/bin/rsync rsync://Server2/disk2/*" >>/tmp/crontab echo "0 3 2-6,8-13,15-20,21-31 * * /usr/bin/rsync rsync://Server2/disk3/*" >>/tmp/crontab Say if in my case I want to build upon this manual command to do daily backups: cd /mnt rsync -avrH user/media/Photos tower2::Photos What should my entry be? I am trying to make sense of the example above but I am not sure I get all the syntax yet. Anything else I am missing?
  16. I, on the other hand, have had my share of problems with standard RAM sticks on my first unRaid built on P5BVM-DO. Still not sure what went wrong there. All of a sudden I started seeing numerous errors in the log, system freezes, etc. Eventually I narrowed the problem down to RAM, and ended up exchanging it, but running only one stick as with two sticks of new RAM the system wouldn't boot. Took me two weeks to get the server stable and problem free. But from what I see on the forums the issues I experienced are very uncommon.
  17. My basic reason for going with Xeon was that since I am buying a server grade mobo with ECC memory, I might as well buy a server grade CPU and take advantage of ECC. I've read somewhere that ECC memory provides greater stability and reliability, hence it is a must for mission critical applications. My unRaid has become pretty mission critical for the members of my family:-) Whenever it is down for maintenance or break-fixing, I get bombarded by complaints.
  18. LOL... that was me... I misread Kode's post and thought he'd said he wasn't sure if i3530 would work with ECC memory. Since you mentioned it, here is the link to that review. It really is very thoughtful and nicely written: http://www.servethehome.com/supermicro-x8silf-motherboard-v102-review-whs-v2-diy-server/
  19. Another alternative for you would be L3406. Albeit it is also priced much higher than the i3.
  20. No problem. Channel 1 (blue) slots don't work with one stick, all I get is long beeps and no post. Channel 2 (black) slots each work with 1 stick. I from the get go put sticks (both and 1 at a time) into channel 1 slots and assumed the same behavior for channel two slots... Now looking at the manual I see a reference to one channel taking 2 populated slots and one channel also taking 1... but I am a typical guy, I hate reading manuals:-)
  21. I have this board and am running it with a single 2GB UDIMM in slot DIMM1A. Brand? Model? Link? Micron MT18JSF25672AY from eBay. It was on Supermicro's tested memory list. Interesting. It refused one stick of my Crucial memory. Did you get the long beep at all? No long beep. Did you use slot DIMM1A as it states in the manual? Also what revision is your board? Mine is 1.02. Not sure. It's quite possible I stuck it in DIMM2A. I did not reference the manual for that.
  22. I have this board and am running it with a single 2GB UDIMM in slot DIMM1A. Brand? Model? Link? Micron MT18JSF25672AY from eBay. It was on Supermicro's tested memory list. Interesting. It refused one stick of my Crucial memory. Did you get the long beep at all?
  23. I'm assuming these are onboard SATA rates. I wonder if there is any performance boost going through the SASLP-MV8. My Atom averages about 55000K/sec on a parity check with 7200 rpm Hitachis, so I'm betting you could see 80 - 90 M/sec with non-green drives. Actually, 2 are on board, and 4 connected to the SASLP card. Wanted to test the card and left it like that afterward.
  24. Added WD20EARS as parity and recalculated the parity. Aug 25 20:09:59 Tower2 kernel: mdcmd (379): spinup 0 Aug 25 20:09:59 Tower2 kernel: Aug 25 20:10:00 Tower2 kernel: mdcmd (383): spinup 0 Aug 25 20:10:00 Tower2 kernel: Aug 25 20:10:26 Tower2 kernel: mdcmd (388): spinup 0 Aug 25 20:10:26 Tower2 kernel: Aug 26 03:00:46 Tower2 kernel: md: sync done. time=28083sec rate=69562K/sec Aug 26 03:00:46 Tower2 kernel: md: recovery thread sync completion status: 0 A bit of improvement, vs the original sync rate using Seagate 500GB for parity: Aug 24 22:41:27 Tower2 kernel: md: sync done. time=9469sec rate=51577K/sec
  25. spasszeit

    UnRaid vs WHS

    I am not going to repeat pros and cons that were already mentioned, nor am I going to describe my very painless experience of 1 failed disk recovery, or how easy it is to expand the array or replace a drive with a larger one. I'll just say that I did recently seriously contemplate running a different home server software than unRaid. The reason was that I had a hard time booting unRaid from a flash drive on my new hardware purchased for a second unRaid system. I was so frustrated that I started looking at alternatives for a while. I considered WHS, FlexRaid, FreeNAS, Openfiler and ZFS. I have to tell you, to me absolutely nothing came even close to unRaid which I have been using since 2007. It totally satisfies my needs and is very simple to setup and maintain for a non-Linux-savvy user like myself. My only concern is that if I do lose more than one drive the data on those two drives will be lost. I have some data that I cannot lose, like family videos in HD. Until recently I had it backed up elsewhere, but since the size is growing fast, I need another solution. I decided to build a second server which will be located in a different location and where I will keep duplicates of critical data. In a way, it is similar to the WHS's duplication, but I don't have to duplicate everything to have some kind of protection. To me unRaid is a much more elegant solution, more stable, feature rich and over the past three years it did not let me down. I will not go other way unless I absolutely have to. Finally, I just don't think that duplication on WHS is worth much. In a properly protected system (non-fail surge protector, UPS and good ventilation) a chance of hard drive failure is very small. I had only one failed hard drive in unRaid over the past 3 years, out of 20 drives I am currently running, from which I easily recovered. This tells me that duplication would have been a total waste of a lot of money. Now, if lightning struck the house or there was a fire or some other disasterous event, the whole system would have been destroyed, again, that duplication would have been worth nothing in the end. A more prudent way to use duplication is to set up different servers and keep them as far away from each other as reasonably possible. Two unRaid servers would do the trick.