June 1, 201313 yr Hello, I've been trying to do a parity build for the last day or so, but it seems my system stops and becomes completely unresponsive after ~6 hours. The first time, I attempted to do a parity build while doing a pre clear as well, and just around the 6 hour mark everything completely froze. I couldn't telnet in, I couldn't reach unraid through the web, and I also couldn't access IPMI. Similarly the 2nd time (same situation -- with preclear running). Both times I was unable to capture a log since I could not access the machine at all. The third time I tried it without preclear running, and I also ran a tail on the log and I have that captured below -- it doesn't show much All 3 times, after it stops, there is no disk activity (or at least the green light is no longer spinning. I've run 5 pass throughs in memtest so far, with no errors there. i'll continue running it until I have some suggestions on what to try Thanks! My unRaid version is 5.0-rc11 Build: Mobo: Supermicro MBD-X9SCM-F-O LG PSU: CORSAIR TX Series CMPSU-750TX 750W CPU: Intel Core i3-2100 Sandy Bridge RAM: Kingston 8GB (2 x 4GB) 240-Pin DDR3 SDRAM ECC Unbuffered Hot Swap Drive Bays: Norco SS-500 x 4 SATA Card: Supermicro AOC-SASLP-MV8 May 31 23:52:31 Alex-MEDIA emhttp: shcmd (2453): chown nobody:users '/mnt/cache' May 31 23:52:31 Alex-MEDIA emhttp: shcmd (2454): mkdir /mnt/user0 May 31 23:52:31 Alex-MEDIA emhttp: shcmd (2455): /usr/local/sbin/shfs /mnt/user0 -disks 16777214 -o noatime,big_writes,allow_other,use_ino May 31 23:52:31 Alex-MEDIA emhttp: shcmd (2456): mkdir /mnt/user May 31 23:52:31 Alex-MEDIA emhttp: shcmd (2457): /usr/local/sbin/shfs /mnt/user -disks 16777215 2000000 -o noatime,big_writes,allow_other,use_ino May 31 23:52:31 Alex-MEDIA emhttp: shcmd (2458): crontab -c /etc/cron.d - <<< "# Generated mover schedule: 40 3 * * * /usr/local/sbin/mover |& logger" May 31 23:52:31 Alex-MEDIA emhttp: shcmd (2459): /usr/local/sbin/emhttp_event disks_mounted May 31 23:52:31 Alex-MEDIA emhttp_event: disks_mounted May 31 23:52:40 Alex-MEDIA sudo: root : TTY=console ; PWD=/tmp ; USER=nobody ; COMMAND=/usr/bin/python /usr/local/couchpotato_v2/CouchPotato.py --daemon --config_file /boot/config/plugins/couchpotato_v2/settings.conf --pid_file /var/run/couchpotato_v2/couchpotato_v2.pid May 31 23:52:48 Alex-MEDIA sudo: root : TTY=console ; PWD=/tmp ; USER=nobody ; COMMAND=/usr/bin/python /usr/local/headphones/Headphones.py -d -p 8084 --datadir /mnt/cache/apps/headphones --nolaunch --pidfile /var/run/headphones/headphones-8084.pid May 31 23:52:51 Alex-MEDIA pms: Starting Plex... May 31 23:52:51 Alex-MEDIA su[9727]: Successful su for unraid-plex by root May 31 23:52:51 Alex-MEDIA su[9727]: + root:unraid-plex May 31 23:52:53 Alex-MEDIA pms: Plex Media Server IS running May 31 23:53:06 Alex-MEDIA in.telnetd[9876]: connect from 192.168.1.6 (192.168.1.6) May 31 23:53:10 Alex-MEDIA login[9877]: ROOT LOGIN on '/dev/pts/1' from '192.168.1.6' May 31 23:53:33 Alex-MEDIA sudo: root : TTY=console ; PWD=/tmp ; USER=nobody ; COMMAND=/usr/bin/python -OO /usr/local/sabnzbd/SABnzbd.py -d -s 0.0.0.0:8081 --config-file /mnt/cache/apps/sabnzbd --pid /var/run/sabnzbd/ > /dev/null 2>&1 May 31 23:53:45 Alex-MEDIA sudo: root : TTY=unknown ; PWD=/ ; USER=nobody ; COMMAND=/usr/bin/python /usr/local/sickbeard/SickBeard.py --daemon --forceupdate --port 8083 --datadir /mnt/cache/apps/sickbeard --pidfile /var/run/sickbeard/sickbeard.pid > /dev/null 2>&1 May 31 23:53:49 Alex-MEDIA sudo: root : TTY=unknown ; PWD=/boot/config/plugins/subsonic ; USER=nobody ; COMMAND=/bin/sh ./subsonic.sh May 31 23:53:49 Alex-MEDIA kernel: mdcmd (43): check CORRECT May 31 23:53:49 Alex-MEDIA kernel: md: recovery thread woken up ... May 31 23:53:49 Alex-MEDIA kernel: md: recovery thread syncing parity disk ... May 31 23:53:50 Alex-MEDIA kernel: md: using 1536k window, over a total of 3907018532 blocks. May 31 23:53:50 Alex-MEDIA emhttp: shcmd (2460): :>/etc/samba/smb-shares.conf May 31 23:53:50 Alex-MEDIA emhttp: shcmd (2461): cp /etc/netatalk/AppleVolumes.default- /etc/netatalk/AppleVolumes.default May 31 23:53:50 Alex-MEDIA emhttp: get_config_idx: fopen /boot/config/shares/Thumbnails.cfg: No such file or directory - assigning defaults May 31 23:53:50 Alex-MEDIA emhttp: get_config_idx: fopen /boot/config/shares/www.cfg: No such file or directory - assigning defaults May 31 23:53:50 Alex-MEDIA emhttp: Restart SMB... May 31 23:53:50 Alex-MEDIA emhttp: shcmd (2462): killall -HUP smbd May 31 23:53:50 Alex-MEDIA emhttp: shcmd (2463): ps axc | grep -q rpc.mountd May 31 23:53:50 Alex-MEDIA emhttp: _shcmd: shcmd (2463): exit status: 1 May 31 23:53:50 Alex-MEDIA emhttp: shcmd (2464): cp /etc/avahi/services/smb.service- /etc/avahi/services/smb.service May 31 23:53:50 Alex-MEDIA avahi-daemon[22638]: Files changed, reloading. May 31 23:53:50 Alex-MEDIA avahi-daemon[22638]: Service group file /services/smb.service changed, reloading. May 31 23:53:50 Alex-MEDIA emhttp: shcmd (2465): /usr/local/sbin/emhttp_event svcs_restarted May 31 23:53:50 Alex-MEDIA emhttp_event: svcs_restarted May 31 23:53:51 Alex-MEDIA avahi-daemon[22638]: Service "Alex-MEDIA-SMB" (/services/smb.service) successfully established. Jun 1 03:40:01 Alex-MEDIA logger: mover started Jun 1 03:40:01 Alex-MEDIA logger: skipping Thumbnails/ Jun 1 03:40:01 Alex-MEDIA logger: skipping apps/ Jun 1 03:40:01 Alex-MEDIA logger: skipping mysql/ Jun 1 03:40:01 Alex-MEDIA logger: skipping www/ Jun 1 03:40:01 Alex-MEDIA logger: mover finished My unMenu indicates that one of my drives has » current_pending_sector=1 The SMART report for that drive is here: === START OF INFORMATION SECTION === Device Model: WDC WD30EZRX-00MMMB0 Serial Number: WD-WCAWZ2484837 Firmware Version: 80.00A80 User Capacity: 3,000,592,982,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sat Jun 1 09:57:36 2013 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x85) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (50460) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 2 3 Spin_Up_Time 0x0027 184 154 021 Pre-fail Always - 7783 4 Start_Stop_Count 0x0032 088 088 000 Old_age Always - 12081 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2747 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 089 089 000 Old_age Always - 11657 192 Power-Off_Retract_Count 0x0032 187 187 000 Old_age Always - 10243 193 Load_Cycle_Count 0x0032 194 194 000 Old_age Always - 19396 194 Temperature_Celsius 0x0022 116 103 000 Old_age Always - 36 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 2747 - # 2 Short offline Completed without error 00% 2737 - # 3 Short offline Completed without error 00% 2573 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
June 1, 201313 yr Author To add some more information -- it freezes around the same time everytime (6hrs)
June 2, 201313 yr Author Update: Booted into unraid...that drive is failing bad now. » current_pending_sector=53 » offline_uncorrectable=52 Anyone know the procedure for uh.......reattaching its external case and hoping for a warranty replacement?
June 2, 201313 yr Clearly this drive isn't going to work in UnRAID. The risk you take when you buy an external drive and pull it from the case is exactly what you've now encountered. You can type in the drive's serial number to the manufacturer's warranty check site and see if they'll cover it under warranty ... but you may simply be out-of-luck unless you removed it VERY carefully and can reassemble it in the original case for a warranty repair.
June 2, 201313 yr Author Clearly this drive isn't going to work in UnRAID. The risk you take when you buy an external drive and pull it from the case is exactly what you've now encountered. You can type in the drive's serial number to the manufacturer's warranty check site and see if they'll cover it under warranty ... but you may simply be out-of-luck unless you removed it VERY carefully and can reassemble it in the original case for a warranty repair. Yup -- thats exactly what happened. These drives actually came from before i had unRaid. It is still under warranty...but I'd have to put it back together.
Archived
This topic is now archived and is closed to further replies.