August 14, 201411 yr Hello, I initiated a parity sync last night and went to bed. I have a monitor attached to my UNRAID server and when I woke up the monitor is scrolling this over and over again. reiserfs error (device sds1): zam-7001 reiserfs_find_entry: io error I ran the command: cp /var/log/syslog /boot/syslog.txt to grab my log, but it's empty. UNRAID login: root Linux 3.9.11p-unRAID. root@UNRAID:~# cp /var/log/syslog /boot/syslog.txt root@UNRAID:~# root@UNRAID:~# cp /var/log/syslog /boot/syslog.txt root@UNRAID:~# cp /var/log/syslog /boot/syslog1.txt root@UNRAID:~# cat /car/log/syslog cat: /car/log/syslog: No such file or directory root@UNRAID:~# cat /var/log/syslog root@UNRAID:~# tail -f /var/log/syslog ^C root@UNRAID:~# The webgui is still available, and parity sync is still going. Does anyone have any suggestions, thanks for the help!
August 15, 201411 yr Author I just got home and have had some time to try and look into this a bit further. I don't seem to have a resolution, but I have found this article. http://lime-technology.com/forum/index.php?topic=8386.0 In this thread Joe L. tells Teamhood that his file system looks to have become corrupt. He was receiving a combination of the error I was receiving plus an error that his file system is read only. I am not receiving the read-only error. Should I attempt a file system repair? Thank You
August 16, 201411 yr Author Hello dgaschk, You are always the one that replies and helps me, it's happened several times over the years and I really appreciate your assistance. I think my cache drive is dying. I noticed that Simple Features showed a smart error on the drive and it would go away and come back. I ran a smartctl -a -A /dev/sds | todos >/boot/smart.txt on the drive and got this output: smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: /10:0:7: Product: 0 Physical block size: 0 bytes Lowest aligned LBA: 14138 >> Terminate command early due to bad response to IEC mode page A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. I did what Smartctl told me and ran the following: smartctl -a -A -T permissive /dev/sds | todos >/boot/smart.txt and got this smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org Short INQUIRY response, skip product id === START OF READ SMART DATA SECTION === SMART Health Status: OK Read defect list: asked for grown list but didn't get it Error Counter logging not supported Device does not support Self Test logging Do you think I should still run the check disk? Or just replace the drive?
August 16, 201411 yr Author I took out the disk and put it in another computer. I then ran a SeaTools quick scan on the drive. SeaTools claims the drive is good and that S.M.A.R.T. has not been tripped.
August 16, 201411 yr I took out the disk and put it in another computer. I then ran a SeaTools quick scan on the drive. SeaTools claims the drive is good and that S.M.A.R.T. has not been tripped. If you have fie system corruption of any sort then the SMART report would not show this. Running reiserfsck is the only way to fix such issues. Having said that a disk can pass the SMART check and still be failing. It would be useful if you provided the full output of the SMART report so that we can see I'd there are signs of problems. One item that is of particular interest is whether the value or Pending reallocated sectors is none-zero.
August 16, 201411 yr Author Hello itimpi, After putting the disk back in and running smartctl again, i was able to get the full output. smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST2000DM001-1CH164 Serial Number: <REDACTED> LU WWN Device Id: 5 000c50 04f347dfa Firmware Version: CC24 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sat Aug 16 03:02:48 2014 PDT ==> WARNING: A firmware update for this drive may be available, see the following Seagate web pages: http://knowledge.seagate.com/articles/en_US/FAQ/207931en http://knowledge.seagate.com/articles/en_US/FAQ/223651en SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 584) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 219) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x3085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 127972408 3 Spin_Up_Time 0x0003 097 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 137 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 090 060 030 Pre-fail Always - 955454005 9 Power_On_Hours 0x0032 085 085 000 Old_age Always - 13189 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 134 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 099 000 Old_age Always - 11 11 11 189 High_Fly_Writes 0x003a 042 042 000 Old_age Always - 58 190 Airflow_Temperature_Cel 0x0022 063 050 045 Old_age Always - 37 (Min/Max 26/37) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 104 193 Load_Cycle_Count 0x0032 073 073 000 Old_age Always - 54273 194 Temperature_Celsius 0x0022 037 050 000 Old_age Always - 37 (0 19 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 10424h+34m+24.623s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 125385362821 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 122287978581 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Aborted by host 90% 13184 - # 2 Short offline Completed without error 00% 13184 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
August 16, 201411 yr That SMART report does not show anything obvious that indicates the disk is dying! On that basis it is quite likely that something happened that caused file system corruption. You might therefore want to follow the earlier suggestion to run a reiserfsck file system check on the drive. Initially make sure that you only do a check and do not attempt to fix any errors. I would suggest that you report back here with the output of the check before attempting any suggested recovery action.
August 18, 201411 yr Author Hello Again, I have done as requested, I started the array via maintenance mode and ran the following: reiserfsck --check /dev/md1 ########### reiserfsck --check started at Sun Aug 17 14:17:47 2014 ########### Replaying journal: Done. Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 711607 Internal nodes 4334 Directories 3021 Other files 41133 Data block pointers 714805349 (37 of them are zero) Safe links 0 ########### reiserfsck finished at Sun Aug 17 15:14:01 2014 ########### I also ran a check just on the drive in question sds: reiserfsck --check /dev/sds1 Replaying journal: Done. Reiserfs journal '/dev/sds1' in blocks [18..8211]: 534 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 217811 Internal nodes 1427 Directories 638172 Other files 426507 Data block pointers 138123478 (0 of them are zero) Safe links 0 ########### reiserfsck finished at Sun Aug 17 17:17:03 2014 ########### It does not look like any errors were found.
August 18, 201411 yr Author I think I may have figured out my issue. On boot I got an error message that sda1 was not unmounted properly and that I should run a fsck on it. That didn't work, I got an error that it couldn't find the vfat.sys file or something. I did some digging and found that there is a command called dosfsck. I ran the following: dosfsck -av /dev/sda/ and it showed me a bunch of columns with numbers on screen, and had a comment like 'will not repair automatically". Then it gave me three choices. (paraphrasing here) 1. Copy backup to original 2. Copy original to backup 3. Do nothing. Originally I figured the safest option would to be to 'copy backup to oringinal'. I choose that option and nothing happened. So I did it again and choose 'copy original to backup'. It then continued and looked like it repaired the filesystem on my usb thumb drive. After that I rebooted, and my issues seem to have gone away. Thank You
Archived
This topic is now archived and is closed to further replies.