January 10, 20179 yr I'm not sure what happened so I'll be posting a Narrative. Unraid has been running great for a year, version 6.1.3 uptime of 28 days. a docker's webgui became inaccessible so I reimaged and had it setup I had also been having 'not enough space' errors. I noticed there was an update so I updated my dockers. and then updated Unraid to 6.2.4. All updates went fine, and all dockers were working. I rechecked for updates and every docker had updates. I tried to update all but only 1 was successful. Everything was working as is so I let it go. Last night I was transferring several pictures and reorganizing the folder structure through SMB. Around the time Lightroom finished importing ~60gb the server lost local share connections and everything seemed dead. Display terminal seemed normal displaying a login prompt. I rebooted through the webgui. Once everything is loaded up docker's were orphaned. and cache is not moving even when prompted several times. reimaging dockers doesn't work, parity checks returned 20 errors, and then reported as fixed. I started rebuilding the dockers and I got further errors reported as input/output errors. I decided to update my backup and a disk became unmountable. I ran the fix common problems plugin which found several errors about permissions. complete log dump can be downloaded here. https://1drv.ms/u/s!Aq_mwOzBP2wauJtpc-MyJgvPDtBksg I am currently running in maintenance mode and doing a parity check. i'll attempt to try and extract data so I don't lose it. I appreciate the help. Thank You
January 10, 20179 yr Community Expert Check file system on disk3 (md3). https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS
January 10, 20179 yr Community Expert Also disk2 is showing some warnings signs, there are no pending nor reallocated sectors but there were recent read errors that look like bad sectors, keep an eye on it and/or run an extended SMART test. Device Model: WDC WD30EFRX-68EUZN0 Serial Number: WD-WMC4N0J7FNJH SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 18 3 Spin_Up_Time 0x0027 182 180 021 Pre-fail Always - 5866 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 261 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 085 085 000 Old_age Always - 11659 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 85 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 35 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 715 194 Temperature_Celsius 0x0022 126 104 000 Old_age Always - 24 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 ATA Error Count: 5 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 5 occurred at disk power-on lifetime: 11644 hours (485 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 c0 50 99 b3 ef Error: UNC 192 sectors at LBA = 0x0fb39950 = 263428432 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 c0 30 99 b3 ef 08 1d+00:59:28.146 READ DMA c8 00 c0 30 93 b3 ef 08 1d+00:59:28.139 READ DMA c8 00 c0 30 8d b3 ef 08 1d+00:59:28.135 READ DMA Error 4 occurred at disk power-on lifetime: 9997 hours (416 days + 13 hours) When the command that caused the error occurred, the device was active or idle.
January 10, 20179 yr Author thanks for the quick response. xfs-repair is asking to reload the disk because of a log. ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. do I go ahead and destroy it?
January 10, 20179 yr Community Expert If it doesn't mount it's the only option, it's normal in these situations and usually there's no data loss.
January 10, 20179 yr Author I am able to mount it now, it found a 0B file. do I go ahead and try to resume normal functions or should I continue to finish my backup and rebuild the server? update: mover seems to be working. assuming everything runs correctly do I go ahead and restore my dockers, run extended Smart checks?
January 10, 20179 yr Community Expert If everything seems fine use it normally, do an extended test on disk2 when you can.
Archived
This topic is now archived and is closed to further replies.