February 28, 20179 yr I've been a dedicated Unraid user for years- that being said I am running unRAID Server Pro version: 5.0-rc16c.. (my motto: if it aint broke don't fix it :)) I have a 3TB parity drive and 4x3TB data drives. I noticed today 55 errors on Disk 3. From the log, it appears this happened on 2/18. Here is a cut from syslog: Feb 18 20:25:47 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 18 20:25:47 Tower kernel: ata4.00: BMDMA stat 0x25 Feb 18 20:25:47 Tower kernel: ata4.00: failed command: READ DMA EXT Feb 18 20:25:47 Tower kernel: ata4.00: cmd 25/00:00:68:ca:04/00:04:5d:01:00/e0 tag 0 dma 524288 in Feb 18 20:25:47 Tower kernel: res 51/40:af:b0:cc:04/40:01:5d:01:00/e0 Emask 0x9 (media error) Feb 18 20:25:47 Tower kernel: ata4.00: status: { DRDY ERR } Feb 18 20:25:47 Tower kernel: ata4.00: error: { UNC } Feb 18 20:25:47 Tower kernel: ata4.00: configured for UDMA/133 Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] Unhandled sense code Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] Feb 18 20:25:47 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08 Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] Feb 18 20:25:47 Tower kernel: Sense Key : 0x3 [current] [descriptor] Feb 18 20:25:47 Tower kernel: Descriptor sense data with sense descriptors (in hex): Feb 18 20:25:47 Tower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 01 Feb 18 20:25:47 Tower kernel: 5d 04 cc b0 Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] Feb 18 20:25:47 Tower kernel: ASC=0x11 ASCQ=0x4 Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] CDB: Feb 18 20:25:47 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 01 5d 04 ca 68 00 00 04 00 00 00 Feb 18 20:25:47 Tower kernel: end_request: I/O error, dev sdf, sector 5855562928 Feb 18 20:25:47 Tower kernel: ata4: EH complete Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562864 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562872 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562880 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562888 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562896 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562904 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562912 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562920 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562928 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562936 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562944 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562952 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562960 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562968 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562976 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562984 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562992 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563000 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563008 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563016 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563024 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563032 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563040 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563048 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563056 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563064 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563072 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563080 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563088 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563096 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563104 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563112 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563120 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563128 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563136 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563144 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563152 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563160 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563168 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563176 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563184 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563192 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563200 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563208 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563216 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563224 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563232 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563240 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563248 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563256 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563264 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563272 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563280 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563288 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563296 Feb 18 21:41:31 Tower kernel: mdcmd (108): spindown 0 Feb 18 22:28:02 Tower kernel: mdcmd (109): spindown 2 Feb 18 22:29:23 Tower kernel: mdcmd (110): spindown 1 Feb 18 22:29:33 Tower kernel: mdcmd (111): spindown 4 Feb 19 00:37:15 Tower kernel: mdcmd (112): spindown 3 Not being all that familiar with what to do when things go wrong, what would be my next steps other than panic? 1) Copy all data to another drive, on or off the array? 2) Run a parity check? ... last parity check was about a year ago ....Check the box: Correct any Parity-Check errors by writing the Parity disk with corrected parity? 4) Replace drive and rebuild? Thanks!
February 28, 20179 yr That looks like a disk uncorrectable error, may or not be one or more pending sectors, a SMART report may help diagnose. 19 minutes ago, WannaTheater said: last parity check was about a year ago Parity checks should be run more regularly, most do it once a month.
February 28, 20179 yr Author Thanks. ... I just learned how to run a smart report, which is attached. smart.txt
February 28, 20179 yr There are pending sectors, so you should replace that disk ASAP. 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10
February 28, 20179 yr Author Thank you for your help. I've done this before, but not in years.... Shut down, physically replace disc, then I am assuming I can follow the prompts to rebuild that disc? Or do I need to do anything else to prepare?
March 3, 20179 yr Author Should the replacement drive be precleared? I thought I read a post stating the only while expanding the array they need to be precleared. What if I am replacing due to currently installed drive showing signs of failure? Is the pre-clear necessary? Or will it happen automatically? (I am running 5.0-rc16c) My thinking is to pre-clear in a spare PC I have (boot from unRaid USB), then swap out bad with the pre-cleared good- (I am trying to prevent array down time, as I have visitors this weekend)
March 3, 20179 yr In this case the only purpose of preclear would be to test the disk. There are other ways to test the disk before use, including the manufacturer's diagnostics. And a successful rebuild followed by a successful noncorrecting parity check would be a pretty good test itself.
March 5, 20179 yr Author Purchased a WD Red to replace the failing WD Green. Used the WD tools to do a fast check, then a full check, then wrote to full disc, then did another fast check- Everything tested OK. Followed instructions to replace drive- When re-assigning, get red bubble with "disc 3, Wrong." In Array Status: "Stopped. Replacement disk is too small." unraid is running on Intel Atom D525, with Supermicro MBD-X7SPE-HF-D525-O motherboard. Reading forums, trying to figure out where to go from here without messing up my data
March 5, 20179 yr Author From forum: hdparm -N /dev/sdf results in /dev/sdf: max sectors = 5860531055/5860533168, HPA is enabled On all other discs HPA is disabled It looks like this would solve the problem: hdparm -N p5860533168 /dev/sdf Or is there a safer way? Edited March 5, 20179 yr by WannaTheater
March 5, 20179 yr 7 minutes ago, WannaTheater said: Purchased a WD Red to replace the failing WD Green. Used the WD tools to do a fast check, then a full check, then wrote to full disc, then did another fast check- Everything tested OK. Followed instructions to replace drive- When re-assigning, get red bubble with "disc 3, Wrong." In Array Status: "Stopped. Replacement disk is too small." unraid is running on Intel Atom D525, with Supermicro MBD-X7SPE-HF-D525-O motherboard. Reading forums, trying to figure out where to go from here without messing up my data You must use a replacement disk at least as large as the original.
March 5, 20179 yr 1 minute ago, WannaTheater said: Thanks- But replacement is 3TB- Same as the original Yes, I posted while you were finding the HPA. You must remove it to get the disk back to full size.
March 5, 20179 yr Author thanks trurl- Is it OK to just use the command line above to do it? Or must I remove from unraid, and jump through hoops on Windows PC, boot using ultimate boot CD, etc?
March 5, 20179 yr Author Command took successfully I unassigned the device (missing) Then reassigned- Still red bubble, and "replacement disc too small" Unassigned again, powered down, powered back on. Assigned new drive Red bubble still says WRONG, but Array is stopped and now it looks like I can move forward. Currently rebuilding- Thank you! Edited March 5, 20179 yr by WannaTheater
Archived
This topic is now archived and is closed to further replies.