WannaTheater Posted February 28, 2017 Share Posted February 28, 2017 I've been a dedicated Unraid user for years- that being said I am running unRAID Server Pro version: 5.0-rc16c.. (my motto: if it aint broke don't fix it :)) I have a 3TB parity drive and 4x3TB data drives. I noticed today 55 errors on Disk 3. From the log, it appears this happened on 2/18. Here is a cut from syslog: Feb 18 20:25:47 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 18 20:25:47 Tower kernel: ata4.00: BMDMA stat 0x25 Feb 18 20:25:47 Tower kernel: ata4.00: failed command: READ DMA EXT Feb 18 20:25:47 Tower kernel: ata4.00: cmd 25/00:00:68:ca:04/00:04:5d:01:00/e0 tag 0 dma 524288 in Feb 18 20:25:47 Tower kernel: res 51/40:af:b0:cc:04/40:01:5d:01:00/e0 Emask 0x9 (media error) Feb 18 20:25:47 Tower kernel: ata4.00: status: { DRDY ERR } Feb 18 20:25:47 Tower kernel: ata4.00: error: { UNC } Feb 18 20:25:47 Tower kernel: ata4.00: configured for UDMA/133 Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] Unhandled sense code Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] Feb 18 20:25:47 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08 Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] Feb 18 20:25:47 Tower kernel: Sense Key : 0x3 [current] [descriptor] Feb 18 20:25:47 Tower kernel: Descriptor sense data with sense descriptors (in hex): Feb 18 20:25:47 Tower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 01 Feb 18 20:25:47 Tower kernel: 5d 04 cc b0 Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] Feb 18 20:25:47 Tower kernel: ASC=0x11 ASCQ=0x4 Feb 18 20:25:47 Tower kernel: sd 4:0:0:0: [sdf] CDB: Feb 18 20:25:47 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 01 5d 04 ca 68 00 00 04 00 00 00 Feb 18 20:25:47 Tower kernel: end_request: I/O error, dev sdf, sector 5855562928 Feb 18 20:25:47 Tower kernel: ata4: EH complete Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562864 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562872 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562880 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562888 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562896 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562904 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562912 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562920 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562928 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562936 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562944 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562952 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562960 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562968 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562976 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562984 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855562992 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563000 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563008 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563016 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563024 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563032 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563040 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563048 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563056 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563064 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563072 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563080 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563088 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563096 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563104 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563112 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563120 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563128 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563136 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563144 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563152 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563160 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563168 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563176 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563184 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563192 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563200 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563208 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563216 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563224 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563232 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563240 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563248 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563256 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563264 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563272 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563280 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563288 Feb 18 20:25:47 Tower kernel: md: disk3 read error, sector=5855563296 Feb 18 21:41:31 Tower kernel: mdcmd (108): spindown 0 Feb 18 22:28:02 Tower kernel: mdcmd (109): spindown 2 Feb 18 22:29:23 Tower kernel: mdcmd (110): spindown 1 Feb 18 22:29:33 Tower kernel: mdcmd (111): spindown 4 Feb 19 00:37:15 Tower kernel: mdcmd (112): spindown 3 Not being all that familiar with what to do when things go wrong, what would be my next steps other than panic? 1) Copy all data to another drive, on or off the array? 2) Run a parity check? ... last parity check was about a year ago ....Check the box: Correct any Parity-Check errors by writing the Parity disk with corrected parity? 4) Replace drive and rebuild? Thanks! Quote Link to comment
JorgeB Posted February 28, 2017 Share Posted February 28, 2017 That looks like a disk uncorrectable error, may or not be one or more pending sectors, a SMART report may help diagnose. 19 minutes ago, WannaTheater said: last parity check was about a year ago Parity checks should be run more regularly, most do it once a month. Quote Link to comment
WannaTheater Posted February 28, 2017 Author Share Posted February 28, 2017 Thanks. ... I just learned how to run a smart report, which is attached. smart.txt Quote Link to comment
JorgeB Posted February 28, 2017 Share Posted February 28, 2017 There are pending sectors, so you should replace that disk ASAP. 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10 Quote Link to comment
WannaTheater Posted February 28, 2017 Author Share Posted February 28, 2017 Thank you for your help. I've done this before, but not in years.... Shut down, physically replace disc, then I am assuming I can follow the prompts to rebuild that disc? Or do I need to do anything else to prepare? Quote Link to comment
JorgeB Posted February 28, 2017 Share Posted February 28, 2017 https://lime-technology.com/wiki/index.php/Replacing_a_Data_Drive Quote Link to comment
WannaTheater Posted March 3, 2017 Author Share Posted March 3, 2017 Should the replacement drive be precleared? I thought I read a post stating the only while expanding the array they need to be precleared. What if I am replacing due to currently installed drive showing signs of failure? Is the pre-clear necessary? Or will it happen automatically? (I am running 5.0-rc16c) My thinking is to pre-clear in a spare PC I have (boot from unRaid USB), then swap out bad with the pre-cleared good- (I am trying to prevent array down time, as I have visitors this weekend) Quote Link to comment
trurl Posted March 3, 2017 Share Posted March 3, 2017 In this case the only purpose of preclear would be to test the disk. There are other ways to test the disk before use, including the manufacturer's diagnostics. And a successful rebuild followed by a successful noncorrecting parity check would be a pretty good test itself. Quote Link to comment
WannaTheater Posted March 5, 2017 Author Share Posted March 5, 2017 Purchased a WD Red to replace the failing WD Green. Used the WD tools to do a fast check, then a full check, then wrote to full disc, then did another fast check- Everything tested OK. Followed instructions to replace drive- When re-assigning, get red bubble with "disc 3, Wrong." In Array Status: "Stopped. Replacement disk is too small." unraid is running on Intel Atom D525, with Supermicro MBD-X7SPE-HF-D525-O motherboard. Reading forums, trying to figure out where to go from here without messing up my data Quote Link to comment
WannaTheater Posted March 5, 2017 Author Share Posted March 5, 2017 (edited) From forum: hdparm -N /dev/sdf results in /dev/sdf: max sectors = 5860531055/5860533168, HPA is enabled On all other discs HPA is disabled It looks like this would solve the problem: hdparm -N p5860533168 /dev/sdf Or is there a safer way? Edited March 5, 2017 by WannaTheater Quote Link to comment
trurl Posted March 5, 2017 Share Posted March 5, 2017 7 minutes ago, WannaTheater said: Purchased a WD Red to replace the failing WD Green. Used the WD tools to do a fast check, then a full check, then wrote to full disc, then did another fast check- Everything tested OK. Followed instructions to replace drive- When re-assigning, get red bubble with "disc 3, Wrong." In Array Status: "Stopped. Replacement disk is too small." unraid is running on Intel Atom D525, with Supermicro MBD-X7SPE-HF-D525-O motherboard. Reading forums, trying to figure out where to go from here without messing up my data You must use a replacement disk at least as large as the original. Quote Link to comment
WannaTheater Posted March 5, 2017 Author Share Posted March 5, 2017 Thanks- But replacement is 3TB- Same as the original Quote Link to comment
trurl Posted March 5, 2017 Share Posted March 5, 2017 1 minute ago, WannaTheater said: Thanks- But replacement is 3TB- Same as the original Yes, I posted while you were finding the HPA. You must remove it to get the disk back to full size. Quote Link to comment
WannaTheater Posted March 5, 2017 Author Share Posted March 5, 2017 thanks trurl- Is it OK to just use the command line above to do it? Or must I remove from unraid, and jump through hoops on Windows PC, boot using ultimate boot CD, etc? Quote Link to comment
WannaTheater Posted March 5, 2017 Author Share Posted March 5, 2017 (edited) Command took successfully I unassigned the device (missing) Then reassigned- Still red bubble, and "replacement disc too small" Unassigned again, powered down, powered back on. Assigned new drive Red bubble still says WRONG, but Array is stopped and now it looks like I can move forward. Currently rebuilding- Thank you! Edited March 5, 2017 by WannaTheater Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.