September 15, 20196 yr Hi, some backstory. I added a new disk8 and was clearing it when the connection on my disk1 died, so I didn't format disk8 and wanted to shutdown the server to fix disk1 connection. After I managed to fixed it (apparently the cable died for some reason), it prompted for a data rebuild (even though i dont think it was required). Now, while it went through the parity check/data rebuild process, it seems like several health issues on various drives popped up. Disk 2 had a Current pending sector count of 1 but 0 reallocated event count (it didn't grow further though). Disk 7 also had its udma crc error count grow from 4 to 10 (I am going to try to replace this sata cable with a new one). The data rebuild was complete but with 54 errors! Does it mean some parts of the data was not restored? Or does it mean the parity drive was updated? I'm a bit worried here.. usually it's 0 errors. Attached logs. tower-diagnostics-20190915-1235.zip tower-syslog-20190915-1251.zip Edited September 15, 20196 yr by CyberMew add syslog
September 15, 20196 yr Community Expert Unless you had dual parity, if there are errors on another disk during a rebuild there will be some corruption, unless you were lucky and there wasn't any data on those sectors.
September 15, 20196 yr Author I see... yea I didn’t have dual parity because I remember reading some limitations or some issues with it early on. Is it highly suggested and problem free to add a second parity drive nowadays? Will it prevent these kind of rebuild errors in the future? I’m now doing a parity check (without writing corrections) and see what happens. Currently 14% in with no sync errors detected. Is it possible to check Disk1 again to make sure the the parity drive writes to it correctly? I'm afraid because in the same session I added in a new drive, and it sort of screwed up. To confirm, a correcting parity check does nothing now if the data on disk1 is already not 100% error free? What are my moves here? Possible to recommend me what to do next? Is it also possible to find out what files were affected by this? Edited September 15, 20196 yr by CyberMew
September 15, 20196 yr Community Expert 13 minutes ago, CyberMew said: I remember reading some limitations or some issues with it early on. Don't remember that. 13 minutes ago, CyberMew said: Is it highly suggested and problem free to add a second parity drive nowadays? Will it prevent these kind of rebuild errors in the future? Problem free yes, always was AFAIK, suggested depends most on array size, but I say it's a very small price to pay for the added redundancy even for smaller arrays, and yes it would save you from a situation like this in the future. 16 minutes ago, CyberMew said: Is it possible to check Disk1 again to make sure the the parity drive writes to it correctly? You can try rebuilding again, but disk2 appears to be failing so likely it will have the same or even more errors. 17 minutes ago, CyberMew said: To confirm, a correcting parity check does nothing now if the data on disk1 is already not 100% error free? You don't want to run a correcting check with a known bad disk, can corrupt parity. 18 minutes ago, CyberMew said: What are my moves here? Possible to recommend me what to do next? Is it also possible to find out what files were affected by this? Replace disk2, only way to found out which files are affected on disk1 would require you had created cheksums before this, or be using btrfs.
September 15, 20196 yr Author Unfortunately I am using XFS and not BTRFS.. well I guess my disk1 data is now set in stone... I assume 54 errors means 54 sectors not filling up/corrupted, and assuming 4k aligned so that's 216KB of non/continuous of data lost? as i am still running parity check (non-correcting), i should not be seeing any errors since disk1 is already set in stone, so i can cancel it now? i will: 1. replace sata cable/controller for disk7. 2. order 2x 10tb drives - 1 to add as second parity, another to replace disk2. are there any instructions on how best to do this together? actually, since my brand new disk8 is still unformatted and unused, can i use convert it to the second parity drive for now? or would you recommend me to replace disk2 first? in the meantime do you think my array safe to use normally (if no more critical errors appear)? thank you very much for your help.
September 15, 20196 yr Community Expert They were on continuous sectors, and each sector in this case is 512 bytes, so just a little data, if you have mostly media files it will most likely translate to a little glitch during playback on a single file. I would first replace disk2 then add dual parity later, you can't do it at the same time.
September 15, 20196 yr Author Got it. I hope it's really part of some media files, 29 KB will be insignificant if so. Is there a set of instructions that I can refer for replacing disk2? It doesn't seem straightforward. I need to remove Disk8 completely (update parity drive?), then remove disk2, and put the 10tb disk (from disk8) in its place (disk2 slot). Is this correct, do I need to create a new config? 😱
September 16, 20196 yr Community Expert Removing disk8 with a known bad disk is not a good option, you should use a new disk to replace disk2.
September 16, 20196 yr Author Ok. I’ll proceed to format my disk8 then and make it usable. In the meantime I have ordered a new 10tb to replace disk2. Hopefully the next rebuild on disk2 will be error free! by the way my parity check completed without issues. So I guess that’s good(?). Will update back here if things go wrong. Thanks a lot for your help!
September 16, 20196 yr Community Expert 8 minutes ago, CyberMew said: by the way my parity check completed without issues. Without either read errors on disk2 or sync errors? One or the other would be expected.
September 16, 20196 yr Author Yes without any read errors on disk2 or sync errors. I’ll generate a diagnostics for reference in a moment. Reaching home soon.
September 16, 20196 yr Community Expert I guess that could happen if there was no data on the rebuilt disk sectors where there were read errors before, that's the only way it would make sense to me. Edited September 16, 20196 yr by johnnie.black
September 16, 20196 yr Author 🤷♂️or maybe it could be a bug somewhere.. Anyway I have attached fresh diagnostics since then, could be useful. tower-diagnostics-20190917-0046.zip
September 16, 20196 yr Community Expert 20 minutes ago, CyberMew said: or maybe it could be a bug somewhere.. Don't think so, only explanation that makes sense is there wasn't any data on the sectors Unraid couldn't read, I would expect it would write zeros to those on the rebuilt disk, and if there were no sync errors now it means those sectors were already all zeros before, i.e., no data.
September 29, 20196 yr Author I got a new drive recently and was trying to replace disk2, but near the middle it seems that my usb drive disconnected/died. Is the restoration process still going on or should I just shut it down and check/replace the usb drive? syslog
September 29, 20196 yr Author First time the usb drive disconnected. Ended up rebooting the server, and all seems ok again. Disk2 is rebuilding drive from 0%. Attached rebooted diagnostics. tower-diagnostics-20190930-0141.zip
October 5, 20196 yr Author The USB drive is my flash/boot drive, I hope it’s not dying. Any way to check? Ever since I replaced the data cables and replaced disk2 with a bigger drive, the array has been working fine. I will order a second drive as parity once the drives are on sale. But before that I need to clear my disk2 first before I dump it away for recycling. Is it possible to clear the disk using unraid via a usb3 cable or just direct cable, without adding it onto the array?
October 5, 20196 yr Community Expert 17 minutes ago, CyberMew said: Is it possible to clear the disk using unraid via a usb3 cable or just direct cable, without adding it onto the array? Yes, use preclear script, plugin or docker
October 22, 20205 yr On 9/15/2019 at 12:54 PM, JorgeB said: They were on continuous sectors, In which of the diagnostic files can I find this information? Edited October 22, 20205 yr by mgutt
October 22, 20205 yr Community Expert 42 minutes ago, mgutt said: In which of the diagnostic files can I find this information? In the syslog, though I should have said continuous blocks, parity is checked on a standard 4k Linux block, each block has 8 sectors (for standard 512E drives), so when the errors error are logged for every 8th sector, they are on continuous blocks.
Archived
This topic is now archived and is closed to further replies.