Acps Posted November 26, 2019 Share Posted November 26, 2019 So ill go ahead and said it, i was clicking buttons i didnt understand lol. This is what I did: unraid-syslog-20191126-1931.zip unraid-diagnostics-20191126-1919.zip So 1 of my 5tb is definitely dead dead, but im comfortable holding out till next week to maximize cyber week. But what i was dicking around with was my cache pool. I did have 4 ssds in raid 0, 2x 250gb and 2x 128gb. I yanked one last nite for another pc that i needed access too, and i was overkill with this setup as 4 as is. I was having issues though rebuilding my btrfs cache pool with the 3 ssds i have left. Luckily i had my appdata folders backing up every week, so the only data loss im gonna miss from the cache drives would be whatever was still waiting to the mover schedule to kick in. But now im afraid to touch it because im fairly certain i havent totally bricked all my data yet because i thought if done correctly this is still salvagable, but 1 more fuckup and i could be dead in the water. Loosing roughly 20tb of data. alot of it being irreplaceable. I spent some time trying to figure it out on my own but im worried that i could misinterpret something and still ruin all the data. So i wanted to check in with the pros here that point me in the right direction to salvage my data. Thanks in advance for the input! ~Acps Quote Link to comment
ashman70 Posted November 26, 2019 Share Posted November 26, 2019 Do you have a backup or second copy of your data? Quote Link to comment
Frank1940 Posted November 26, 2019 Share Posted November 26, 2019 I would suspect that the unformatted 5TB drive (in the unassigned section) is actually your parity drive. The other six unassigned drives which are formatted are the data drives. I would also suspect that the TOSHIBA MD04ACA500 (sdg) drive with the serial number ending in FS9A is on its last legs. Do you think that your parity was valid before this hiccup occurred? That would effect how you might approach the recovery operation. If parity was good, then you could just rebuilt this disk onto a replacement. But be careful with what you do as you could lose the option by not doing correctly. (In my opinion, the last thing you want to do at this point is to attempt to rebuilt parity with that disk (Serial# ...FS9A) installed as I suspect that the parity operation will fail...) Quote Link to comment
Acps Posted November 29, 2019 Author Share Posted November 29, 2019 On 11/26/2019 at 3:59 PM, ashman70 said: Do you have a backup or second copy of your data? I wish but my array was 25tb, i thought by running double parity i was protecting my data. I wasnt expecting for me to f up the raid array but getting all the disks unassinged from their slots. I wish i had the internet to back it up to the cloud, but for me to upload 25tb at a speed of .75mb, would take roughly 300+ days, and no one would be able to use the internet during that time lol. I havent tried to start the array since they got unassigned. I havent even tryied to reboot anything. I thought that it was able to be put by togther, if done correctly, but i dont want to try anyting out of fear of the 1 mistake i could make tjat would cost me all the data. The cache pool data is gone and im fine with that, i backed up my appdata weekly, assuming i can recover the raid. I did put my hdd back in the same slots they were before. Now Disk2 which is sdg, is definitely bad, its been dead dead. its been like that for several weeks and i tried many times to try and rebuild the array with it to see if it was a fluke, but i 3-4 times it always ended up failing and being emulated data, but i had dual parity so i wasnt worried. Parirty2 sde which is that drive that looks unformated was the other drive i had an issue with. But it was a 1 time fluke because it hasnt had a single issue since it reported a few errors. Disk2 sdg Parity2 sde Looking the smart health tho its showing they both have errors: But i thinks that because they got unassigned and reassigned. Because i acknowledged them after i try and rebuild the array to see if itll occur again. Parity2 sde the one showing unformulated was confusing me too, but once i assinged it back into its slot in the array it says it was formated just fine. Know my parity wa s always valid as far as my knowledge, i cancelled a few parity syncs by hand. But i back up all those logs as well. My syslog assuming i do a clean shutdown always backs up, got 2 years of them so far lol. Heres the rest of syslogs from Nov 26, which is when this all occured! syslog-20191126-100643.txtsyslog-20191126-105428.txt So after all the pics and walls of text, i hope that clarifies what i did and hopefully, i just need to start the array and let the parity sync run. But i was worried because in the first pic i posted of the drives assinged but not mounted, the 2 raid arrows are contradicting each other. That red text saying id loose all my data didnt popup till after i assinged both parity and then assinged disk1. So im afraid to try it until i hear back from someone who has a much better understanding of this than me, so ty for the help so far and hopefully someone can confirm i can rebuild it just fine. If thats not the case. I would want to try and recover as much data as possible from the disks without trying to rebuild the array and possible write over the good data thats still there. ~Acps Quote Link to comment
ashman70 Posted November 29, 2019 Share Posted November 29, 2019 You may have heard this before but RAID, or unRAID is not a backup, meaning you should always have a second copy of your data. I know this isn't easy for many people as it costs $$ but if your data is worth anything to you, then please seriously consider it. Quote Link to comment
Frank1940 Posted November 29, 2019 Share Posted November 29, 2019 10 hours ago, ashman70 said: You may have heard this before but RAID, or unRAID is not a backup, meaning you should always have a second copy of your data. I know this isn't easy for many people as it costs $$ but if your data is worth anything to you, then please seriously consider it. Or divide your data into two categories-- Irreplaceable Data and everything else. Make sure you have a well thought-out strategy for the Irreplaceable Data with an off-site location being a part of that strategy. (I personally use portable hard drives stored in a safety deposit box for off-site.) Quote Link to comment
Frank1940 Posted November 29, 2019 Share Posted November 29, 2019 I think you can recover from this without data loss. Do you have the replacement disks yet? The procedure is relatively simple but it has to be done in exactly the proper sequence. I am going to ping @johnnie.black as he has led several other folks through it. Now for a word of caution. Never run for several weeks depending on dual parity to bail you out. Either fix it immediately or powerdown the server down until you can address the issue. The principal advantage of dual parity is that is protects against a second disk failure occurring during the recovery process from the first failure! Quote Link to comment
JorgeB Posted November 29, 2019 Share Posted November 29, 2019 The easiest way forward is to check "parity is already valid" and starting the array, assuming all assignments are known to be correct, then if you already have a spare to replace disk2 do it now, if not unassign disk2, and use the array like that until you have a replacement, but try to do it as soon as possible. Quote Link to comment
Acps Posted December 1, 2019 Author Share Posted December 1, 2019 (edited) Ok so i was able to bring my array back online without disk2, and checking parity is valid, however disk2 data isnt being emulated like it was before. I do have a replacement hdd thatll be here tuesday. Is the data being stored on disk2 still recoverable? Or how do i emulate the data from the parity drives? Edited December 1, 2019 by Acps Quote Link to comment
Acps Posted December 1, 2019 Author Share Posted December 1, 2019 On 11/29/2019 at 8:34 AM, Frank1940 said: Or divide your data into two categories-- Irreplaceable Data and everything else. Make sure you have a well thought-out strategy for the Irreplaceable Data with an off-site location being a part of that strategy. (I personally use portable hard drives stored in a safety deposit box for off-site.) I knew that i was at risk with 1 disk offline, and thought had crossed my mind to powerdown till i had a replacement. I know that good practice is to have an offsite backup for disaster recovery whether on LTO tapes, or in the cloud. I just never thought id be the cause of the disaster lol. But i didnt think the cost was worth the redundancy to backup 25tb of data. It didnt occur to me that backing up just the sensitive data would be another option. Because most of my data is media/software while important is replaceable, my irreplaceable data is mostly medical/legal/personal data which is documents is most cases. So i think I can easily setup some new shares and move data around to then make it possible for me to backup to external drive quite easily. Or even online with the right encryption protection. I rarely delete anything and would be consider a data hoarder by far. I was an IT in the Coast Guard for 10 years with TS/Sci clearance so i got to work on enterprise level networks, and when i got out this started out as a hobby, but I never was involved with the implementation of our networks just the administer and maintain. A mistake like this would have cost me my job in the military possibly criminal charges for negligence. So hopefully i can recover from this learn and make changes so that it doesnt happen again. I was waiting for blackfriday/cybermonday deals to pickup a few drives, to replace the 2 i had issues with as well as a spare or 2 for future. Quote Link to comment
Frank1940 Posted December 1, 2019 Share Posted December 1, 2019 1 hour ago, Acps said: Ok so i was able to bring my array back online without disk2, and checking parity is valid, however disk2 data isnt being emulated like it was before. I do have a replacement hdd thatll be here tuesday. Is the data being stored on disk2 still recoverable? Stop at this point. Don't allow any writes to the array. What happened to the physical disk 2? Did you take it out? I believe you should have included it in the assignment of disks if the server could detected it (even if it is bad!) and thus its contents could be emulated. I am going to ping @johnnie.black again but I believe he is in Europe so he probably won't see this until tomorrow. (I have had very few issues with my servers so I don't have much experience in fixing the more complex problems. He seems to have many servers running and seems to have accumulated a lot of knowledge in "how to deal" with these types of things.) Quote Link to comment
itimpi Posted December 1, 2019 Share Posted December 1, 2019 1 hour ago, Acps said: Ok so i was able to bring my array back online without disk2, and checking parity is valid, however disk2 data isnt being emulated like it was before. I do have a replacement hdd thatll be here tuesday. Is the data being stored on disk2 still recoverable? Or how do i emulate the data from the parity drives? If disk2 is not being emulated then it’s data is not recoverable by simply rebuilding onto a new disk. without knowing exactly what steps you have already taken to get your array operable on the other drives it is impossible to determine if there is any path forward that might lead to the contents of disk2 being recoverable or if you have already made this impossible. Quote Link to comment
Acps Posted December 1, 2019 Author Share Posted December 1, 2019 On 11/29/2019 at 8:59 AM, johnnie.black said: The easiest way forward is to check "parity is already valid" and starting the array, assuming all assignments are known to be correct, then if you already have a spare to replace disk2 do it now, if not unassign disk2, and use the array like that until you have a replacement, but try to do it as soon as possible. I literally did this, since i did not have a spare i unassigned disk2. I didnt remove it its still sitting in my server plugged in just as an unassigned disk. Quote Link to comment
itimpi Posted December 1, 2019 Share Posted December 1, 2019 it sounds as if you did not start the array with disk2 assigned and then stop it to unassign the disk and restart the array? That would have left you with disk2 being emulated. If you did the unassignment without that initial start then it would not be emulated (and parity would be invalid). Has anything been written to the array in the meantime? Have you tried to rebuild parity? Quote Link to comment
Acps Posted December 1, 2019 Author Share Posted December 1, 2019 THats exactly what i did, i guess i missunder stood johnny. i did some appdata backups, but as far as i know thats all to the cache drive. so i dont think anything beens wrriten array. Will rebuilding the parity make disk2 data unrecoverable? If theres no one to salvage the raid, as a last resort id like to try and pull the drive and see if i can recover any data off what was written to it if any. Quote Link to comment
itimpi Posted December 1, 2019 Share Posted December 1, 2019 25 minutes ago, Acps said: Will rebuilding the parity make disk2 data unrecoverable? Yes. Quote i did some appdata backups, but as far as i know thats all to the cache drive. Appdata backups are normally to the array (in case there is a problem with the cache drive). Check what you have set for the target? If it is to the array then it will no longer be possible to rebuild disk2 and you should rebuild parity to make sure it matches your current drive assignments in case another drive has issues (I suspect that the moment it does not) 25 minutes ago, Acps said: If theres no one to salvage the raid, as a last resort id like to try and pull the drive and see if i can recover any data off what was written to it if any. Can the drive be mounted when it is shown as an Unassigned device? Quote Link to comment
JorgeB Posted December 1, 2019 Share Posted December 1, 2019 You should be able to mount old disk2 with UD plugin and copy everything you can, alternatively you can clone it with ddrescue. Quote Link to comment
Acps Posted December 1, 2019 Author Share Posted December 1, 2019 Cant mount,no file system and unraids MBR is missing so I think data recovery is no longer an option. Atleast I didn't loose my entire array. Frustrating part is not knowing exactly what I lost. Quote Link to comment
Acps Posted December 1, 2019 Author Share Posted December 1, 2019 To make things worse, Disk 1 and 3 now are unmountable with no file system present. After trying to add disk 2 back to the array: Niether disk shows any errors and smart reports are clean. I got a feeling the raid controller i was told needed to be replace is coming back to haunt me... unraid-diagnostics-20191201-1755.zip Quote Link to comment
JorgeB Posted December 2, 2019 Share Posted December 2, 2019 Controller driver crashed, reboot and post new diags. Quote Link to comment
Acps Posted December 2, 2019 Author Share Posted December 2, 2019 I believe in you johnnie, ressurect my data as much as possible! I got x2 12tb ordered to server as future backups unraid-diagnostics-20191202-1058.zip Quote Link to comment
JorgeB Posted December 2, 2019 Share Posted December 2, 2019 Forgot to say, we need diags after array start. Quote Link to comment
Acps Posted December 2, 2019 Author Share Posted December 2, 2019 unraid-diagnostics-20191202-1148.zip Quote Link to comment
JorgeB Posted December 2, 2019 Share Posted December 2, 2019 Check filesystem on disks 1 and 3. https://wiki.unraid.net/Check_Disk_Filesystems Quote Link to comment
Acps Posted December 2, 2019 Author Share Posted December 2, 2019 Disk1 Quote Phase 1 - find and verify superblock... - block cache size set to 1496680 entries Phase 2 - using internal log - zero log... zero_log: head block 3466131 tail block 3425645 ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... Maximum metadata LSN (1:3467420) is ahead of log (1:3466131). Would format log to cycle 4. No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Mon Dec 2 14:22:18 2019 Phase Start End Duration Phase 1: 12/02 14:21:58 12/02 14:21:58 Phase 2: 12/02 14:21:58 12/02 14:21:59 1 second Phase 3: 12/02 14:21:59 12/02 14:22:10 11 seconds Phase 4: 12/02 14:22:10 12/02 14:22:10 Phase 5: Skipped Phase 6: 12/02 14:22:10 12/02 14:22:18 8 seconds Phase 7: 12/02 14:22:18 12/02 14:22:18 Total run time: 20 seconds Disk3 Quote Phase 1 - find and verify superblock... - block cache size set to 1504136 entries Phase 2 - using internal log - zero log... zero_log: head block 1594967 tail block 1594963 ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... Maximum metadata LSN (1:1595140) is ahead of log (1:1594967). Would format log to cycle 4. No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Mon Dec 2 14:30:26 2019 Phase Start End Duration Phase 1: 12/02 14:30:05 12/02 14:30:05 Phase 2: 12/02 14:30:05 12/02 14:30:05 Phase 3: 12/02 14:30:05 12/02 14:30:16 11 seconds Phase 4: 12/02 14:30:16 12/02 14:30:16 Phase 5: Skipped Phase 6: 12/02 14:30:16 12/02 14:30:26 10 seconds Phase 7: 12/02 14:30:26 12/02 14:30:26 Total run time: 21 seconds So do i go ahead and try the xfs_repair tool with this results? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.