[SOLVED] Double Data Drive Failure During Parity Check (Dual Parity)


Recommended Posts

6 minutes ago, Keexrean said:

I'm trying to make the script fool proof and cafein-depravation-proof. And I'm having fun, hence why it's taking some time, I'm writting a lot of stuff :D

Error handling, syntax, case sensitivity, menus, ect!

Sweet. Take your time. Looking forward to seeing the finished product.

 

Link to comment

Mmmmh... when running on the cache it seems to stop listing folders after bumping in the dolphin docker folder and skips to the array.

Any idea on that? You think it's because it's denied access maybe? I have to find the way to make it handle denied access issues. (pooped also on disk2 because of the .Trash folder.)

 

Also yeah I see why you see cache as a share, but I see it more as a disk since I have a Raid1 cachepool and actually have some share that live solely on the cache, not the array (including appdata, system, a "CachedDomains" folder for some VMs, etc).

Edited by Keexrean
Link to comment
17 minutes ago, Keexrean said:

Mmmmh... when running on the cache it seems to stop listing folders after bumping in the dolphin docker folder and skips to the array.

Any idea on that? You think it's because it's denied access maybe?

 

Also yeah I see why you see cache as a share, but I see it more as a disk since I have a Raid1 cachepool and actually have some share that live solely on the cache, not the array (including appdata, system, a "CachedDomains" folder for some VMs, etc).

Yeah I can see both sides. Yes, when it stops it most likely is a permissions issue. I've seen the same thing on my cache share script. Certain appdata folders can cause issues.

 

For data disks, I usually run the "Docker Safe New Perms" script to reset permissions across the array. It is safe...and resolves similar issues, especially when uploading files to the array from OSX.

 

image.thumb.png.fbd54e7a0ccf5fe1ded179597fb831c9.png

 

image.thumb.png.35c96137d6115955241969a831b0fbcf.png

 

You also might be running into issues if certain dockers are running and have file locks. I usually run my cache share script when all dockers are disabled and the array is stopped for that reason.

 

 

Edited by falconexe
Link to comment

Oh well, I knew about this option, but I'll concede to mess up some perms I set myself for the sake of that script, and I'll let you know ;)

 

 

Edit: Dockers stopped, perms resetted, Dolphin's share still denying access. I feel like this one will tickle my pickle quickly and yeet off of my server.

Edited by Keexrean
Link to comment
23 minutes ago, Keexrean said:

Oh well, I knew about this option, but I'll concede to mess up some perms I set myself for the sake of that script, and I'll let you know ;)

 

 

Edit: Dockers stopped, perms resetted, Dolphin's share still denying access. I feel like this one will tickle my pickle quickly and yeet off of my server.

Hmm. I don't use that docker so I am stumped. Any chance you could skip the cache share for now and get the main disk script working fully dynamically?

Link to comment

Update: I yeeted Dolphin out. And no, it won't do.

 

Really have to figure out how to make the dir command handle access denied issues better, because an other docker is in the way now in the app-data folder, the ngnix folder of my owncloud.

 

Question: Have you managed to audit the appadata folder in the past?

Question2: If yes, are you sure, and that it didn't just skipped to the next folder, and thus messed up sections of your audit?

Wondering: might be because I'm using it through the arrayshare, and not individual disk shares.

 

Update: Activated Disk Shares, still some folders denied. For the ngnix folder, I totally understand why. But it looks like your audit may have suffered the same issue then, because so far I'm using the same dir /s /b command than you do.

Edited by Keexrean
Link to comment

Well, I'll take a break. Don't worry I'll dive into it back when I have a bit of time and energy to spare.

 

Issue is just how DIR has no way to handle lack of access to a folder it can see. It just craps out. findstr tricks and such I tried only affect the output... which doesn't solve the issue.

 

I caved in and went QQ on stack overflow, see what their collectiveness mindfulness can eventually give as propositions.

Link to comment

Yeah, I just performed a full audit on my appdata share without any issues. Natively my appdata folder is about 30GB in size. For the audit results, it produced txt files with hundreds of MBs and millions of records/lines inside each (Both Raw and Meta). I have different dockers than you, so I must have ones that work without throwing exceptions.

 

Let me know if you figure it out. Sorry you are running into issues...🤷‍♂️

 

I am very interested to see how you loop through a set number of disks via a single variable though. If you figure out that snippet of code, please send it my way. I can incorporate it back into my script and send it out to the masses if you are cool with that.

Edited by falconexe
Link to comment

Freaking Sweet. So after about 10 minutes of Google searching, I was able to solve the dynamic loop question. I can now dynamically tell the script the total number of data drives I have and it will loop through a single block of code until completed. Work smarter not harder! 😅

 

 

Variable:

SET DataDisks=28

 

Code:

Echo Performing Audit on Disk: 1...
DIR /s /b \\%ServerName%\disk1>"%OutputPath%%FileNameRaw%"
DIR /s     \\%ServerName%\disk1>"%OutputPath%%FileNameMeta%"
 

FOR /L %%i IN (2,1,%DataDisks%) DO (
Echo Performing Audit on Disk: %%i...
DIR /s /b \\%ServerName%\disk%%i>>"%OutputPath%%FileNameRaw%"
DIR /s     \\%ServerName%\disk%%i>>"%OutputPath%%FileNameMeta%"
)

 

Note: The IN (#,#,#) DO loop parameters map out to (Start, Step, End). So in this case, we start at disk2 since the script is now appending to the files disk1 created. We are then stepping by 1 disk at a time. And finally, we end at the variable value of 28 (my last disk).

 

 

I Also Added Some New Bells and Whistles:

  • Dynamic Audit Output Folder Creation Based on Current Date in YYYYMMDD Format
    • Only Creates This Folder If It Does Not Already Exist
  • Audit Start/End Timestamps During Run-Time

 

 

I have attached the FULLY DYNAMIC Version of the UNRAID DISK AUDIT script. I have also fully set all text contents to variables which will make @Keexrean happy.

 

Anyone in the community should now be able to run this (On Windows) for any UNRAID server by simply changing out the variables at the top of the script.

 

 

Prerequisites/Instructions:

  • Set Your Windows hosts File (Make a Backup First!)
    • Allows You to Call UNRAID Server by Name Instead of IP (For NET USE Command)
    • Location of hosts File in Windows 10: C:\Windows\System32\drivers\etc\hosts
    • Open hosts File with Notepad
    • Enter This At Bottom of hosts File (Change IP Address and Host Name Accordingly):
      • #Custom Defined Hosts

        192.168.#.#   YOURSERVERNAME

    • Save the New hosts File

  • Turn on Per Disk Shares in UNRAID (Do This For Every Data Drive)
    • I Set My Disk Share Type To:
      • Export: Yes (hidden)
      • Security: Secure
    • I Set My Disk Share Access To:
      • User: Read-only
  • Right Click and Edit the Batch Script
  • Set Variables Accordingly
  • Sit Back and Relax, This Will Take a While...

 

 

Use Cases:

  • Run Regularly to Have an Exact Mapping of Every File on the Array
  • Run Prior to Parity Checks
  • Run Prior to Reboots/Shutdowns
  • If You Ever Experience Data Loss, At Least You Will Know Which Files Were On That Drive!

 

 

ENJOY!

 

Updated Code 2020-01-19

 

UNRAID_Audit_All_Disks.bat

Edited by falconexe
Link to comment

Oh the dynamic loop issue I solved before diving into the access denied skip!

 

If you don't have any permission issues, the script should work fine for you already, but I'm in heated brainstorming on stackoverflow, so the following version of the script is to be considered temporary.

 

I want it to work without having have a network user with GAWD PAWA on the network who can access really every. I have to find a way to make it handle&skip, of just avoid the a set list of folders or folder/subfolder couples like the plague, whereever they are, without hard coding them to a specific location.

 

I supports cache audit toggle, as well as asking OR detecting the number of drives, and also supports adding the path to a arrayshare instead of relying on diskshares.

 

 

UNRAID_Audit_All_Disks.bat

Edited by Keexrean
Link to comment

Also yeah I knew about just doing a For /L, but since I wanted to allow a cache+array audit, I made it set the %writeinsert% value to >>, then test if it was on drive 1, if yes %writeinsert%=> , then check if %cacheaudit%=1, and if yes %writeinsert%=>> .


That's the crappy way, but since I'm eyeing the option of using a .vbs drop to make the audit without it giving up the ghost on access denied files, it allow me to avoid wrapping the audit commands in something and untie it from what is actually performing the loop, just to avoid nested nightmares :D

Edited by Keexrean
Link to comment
9 hours ago, Keexrean said:

Oh the dynamic loop issue I solved before diving into the access denied skip!

 

If you don't have any permission issues, the script should work fine for you already, but I'm in heated brainstorming on stackoverflow, so the following version of the script is to be considered temporary.

 

I want it to work without having have a network user with GAWD PAWA on the network who can access really every. I have to find a way to make it handle&skip, of just avoid the a set list of folders or folder/subfolder couples like the plague, whereever they are, without hard coding them to a specific location.

 

I supports cache audit toggle, as well as asking OR detecting the number of drives, and also supports adding the path to a arrayshare instead of relying on diskshares.

 

 

UNRAID_Audit_All_Disks.bat 9.41 kB · 1 download

Dude, this is impressive. It is way more crazy and complex than I could have imagined. You definitely taught me some new techniques. And I noticed your PowerShell injection. 

 

I just finished running the audits using your script and got the EXACT same results down to number of lines in each file and the exact same size of output files down to the byte. So both of our scripts match up output-wise exactly. Peer Review Complete ha ha.

 

Nice work!

Edited by falconexe
Link to comment

Well thanks but don't take too much of it as any kind of teaching, there is a LOOOOOT of bad habits there, from when I started spagehti writing batch as a teenager in the early 2000s.

 

But yah, that's a decent preview of how logic works in most of my batches.

Namely I'm using a lot what I label as "modules" in my code.

Like using the same piece of code to both do a full array detection, and also to just check if the manual set disk count is at least valid, just calling it over, and using logical steps to link it all together, going back to one or more title screen, clearing the console each time to keep it humanely readable.
(in some cases it's usefull to be able to scroll through a mile long console to check stuff, but in that case I send the output to an other cmd window)

 

Note though, I forgot to add a cache-exist detection I think, it doesn't check before asking you if you want to audit cache.
I'll do it in the next iteration when a solution for the ">dir crashes to restricted files/directories" issue just pops in my head.

Edited by Keexrean
Link to comment
15 minutes ago, Keexrean said:

Well thanks but don't take too much of it as any kind of teaching, there is a LOOOOOT of bad habits there, from when I started spagehti writing batch as a teenager in the early 2000s.

 

But yah, that's a decent preview of how logic works in most of my batches.

Namely I'm using a lot what I label as "modules" in my code.

Like using the same module to both do a full array detection, and also to just check if the manual set disk count is at least valid.

 

Note though, I forgot to add a cache-exist detection I think, it doesn't check before asking you if you want to audit cache.
I'll do it in the next iteration when a solution for the ">dir crashes to restricted files/directories" issue just pops in my head.

No worries. I have been having some weird stuff with my scripts too. The Meta audit sometimes runs into 1969 files that are missing the creation date on the folder. When this is encountered it outputs "The parameter is incorrect." into my screen output. I spent many hours last night touching files to reapply these attributes.

 

I was able to find these erroneous files and folders by going to each disk share and running the following command on the UNRAID terminal: find . -type f -ls |grep ' 1969 '

 

Then you "touch" the files and it should fix the issue.

 

Here is what it looks like when it happens in the script. You can see this exact output by just running the command in CMD natively, so it is not script related. Super freakin annoying...

 

image.thumb.png.846c1ac4f84e2edc160e188bb113b8ea.png

 

I had this completely fixed last night after fixing all of the files in question. I then re-ran the audit and I had clean screen output.

 

Then this morning I ran it again and I have more of these lines in different disks. So not sure if it is always the "1969" issue. This does not happen on the RAW "DIR /s /b" command, only the META "DIR /s" command.

 

Anywho, I'll keep trying to track the issue down. I hate intermittent issues! If anyone else is seeing this on their server, I would be very curious. Again the scripts work perfectly. It is the actual data throwing the exception.

 

Worse case scenario, I would love to just suppress the error line on my output screen. Any thoughts on how to do this?

Edited by falconexe
Link to comment

Yep!

This might do the trick:

dir /s /b "\\server\path\" 2>nul 1>>output.txt

 

 

And I'll check the grep on my server and let you know if I have the same type of file.

 

But namely the issue I had wasn't with files missing attributes (that I know of), but files that are also inaccessible through the windows file explorer, and all that because:

> I don't want an all powerfull user on the network that can access some of appdata subfolders (like ngix folders or some of letsencrypt's files)

> Because I didn't ran it with dockers shut down

So I really do need to figure out a way to 'touch' these files before the dir command does it to make it skip them without touching them, or use a different command/ a vbs script to make the auditing module.

Link to comment
1 hour ago, falconexe said:

I was able to find these erroneous files and folders by going to each disk share and running the following command on the UNRAID terminal: find . -type f -ls |grep ' 1969 '

 

So I forgot to mention that the "type" parameter can be set to the following:

 

File:           find . -type f -ls |grep ' 1969 '

Directory:  find . -type d -ls |grep ' 1969 '

 

I have seen both files and folders have the missing creation date attribute where it defaults to 1969.

 

Here is some more info on the subject: https://www.a2hosting.com/blog/whats-the-deal-with-12-31-1969/

Edited by falconexe
Link to comment

Ran both commands.

Can confirm it's not what was actually messing things up on my side.

 

The directory search returned nothing.

The file search returned just old files from damages drives and/or NTFS partitions that dropped to raw, and none of these files were the ones making the dir command crash, I can even find them in the disk audit log I ran yesterday.

Link to comment
26 minutes ago, falconexe said:

Thanks trying this now...

Yep that worked. Thanks.

 

Echo Performing Audit on Disk: 1...
DIR /s /b "\\%ServerName%\disk1" 2>NUL 1>"%OutputPath%\%CurrentDate%\%FileNameRaw%"
DIR /s    "\\%ServerName%\disk1" 2>NUL 1>"%OutputPath%\%CurrentDate%\%FileNameMeta%"


FOR /L %%i IN (2,1,%DataDisks%) DO (
Echo Performing Audit on Disk: %%i...
DIR /s /b "\\%ServerName%\disk%%i" 2>NUL 1>>"%OutputPath%\%CurrentDate%\%FileNameRaw%"
DIR /s    "\\%ServerName%\disk%%i" 2>NUL 1>>"%OutputPath%\%CurrentDate%\%FileNameMeta%"
)

 

image.thumb.png.4d472e40c45e6aa8ed73827e559601de.png

 

Edited by falconexe
Link to comment
13 minutes ago, Keexrean said:

Ran both commands.

Can confirm it's not what was actually messing things up on my side.

 

The directory search returned nothing.

The file search returned just old files from damages drives and/or NTFS partitions that dropped to raw, and none of these files were the ones making the dir command crash, I can even find them in the disk audit log I ran yesterday.

No worries. At least you know that is not the issue.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.