Jump to content

Pourko

Members
  • Posts

    76
  • Joined

  • Last visited

Posts posted by Pourko

  1. 26 minutes ago, ryann said:

    I'm not sure if I successfully got this enabled or not. I took my array down, Docker hung, so it never went fully down. I had to force power-cycle the server. Now it's going through a parity check and I found out the Recycle Bin modified the SMB config so now I have two [global] sections and I cannot change that until the parity check finishes.

    I don't know anything about Docker.  I was talking about the stock Samba service that comes with Unraid. It uses two config files -- one is recreated in RAM every time the server boots, the other (i.e., smb-extra.conf) is located in the config folder on your boot flash drive.  There's no recycle bins in any of that.

  2. On 8/3/2020 at 12:05 PM, ryann said:

    What's funny is I can setup a share in my VM running on the server and transfer at near 10Gb speeds, but using SMB, I'm hitting around 1.2Gbps

    I've been struggling with that for years!  Last week the bright idea came to me to look into ACL in Samba.  Turned out that was enabled by default.  I disabled it, and BOOM!  For the first time in my life I saw Unraid saturate the network adapter!  

     

    Try adding this in the global section of smb-extra.conf, and let us know how it goes.

        nt acl support = No

     

  3. 15 minutes ago, trott said:

    problem here is when there is an error detected, how can we know if there is only one corrupt disk

    I have demonstrated in a clumsy example that if you disregard a disk in your parity calculation you can come to a case where that calculation shows a correct parity.  Now that we know that we CAN know that, we can find better ways of getting to know that.

  4. 3 hours ago, testdasi said:

    I have already stated my argument that Unraid should stay on the safe side and focus on the general case, especially when there is a safer alternative (btrfs scrub) already implemented.

    Speaking of the safe side, imagine for a moment that what I'm asking for has already been implemented, i.e., Unraid repors the disk number in syslog in case it finds a single-disk mismatch. So now, you run your read only parity check, and on this occasion you finds numerous parity errors, but to your surprize you notice that all those errors seem to be coming from disk#5.  What safe road would you be wanting on that occasion?  The one where Unraid propagates the wrong bytes onto the parity, or the one that gives you the golden chance to yank disk#5 out of your server and rebuild your good data from parity? 

     

    And about all the other features that you keep saying are higher priority for Unraid... They are not the core of Unraid. The "md" driver is.  The way Unraid does parity is the only reason I bought Unraid.  Everything else is bells and whistles, and you can do them on any other distro.

  5. 2 hours ago, testdasi said:

    I have already read that paper quite a while ago.

    Quote from the paper (underline added for emphasis): "If two disks are corrupt in the same byte positions, the above algorithm will (again, in the general case) introduce additional data corruption by corrupting a third drive. "

    Keep the two things separately -- detecting something, and doing something about it.  I am only talking about the first part.  Yes, with two disks corrupt in the same byte positions, you can't detect which disks they are. But with one corrupt disk you can.  Report that in syslog.

  6. 18 minutes ago, testdasi said:

    You proposed an experiment, which is not a proof. An experiment can disapprove a theory (by providing a counter example) but it cannot prove a theory.

    For example, I can crash a car to a wall at exactly 10mph and have 100% survival rate. But it doesn't prove the theory that crashing a car always have a 100% survival rate.

    Dude, with your example you can prove the theory that a car can be crashed.  After I see that once, then I know for a fact that a car can be crashed. In my example, I prove the theory that with dual parity you can know exactly which disk is carrying the mismatched byte, in case it is mismatched on only one disk at that particular byte/sector.  Nothing else.

  7. 6 hours ago, BRiT said:

    You went through the situation where you control all the other variables in the algorithm and only change one of them. In reality, you can not control all of the variables and be certain that only 1 change was made.

    Right. For the sake of simplicity, and proof of concept.

     

    We check parity one byte/sector at a time. We have just arrived at a byte/sector which does not match what we expected. We report the sector number in the syslog. We have a few possible situations here:  A) We are with single parity: We reasonably assume that we should sync the parity disk, as in 99% of the cases that would indeed be the correct thing to do, and also because we have no way of knowing otherwise. B) We are with dual parity, when more that one disk are mismatched at that particular byte/sector: Again, we have to assume that the parity disks need syncing, and that's the only rasonable thing we can do.  C) -- and this is the interesting one, we are with dual parity, and only one disk is mismatched at that particular byte/sector. Now here we CAN know which disk exactly that is, and at the very least we can report that in the syslog.  And now that we've dealt with that particular byte/sector, we can proceed with our parity check, on to the next mismatched sectors we may find, and deal with them one at a time.

  8. 4 hours ago, testdasi said:

    No, all proofs are not equal. The reason I asked for mathematical proof is because it must state all assumptions upfront then use logic to arrive at a set of guaranteed conclusions. To be brutally honest, you statement "proof is proof" and casual use of "Q.E.D." give me the impression that you don't seem to appreciate the rigid standard that the science in computer science has to adhere to.

    I can't believe what I'm reading here.  A proof is either a proof or it isn't.  There is no gray are in that, and that applies to any hypothesis that ever needed a proof. You can choose numerous different ways of proving something, and they would be equally valid at the end. Choosing a complicated and obscure way to prove something doesn't make it any more of a proof. So I went out of my way to give you the simplest possible setup that you can replicate yourself. Unless you find a flaw to the steps I presented, you should be completely satisfied with the validity of my conclusion. 

  9. On 8/11/2020 at 6:03 PM, testdasi said:

    I don't see how dual parity on its own can identify which disk is wrong so please provide support behind your claim.

    I will give you a very easy practical proof. (Proof is proof, right?)

    Setup a test server with dual parity. Run a parity chack, and verify that everything is good.

     

    For the purposes of the following exercise, to eliminate any interference, only use "maintenance mode", and only run "read-only" parity checks. Note, there's a bug in the current UI ignoring the read-only checkbox, so only do read-only parity checks from the command line. (mdcmd check nocorrect).

     

    For extra proof, collect the md5 checksums of the disks, and save them somewhere, like...  md5sum /dev/mdX >/boot/mdX.before.md5

     

    Now, let's "illegaly" modify one byte on one of the disks that are behind the "md" devices. Something like:  dd if=/dev/random of=/dev/sdX1 bs=1 count=1 seek=10000000

     

    Run a "read-only" parity check, and you will see the error reported in the syslog, like...  kernel: md: recovery thread: PQ incorrect, sector=19528

     

    Shutdown the server, and pull out one of the disks.  Start the server.  The missing disk will of course be simulated.  But here is the interesting part: With dual parity, you can do a parity check at this point.  And if you have been "lucky" enough to have pulled out the offending disk, the parity check will show no errors at this moment.  Any other disk, and you will be seeing exactly the same error as before.  We have now correctly identified exactly which disk is the one that's carrying the "wrong" byte, and we don't need to assume that it's the parity that's wrong.

     

    Q.E.D.

     

    Now shutdown the server, replace the missing disk, and start the server.  Unraid will offer to rebuild that disk from parity.  Proceed.  When done, you'll have all data on your disk exactly as it was before our "illegal" modification.  If you want an extra verification for that, just run the new md5 summs, and compare them with the ones you saved before. (Given that you were extra carefull to only use "maintenance mode", as advised earlier)

     

    Note, the "illegal" modification could have been done on any of the disks, data or parity, regardless, with exactly the same result.

     

    Also note, this whole exercise was designed only as a proof that the offending disk can indeed be identified with the help of dual parity, and that the good information can be restored back to the disk, instead of just assuming that the parity is wrong and propagating the wrong byte onto the parity disks. This execrise is not meant as a practical advise about how you should deal with such situations on real servers, although you could, if it comes to that.

     

    Now that the proof is out of the way, limetech can think of some efficient programatic ways of doing it. My point is that it would be extremely usefil if the report in the syslog shows something like:

    kernel: md: recovery thread: PQ incorrect, rdevName.2=sdc, sector=19528

     

  10. Works for me too.

    0 root@ToyVB:~# wget ftp://ftp.slackware.com/pub/slackware/slackware64-current/slackware64/n/net-tools-*.txz
    
    0 root@ToyVB:~# installpkg net-tools-*.txz
    
    0 root@ToyVB:~# which -a netstat
    /bin/netstat
    
    0 root@ToyVB:~# netstat --version | line
    net-tools 2.10-alpha
    
    0 root@ToyVB:~# netstat -tunlp
    Active Internet connections (only servers)
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
    tcp        0      0 10.22.33.88:23613       0.0.0.0:*               LISTEN      1671/sshd
    tcp        0      0 10.22.33.88:80          0.0.0.0:*               LISTEN      9062/nginx: master
    tcp        0      0 10.22.33.88:445         0.0.0.0:*               LISTEN      8921/smbd

    This output reassures me that I have nicely hardened all my services to only listen on my private "admin-only" interface. :-)

  11. I don't really get what this fuss is all about "supporting" wireless.  The wheel has already been invented you know.

     

    So, a month ago, I blindly picked up a bunch of used wifi adapters from a garage sale. Having never played with wifi, I was curious to see what I can do with them on my unraid test-server.  Turned out it was no big deal.  What I did was, I enabled their drivers in the kernel .config, and recompiled the kernel. Then downloaded one little firmware file, installed a couple of packages from slackware.org, and everything was up and running.  All and all, with googling and stuff, it took me about a day.  As we speek, my Unraid test-server is running as a Wireless Access Point, with a separate smbd process configured to listen only on the wifi interface.  Fun!

     

    Just a little note:  I have modified all the regular Unraid services (sshd, smbd, nginx, etc) to only listen on my admin-only ethGreen ethernet interface, so at this point, no service other than that dedicated samba daemon is listening on the wifi interface.  Also, for now, this box is not acting as a gateway, it's not forwarding any internet packets.  May decide to do that later, after playing some more with this thing, and after setting up some sane iptables rules. 

  12. So, I put that server in "storage", and set up a test server, to play with things, and see how I can get myseld out of this mess.

     

    I think I ran into a bug with the UI. (Maybe I should post a bug report somewhere?)

     

    So here's the bug, and how to reproduce:

    You start the array from the UI, and the first thing you want is a read-only parity check. So you dutifully uncheck the "Write corrections to parity" checkbox, and click the "Check button". To your greatest surprize, in syslog you see:

     

    Aug 13 19:36:15 ToyVB kernel: mdcmd (142): check
    Aug 13 19:36:15 ToyVB kernel: md: recovery thread: check P Q ...
    Aug 13 19:36:15 ToyVB kernel: md: recovery thread: PQ corrected, sector=19464
    Aug 13 19:36:16 ToyVB kernel: md: sync done. time=1sec
    Aug 13 19:36:16 ToyVB kernel: md: recovery thread: exit status: 0
     

    On consecutive tries, the UI does honor the unchecked checkbox, and you see:

     

    Aug 13 19:36:24 ToyVB kernel: mdcmd (143): check nocorrect
    Aug 13 19:36:24 ToyVB kernel: md: recovery thread: check P Q ...
    Aug 13 19:36:25 ToyVB kernel: md: sync done. time=1sec
    Aug 13 19:36:25 ToyVB kernel: md: recovery thread: exit status: 0
     

    To see the bug again, just stop the array, start it again, and try another "read-only" parity check.

     

    Unraid version 6.8.3 by the way.

  13. 4 hours ago, sota said:

    if you pull a disk out of the array and mount it anyplace else, you should automatically assume it's not integral with respect to parity.

    It is not quite true that you should automatically assume that. You can do an external mounting as read-only, and not disturb a single bit on the disk.  And, if you have any doubts, then that's what parity check is for.

  14. 2 hours ago, jonathanm said:

    99% of the time, a disk failure presents with read errors, which unraid already logs, followed by write failures, which kick the disk out of the array. That is a clear cut reason to investigate the health of a specific disk, but the write errors can also be caused by controller, cable, PSU or RAM issues. Discarding a disk because a parity bit was wrong is not productive.

    Jonathan, you completely misunderstood my question. I am not talking about disk failure or read/write errors, or anything of the kind. The disks are all in good health, but one disk (and I don't know exactly which one) may have been inadvertantly modified while outside of the array. So no, I do not want to discard parity, exactly the opposite -- I want to trust the known good parity to restore the correct data on that disk, in case that it had indeed been modified. All i need to do that is that a report-only parity check reports the actual disk on which it finds the mismatched byte(s) -- that is something that only double parity can do.

  15. 46 minutes ago, BRiT said:

    It comes down to being mathematically uncertain that you only have 1 unknown in an algorithm set of N variables.

     

    I do think it would be nice to have a better presentation of where or what Data Integrity issue (corruptions) your array has. Ideal world would have something like the following (massive amount of functionality required):

    • UI display the list of corruptions detected 
    • For each corruption, a way to view the data stored there for each drive
    • For each corruption, a way to possibly view the filename located there for each drive
    • For each corruption, an indicator of likely which drive has the corruption if only one drive is corrupt (with huge warnings and caveats about dragons)
    • For each corruption, a means of backing up current data values for each corruption for a designated drive
    • For each corruption, a means of restoring the data from the previous backup for a designated drive
    • For each corruption, a means of selecting a drive to attempt a rebuild of the data which then marks this corruption as possibly fixed and has yet to be verified by the user

     

    This doesn't do anything automatic, except generating the list of corruptions during the scan.

    Throwing in the UI and all the other things you listed, doesn't help us cut through the fog. File systems have nothing to do with this conversation. Let us forget about the UI for now, and let's not talk about massive multi-drive failures.  Trying to keep things as simple as possible, imagine the problem like this:


    We had a good parity protected array, for which we had run parity checks, and everything was OK.


    Now we will start a parity check, which will only report found errors if any.


    A single parity array can only report someting like this:
    "A mismatched byte was found at byte position NNNNNNNN"


    A dual parity array could report:
    "A mismatched byte was found on disk#5 at byte position NNNNNNNN"


    Are you not seeing the possibilities in this?
     

  16. 2 hours ago, BRiT said:

     

    So you're backing away from your Cosmic Data Change example that simply was never possible. So that's a good sign.

    Yes, in my initial post I used the word "magically", for the sake of saving us time and getting straight to the point.  But now I see how I got you distracted by that. :-)  Please don't fixate on that musfortunate wording.  I am counting on your long time experience to find a real solution to a real problem. (Even if that means me putting that server in storage for 12 months, while I'm trying to convince Limetech that what I am talking about actually makes some sense.:)

    • Haha 1
  17. 1 hour ago, johnnie.black said:

    If you continue to read the quoted post you see why Unraid doesn't support that.

    Yes, I read that post a few times. Limetech confirms that he can be doing it. But the reason he gives for not doing it is kind of flaky: He gives some extremely improbable scenario in which two disks are corrupt in the same byte positions. I don't find that to be a valid argument, because you could apply the same logic if you want to build up an argument against having any parity protection -- even with single parity there is the hypothetical possibility that two disks are corrupt in the same byte position in such a way that the existing parity checksum "looks" good. So how is that a valid reasoning?

     

    The thing is, identifying something, and deciding what to do about it, if anything, are two completely different things. That posting over there boils down to: "we're not going to try to identify it, because if we do, we will not know how to handle one hypothetical extreme scenario".  In my scenario however, If you can identify for me (with the help of dual parity) which exactly physical disk is the one that's been illegaly modified, then I could just throw that disk it in the trash bin, and restore my good data onto a new disk.

  18. 5 hours ago, BRiT said:

    And I will point you to the details of how hard drives work in reality, where the drives themselves will report CRC errors, and thus you know which drive is damaged.

    Brit, I am not talking about "CRC errors". I am talking about a single byte changed on one data disk, which to that disk looked like a legitimate change, so nothing to do with disk errors. 

     

    Here's an example that may help you see what I am talking about... During a recent border crossing, my server was out of my hands for a short period of time.  I am allowing for the possibility that one of the disks may have been mounted somewhere. (read-only, hopefully!:)  If they have mistakenly changed even a single byte on that disk, to the disk that would look like "legitimate" write, so nothing to do with CRC errors. Now, it that scenario, if I were to start the array and do a parity check, the discepancy of that byte will be found, and it will be automatically "validated" onto the parity disk, under the assumption that it is the parity disk that's wrong. Am I explaining this a little better now?

     

    Although this example is a little extreme, you can think of various scenarios how a byte on a data disk can be changed. With single parity setup, you really have no way of knowing which exactly physical disk is the one with the wrong byte. So, back to the point:

    5 hours ago, BRiT said:

    Not with the algorithm unraid uses for dual parity.

    Are you really familiar with the algorithm unraid uses for dual parity, or are you making it up?  Has there been any discussion on the matter that you can point me to?

  19. On 8/9/2020 at 9:23 AM, BRiT said:

    Others have already brought up this point (here or elsewhere), so why are you ignoring it for your cosmic example? The drive itself will know what data is wrong since the hardware supports checksums of sectors, so if a data bit is flipped and it doesn't change the checksum information, the drive will report it.

    I used a cosmic hypothetical only to save us time, as that's not the main point.  If you insist, I can give you some real life examples about how this can happen.

     

    On 8/9/2020 at 9:23 AM, BRiT said:

    Mathematically it is impossible to know which disk is wrong with the algorithms used by single and dual parity protections.

    Mathematically it is impossible to know which disk is wrong with the algorithms used by single parity protection.  That is not true for dual parity though.  Which gets us straight to the point.

  20. On 5/27/2019 at 11:18 AM, Squid said:

    Yeah.  Anything and everything that I've ever done in my life ever is fair game to her.  But god forbid I bring up something from yesterday without her losing it and complaining that it was in the past.

     

    On 5/27/2019 at 12:59 AM, BRiT said:

    In 10 years this thread will be old enough to drink even in the US.

     

    On 5/26/2019 at 2:21 PM, strike said:

    I think maybe common sense invented it. In real life, do you suddenly quote something someone said 10+ years ago and expect them to continue the conversation? 

    Yes, I am an example of someone who came here 12 years later, hoping to find some answers on this topic.  In 2008 Limetech announced their intention to implement dual parity. Now 12 years later, I come to revisit Unraid, and I find that dual parity is already here, but for the life of me I can't find any useful documentation about it. (I googled a lot.)  And this thread is linked from the wiki.

     

    So, let me give you a specific example of what's been bugging me for over 10 years now:

     

    Suppose you have a two disk array -- one parity and one data.  And suppose that some cosmic particle has magically flipped just one single bit on one of the two disks.  Either one.  You do a parity check, you find the discrepancy, but with single parity, you have no way but to ASSUME that the parity disk is the wrong one, so with 50 percent probability you will be writing the wrong bit to the parity disk and all will look "good". 

     

    Now, I was hoping that when the dual parity gets implemented, then the system will actually be able to KNOW which disk has the wrong bit, and not just blindly assume that the parity disk is wrong.  If Unraid is indeed doing this now, can somebody please point me to the discussion about all that?

     

×
×
  • Create New...