December 18, 200817 yr Hi, with 4.4 final I experience a problem that was fixed in 4.3.3 and that did exist in individual releases before 4.3.3. Consider the following scenario: 1.) I'm writing with a Windows box to a user share (1 drive). This drive is active and hosts another, second user share (see 2.)). 2.) Another process on the same machine requires to read from a big share that is spread over a lot of drives and a lot of these drives are spun down. The drive that is actually written to (see 1.)) is part of this second user share. 3.) Both processes freeze. After some time Windows comes back for the writing process with a "... data lost ... " message. And the data is lost, trust me. This happened to me several times since 4.4 final now. This did not happen on 4.3.3. This is a show stopper for me. Freezing may be ok for some time, but lost data is not acceptable. I will go back to 4.3.3 now. Currently I see only one workaround: Drives shouldn't be part in more than one user share. That seems to help - but is not possible in my scrambled machine ... Anybody else seeing this "... data lost ... " messages? Thanks Harald
December 18, 200817 yr I have to start by saying I really like your subject prefix [4.4 final]. I think it should be recommended, for this forum. There is another workaround, either disable spin down for the drives that might be involved, or use something like Joe's script that wakes up a group of drives if a certain app is 'awake'. In this case, the script would watch a selected group of drives, and if any one of them was spun up, then spin the rest up in an orderly and safe way. I think Joe's script would continue to keep them up, until that external app was gone, but I don't how that should be handled in this case. It would be a problem. I think you will find corresponding sequences of exception Emask errors, of the frozen and timeout type, in your syslog. Your scenario sounds very plausible, although I'm surprised that v4.3.3 was better. I would have assumed that v4.3.3 would have behaved as bad or worse. It appears that the Linux kernel gurus have not resolved this issue yet. And/or we don't fully understand what is going on. I have a theory, but it's too speculative to be presentable. This kind of issue, which I think is similar to cases where drives have been disabled because they could not respond while other drives were spinning up, is somewhat unique to unRAID users, as well as possibly Drobo and maybe Windows Home Server. That is, the use of an array of drives AND the independent spinning up and down of individual drives in that array. In almost all of the drive array world, they use traditional RAID, and almost no spin down ever. So this has not been an issue with the rest of the Linux world. In a sense, we are the leading edge here.
December 18, 200817 yr Author RobJ, I can definetely tell that 4.3.3 was better here. Sometimes I had to wait for a minute until my system was responsible again (all required drives spun up) but it worked. An hour ago my Windows was telling me that my H: drive was not accessible. There was no write activity. Just all drives were spun down and had to spin up. I tried again and everything was fine. I have several options: 1.) Go back to 4.3.3 (I hope BubbaRAID will work with 4.3.3) 2.) Stay with 4.4 but keep drives alive. Hmm, I think if I go with 2.) I do have a new feature request. At least during night the drives should spin down. So a time pattern like "spin down between 00:00 to 06:00" would be great. Thanks Harald
Archived
This topic is now archived and is closed to further replies.