RobJ

Members
  • Posts

    7135
  • Joined

  • Last visited

  • Days Won

    4

Everything posted by RobJ

  1. Have you tried the new unRAID boot GUI? You can then use anything you install into unRAID.
  2. I still have some of my dialup modems, although not sure where the 300 baud one went... If I can find the right phone number, I might be able to play Global Nuclear Warfare! Or Tic-Tac-Toe ...
  3. The Lime cat pic has been around the Internet since sometime in the early 2000's. Actually, it's probably not a lime. But I like it anyway. What cat would be happy, made to wear something on its head?
  4. This is Linux, those are completely allowable. It's only a problem when you interface to Windows stations, where it's not allowable. Actually, I personally think it's a problem everywhere, a huge source of confusion! But Linux allows it.
  5. You can format a drive even if full. The only thing that slows you down from formatting a drive is the checkbox that Squid mentioned, and that's an important safety feature long needed. In the past, too many users have too quickly formatted drives, that shouldn't have.
  6. Wiki formatting is tricky, but I found a way to spread the rsync command parameters apart better. I assume the 2 rsync commands are what you were referring to? I could not see anything else where spacing mattered. And I didn't find where "there appear to be spaces but in fact it is a continuous string". Let me know. While Trurl's answer is probably enough, it seems to me you may be unclear how unRAID works, so here are some basics. * An unRAID array is a set of data disks plus parity disks. Each data disk is very like a Windows disk, with its own file system and folders and files. Each data disk can be shared as a disk share, or not shared. Sharing is about providing controlled network access to the folders included in the share. The unRAID User Share system is a whole 'nother level above the data drives, a virtual file system that combines all of the data drive file systems, and provides a different look at your data files. It provides User Shares for every top level folder, with controls for sharing them. * Global Share Settings should probably be renamed as Global User Share Settings, as it is strictly settings for User Shares, and is not involved at all with the flash share, Unassigned Devices shares, disk shares, Samba shares, NFS shares, etc. * If a drive is excluded in Global Share Settings, then it is considered automatically excluded from all User Shares. That's the whole purpose of the global setting. When you select a swap disk, you are selecting a data disk in the array, and it stays a data disk in the array, with full parity protection. I hope this helps.
  7. I recommend changing that to either 60, 120, or 180 (1, 2, or 3 minutes).
  8. I've updated the FAQ entry to use poweroff instead. Thanks for pointing it out.
  9. I spent some time going through the linked papers, and I do appreciate their provision, did learn a little. But I'm afraid I have yet to find one bit of evidence against the strength of the ECC bits to preserve data integrity. I have more reading to do, especially want to read up on the newer technologies like AF format, and how it does ECC, compared with the old 512 byte sectors. All of the papers were old, roughly 10 years old, and based on older technologies, with 512 byte sectors. The Toronto study was a good one, probably the best of the papers, but irrelevant to data corruption. It was heavy on discussion of bit error rates, SSD's vs hard drives, and AFR's, but every bit of it was about drive reliability. There was a clear assumption that while there are numerous bit errors from various sources, they are ALL caught, and are either corrected in 'transparent errors' or caught as 'non-transparent errors'. If anything, the study reinforces the idea that bit errors are either corrected or caught. Admittedly it's indirectly, as it's not directly stated, but no other possibility is entertained. The CERN paper was frustrating, and I don't know why several sources are quoting it or referring to it. CERN has great scientists, and I'll always be interested in whatever is published in connection with CERN, but this was more of a vague announcement of the discovery of data corruption. It contains scientific data, but gathered and analyzed with a sad lack of rigor. They produced some numbers, but nothing you could actually draw any conclusions from, at all. And they openly admit that they deduced there were hardware and firmware issues that dramatically affected the numbers and results. The tests were run against whole systems (CPU's, RAM, drives, etc), with little effort to determine and distinguish between the many sources of errors. They give a statement that a vague 80% of the errors detected were probably from issues between their 3Ware controllers and their WD drives! And that they were now going to replace the firmware on 3000 of them! They thought that about 10% of the other errors were related to memory, but were detected by its being ECC RAM. And the last 10% was from something that wasn't explained, no physical source mentioned. Nowhere is there any mention, let alone a clear attribution to sector bit errors not being caught. I don't think this paper should be cited at all. The NEC paper is good, and worth reading by everyone for the ideas about other sources of silent data corruption. At no point, does it implicate a weakness in ECC. Rather, it points out other possibilities. The 'Torn writes' I discount, as while it would corrupt data and parity, it would be detected, by a huge ECC mismatch. I may be wrong, but I think I can ignore the 'Misdirected write' issue too, as I've never heard of its possibility EVER, and I feel fairly sure that it's a problem long solved by current drive vendors (NEC paper was from 2008). That leaves the 'data path corruption' issue, the interesting one here, and the one that is consistent with our own experience. It's about the many sources of corruption between the drive interface and the media surface and back again, including the memory buffers, busses, and registers, and firmware issues. We had basically concluded we saw it quite awhile back when a few users had repeating parity errors, with the only possible source being the memory caches or registers on the physical drive itself. This one needs some thought. In this paper, see also the nice diagram of what they call 'parity pollution', something that's applicable to us. I don't know how much confidence to put in their frequency numbers though, they're based on 512 byte technology. My apologies for only draft quality writing here, a bit rushed.
  10. It just means operations with I/O significant enough to affect the speed measurements, like the ones you mentioned.
  11. Please see Need help? Read me first!, and attach the diagnostics zip. I suspect it might be an unsupported networking chipset. We can see if that's true in the diagnostics.
  12. The testing section in the SMART report for that new Hitachi (Disk 2, SN=...R7R) has improved, now showing the test that was in progress before, plus a completed test. No issues on the tests, but still has the age discrepancy. Was this drive a refurbished one? You have another Hitachi (Disk 4, SN=...YKP) with an odd age discrepancy too. Possibly another refurb? You have another Hitachi (Disk 1, SN=...5TV) with 2 Pending sectors, have had them for awhile, and you've acknowledged it, to ignore them. That's a very bad idea! There are many SMART attribute changes you can safely ignore, but not this one. This one will block drive rebuilds, and parity checks and builds. If you can't use it for drive rebuilds, then having parity is useless. (And if you don't care about parity, then you don't really need unRAID!) You have IDE emulation turned on for some of your onboard SATA drives. When you next boot, go into the BIOS settings and look for the SATA mode, and change it to a native SATA mode, preferably AHCI if available, anything but IDE emulation mode. AHCI should be faster and safer (and as you'll see below, IDE proved disastrous!) Worse, it reported "limited to UDMA/33 due to 40-wire cable" for them, so performance is likely to be terrible for those 2 drives, earlier syslog - Seagate ST1000 (Disk 2, sdc, ...M5Y) and Sandisk SSD (Cache, sdb, ...343). In the current syslog, it was Disk 5 (Hitachi ...5EA) and Parity (WD30EZRX ...492). None of those you want to be on the ports configured to be the slowest! Your tunables are not optimal, which will result in much slower parity check performance. md_sync_thresh should be raised to either 512 or about 990, depending on whether you have certain SATA controllers (don't remember which). Test for yourself. Syslog is filling and rolling over with BTRFS errors, indicating a corrupt file system for the Cache pool. On March 5 at 14:55:15, after days of BTRFS errors related to the corrupted BTRFS file system on the Cache drives, the SSD holding the Cache drive stopped responding. Multiple attempts were made to recover it, unsuccessful, so it was dropped from the system. Because it was on an emulated IDE channel instead of a SATA channel, the whole channel was dropped, which included Disk 2, the Seagate ST1000! There was no fault with it, just happened to be tied to a drive that was dropped, so it was dropped too. You removed it, but I believe you will find that it's completely fine. Check it for sure. After the SSD was dropped, there were several Call Traces, but they were from BTRFS because the Cache drive was gone, and therefore loop0 was gone! So you can ignore them. Now for the new syslog! Drives have moved, are changed. You started then stopped, assigned Hitachi ...R7R to Disk 2, then restarted and Disk 2 is about 2 hours into rebuilding. I'm not optimistic it will finish because of the bad sectors on Disk 1. You still have the BTRFS corruption in the Cache pool. The CrashPlan container complains that the path "/mnt/disks" does not exist. So ... - The BTRFS corruption in the Cache pool *has* to be fixed. I'm not sure you can completely trust anything stored on it. I suspect you will have to stop Dockers and VM's, then copy whatever you can off of it, and reformat it. I believe there are FAQ entries about this. Might be best to disable Dockers, and delete and recreate docker.img. - AHCI needs to be set in the BIOS - The Pending sectors on Disk 1 *have* to be fixed, probably by rebuilding Disk 1 onto itself - And you are trying to rebuild Disk 2 now. If it does not go through, you might try putting back the original Disk 2. You would have to do a New Config with Retain all and set Parity is already valid, in order to put the smaller 1TB drive back into Disk 2. Once the array is working fine again, you can then try again to add the 3TB drive. I may have forgotten something, forgive me.
  13. The Fix Common Problems plugin has a special Troubleshooting Mode, that if enabled will constantly monitor and save diagnostics and syslogs to the flash drive, until you reboot. You don't want to run it normally, because it bloats the syslog hugely, but it's perfect for this kind of issue, when the machine can't be instructed to save a last diagnostics and syslog. First, let the plugin run its tests and see if it notes anything that might be causing trouble.
  14. The Fix Common Problems plugin has a special Troubleshooting Mode, that if enabled will constantly monitor and save diagnostics and syslogs to the flash drive, until you reboot. You don't want to run it normally, because it bloats the syslog hugely, but it's perfect for this kind of issue, when the machine can't be instructed to save a last diagnostics and syslog. First, let the plugin run its tests and see if it notes anything that might be causing trouble.
  15. This one needs to be highlighted! It's a VERY useful tip, for users to make sure the right person sees the current topic. When you create a 'mention', as Endy demonstrated above with jonp, you also create a notification to them about that particular post, with a link directly to it that makes it very easy for them to come check it out. In other words, if you need or want attention from a particular user, to your post, then include a 'mention' of them. But PLEASE do not abuse this! (If you abuse it, they are likely to turn off mention notifications, or just set you to be ignored!)
  16. @gfjardim is not here very often, I'm sure he will take care of it when he can.
  17. By the way, I don't want to completely wave off the dd write error that is concerning some. I was just trying to offer a possible explanation, but only gfjardim can say for sure, once he's back. He's a busy guy, generally only seems to be available to us a couple of times a month. I do think it's a harmless one, but let's wait for him to say for sure.
  18. The NZXT Grid+ V2 now works on Linux In the post editor, the upper toolbar, use the link of a chain to form your linkable URL, with your choice of text.
  19. I suspect you are going to get some love for that! But you'll have to tell them that since the array isn't running yet, there aren't many services running. Other than the boot drive, there are no storage devices available. And networking could be unreliable, as it gets reconfigured on array start. I think I would want to add a note with it, that the feature is there but you aren't providing support for its use!
  20. Check Disk File systems The section you want is near the bottom.
  21. It's a good option, but I think you will get complaints from some who actually want a true "At boot" option. Might be better to call it something like "On first array start only". Perhaps you can add a note that currently a true "at boot" option probably has to use the go file.
  22. I don't know if you have tried the Help button, but it's generally pretty helpful. The New Config function has nothing to do with your data, that is completely unchanged. It's only about configuration changes, such as which disks are assigned, and where they are assigned. When you start a fresh array config, you have Retain options, as to which of the assignments you want retained and which you want cleared. Since you are starting a fresh array config, you don't want to retain any of the old config.
  23. I think what I would do is : - add and format the 1TB drives - copy all of the data on the 3TB drive to any of the other drives you wish - Tools->New Config with Retain nothing - assign the data drives you want, in the order you want, but not the 3TB drive - start the array and check everything, make sure all your data is there on the new array - assign the old parity drive, and assign the 3TB as parity2 - start the array and let parity build Until you start the parity build, the data still on the 3TB will serve as a backup, in case anything goes wrong.
  24. No, there's no failure, no shortfall. Please see my comments here And a few posts up from that, I have a comment about the apparent "dd error", which may not be an error but just the flag that there's nothing left to write.
  25. No, they are exactly correct, if you look closely at the commands. In the pre-read, he reads the entire disk, including the first block. In the zeroing and post-read, he skips the first block, then zeroes and tests the rest. He's using a block size of 2MB (bs=2097152), and what is zeroed and post-read is exactly the entire disk minus one block. You can subtract the 2 numbers and see. To skip the first block, he uses skip=1 on the read and seek=1 on the write. He must be handling the zeroing and signature writing in that first 2MB in another code section that isn't logged above. I don't know why, but apparently the read to the end stops without reporting an error, whereas the write does report an error, when it cannot write any more.