RobJ

Members
  • Posts

    7135
  • Joined

  • Last visited

  • Days Won

    4

Everything posted by RobJ

  1. I was tempted to literally move your post straight to the FAQ! And add some FAQ-ness to it, preceding questions like "Please explain Linux file and folder permissions!", and "What's it take to delete a file?", and probably others that users ask. Would you be interested in starting a FAQ entry based on the above?
  2. Is there any possibility that one or more of these confused drives were part of a BTRFS pool before, either as a Cache pool in unRAID or a BTRFS pool outside of unRAID? Tom has recently said something relevant to this, that seems to indicate that BTRFS has functionality that preserves the BTRFS-ness of a drive, which might include it's pool size. If your 1TB drive is still thinking it's part of a 4TB pool ... Which also may mean that to properly format a BTRFS drive to something else, you may need to take an extra step to un-BTRFS a drive before the format. It could be as easy as just zeroing the early sectors, but I don't know where BTRFS stores its registration info. Could be hidden in the MBR, or near it in the empty unused sectors at the beginning of the partition, or at the end of the drive, etc.
  3. I didn't know it was restricted, but I have to say there are a number of us that prefer it that way! That's because we don't like having to scroll scroll scroll past someone's long novel! I suppose there is a happy middle ground we can all agree to. ( but 5 lines and no flashy things sounds nice and plenty to me! )
  4. At this point, parity is going to be good, because whatever state it was in, it's been 'written' into the rebuilt drive. In a way, rebuilding a drive 'corrects' parity, because the drive *has* to be written consistent with whatever parity is. You can check it again after the update. In general, it's always good to know everything is fine before starting anything major, so normally we would probably recommend doing a parity check first. But I don't think there is any point now, unless you know of any issues that have occurred in the last week, since the rebuild.
  5. You aren't getting a gateway either. It's as if the router isn't working. See if you can log into the router and see the server from there. You might try turning off bridging, and see if it can work. Then you can turn it back on again later, if needed.
  6. But SMART only tells you about the drive's health. Many drive errors have to do with other components, and for that you would need to check the syslog. Please see Need help? Read me first!, and attach the diagnostics zip.
  7. An interesting SMART report! Looks fine at first, like a brand new drive, with only 74 Power On Hours. But there are 2 big discrepancies in the report! One is that the test section mentions there were 2 vendor test at 21639 hours! Both completed without errors, and there's no indication there of any other testing. Yet in the General Values section near the top, it says there's a test in progress, with 90% yet to complete. So one part says no test in progress, another part says there is. One part says the drive is new with only 74 hours, another part says there was testing at 21639 hours! Is this drive possibly a refurb? It looks like they may have reset most but not all of the SMART attributes and values.
  8. There's a safety feature built into the post editor, that has saved quite a bit of work for me. Every now and then, something goes wrong when I'm editing a reply to an existing thread, and I'm thrown completely out of the editor *and* the thread. But if I go back in and start another reply, the edit box will be pre-filled with all my previous work, apparently still saved in a buffer somewhere, at least for a short time. Then, you are given a chance to keep it, or clear the editor and start anew. So if you think you may have lost your work (it just happens, don't know why), first try starting a fresh reply, and see if it all magically reappears in the edit box!
  9. If you go to your Account Settings, then look for Signature on the far left. Click it, and you should see an edit box with your signature in it, and you can edit it as you like. Up above, there is a switch for whether all signatures are visible. Turn it on, and you should see everyone's signature, including your own.
  10. The commands look completely correct, and the numbers look correct too. Except for the first block, the entire drive was zeroed then post-read. I see no reason for an error. The only thing I can think of is that the zeroing command does not have an end, it just says keep zeroing until you can't, and that may mean it keeps zeroing blocks until it can't, which results in an error, which therefore is expected and can be ignored. I think you can safely ignore that line in the log, as it did complete all of the zeroing and re-reading.
  11. I've never done it, and I haven't read of others doing it, so you're on your own here! But LimeTech has added Linux support to the flash preparation tool make_bootable. You don't need to format the drive, but you should still clean it off completely just as if you were formatting it, making sure though that you preserve the volume label of UNRAID. Then extract the 6.3.2 distro onto it, with its folders. Then in a console session (like SSH or Telnet), change into the /boot directory and run make_bootable_linux. That should make it bootable with the latest syslinux. You can then restore your config folder and files to it, using the instructions. I'll be interested to see how it works. Let us know of any quirks, and modifications needed to any instructions.
  12. I don't think anyone was saying that, at least I didn't intend to. I think what you are saying is that you believe the problem above could have been caused by an ECC collision, that the data was corrupted in such a way that it still matched the ECC info, but was caught by checksum. That's a plausible explanation. It's the first time though that I've ever heard that ECC collisions could be statistically easy. I'm still an amateur, and if you can point me to any studies about this, I would really appreciate it. I don't believe I said that. I'm sorry if I wasn't clear, but what I was trying to say was not that the reallocation caused the corruption, but that the reallocation caused the just previously or simultaneously corrupted data to be written to the new replacement sector. We don't know if it occurred or not, but within the brief one day window, there was no report of pending or other issue.
  13. methanoid said: That sounds more like a fan than a drive. Edit: thanks Squid, I copied from the wrong post. Plus my guess was far off, a lousy guess.
  14. Need more information. That's not cosmetic, that's something that is really wrong. Need to see the diagnostics from the first post, as well as what actions you may have taken before that, as well as what steps you took to replace the drive with the 4TB. Parity is probably completely invalid. Is it possible you restored super.dat from an older backup?
  15. Are you absolutely certain that both drives were formatted with XFS? I noticed that you used both xfs_repair and reiserfsck, as if you were not sure which file system was in use. Using the wrong tool can both cause additional damage, and result in errors that may look like hardware errors. You need to find previous evidence of the actual file systems, like older syslogs or notes or screen captures, that indicate the correct file system for each. Once you know which one it is (ReiserFS or XFS), then you can retry on both with the correct tool. And if it is ReiserFS, then you should try the --rebuild-tree option with the scan whole partition option (-S), searches the entire partition for files and folders.
  16. That puts SanDisk back on the hook. Since most SSD users are probably Windows users, definitely not BTRFS or checksum users, there may well be a fair amount of undiscovered corruption out there, once the drives are old enough to have retired sectors. This should perhaps be a news item. Certainly needs more investigation.
  17. For folders/directories, I believe you would use -type d instead of -type f.
  18. Not sure but try Guides and Videos
  19. After reading the title, naturally you thought "I don't think so". All I have to say is, PLEASE go view gridrunner's fast paced video about using rclone with scripts! And consider all of the capabilities it adds to unRAID! Perhaps the configuration of it could be similar to Notification agents? How to setup and use rclone. Copy, sync, and encrypt files to the cloud. Even stream media - video guide by gridrunner
  20. Better would be: And I'm sure a bash scripter could shorten it with a loop, working on a list.
  21. An interesting report, and your point is good, backups and checksums are vital. But something about what happened bothered me, so I gave it some thought, and realized this is not at all how we are used to seeing normal hard drives behave. Because of sector ECC, the drive knows whether the data read is correct or not, and even tries to correct it if it only has a few wrong bits. But it never returns the data to ANY reader (including the file system code) if the data cannot be read perfectly, corrected or not. This means you CANNOT get corrupted data back, you only get perfect data or an error code. With normal hard drives, behaving the way we are familiar with, data that can't pass its ECC test cannot be read, but the sector is not reallocated until you give up on the sector and write to it. That initiates the drives testing of the sector, and a possible remapping, and then the writing of the new data. Based on what you have written, this SSD has done something different, and I don't think it should have. It decided on its own when to test and reallocate, something that at first *sounds* like a good idea, but it also then wrote what it could recover of the old data to the new sector, corrupt data. That's unacceptable. But if the drive decides to help you by *fixing* the sector behind the scenes, it has to come up with *something* to write to the new sector, and that's why this corruption occurred. If the old data was corrupt, you can't use it, and that's why you can't fix a sector until you're given new data for it. I don't like this. Sounds like an inexperienced developer with a *bright idea*, who did not think it through. You CANNOT write to a sector with anything less than perfect data. This time, the built-in checksumming caught it, but generally there isn't any checksumming, and this would result in silent corruption. I don't see how we could recommend any SSD's with that firmware. But I could be off-base, we could be misinterpreting what actually happened. Or there was a terrible coincidence here - the sector went bad but was safely replaced, and then the same sector was corrupted somehow (might want to avoid that bad luck sector!). Edit: in taking another look above, I don't think we can conclude for sure that the remapped sector was the one with the corruption, and that may change everything. If it wasn't, then that takes SanDisk off the hook. Off-topic, but this made me wonder why we ever worry about bit-rot. Something that occurs on the order of one bit in a thousand terabytes would be caught and easily handled by the sector ECC info. What am I missing?
  22. Really appreciate the testing. It answers questions I had, as to whether the constant polling could impact performance. Sounds well implemented.
  23. Added to Guides and Videos! I just watched it, and have to say this was very impressive! You move fast, but that's fine, the viewer can pause and rewind as needed. That makes these videos short and information dense, just the way I like it! I can't watch most tutorials and classes, as they are usually talking down to the lowest common denominator, and I generally give up trying to pay attention. I also want to say that I think this video could be very empowering for many users. It introduces a number of very good and useful capabilities.