c3

Members
  • Posts

    1175
  • Joined

  • Last visited

Everything posted by c3

  1. Unfortunately, I do see drives returning corrupt data on a regular basis. Enterprise storage manufacturers actually fight over who does this better. Enterprise storage uses drives which offer variable (512/520/524/528 bytes per sector) configurations for additional checksum storage. They don't do this just because they want you to buy more drives. Additional sources https://bartsjerps.wordpress.com/2011/12/14/oracle-data-integrity/ http://blog.fosketts.net/2014/12/19/big-disk-drives-require-data-integrity-checking/
  2. Look at the OP. First line, SMART errors and the disconnect from them meaning data loss. For me, data corruption equal data loss. The drive is reporting uncorrected errors, these are not ECC collisions, these are known bad/lost data. The topic title talks about reallocation and corruption. I was trying to make it clear that reallocation is the best thing that happened in this whole mess. The drive has learned to not use the bad place. But the learning was costly. Again the OP, first line, the reallocation is not the problem. It was the errors before the reallocation. I don't believe the SMART data was polled fast enough to state that there were no pending. Pending is not a permanent state. I have not quoted anyone, and would hope that people will understand that this means nothing is directed at anyone directly.
  3. I have said this many times before, SMART counters do not predict errors, only reports errors which have already occurred, after data loss. ECC has a very limited ability to detect errors, and a more limited ability to correct. What a reported uncorrected error means, is the drive has data it knows is bad because it failed ECC, but was unable to correct. But a collision (easy with ECC) means you can not really trust data just because it passes ECC. Read up on SHA collisions if you want to know what this means or how bad it is. Enter check-summing. BTRFS is one filesystem which has checksums for the data (ZFS is another). XFS and others have checksums for the metadata, it's deemed more important. Filesystems like WAFL/CephFS/OneFS actually do correction in the event of a checksum error. The general direction is less reliable storage devices, at lower costs. If this event had been a video file, the data loss would probably be unnoticeable. And if you don't want to lose data, a higher level protection scheme (Replication(backup)/RAID/EC) needs to be in place. Which is exactly what happened in this case. The data was protected by a higher level scheme, a backup. One last thing, a reallocation does not corrupt or lose data. The reallocation is long after the loss. Reallocation is done on the write of new data to a location previously found to be unreliable. The loss is what caused the reallocation. Prior to a reallocation you might see a pending, or ECC corrected and uncorrected.
  4. There was a trick question last year, what's the largest disk you can buy? The answer is the Samsung 15.3TB SSD. This came as a surprise to many as they tried to defend their answer of 10TB, but the reality is the 15.3TB goes exactly where the 10TB answer was. This is the TB/unit question, which is really, not meaningful. So, the new questions are what is the best $/TB or $/iop? You need to understand what you are looking for, and ask the right question. As you mentioned HDDs have several technologies on the roadmap which will allow for increased capacity/density. These will continue the $/TB curve where HDDs have the lead. On the other side, SSDs are adding things like compression and duplication. It is quite the competition. The first to vanish is the SAS or high performance HDDs, 10k, 15k. HDDs are becoming purely a capacity play, $/GB. Next, desktops and laptops will contain SSDs. Frankly, I am surprised how many are still listed with HDDs. To answer your question, no HDDs are not going away. But you probably wont be seeing them. They are moving into the data center, which is where most >4TB HDDs are already. And in this space, the question becomes how many TB/U and $/TB. 4U for a PB? done Seagate, Hitachi, and WD are in the packaged HDD business. Once HDDs are out of the laptops and desktops, these packages will become of the focus of the $/GB and TB/U optimization. One of the first things to go will probably be the 3.5 inch z height. More platters per (spindle motor)+(head servo) is a great way to drive down $/TB.
  5. The persistent cache is a write cache, there is no "hit" involved. The persistent cache is measured in GB, the cache you are referring to is measured in MB. Might be better to call it disk buffer. The way that buffer is used is different on SMR vs PMR. Using it for writes puts the data at risk to power failures. Persistent cache does not, hence the name persistent.
  6. Actually, the SMR drives like Seagate's 8TB archive, offer better performance than PMR for simultaneous random writes, up to the persistent cache size. The SMR drives put those writes into the persistent cache without the seek overhead of the PMR drives. This is very similar to the performance of MLC/TLC/QLC used as SLC cache on SSDs, for example the Hellfire has ~17GB of cache, less than the Seagate 8TB archive. And WD Black went with ~5.5GB.
  7. 1) SSD write endurance changes with each generation. Any research is limited to the generation studied. This is still a rapidly advancing area. Today, many MLC/TLC/QLC based SSDs use a portion as SLC for cache, in varying amounts. 2) A "reliable" device is not a substitute for a backup. Everything fails.
  8. Pending is probably worse since that means data was unable to be read. ie something was lost. Reallocated means the data was written, but on a spare sector. As mentioned, continued increases are worrisome. This drive is only a few weeks old, was it precleared? Preclear normally gets all the early defects located and out of the way. If the array is not busy, maybe run a long test. Overall, the parity drive is like any other, it's failure does not result in data loss. It does get more writes.
  9. http://hothardware.com/news/intel-reacting-to-amd-ryzen-apparently-cutting-prices-on-core-i7
  10. Interesting that your first is so corner case... What you are talking about is called plaid, nested or layer raid. The most common is raid 10, striping data over mirrored drives. But raid 50 and raid 60 are not uncommon. There are commercial products offering plain like 56 and 66, again think of systems with hundreds of drives. Two problems with actually doing this with unRAID, though you can do it. 1) As you mentioned, the number of drives needed to do it. Since you would need at least a parity and a data drive in unraid, and in the lower controller raid, that's 4 drives, with a yield of 2xdrive size. It just multiplies up from there. 2) the number of controllers/slots will also be a factor. To get those 30 arrays, how many controllers will be needed, and thus slots? I don't know what kind of controller is in your R710, but you'll want to get those drives switched over to JBOD so each is directly addressable from unRAID.
  11. This behavior can be controller+firmware specific. It would be best to grab a sandbox drive and test the conversion from to and from JBOD.
  12. I could make better guesses with some logs.... Is the kernel panic exactly the same? If so, probably not defective hardware. If it is a bit random, I would check cooling.
  13. c3

    Filerun?

    I will try it as I have been looking for a something like this.
  14. I'm seeing the 5TB for $99 (WD external), the downward price cycle continues. As the article points out, the demand for HDDs is in decline. Thus it makes sense to reduce the assets assigned to assembling HDDs. Hopefully, HDDs will be completely out of non-servers shortly. Then HDDs can seriously take up server requirements. The above 5tb is a good example, it will just go away. 4TB drives are extremely popular, hence less flexible pricing. 5TB is not really a common size outside consumers. 6TB and 8TB battle for some middle ground, a weak space. And 10TB currently owns the capacity storage frontline. 12TB is take that spot in expect late 2017. (And yes, the 1TB/2TB/3TB just go away too). All of this is in regard to 3.5 inch HDDs.
  15. I asked the same question and got a quick answer indicating no problem. http://lime-technology.com/forum/index.php?topic=55422.0
  16. The restores (and their failure) do not use parity data, unless there is a failed disk. Since there is no failed disk, the validity of parity is irrelevant. Your restore problem remains. Were you able to downsize the cache drive?
  17. You need an option for both.
  18. It should be noted that hot adding and hot swapping are very different. Hot swapping requires much more than AHCI. unRAID does not support hot swapping, or even hot adding drives to the array. On the Intel chipsets, under AHCI, the hot swap feature must be enabled via BIOS and software. That being said, I recently hot plugged a drive into a linux machine (which I have done thousands of times ala echo "- - -"), and had it crash. The machine crashed at the time of insertion, not detection. I suspect a physical interaction. Power cycled and never looked back. Given the lack of support by unRAID, and the potential for mishap, best practice would be to avoid hot plugging. At the very least, be aware there is risk, and with unRAID limited upside. unRAID will not know about the drive until you config it. But linux will likely detect the drive after insertion, if not you can use the echo "- - -" and have the hba scan.
  19. I was going to recommend proxmox based on your earlier statements. What you would be getting for the unRAID license fee is an unconventional RAID-like protection for data. Everything else is not Limetech.
  20. All 4224 are not equal. Depending on the version you have, and you changed the mid-plane, there are some mods for proper airflow. The air is drawn across the drives. This means the fans need to be a suction, not static pressure, design. Hence your external fan does not help. You need to take care to ensure the air reflow around the mid-plane is minimized. The mid-plane needs to be sealed to make sure no air from the pressurized side is short cutting into the space between the drives and fans. Back in the day, there were socks inserted into the cable holes of the mid-plane. Lastly you can go the extreme and put a foam strip along the top of the mid-plane so the cover (which must be in place for proper air flow) tightly seals to the mid-plane. Any screw hole forward of the mid-plane on the bottom or sides should be taped over. Any empty drive bays need to have fillers, again socks have been used. I use the rather noisy, but powerful delta fans on the mid-plane, and make sure the CPU section has plenty of exhaust venting. I do not use exhaust fans on the rear panel. The 8TB are hotter because they are larger, which reduces airflow.
  21. Any port open on the internet will be scanned, and probed. If it is a well known port, it will be continuously attacked. Changing the port for things like ftp, ssh, telnet, even http and https, will not avoid the attacks. The open port is found by scanning and service easily detected. You can try it yourself, telnet hostname 22, first thing you get is what service is running. your "attacker" is running Ubuntu exposed on the internet, probably hacked and used as a bot. telnet 116.31.116.41 22 Trying 116.31.116.41... Connected to 116.31.116.41. Escape character is '^]'. SSH-2.0-OpenSSH_6.9p1 Ubuntu-2 Connection closed by foreign host. 173-10-58-34-Michigan.hfc.comcastbusiness.net is running a web server, likely compromised and again used as a bot. curl 173-10-58-34-Michigan.hfc.comcastbusiness.net <!DOCTYPE HTML> <html> <head> <title>IVSWeb 2.0 - Welcome</title> <link rel="stylesheet" href="css/login.css"> <script type="text/javascript" src="js/clientinfo.js"></script> <script type="text/javascript"> var os = clientinfocontext.GetOSInfo(); if(os=='Windows'){ window.location.href = "/old/index.htm?mxy="+Math.random(); }else{ window.location.href = "/new/index.jsp"; } </script> </head> <body> </body> </html> These services running on a different port are trivial to detect and automatically attack. Fail2ban is useful to detect and block attackers.
  22. Was just thinking through some processes, and thought of this situation. System has all 2TB drives. One drive fails. Visit the local disk drive seller, not want to invest in another 2TB drive, and home comes a new 4TB drive, twice the size for practically the same price. How does that play out? Somehow the new larger drive needs to end up as parity, and hopefully the parity gets used to replace the failed drive. Each path I imagine seems rather long and most require another purchase Dual parity to the rescue?
  23. One thing you mentioned is the memtst hang when parallel testing, along with this log entry; Jan 7 12:01:06 Tower kernel: XFS (md2): Corruption of in-memory data detected. Shutting down filesystem together makes me wonder about the details of your system, not that you have a bad component, but if there is some architecture which is causing this. I can guess it is ASUS P8B using c206 and E3 processor. That should have no trouble running memtst in parallel. There are bugs in xfs, and this might be one of them. They are hard to make progress on, as in your case, the log was dumped and you moved forward.