blockhead

Members
  • Posts

    11
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

blockhead's Achievements

Noob

Noob (1/14)

0

Reputation

  1. Right? I think it should be agreed that unRAID must achieve the above baseline without Docker, Xen, or KVM. As long as it does this without any add-ons, then it's a real NAS. Pile on the virtualization all you want after that. It's called gravy! I should say my definition of "reliable" means that I can RELY on it to safeguard my crap in a reasonably intelligent manner, not just sit there without crashing. At this very moment I'm moving data off of a failing disk. How do I know it's failing? Because I check the syslog every day. Why? Because I've learned that's what I have to do to keep my stuff safe on my unRAID box. When last night's parity check completed with zero errors and all green balls, most users would think that's a damn good indication of overall system awesomeness. But I know the deal. I know how this works. So I go to the syslog, like I do everyday, and there are bunch of read errors on disk 4. But the green balls!! I go back to Main... oh. Yes. There it is, the number forty-seven, nestled snugly next to other numbers like fifty-four million three hundred and fourteen thousand seven hundred and thirty-eight, and one billion eight hundred and ninety million blah blah blah. It's in the Errors column. That's how y'know it's bad. HOW can that be the only indication of a problem? I don't need it to do the RMA for me; I just need it to tell me something is wrong AND take some proactive steps to protect my stuff. I know this isn't a support thread but can someone tell me why a drive with 47 errors can get a green ball? The drive is obviously not dead because I'm reading it right now, but isn't there a yellow ball? Can we get some yellow balls in here? Just a few tiny yellow balls!?! If I understand this debate at all, it's that the V-word folks are thinking the way to build the more reliable / intelligent server we all want is through the V-word. Do I have that wrong? I'm hoping that's not the only way. If that ends up being the only way to do it, then I guess I'll do it. But get ready because just as between your grandparents and every electronic device they own, there are gonna be frickin problems and you are all now my grandkids. (I'm just attaching my syslog in case anyone thinks I'm lame enough to make sh*t up.) You read your syslog everyday because finding the 'Errors' column is too hard next to all those big numbers? My point was that it's an easy mistake to make. It's like a floor covered in banana peels: If I'm super careful, I'll be fine. But wouldn't it be better if the bananas peels weren't there? Why does the system say the parity check completed with no errors when there were 47 read errors on one disk, all during the parity check? Why does the drive have a green ball next to it? Doesn't that deserve a yellow ball? Am I being unreasonable?
  2. Right? I think it should be agreed that unRAID must achieve the above baseline without Docker, Xen, or KVM. As long as it does this without any add-ons, then it's a real NAS. Pile on the virtualization all you want after that. It's called gravy! I should say my definition of "reliable" means that I can RELY on it to safeguard my crap in a reasonably intelligent manner, not just sit there without crashing. At this very moment I'm moving data off of a failing disk. How do I know it's failing? Because I check the syslog every day. Why? Because I've learned that's what I have to do to keep my stuff safe on my unRAID box. When last night's parity check completed with zero errors and all green balls, most users would think that's a damn good indication of overall system awesomeness. But I know the deal. I know how this works. So I go to the syslog, like I do everyday, and there are bunch of read errors on disk 4. But the green balls!! I go back to Main... oh. Yes. There it is, the number forty-seven, nestled snugly next to other numbers like fifty-four million three hundred and fourteen thousand seven hundred and thirty-eight, and one billion eight hundred and ninety million blah blah blah. It's in the Errors column. That's how y'know it's bad. HOW can that be the only indication of a problem? I don't need it to do the RMA for me; I just need it to tell me something is wrong AND take some proactive steps to protect my stuff. I know this isn't a support thread but can someone tell me why a drive with 47 errors can get a green ball? The drive is obviously not dead because I'm reading it right now, but isn't there a yellow ball? Can we get some yellow balls in here? Just a few tiny yellow balls!?! If I understand this debate at all, it's that the V-word folks are thinking the way to build the more reliable / intelligent server we all want is through the V-word. Do I have that wrong? I'm hoping that's not the only way. If that ends up being the only way to do it, then I guess I'll do it. But get ready because just as between your grandparents and every electronic device they own, there are gonna be frickin problems and you are all now my grandkids. (I'm just attaching my syslog in case anyone thinks I'm lame enough to make sh*t up.) syslog-2014-07-08.txt
  3. That's true. September is not that far away and I can wait. It does seem a wee bit optimistic though, given that it's July and we don't yet have a beta with the slew of new features that have been announced. Oh well. What happens happens.
  4. Oh man that is low. Haven't you ever lost a show? Did you watch Firefly?@! Some shows deserve a good finale. That's what we in the "small subset" would like. And as for the size of our subset, please don't be insulting. Yours *might* be bigger, but that's no reason to point and laugh. I've been told our subset is "a good size," and "more than adequate." I don't understand why you would underestimate / minimize / denigrate it. I believe this is how the forum works: Most people only post when they can't find the solution to their particular issue, or if in threads like these they don't see their viewpoint being expressed. The rest of the time it's just reading for most of us. Imagine this thread if all 60k members wanted to throw in 2 cents! Maybe if the forum had Like / Dislike buttons on our posts we'd get a better idea of just how much support there is for various viewpoints. In the meantime, c'mon man, give us a little respect. Many of us want a better v5 while we wait for at least an RC of the 6.0 variety. Call us mad, kooky, irrational, nonsensical, superstitious, nutty, wacky, bone-headed, reasonless, brainless, cockamamie, demented, or (yes I went to the thesaurus for this) injudicious, but we don't like running betas. And the current release version needs stuff. If they could just do one last upgrade with that stuff, maybe call it 5.1, keep the bug fixes going until at least the first 6.0 RC, then I think we'd be done with 5, and then we could concentrate on fighting over the direction of 6.
  5. I'd like to chime in from the somewhat less technical side of things... I'm far from clueless with pc's but I'm no guru, which I *think* makes me Limetech's target demographic. When I started with unraid I think 4.6 was the latest stable release, and at that time, when I bought my licenses, I knew I was buying an unfinished product. I knew there was stuff missing, the core features which I wont list because everybody knows, but it did the one thing I most wanted: It let me throw a bunch of random drives into whatever box I chose and presented those drives back to me as one logical space and with a little fault tolerance. And it did this without requiring me to expand my limited knowledge of Linux. So hell yeah I was willing to bet a hundred bucks or so that unraid would evolve into what I considered a finished product, something less fragile, something I could recommend to my friends. Nobody promised it would happen. It was just a bet. I'm not writing this to complain. I just wanted to say that after reading the "complaint dept" thread and this one, it's clear that in the next few months I will learn whether I won that bet. And also that if this were a movie, Grumpy would be the guy the audience knows the other characters should be listening to. The audience would know any characters who choose to ignore him are all gonna be eaten by frickin sharks or dinosaurs or possibly sharkosaurs. Could happen. And since this is a roadmap thread I'll go ahead and post my dreamlist. In addition to the aforementioned core features, my dream unraid would (I take no credit for these ideas): - automatically move data from a failed disk to a designated warm spare. - automatically move data from a failed disk to any available free space if a warm spare doesn't exist. - allow me to EASILY delete a disk from the array if I have plenty of free space to move the data to, and I don't intend to replace the disk anytime soon. I know some of you cannot possibly imagine reducing the number of drives in your array, but (and yer gonna hafta trust me here) the above situation MAKES PERFECT SENSE to regular humans. And now that those perfectly reasonable requests are out of the way, I'll throw out an unreasonable one: It's funny I was already thinking about this kinda thing when Grumpy posted that stuff about Ceph, but anyway ZOMG I want that!!! How cool it would be to take multiple unraid boxes and have them all replicating and fault tolerating each other over encrypted channels. Yes, fault "tolerating" - I said that. They can do other naughty stuff to each other too I don't care. It would just be so awesome. But I dream.
  6. So I swapped the drives on the motherboard with the drives on the sas2, ran another check, and the error came up again on one of the drives on the sas2. Mar 30 12:56:00 Tower kernel: mdcmd (45): check NOCORRECT Mar 30 12:56:00 Tower kernel: Mar 30 12:56:00 Tower kernel: md: recovery thread woken up ... Mar 30 12:56:00 Tower kernel: md: recovery thread checking parity... Mar 30 12:56:00 Tower kernel: md: using 4096k window, over a total of 3907018532 blocks. Mar 30 16:15:47 Tower kernel: sd 6:0:0:0: [sdg] command f3de86c0 timed out Mar 30 16:15:47 Tower kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 Mar 30 16:15:47 Tower kernel: sas: trying to find task 0xf7705400 Mar 30 16:15:47 Tower kernel: sas: sas_scsi_find_task: aborting task 0xf7705400 Mar 30 16:15:47 Tower kernel: sas: sas_scsi_find_task: task 0xf7705400 is aborted Mar 30 16:15:47 Tower kernel: sas: sas_eh_handle_sas_errors: task 0xf7705400 is aborted Mar 30 16:15:47 Tower kernel: sas: ata6: end_device-6:0: cmd error handler Mar 30 16:15:47 Tower kernel: sas: ata6: end_device-6:0: dev error handler Mar 30 16:15:47 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Mar 30 16:15:47 Tower kernel: ata6.00: failed command: READ DMA EXT Mar 30 16:15:47 Tower kernel: ata6.00: cmd 25/00:00:08:79:db/00:04:6f:00:00/e0 tag 0 dma 524288 in Mar 30 16:15:47 Tower kernel: res 40/00:00:17:a4:b4/00:00:6f:00:00/e0 Emask 0x4 (timeout) Mar 30 16:15:47 Tower kernel: ata6.00: status: { DRDY } Mar 30 16:15:47 Tower kernel: ata6: hard resetting link Mar 30 16:15:47 Tower kernel: sas: ata7: end_device-6:1: dev error handler Mar 30 16:15:47 Tower kernel: sas: ata8: end_device-6:2: dev error handler Mar 30 16:15:47 Tower kernel: sas: ata9: end_device-6:3: dev error handler Mar 30 16:15:47 Tower kernel: sas: sas_form_port: phy0 belongs to port0 already(1)! Mar 30 16:15:49 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1527:mvs_I_T_nexus_reset for device[0]:rc= 0 Mar 30 16:15:50 Tower kernel: ata6.00: configured for UDMA/133 Mar 30 16:15:50 Tower kernel: ata6: EH complete Mar 30 16:15:50 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 Mar 31 00:47:13 Tower kernel: md: sync done. time=42672sec Mar 31 00:47:13 Tower kernel: md: recovery thread sync completion status: 0 So that's a total of 3 different drives that have had this error, only while attached to the sas2. But the parity check had no errors, so I guess I won't worry too much. Though if someone could walk me through what's happening in the log there, I'd appreciate it. Thanks!
  7. The first time around the errors started several hours into a nocorrect check. The second time was just a matter of minutes. I cancelled the checks because I was too worried something would start corrupting data if it hadn't already. Mar 29 20:38:26 Tower kernel: mdcmd (45): check NOCORRECT Mar 29 20:38:26 Tower kernel: Mar 29 20:38:26 Tower kernel: md: recovery thread woken up ... Mar 29 20:38:26 Tower kernel: md: recovery thread checking parity... Mar 29 20:38:26 Tower kernel: md: using 4096k window, over a total of 3907018532 blocks. Mar 30 01:08:26 Tower kernel: sd 6:0:3:0: [sdj] command f7482a80 timed out Mar 30 01:08:26 Tower kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 Mar 30 01:08:26 Tower kernel: sas: trying to find task 0xf7501a00 Mar 30 01:08:26 Tower kernel: sas: sas_scsi_find_task: aborting task 0xf7501a00 Mar 30 01:08:26 Tower kernel: sas: sas_scsi_find_task: task 0xf7501a00 is aborted Mar 30 01:08:26 Tower kernel: sas: sas_eh_handle_sas_errors: task 0xf7501a00 is aborted Mar 30 01:08:26 Tower kernel: sas: ata9: end_device-6:3: cmd error handler Mar 30 01:08:26 Tower kernel: sas: ata6: end_device-6:0: dev error handler Mar 30 01:08:26 Tower kernel: sas: ata7: end_device-6:1: dev error handler Mar 30 01:08:26 Tower kernel: sas: ata8: end_device-6:2: dev error handler Mar 30 01:08:26 Tower kernel: sas: ata9: end_device-6:3: dev error handler Mar 30 01:08:26 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen Mar 30 01:08:26 Tower kernel: ata9.00: failed command: READ FPDMA QUEUED Mar 30 01:08:26 Tower kernel: ata9.00: cmd 60/00:00:98:ef:2c/04:00:97:00:00/40 tag 0 ncq 524288 in Mar 30 01:08:26 Tower kernel: res 40/00:0c:78:4a:23/00:00:97:00:00/40 Emask 0x4 (timeout) Mar 30 01:08:26 Tower kernel: ata9.00: status: { DRDY } Mar 30 01:08:26 Tower kernel: ata9: hard resetting link Mar 30 01:08:26 Tower kernel: sas: sas_form_port: phy3 belongs to port3 already(1)! Mar 30 01:08:28 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1527:mvs_I_T_nexus_reset for device[3]:rc= 0 Mar 30 01:08:28 Tower kernel: ata9.00: configured for UDMA/133 Mar 30 01:08:28 Tower kernel: ata9: EH complete Mar 30 01:08:28 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 Mar 30 01:10:35 Tower kernel: mdcmd (46): nocheck Mar 30 01:10:35 Tower kernel: md: md_do_sync: got signal, exit... Mar 30 01:10:35 Tower kernel: md: recovery thread sync completion status: -4 Mar 30 01:39:18 Tower kernel: NTFS driver 2.1.30 [Flags: R/W MODULE]. Mar 30 01:39:21 Tower kernel: mdcmd (47): check NOCORRECT Mar 30 01:39:21 Tower kernel: Mar 30 01:39:21 Tower kernel: md: recovery thread woken up ... Mar 30 01:39:21 Tower kernel: md: recovery thread checking parity... Mar 30 01:39:21 Tower kernel: md: using 4096k window, over a total of 3907018532 blocks. Mar 30 01:46:30 Tower kernel: sd 6:0:2:0: [sdi] command f3d47540 timed out Mar 30 01:46:30 Tower kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 Mar 30 01:46:30 Tower kernel: sas: trying to find task 0xefd6e300 Mar 30 01:46:30 Tower kernel: sas: sas_scsi_find_task: aborting task 0xefd6e300 Mar 30 01:46:30 Tower kernel: sas: sas_scsi_find_task: task 0xefd6e300 is aborted Mar 30 01:46:30 Tower kernel: sas: sas_eh_handle_sas_errors: task 0xefd6e300 is aborted Mar 30 01:46:30 Tower kernel: sas: ata8: end_device-6:2: cmd error handler Mar 30 01:46:30 Tower kernel: sas: ata6: end_device-6:0: dev error handler Mar 30 01:46:30 Tower kernel: sas: ata7: end_device-6:1: dev error handler Mar 30 01:46:30 Tower kernel: sas: ata8: end_device-6:2: dev error handler Mar 30 01:46:30 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x10 SErr 0x0 action 0x6 frozen Mar 30 01:46:30 Tower kernel: ata8.00: failed command: READ FPDMA QUEUED Mar 30 01:46:30 Tower kernel: ata8.00: cmd 60/00:00:00:9f:c6/04:00:03:00:00/40 tag 4 ncq 524288 in Mar 30 01:46:30 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Mar 30 01:46:30 Tower kernel: ata8.00: status: { DRDY } Mar 30 01:46:30 Tower kernel: ata8: hard resetting link Mar 30 01:46:30 Tower kernel: sas: ata9: end_device-6:3: dev error handler Mar 30 01:46:30 Tower kernel: sas: sas_form_port: phy2 belongs to port2 already(1)! Mar 30 01:46:32 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1527:mvs_I_T_nexus_reset for device[2]:rc= 0 Mar 30 01:46:32 Tower kernel: ata8.00: configured for UDMA/133 Mar 30 01:46:32 Tower kernel: ata8.00: device reported invalid CHS sector 0 Mar 30 01:46:32 Tower kernel: ata8: EH complete Mar 30 01:46:32 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 Mar 30 01:58:38 Tower kernel: mdcmd (48): nocheck Mar 30 01:58:38 Tower kernel: md: md_do_sync: got signal, exit... Mar 30 01:58:38 Tower kernel: md: recovery thread sync completion status: -4 sdj and sdi are both on my newly installed sas2lp-mv8. And yes I had to reflash it to get it working in the first place. Int13 is disabled. One odd thing I noticed in the setup menu is under "RAID mode" it says "N/A" instead of "JBOD" like the old saslp-mv8 which is still installed, if that matters. Probably unrelated: One thing in the full syslog I've always wondered about is this line Mar 29 20:33:48 Tower kernel: AMD_IDE: probe of 0000:00:06.0 failed with error -12 But that's been happening for years since I started with 4.x (only moved to 5.0.5 a couple weeks ago). Any advice? Need more info? Am I being paranoid? No errors in parity were detected as far as it got, and the check did continue past the error on its own. Maybe I should have let it finish. Just hate to see red in unmenu. syslog-2014-03-30.txt
  8. And I think it's highly impossible for LimeTech to benefit to any degree from the hundreds or thousands of people out there right now looking to build servers with 3TB disks. That's all I'm saying. Until support is added, none of that money goes to LT, and that's bad for us.
  9. So your hypothesis is that a significant portion of respondents are being tricked into saying they want something they don't really want? Perhaps they are deciding how to vote by spinning a dreidel. Or, my hypothesis, perhaps they can read and are picking the exact option they think is best. Good idea. And if you put that question to a group of a few hundred would-be users, what would the answer be? Why not just ask whether the continued development of this software is beneficial to its current users? Because if the answer is yes, then you'll want what put's the most motivation in Tom's pocket. I have plenty of room in my 12 bay (pathetic I know!) server, and I want what's most likely to keep unraid in the forefront of the developer's mind, because in the long run that's what's best for everyone involved here.
  10. Wowza! I can't imagine a better reason to register than to respond to that statement. What the poll clearly shows is that you do not know how to read polls. The poll clearly shows 51.5% of a few hundred current unraid users who inhabit this forum and like to vote in polls think 3T support should not be a priority. That leaves 48.5% expressing a desire for 3T support. 9 votes - that's your "wide margin"? If these are the numbers generated by current users, imagine a poll of would-be users. How would the numbers shift if you put this question to a few hundred prospective buyers? Yes, that's right, it would be a landslide for 3T support, because no one is looking to build a new massive storage system based on software that doesn't support the largest readily available hard drives. As a happy user of unraid for several months now, I am grateful for the existence of this software more than I can adequately express. I bought two licenses because I'm far more afraid of Tom losing interest in unraid than I am afraid of a flash drive failure. As a current user with a 2T based system with plenty of room, the best thing for me is a stable 5.0 with no 3T support since I won't need it for quite some time. But the best thing for Lime Tech is to snag as many fresh fish as it can by supporting the drives the fish want to use. That's my vote, and if you care at all about unraid succeeding as an enterprise, that should be your vote too.