hwilker

Members
  • Posts

    142
  • Joined

  • Last visited

Everything posted by hwilker

  1. Thank you! That's extremely helpful info, and explains why these errors are suddenly appearing, I'm going to try a rebuild and watch the array closely. As you say, if they keep increasing, I'll replace the SAS cables. Hopefully it's not the backplanes. Thanks again.
  2. Thanks. My build uses a Norco 4224 case, with two m1015 SAS controllers and eight onboard SATA ports. The drive in question is connected to one of these SAS controllers, so replacing the cable to the drive2 means replacing an SAS cable (SFF-8087, if I remember correctly) that serves four drives. So, before dealing with replacing the cable, I thought I'd try a few things: I followed the instructions outlined in reconstructing the drive, up until the final step of actually starting to reconstruct the drive. Upon re-assigning the disk I received a warning notice that there were udma crc errors on disk 2 (see image006). So, I shut down the system, and moved the disk into one of the slots controlled directly from one of the SATA ports on the motherboard, and rebooted the system. This time I received no notices or warnings (see image007). Now I'm left with a couple of questions and some decisions to make. If you look at the first post in this thread, immediately after upgrading to 6.6, I received notifications of udma crc errors on two other drives( drive6 and drive8. After that initial boot these drives haven't reported any problems). Those two drives are also attached to SAS controllers but are on different cables from disk2 and different cables from each other. This would mean three of my four SAS cables all started reporting problems at the same time, since none of this was being reported until the upgrade to 6.6 (dumb question: does 6.6 perhaps incorporate diagnostics that are, in some way, more 'sensitive' to udma crc errors than 6.1?). It does seem odd to me that these errors would suddenly start to appear just after upgrading the OS, and to do so on three different cables. At any rate, I'm now confronted with the question of whether, given the udma crc errors that the system is apparently throwing off, is it safe or advisable to attempt a rebuild with disk2 in its new slot? Or, given the possibility of new udma crc errors occurring during the rebuild, should I just go the safe route and replace the SAS cables before proceeding. I know there is probably no definitive answer, but the fact that all these errors are coming up on different cables all at once and at exactly the moment that I'm upgrading the software makes me wary.
  3. Thanks. I've attached the diagnostics zip file as suggested. By the way, I've gone through the rebuild process over the years, to both replace a failed disk and to replace a disk with a larger disk. But, from a UI perspective, I'm unclear how to tell the array to rebuild an existing disk, as, if I recall correctly, you place the new, unrecognized disk into the old disk's slot in the array (disk2 in this case), start the array and the system does the rest. How do you tell unraid to rebuild using an existing disk? Anyway, I'll await the analysis of the diagnostic file, before proceeding. I, too, think there is nothing wrong with the disk itself, as prior to upgrading to 6.6 (from 6.1) there were no indications of problems, and SMART status indicates it's OK. (see image005). But, if there are problems with it, I've got an 8 TB hot spare ready to go. I just don't want to 'waste' that spare at this point if I don't have to, since I've got plenty of free space in my array. Thanks for your help. tower-diagnostics-20190223-1502.zip
  4. Thanks. I started the array, and received the following message (image004), and an indication of problems on disk2 in the Device column (the red 'X' on the left). But the array itself seems intact. I can mount the array on my PC, and if I access disk2 directly from the network via the TOWER, I can see disk2 and access the content on disk2. Can you advise what I should do next. Thanks. I
  5. Do you mean just dismiss the error popup and start the array? It won't attempt to rebuild disk 2? Just want to make sure. Thanks.
  6. I just upgraded from v6.1 to v6.6.6. The upgrade itself went smoothly, but upon rebooting the system after upgrading a series of events left me unsure how to proceed. (diagnostics zip file attached for the last boot of the system described below.) Please note that nowhere in the actions I describe below did I mount the array. When the system first came up after upgrading it indicated that two disks had udma crc errors (see image001). I did some rooting around on the internet which seemed to indicate that such problems were related to the transfer of data from the disk to the host, and that the best course of action was to check out the connections on the cables and the seating of the disks. I did this and rebooted the system. The indication of udma crc errors didn't reappear, but a potentially more serious problem was reported. Disk 2 was reported as missing. More curious, the disk that represented disk 2 (serial number ending in 5HSG) wasn't even made available as an option to remount (I do have a hot spare that was presented as available to replace the 'missing' disk 2). I shut down the system and tried moving disk 2 to another empty bay (this may have been a mistake but I thought that the physical position of the disk was no longer required in V6) and restarted the system. Disk 2 was still reported as missing, so I shut down the system again, put disk 2 back into its original location and rebooted. When the system rebooted it still indicated that disk 2 was missing but now the physical drive (5HSG) was now presented as option to place in the disk 2 slot (see image002). I once again shut the system down and rebooted. Again, disk 2 was reported as missing and in an error state, but now the disk was shown as selected for Disk 2. (image003). (this is the current state of the system and is the basis for diagnostics file that is attached to this post.) My general question is how best to proceed. Specifically, I presume that if I start the array in this state, it would start to rebuild the array as if the physical disk (5HSG) was a replacement disk for the disk that used to occupy disk 2. I presume that would be ok, but is there a better way, presuming that the data on disk 2 is valid; some way to just rebuild the configuration? If there is, is it more risky than just rebuilding the disk. Any advice or suggestions about the best way to proceed would be appreciated. Thanks tower-diagnostics-20190222-1320.zip
  7. I'm confused. The link you gave points to an exe file not a zip file. It appears to be intended to do a fresh install for a new user (without any prior data.)
  8. I am trying to upgrade from v6.1.6 to the latest version of unraid. I'm running a plain vanilla version of 6.1.6. I have no VM's, no Dockers, etc. I've looked at the wiki, checked that I have the room on the flash, and that my network bridge name is 'br0'. But, I'm unclear as to what to do next. Do I first upgrade to 6.2 and then follow the directions for upgrading from 6.2? If so, I'm not clear how to get to 6.2. Or, am I just supposed to do a fresh install of the latest version. If so, are there directions on how to do this correctly from 6.1. Can someone point me in the right direction. Thanks
  9. "no errors during the rebuild" If that FAIL in the red array health report popup isn't an error, what is it?
  10. Thx. Hope you're right. Would you do a parity check afterwards to be certain?
  11. I've been running an array of 12 disks, with a cache drive and a hot spare for several years now with no complaint an great ease, The box I use is a Norco 4224 with IBM SAS cards and native SATA slots on a ASRockZ77Extreme with an Intel Celeron G1610 and 4GB of memory since about 2013 with virtually no incidents. Tonight I wanted to replace an old 2GB drive with a new 8GB drive. I had already updated my parity drive to 8gb successfully some six months ago,' I took the array off line, removed the old disk, replaced the new larger disk in the same physical slot that the old one had been in (just for a sense of safety) and restarted the array. Evevything was going according to plan until a got a notice that the rebuilding disk was 'warm' 45 degrees C. Several minutes later I got the same notice but it now read 46 degrees, I took the top off the case and discovered that the fan closest to the drive being rebuilt had stopped, I removed the lid completely, placed a desk fan pointing at the problem disk and waited for the temperature to go down. It did about 20 minutes later and is now running at a comfortable 41 degrees . But after about another 3 hours I got a popup fail message, which read "Notice [TOWER} = array health reort [FAIL]. Array has 14 disks(including parity and cache.) I looked in the log but couldn't see anything untoward. The array was still rebuilding so I let it continue to see what would happen. Now about another 1-1.5 hrs later it seems to be humming along, My plan, such as it is is to let it finish and the run a parity check (assuming nothing further happens.) I also purposely didn't repurpose the old disk that I was replacing, and since I didn't any new content to the unraid system during this process, I hope that if something is wrong I can simply replace the old disk in the slot where it was temporarily until I know what's wrong. I've attached both the syslog from right after I noticed the problem though my eye sees nothing wrong with it, and the diagnostics file requested by the how-to post. I also captured an image of the toastr image that warned me of the fail. What I would like to know is: 1. is there any point in letting this process complete. If not should I place aside the new hard drive and do a new preclear on it before trying to use it again (I did two cycles before starting this process. 2. if not what should I do. 3, what do more trained eyes than min glean from the log and diagnostics report. I'm anxious of course to get my array back up and running but I'm hoping a being deliberate will keep me from doing something rash, hence allowing it to complete and having the old disk still intact and untouched so it could be put back in the array. Any help or guildance would be much apprecited. hwilker tower-diagnostics-20180810-0057.zip tower-syslog-20180810-0033.zip
  12. Thanks. I'm using a Norco 4220. The bay that this disk is in is attached via an SAS backplane to one of the SATA ports on the motherboard. I believe the cable I'm using is a SAS to SATA Reverse Breakout Cable. I can replace that but certainly don't have a spare around. Here's my plan: I just reseated the drive in the bay and made sure the bay was secure in the chassis. I pushed on all the connectors just in case one was loose, and started the pre-clear again. If it completes, then all is well. If it fails again, I'm going to try to preclear it again in a different slot. I still have plenty of empty bays including several that are attached to IBM M015 cards. If it fails there, I'll send the disk back. If it passes I'll replace the cable on the problematic bay and then preclear the drive again in the original bay. I usually make several cycles through preclear and as this is in service of a hot spare, I don't have to rush as nothing in the array is depending on this drive being cleared. I'll report back how it goes. Thanks again for your help.
  13. I updated to unraid v6.1.6 a few days ago. Everything seems to have gone fine. Two days ago I started to preclear a new 4 TB drive using the preclear plug-in. All was going fine until about 23 hours into it, about 22% into the post-read cycle. At that point it just stopped. I looked at a few things that I didn't really understand, and decided to just start over with a traditional preclear using screen and the command line. It just stalled again at 25 plus hours, 51% into the post-read If I go to Unassigned Devices and click on the drive in question, I see that it's telling me that the disk must be spun up before getting a "Last SMART test result". But when I try to spin it up it won't let me. I'm attaching a syslog which appears to start throwing errors at about 15:33 on Dec 21st, right about when the preclear froze. I'm also attaching two smart reports. One is a report that appears to terminate early is taken before a system reboot. The second is taken after a reboot and appears ok to me. Can someone help me understand what's going on here. Do I have a defective disk, if so I can easily return it as it was bought via Amazon. If not, what's going on, and what do I need to do? Thanks in advance. syslog.zip ST4000DM000-1F2168_Z303KBBL-20151221-1630.txt ST4000DM000-1F2168_Z303KBBL-20151221-1638.txt
  14. Nice call! As you suggested I had to toggle it. Now it's working. Thanks.
  15. I just upgraded from v5.0 to v6.1.6. After doing so, I copied some files from my PC to my unraid server. I then tried to move them from the cache drive to my array using the "move now" command from the Move Settings page in the Scheduler. I have two folders at the root of my cache drive. One is a cache only share named DataBU. It is not supposed to be copied and mover appropriately skipped it. The other is called "media" and holds files I copied to the array. I expected them to be moved from the cache drive to the array when the mover ran. It has always worked that way until now. But this time the files in "media" did not get moved. Here's the log segment: Dec 18 19:46:58 Tower logger: mover started Dec 18 19:46:58 Tower logger: skipping "DataBU" Dec 18 19:46:58 Tower logger: skipping "media" Dec 18 19:46:58 Tower logger: mover finished As can be seen, the mover simply skipped the files in the media folder. Is there some new setting that I haven't enabled, or something else I need to do to make the mover function like it did under 5.0? Thanks
  16. I've just installed V6.1.6 and I'm trying to get everything set up. I'm currently dealing with setting up notifications and have seen several references to a toggle for "basic view/advanced view" which adds a section for SMART notifications. I've seen a thread (http://lime-technology.com/forum/index.php?topic=40852.msg387310#msg387310) that indicates that you have to have the GUI in tabbed mode, so I did that, but I still don't see the toggle (see the attached screen shot). Can someone tell me how to get this button to be visible? Thanks.
  17. Just a follow up. I did the procedure outlined. I replaced the problematic disk and rebuilt with a new disk. Then did a non-correcting parity check, receiving no errors. So all appears well. Now on to installing V6, which everyone says is simple but looks intimidating to me Thanks for the assistance
  18. "The fact that you were running a parity check would seem to imply that you weren't completely confident in it?" No. I was confident in it. It's just that I had just finished replacing another drive. I needed another drive on my PC, and so replaced a healthy 2TB from my array with a 4TB to buy some extra capacity. That went fine and didn't indicate any errors at all. But out of an abundance of caution, whenever I've replaced a drive, I've always followed it up with a non-correcting parity check after replacing it. Just seemed like good form. I guess it paid off this time. FYI I probably should have mentioned that the parity errors I got were exactly 256. I don't know the sector size but that looks suspiciously like the size of something, either a sector or perhaps a byte at the same offset in 256 sectors. I don't have the expertise to know but such a distinct numbe seems to imply to me that those sectors really are the problem. The disk is also rather old. It's been out of warranty for over 3 years which makes it 6-8 years old. Not terribly surprising that it might be bad. Swapping it out is fine with me if it's likely to solve the problem. I'd only do a Long Smart report if I were desperate to save the disk. I'm not, as I'm slowly getting rid of all my 2 TB drives and repurposing them on PC's if they're healthy. Thanks for you help.
  19. "No memory errors, those are false positives. That color coding and syslog line categorizing was added a long time ago, and has not been kept up to date, so is a little unreliable at times." Thanks for the info. They certainly looked like a problem. "Once you've upgraded to v6, you should get better error descriptions. However as Trurl said, the SMART report shows 2 bad sectors, probably the cause of the issues. You might try a SMART long test to verify. And unfortunately, you can't trust your parity drive now for rebuilding data drives, until this is cleared up." Does this make sense as a plan. 1. Let the parity check complete (I'm at 86%) just to see if there are further issues. (FYI, it's a non-correcting check) 2. I have a hot spare. Use it to replace and rebuild disk 8. 3. Run another parity check after replacing the disk. 4. (Long overdue) upgrade to V6. Thanks for the help.
  20. Thanks. Missed that. Any insight on the memory errors in the log?
  21. I am running unRaid 5.0. I am in the final process of replacing a 2 TB drive with a 4 TB drive. I mention this only for context as the process went fine. As the final step, I started a parity check just to make sure that all had gone well. That parity check is about 60% complete, but it started throwing parity errors on a drive unrelated to the replacement drive. I went to look at at the syslog (attached) and discovered what looks to me like memory errors at the start of the syslog. Lines 63,65,67,69,70,72 and 74 all show in red in the syslog, and have the word '(Errors)' at the end of the line. (Curiously when printing out the syslog via unmenu it doesn't show the '(Errors)' at the end of the line.) What I'd like to know is if these indicate there is a problem with my ram memory, and if so, is the first step in correcting the problem to run a mem test after the parity check finishes? I'd also be curious to know if it is possible that the memory problems are the source of the parity errors that I got during this check, starting at line 1347. They are all on disk 8 which appears ok to me (smart report attached). Thank you in advance, hwilker syslog-2015-12-16.zip smart_report.txt
  22. Just upgraded from rc11 using this method. All went fine. One question regarding 5.0. I see the added menu item for booting into "Safe Mode". What exactly happens if you boot into Safe Mode? I couldn't find any reference to it in this thread. Thanks. If you boot in Safe Mode it will not load any plugins or packages you have on 'extra' folder. Thanks