hwilker

Members
  • Posts

    142
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

hwilker's Achievements

Apprentice

Apprentice (3/14)

0

Reputation

  1. Thank you! That's extremely helpful info, and explains why these errors are suddenly appearing, I'm going to try a rebuild and watch the array closely. As you say, if they keep increasing, I'll replace the SAS cables. Hopefully it's not the backplanes. Thanks again.
  2. Thanks. My build uses a Norco 4224 case, with two m1015 SAS controllers and eight onboard SATA ports. The drive in question is connected to one of these SAS controllers, so replacing the cable to the drive2 means replacing an SAS cable (SFF-8087, if I remember correctly) that serves four drives. So, before dealing with replacing the cable, I thought I'd try a few things: I followed the instructions outlined in reconstructing the drive, up until the final step of actually starting to reconstruct the drive. Upon re-assigning the disk I received a warning notice that there were udma crc errors on disk 2 (see image006). So, I shut down the system, and moved the disk into one of the slots controlled directly from one of the SATA ports on the motherboard, and rebooted the system. This time I received no notices or warnings (see image007). Now I'm left with a couple of questions and some decisions to make. If you look at the first post in this thread, immediately after upgrading to 6.6, I received notifications of udma crc errors on two other drives( drive6 and drive8. After that initial boot these drives haven't reported any problems). Those two drives are also attached to SAS controllers but are on different cables from disk2 and different cables from each other. This would mean three of my four SAS cables all started reporting problems at the same time, since none of this was being reported until the upgrade to 6.6 (dumb question: does 6.6 perhaps incorporate diagnostics that are, in some way, more 'sensitive' to udma crc errors than 6.1?). It does seem odd to me that these errors would suddenly start to appear just after upgrading the OS, and to do so on three different cables. At any rate, I'm now confronted with the question of whether, given the udma crc errors that the system is apparently throwing off, is it safe or advisable to attempt a rebuild with disk2 in its new slot? Or, given the possibility of new udma crc errors occurring during the rebuild, should I just go the safe route and replace the SAS cables before proceeding. I know there is probably no definitive answer, but the fact that all these errors are coming up on different cables all at once and at exactly the moment that I'm upgrading the software makes me wary.
  3. Thanks. I've attached the diagnostics zip file as suggested. By the way, I've gone through the rebuild process over the years, to both replace a failed disk and to replace a disk with a larger disk. But, from a UI perspective, I'm unclear how to tell the array to rebuild an existing disk, as, if I recall correctly, you place the new, unrecognized disk into the old disk's slot in the array (disk2 in this case), start the array and the system does the rest. How do you tell unraid to rebuild using an existing disk? Anyway, I'll await the analysis of the diagnostic file, before proceeding. I, too, think there is nothing wrong with the disk itself, as prior to upgrading to 6.6 (from 6.1) there were no indications of problems, and SMART status indicates it's OK. (see image005). But, if there are problems with it, I've got an 8 TB hot spare ready to go. I just don't want to 'waste' that spare at this point if I don't have to, since I've got plenty of free space in my array. Thanks for your help. tower-diagnostics-20190223-1502.zip
  4. Thanks. I started the array, and received the following message (image004), and an indication of problems on disk2 in the Device column (the red 'X' on the left). But the array itself seems intact. I can mount the array on my PC, and if I access disk2 directly from the network via the TOWER, I can see disk2 and access the content on disk2. Can you advise what I should do next. Thanks. I
  5. Do you mean just dismiss the error popup and start the array? It won't attempt to rebuild disk 2? Just want to make sure. Thanks.
  6. I just upgraded from v6.1 to v6.6.6. The upgrade itself went smoothly, but upon rebooting the system after upgrading a series of events left me unsure how to proceed. (diagnostics zip file attached for the last boot of the system described below.) Please note that nowhere in the actions I describe below did I mount the array. When the system first came up after upgrading it indicated that two disks had udma crc errors (see image001). I did some rooting around on the internet which seemed to indicate that such problems were related to the transfer of data from the disk to the host, and that the best course of action was to check out the connections on the cables and the seating of the disks. I did this and rebooted the system. The indication of udma crc errors didn't reappear, but a potentially more serious problem was reported. Disk 2 was reported as missing. More curious, the disk that represented disk 2 (serial number ending in 5HSG) wasn't even made available as an option to remount (I do have a hot spare that was presented as available to replace the 'missing' disk 2). I shut down the system and tried moving disk 2 to another empty bay (this may have been a mistake but I thought that the physical position of the disk was no longer required in V6) and restarted the system. Disk 2 was still reported as missing, so I shut down the system again, put disk 2 back into its original location and rebooted. When the system rebooted it still indicated that disk 2 was missing but now the physical drive (5HSG) was now presented as option to place in the disk 2 slot (see image002). I once again shut the system down and rebooted. Again, disk 2 was reported as missing and in an error state, but now the disk was shown as selected for Disk 2. (image003). (this is the current state of the system and is the basis for diagnostics file that is attached to this post.) My general question is how best to proceed. Specifically, I presume that if I start the array in this state, it would start to rebuild the array as if the physical disk (5HSG) was a replacement disk for the disk that used to occupy disk 2. I presume that would be ok, but is there a better way, presuming that the data on disk 2 is valid; some way to just rebuild the configuration? If there is, is it more risky than just rebuilding the disk. Any advice or suggestions about the best way to proceed would be appreciated. Thanks tower-diagnostics-20190222-1320.zip
  7. I'm confused. The link you gave points to an exe file not a zip file. It appears to be intended to do a fresh install for a new user (without any prior data.)
  8. I am trying to upgrade from v6.1.6 to the latest version of unraid. I'm running a plain vanilla version of 6.1.6. I have no VM's, no Dockers, etc. I've looked at the wiki, checked that I have the room on the flash, and that my network bridge name is 'br0'. But, I'm unclear as to what to do next. Do I first upgrade to 6.2 and then follow the directions for upgrading from 6.2? If so, I'm not clear how to get to 6.2. Or, am I just supposed to do a fresh install of the latest version. If so, are there directions on how to do this correctly from 6.1. Can someone point me in the right direction. Thanks
  9. "no errors during the rebuild" If that FAIL in the red array health report popup isn't an error, what is it?
  10. Thx. Hope you're right. Would you do a parity check afterwards to be certain?
  11. I've been running an array of 12 disks, with a cache drive and a hot spare for several years now with no complaint an great ease, The box I use is a Norco 4224 with IBM SAS cards and native SATA slots on a ASRockZ77Extreme with an Intel Celeron G1610 and 4GB of memory since about 2013 with virtually no incidents. Tonight I wanted to replace an old 2GB drive with a new 8GB drive. I had already updated my parity drive to 8gb successfully some six months ago,' I took the array off line, removed the old disk, replaced the new larger disk in the same physical slot that the old one had been in (just for a sense of safety) and restarted the array. Evevything was going according to plan until a got a notice that the rebuilding disk was 'warm' 45 degrees C. Several minutes later I got the same notice but it now read 46 degrees, I took the top off the case and discovered that the fan closest to the drive being rebuilt had stopped, I removed the lid completely, placed a desk fan pointing at the problem disk and waited for the temperature to go down. It did about 20 minutes later and is now running at a comfortable 41 degrees . But after about another 3 hours I got a popup fail message, which read "Notice [TOWER} = array health reort [FAIL]. Array has 14 disks(including parity and cache.) I looked in the log but couldn't see anything untoward. The array was still rebuilding so I let it continue to see what would happen. Now about another 1-1.5 hrs later it seems to be humming along, My plan, such as it is is to let it finish and the run a parity check (assuming nothing further happens.) I also purposely didn't repurpose the old disk that I was replacing, and since I didn't any new content to the unraid system during this process, I hope that if something is wrong I can simply replace the old disk in the slot where it was temporarily until I know what's wrong. I've attached both the syslog from right after I noticed the problem though my eye sees nothing wrong with it, and the diagnostics file requested by the how-to post. I also captured an image of the toastr image that warned me of the fail. What I would like to know is: 1. is there any point in letting this process complete. If not should I place aside the new hard drive and do a new preclear on it before trying to use it again (I did two cycles before starting this process. 2. if not what should I do. 3, what do more trained eyes than min glean from the log and diagnostics report. I'm anxious of course to get my array back up and running but I'm hoping a being deliberate will keep me from doing something rash, hence allowing it to complete and having the old disk still intact and untouched so it could be put back in the array. Any help or guildance would be much apprecited. hwilker tower-diagnostics-20180810-0057.zip tower-syslog-20180810-0033.zip
  12. Thanks. I'm using a Norco 4220. The bay that this disk is in is attached via an SAS backplane to one of the SATA ports on the motherboard. I believe the cable I'm using is a SAS to SATA Reverse Breakout Cable. I can replace that but certainly don't have a spare around. Here's my plan: I just reseated the drive in the bay and made sure the bay was secure in the chassis. I pushed on all the connectors just in case one was loose, and started the pre-clear again. If it completes, then all is well. If it fails again, I'm going to try to preclear it again in a different slot. I still have plenty of empty bays including several that are attached to IBM M015 cards. If it fails there, I'll send the disk back. If it passes I'll replace the cable on the problematic bay and then preclear the drive again in the original bay. I usually make several cycles through preclear and as this is in service of a hot spare, I don't have to rush as nothing in the array is depending on this drive being cleared. I'll report back how it goes. Thanks again for your help.
  13. I updated to unraid v6.1.6 a few days ago. Everything seems to have gone fine. Two days ago I started to preclear a new 4 TB drive using the preclear plug-in. All was going fine until about 23 hours into it, about 22% into the post-read cycle. At that point it just stopped. I looked at a few things that I didn't really understand, and decided to just start over with a traditional preclear using screen and the command line. It just stalled again at 25 plus hours, 51% into the post-read If I go to Unassigned Devices and click on the drive in question, I see that it's telling me that the disk must be spun up before getting a "Last SMART test result". But when I try to spin it up it won't let me. I'm attaching a syslog which appears to start throwing errors at about 15:33 on Dec 21st, right about when the preclear froze. I'm also attaching two smart reports. One is a report that appears to terminate early is taken before a system reboot. The second is taken after a reboot and appears ok to me. Can someone help me understand what's going on here. Do I have a defective disk, if so I can easily return it as it was bought via Amazon. If not, what's going on, and what do I need to do? Thanks in advance. syslog.zip ST4000DM000-1F2168_Z303KBBL-20151221-1630.txt ST4000DM000-1F2168_Z303KBBL-20151221-1638.txt