June 16, 200818 yr I'm using UnRAID 4.1 and have a 4 disc (3+parity) server. Yesterday I added another data disc. I powered down, installed the disc, powered up, stopped the array, assigned the disc, started the array, and then clearing began. It took nearly 24 hours to clear the 1TB disc. When it finished, I tried to start the array. It would show all discs as mounting, and then fail to start. I tried to start several times but it would never do it. I then rebooted and the parity and old data discs all showed green. The new disc was blue. I clicked start and it has once again started clearing. I suspect it will take another 24 hours and still may not mount and start. Any ideas on what I can do while I'm waiting?? One other thing. All of my old discs are showing temperatures between 48 and 53 degrees, except the new one, which is around 70. Is that a serious problem or sign of potential trouble? The disc doesn't actually feel hotter than any others to the touch, but it has been reading 10-15 degrees warmer since first installed. Thanks, Dave
June 16, 200818 yr Author Here's the end of my syslog. The bit I can see just repeats this section over and over. Jun 16 11:27:16 Tower kernel: [ 3518.160403] md0: import [8,0] (sda) ST31000340A S 9QJ02YF0 offset: 63 size: 976762552 Jun 16 11:27:16 Tower kernel: [ 3518.160517] md1: import [8,32] (sdc) Hitachi HD S72101 GTH000PAGW489H offset: 63 size: 976762552 Jun 16 11:27:16 Tower kernel: [ 3518.160592] md2: import [8,16] (sdb) Hitachi HD S72101 GTH000PAGW44HH offset: 63 size: 976762552 Jun 16 11:27:16 Tower kernel: [ 3518.160651] md3: import [8,48] (sdd) Hitachi HD S72101 GTH000PAGTXBLH offset: 63 size: 976762552 Jun 16 11:27:17 Tower kernel: [ 3519.112555] md4: import [8,64] (sde) ST31000340 AS 9QJ0MV3L offset: 63 size: 976762552 Jun 16 11:27:17 Tower kernel: [ 3519.112563] md4: new disk Jun 16 11:27:17 Tower kernel: [ 3519.112565] md5: import: no device Jun 16 11:27:17 Tower kernel: [ 3519.112568] md6: import: no device Jun 16 11:27:17 Tower kernel: [ 3519.112570] md7: import: no device Jun 16 11:27:17 Tower kernel: [ 3519.112572] md8: import: no device Jun 16 11:27:17 Tower kernel: [ 3519.112574] md9: import: no device Jun 16 11:27:17 Tower kernel: [ 3519.112577] md10: import: no device Jun 16 11:27:17 Tower kernel: [ 3519.112579] md11: import: no device Jun 16 11:27:17 Tower kernel: [ 3519.112581] md12: import: no device Jun 16 11:27:17 Tower kernel: [ 3519.112584] md13: import: no device Jun 16 11:27:17 Tower kernel: [ 3519.112586] md14: import: no device Jun 16 11:27:17 Tower kernel: [ 3519.112588] md15: import: no device Jun 16 11:27:58 Tower in.telnetd[1269]: connect from 192.168.0.174 (192.168.0.17 4) Jun 16 11:28:01 Tower login[1270]: ROOT LOGIN on `pts/0' from `192.168.0.174' Jun 16 11:28:17 Tower kernel: [ 3579.030108] md0: import [8,0] (sda) ST31000340A S 9QJ02YF0 offset: 63 size: 976762552 Jun 16 11:28:17 Tower kernel: [ 3579.030227] md1: import [8,32] (sdc) Hitachi HD S72101 GTH000PAGW489H offset: 63 size: 976762552 Jun 16 11:28:17 Tower kernel: [ 3579.030303] md2: import [8,16] (sdb) Hitachi HD S72101 GTH000PAGW44HH offset: 63 size: 976762552 Jun 16 11:28:17 Tower kernel: [ 3579.030377] md3: import [8,48] (sdd) Hitachi HD S72101 GTH000PAGTXBLH offset: 63 size: 976762552 Jun 16 11:28:18 Tower kernel: [ 3580.057751] md4: import [8,64] (sde) ST31000340 AS 9QJ0MV3L offset: 63 size: 976762552 Jun 16 11:28:18 Tower kernel: [ 3580.057760] md4: new disk Jun 16 11:28:18 Tower kernel: [ 3580.057762] md5: import: no device Jun 16 11:28:18 Tower kernel: [ 3580.057765] md6: import: no device Jun 16 11:28:18 Tower kernel: [ 3580.057767] md7: import: no device Jun 16 11:28:18 Tower kernel: [ 3580.057769] md8: import: no device Jun 16 11:28:18 Tower kernel: [ 3580.057771] md9: import: no device Jun 16 11:28:18 Tower kernel: [ 3580.057773] md10: import: no device Jun 16 11:28:18 Tower kernel: [ 3580.057775] md11: import: no device Jun 16 11:28:18 Tower kernel: [ 3580.057778] md12: import: no device Jun 16 11:28:18 Tower kernel: [ 3580.057780] md13: import: no device Jun 16 11:28:18 Tower kernel: [ 3580.057782] md14: import: no device Jun 16 11:28:18 Tower kernel: [ 3580.057784] md15: import: no device root@Tower:~#
June 16, 200818 yr Author And here's the Syslog. I never realized I could attach it! I cut out the repetitive two thirds (see above) which started once I began clearing the new disc.
June 16, 200818 yr The section you are seeing repeated (twice in your sample above) was normal for versions of unRAID prior to v4.2, and you are using v4.1. It would display periodically (once a minute) while the array was stopped, and more often while array management was being processed, such as changes to the array assignments, and clearing and formatting. In v4.2, Tom removed that section from the reporting in the syslog. I actually miss seeing that section, because although not absolutely necessary, it was quite helpful to techies like me, in troubleshooting a problem. I don't miss all the repetition, that you saw, but I would be happy to see Tom add it back, if he can limit it to only displaying when there are changes to the list. The fact that it restarted the Clearing was normal too, after the reboot. unRAID never assumes a drive is clear before it is formatted and added to the array. You know it was, because you saw it, and you knew that you had not done anything else to the drive, but unRAID won't make that assumption. Once you reboot, it assumes a new unformatted uncleared drive, if it does not find a ReiserFS partition on it, installed by the formatting. I can't tell what actually went wrong, but I can speculate on a couple of things. That clearing for 24 hours is way too long, and a temp of 70C is WAY WAY WAY too high! So the drive itself may be defective, and causing overheating and errors that slowed the process way down, or the drive was installed in a hot place with no airflow and running way too hot, causing errors and slow operation. It would have been good to see the syslog at that time, before the reboot. There are no errors in the current syslog. See this thread for a discussion of drive temps. I strongly recommend shutting down and determining what is causing the high temps on all of the drives, but most of all the extremely high temp on the new drive. It is remotely possible that it has a bad temp sensor, but the fact it took so long to clear and then could not format and start would indicate much more trouble than a defective sensor. I would get a SMART report and consider RMA'ing it. I recommend additional fans/airflow for all of the drives. One other anomaly, probably harmless, the partitions of your 'SanDisk U3 Cruzer Micro' were identified with this line: sdf: sdf4 That is denoting sdf4 as a found partition on the drive denoted as sdf. Normally, that should be sdf1, not sdf4. It does not list 4 partitions, only one, so there must have been 3 partitions before this one, that were deleted and are now empty, and the only partition now is in the fourth position of the partition table. That is OK, but really strange!
June 16, 200818 yr I forgot to mention that I recommend upgrading to unRAID v4.3.1. I believe it may handle the clearing and formatting better, plus other improvements.
June 17, 200818 yr Author RobJ, Thanks very much for your help and insights. It's been a big help for me to understand what is a concern and what isn't. I hadn't focused on the drive temps as I thought I'd seen a comment saying that anything up to about 55 degrees is acceptable. The statements in the other thread certainly shoot that down. I'll take a look at fans and airflow. My server will only get hotter as I install more drives over time. I also agree that the new drive may well be faulty. It's been at 70 degrees since the first boot up. One more question, if faced with the same situation again, where UnRAID identifies a cleared drive as being new and needing to be cleared, could I use the Restore command and skip the clearing process without affecting any other drives or data? I'll get things settled down and then try 4.3.1. Thanks again for your help. Dave
June 17, 200818 yr One more question, if faced with the same situation again, where UnRAID identifies a cleared drive as being new and needing to be cleared, could I use the Restore command and skip the clearing process without affecting any other drives or data? Using the "Restore" button would skip the clearing step, but it would immediately throw away the existing parity data and start the task of re-calculating parity on your array.. Your array would not be protected from a disk failure until the parity creation process was complete. For me, this would be over 10 hours. As long as all the disks are working, and you don't mind the extended period of time where your array is not protected by parity, you can use the "Restore" button, but I would not. I would just let the clearing step occur, and then the array will be protected while it calculated parity. It really depends on what you are storing on the array, and how difficult it would be to replace it if a disk were to fail. The only time the "Restore" button is needed is when you are removing a drive from the array and do not intend to imediately replace it with another. For every other situation, the "Start" button is the correct choice. The accidental use of the "Restore" button when a data disk has failed will cause the loss of the data on that drive, because you asked the unRAID array to throw away its data. (usually exactly opposite of why you have an unRAID array) With all the fixes in more current releases in both the parity swap process and in the disk clearing process, I also strongly recommend you upgrade to 4.3.1. You should be able to upgrade by following these instructions from the release notes: 1. Referring to the System Management Utility 'Main' page, make a note of each disks's model/serial number; you will need this information later. 2. From the System-Management 'Main' page, Stop the array and then power down your server, remove the Flash and plug it into your PC. 3. Right-click your Flash device listed under My Computer and select Properties. Make sure the volume label is set to "UNRAID" (without the quotes) and click OK. (It probably already is set correctly on your flash drive if you are running unRAID 4.1) You do NOT need to format the Flash. You do NOT need to reload syslinux 4. Copy the bzroot and bzimage files from the new release to the root of your Flash device. You can even re-name the existing bzroot and bzimage to bxroot41 and bzimage41 in case you want to revert to the older release for any reason (just rename them back to their original names and reboot) 5. Right-click your Flash device listed under My Computer and select Eject. Remove the Flash, install in your server and power-up. If the array starts up on its own, using the list of drive model/serial numbers you made earlier, double check that all the disks are in their correct respective slots in the array. Different versions of Linux will sometimes scan the hardware in different order, so what used to be /dev/hda could end up as /dev/hdc. This is not a problem unless the parity disk is assigned a disk that used to hold your data. The unRAID array will not start itself if this occurs. You will need to go to the "Devices" page and assign the disks to their correct slots in the array and then return to the main page, check the box under the "Start" button, and then press "Start" Joe L.
June 17, 200818 yr His situation is a bit different, and I'm not sure he would see a "Restore" button, because he has a drive with unknown contents, has not been cleared and formatted, from unRAID's point of view. I don't think a parity check or sync should be allowed in this situation, but I don't know if anyone has faced this before. If someone were to have both a new disk and a failed disk, and saw a Restore button, they should probably get carefully thought-out advice, probably remove the new disk and fix the array first. If this were my system, I would: 1. Get the temps down pronto. If nothing else, take the side off and point a house fan into it, and flood it with cool air, especially around the drives. 2. Upgrade the flash drive to unRAID v4.3.1. Joe's instructions are good, but check the upgrade instructions in the v4.3.1 Release Notes. Since you are upgrading from 4.1 to 4.3.1, there are 3 additional files to copy. 3. After starting unRAID v4.3.1 and checking for improvement in the temps, try a SMART test on your new drive (I'm assuming it is Disk 4, sde, the new Seagate), with this command: smartctl -a -d ata /dev/sde > /boot/smart.txt This will create a report called smart.txt on your flash drive, that should make it clear what the status of this drive is, and facilitate obtaining an RMA for it, if needed. You really can't proceed further until you know whether this drive is good or bad.
June 18, 200818 yr Author Hey guys, I'm basically doing a combination of all of the above. I pulled the new 1TB Seagate and replaced it with an old 300GB drive. That drive is currently clearing and looks like it may take about two hours. It's also running in the mid 40 degrees, so big improvements all the way around. I wanted to get this drive installed and make sure that everything is working properly on 4.1 before doing the 4.3.1 upgrade. If there's one thing I've learned over the years, it's 'reduce the number of variables'! If all is well, I will move up to 4.3.1 tonight. I'm going to run some Windows diagnostics on the new 1TB drive. If I can't find any problem, I'll reinstall it, run the SMART test in Linux, and then see if it clears any better under 4.3.1. I doubt if I get that far, however, because I strongly suspect this drive has a problem. I'll attack the cooling issues over the weekend. I may just buy a big, high quality case which has lots of fans and more drive bays for future expansion. I'm not sure there's much I can do with the current case to get the temps down. And yes, I did have the option to Restore or Start the array. I thought Restore might have been a time saver, but it's not worth being out of parity sync, as Joe pointed out. Thanks again for your help and comments. Over the last six months my UnRAID box has gone from being a "Proof of Concept" project thrown together with leftover parts, to a core part of my media system. With your guys' help, it's only getting better! Dave
Archived
This topic is now archived and is closed to further replies.