September 11, 200817 yr My array had pretty much filled up so i bought a new 500 gig hard drive and was installing it. I stopped the array and then clicked shutdown, installed the drive, and powered it up. When it was up and running I went in to the GUI and noticed it said disk2 was missing (this is common and happens almost every time I move or install a drive). I went to devices and clicked the drop down menu and the missing drive was listed (usually the case when unraid says a disk is missing) so I selected it. Strangely, it says the replacement disk is too small and is reporting it as 1056 bytes smaller all of the sudden. It is the exact same drive that has been there and I never even disconnected it or anything. Here's a screen shot: I tried rebooting and it still says drive is too small. What the heck happened and how do I correct it? Also why does unraid report untouched drives as missing even though they are showing under devices?
September 11, 200817 yr RD, This sounds exactly like a problem I experienced: http://lime-technology.com/forum/index.php?topic=2230.0 You're missing the same # of sectors that I was missing. What motherboard are you using? Are you using a combo of IDE and SATA drives? The nearest that I could tell was that some of my drives would consistently lose this capacity when my Gigabyte motherboard was not in ACHI mode (I guess it was in IDE mode). All my drives are SATA, so I switched to ACHI mode. I haven't lost capacity since the switch. I'd really like to know what the root cause of the issue is so as to totally eliminate it in the future.
September 11, 200817 yr I forgot to mention some crucial info. I'm my case, something was definitely creating a Host Protected Area on some of my drives. Since I had only ever used unRAID on the setup, I could only surmise that it was the BIOS creating it. I used tools such as SeaTools to remove the HPA, but it would recreate itself when I rebooted (until I switched to ACHI mode).
September 11, 200817 yr Author It's a Gigabyte, but I don't remember the model. I'll have to check. It's SATA drives with one IDE.
September 11, 200817 yr Author I checked the log and there definitely seems to be some odd stuff in there, but it's all foreign to me. Here's the log: http://pastebin.com/m675b988d a sample of some strange looking lines: Sep 10 21:29:46 Tower2 kernel: attempt to access beyond end of device Sep 10 21:29:46 Tower2 kernel: sdd: rw=0, want=976772928, limit=976771055 Sep 10 21:29:46 Tower2 kernel: Buffer I/O error on device sdd1, logical block 97 6772864 Sep 10 21:29:46 Tower2 kernel: attempt to access beyond end of device Sep 10 21:29:46 Tower2 kernel: sdd: rw=0, want=976772929, limit=976771055 Sep 10 21:29:46 Tower2 kernel: Buffer I/O error on device sdd1, logical block 97 6772865 Sep 10 21:29:46 Tower2 kernel: attempt to access beyond end of device Sep 10 21:29:46 Tower2 kernel: sdd: rw=0, want=976772930, limit=976771055 Sep 10 21:29:46 Tower2 kernel: Buffer I/O error on device sdd1, logical block 97 6772866 sdd is the drive that's showing too small.
September 11, 200817 yr Author I meant to say that I don't see anything about HPA in the log. I'm not concerned about the lack what I assume is a minuscule amount of space, I just want to be sure it's not the sign of a bigger problem.
September 11, 200817 yr Also why does unraid report untouched drives as missing even though they are showing under devices? When unRAID is examining the disks, it first check to see if it is there at all. It it is not responsive, it is marked as "missing" If it does respond, the following code is involved. else if (!same_disk_info( disk, rdev)) { printk("md%d: wrong\n", disk->number); rdev->status = DISK_WRONG; /* count missing disk(s) */ mddev->num_missing++; mddev->missing_disk = disk->number; } else if (!disk_valid( disk)) { rdev->status = DISK_INVALID; } It check to see that the disk is still the same disk as you used to have in that slot in the array. If not, it marks it as INVALID and increments the "missing_disk" counter. The function that looks to see if the disk is still the same makes three tests... It check to see if the MODEL, SERIAL-NUMBER, and SIZE of the disk are still the same in that slot in the array. Two out of three isn't good enough if the size changed, to unRAID it is a different disk. [pre] /* check if disk info & size matches recorded info & size */ static int same_disk_info( mdp_disk_t *disk, mdk_rdev_t *rdev) { char str1[41]; char str2[41]; set_str( str1, disk->model, 40); set_str( str2, rdev->model, 40); if (strncmp( str1, str2, 16) != 0) return 0; set_str( str1, disk->serial_no, 20); set_str( str2, rdev->serial_no, 20); if (strcmp( str1, str2) != 0) return 0; if (disk->size != rdev->size) return 0; return 1; } [[/pre]
September 11, 200817 yr Author I see. This is the first time I've received the disk too small error. Every other time unRAID reports a drive as missing, it is available in a drop down under the devices tab and I assign it and everything is OK. unRAID reports the exact same make, model, and size. So I'm still at a loss as to why it does that. Anyway, as to my current problem... nobody has any odea from the log why unraid did this to this drive? Either way, what's the best way to proceed from. Like I said, I'm not concerned with the little loss of space, but how do I know that lost area didn't originally contain data? Should I just select restore and let it do a parity sync?
September 11, 200817 yr I meant to say that I don't see anything about HPA in the log. I'm not concerned about the lack what I assume is a minuscule amount of space, I just want to be sure it's not the sign of a bigger problem. I think my log explicitly referenced an HPA. I never had any problems with my array when I had the missing space, but then again, I never made any configuration changes after my drives were in use. I had rebuilt my unRAID to lower my number of drives, and the first time I addressed the issue was when I decided to add a drive. So I never had a drive lose space after data was on it. My fear would be that I totally fill a drive (I mean TOTALLY) and then I make some change that causes the loss of space to occur. Would/could I lose some data or corrupt a file? Then there's the annoyance factor? In my case, on the same port, some drives would work (i.e., not lose space) and some drives wouldn't. It was just a PIA to try to change to a faster parity, but then it's not seen as the biggest drive.
September 11, 200817 yr I took a look at the piece of the syslog you linked above, manually unwrapped the wrapped lines, and I can say I have never seen before the particular errors you are getting. The theories about Gigabyte using an HPA perhaps for BIOS copies does sound plausible to me. It's too bad there does not seem to be a way to turn that off in the BIOS menus? Would it be possible to capture the syslog by copying it, so we can look at the whole thing, and without word-wrap? The part above appears to be the second half of the syslog, and begins part way through the setup of sde, and it is the setup of sdd that is at issue. I think it very likely that there will be an HPA or Host Protected Area logged as part of the setup of sdd, the Seagate, Disk 2.
September 12, 200817 yr Author Also, I've been running this Gigabyte mobo for a couple years now and this has never happened before so it seems odd that it would be my problem now. This drive has been in this server for about a year and many boot ups and this has never happened before.
September 12, 200817 yr Also, I've been running this Gigabyte mobo for a couple years now and this has never happened before so it seems odd that it would be my problem now. This drive has been in this server for about a year and many boot ups and this has never happened before. Can you post the output of smartctl on that disk drive. Perhaps it has had some kind of error that resulted in it needing to use some of the space on the drive. smartctl -a -d ata /dev/sdd Joe L.
September 12, 200817 yr In that same post I posted a link to the whole log on pastebin. Perhaps it did not upload correctly? If you check the pastebin link, you will see that it is just a highly wrapped piece of the syslog, the last half, and begins part way through the setup of sde.
September 12, 200817 yr Author Now this is really weird. As I was typing that last post I was thinking to myself "The mobo never did this before to nay of my disks and everything is exactly the same except for adding a new drive which I never activated in unraid because of this problem." That's when I thought I should go down and unplug the new drive to see if it was somehow contributing to the problem. I power down the server, remove the power and data cable from the new drive and reboot. Same problem. I hook it back up and reboot. Same problem. I then decide to check BIOS settings. I don't see anything amiss, so I reboot and now the problem drive is shown as missing. I go under devices select the same drive and now unraid reports everything is fine! The drive is back to being the right size! I never changed a single thing. All I did was disconnect and re-connect the new drive and reboot 3 times. I didn't change anything in the BIOS and all power downs were the correct way through the web GUI. It's weird that it also exhibited the other problem I mentioned having for some time where untouched drives will show as missing after a reboot but show under devices. I'm stumped and a combination of a little happy while being a little nervous. I'm happy my new drive is added to the array and formatting so I can transfer some data to it, but I'm a little nervous as to what this all means about the reliability and stability of my array. Anyone have any ideas what it means or what I should do?
September 12, 200817 yr Author In that same post I posted a link to the whole log on pastebin. Perhaps it did not upload correctly? If you check the pastebin link, you will see that it is just a highly wrapped piece of the syslog, the last half, and begins part way through the setup of sde. I'm sorry, I now see it's only part of the log. I must've screwed up when I was highlighting the text. As for the wrapping that is exactly the way it shows in my telnet window. I can't seem to maximize or widen the window.
September 12, 200817 yr how do I know that lost area didn't originally contain data? Once we confirm that it is an HPA, then I can reassure you about data loss. HPA's are formed from the very last tracks on the disk, so unless you filled the very last megabyte of the drive, you are safe. For syslogs, we always recommend copying it. You get all of it, intact, without wrapped lines by the telnet box, and ready to attach. See http://lime-technology.com/wiki/index.php?title=Troubleshooting#Capturing_your_syslog.
September 12, 200817 yr Author Thanks, I'll remember that in the future. How can we confirm it was HPA if it's fine now?
September 12, 200817 yr Probably can't now. Since it is working correctly, it is probably back to the correct full size, and no HPA. If it happens again, capture the syslog, and search it for both HPA and Host Protected Area. Some drivers refer to it as HPA, and others refer to it with the longer name. The lines that contain that will show the discrepancy in the total size of the drive. I do recommend capturing a syslog now, for a baseline to compare, if/when the problem occurs again. You will be able to see the changes in the lines where sdd is first setup.
September 12, 200817 yr Author OK. It's currently clearing my new drive. Should I wait for it to finish then reboot and capture the log then?
September 12, 200817 yr I tend to capture more syslogs than most, at any excuse. I especially would capture 2 syslogs here, before and after reboot, just in case... You can always delete the extra ones later. A syslog contains most of the unRAID setup, especially of the drives, plus most or all of the issues reported for that session. But once you reboot, it is gone forever, unless the syslog was saved. Having multiple syslogs allows you to use file comparison tools, that help you quickly isolate what is different between them. I use Total Commander's built in comparison tool for quick analysis, and WinMerge (with a prediffer of 16 columns) for better analysis, better handling of moved lines. Quick isolation of just the changes is important when reading syslogs, because it's often more about what to ignore than what to look for. 'Before and after' syslogs or 'baseline and problem' syslog pairs are ideal for syslog analysis.
September 12, 200817 yr Author Alright, I'll save one after the clearing is finished and another after a reboot. I'll try some repeated reboots in a couple days to see if I cant replicate the original problem and capture a log of that. Thanks for all the help. It was beneficial even if I still don't know what happened.
Archived
This topic is now archived and is closed to further replies.