Jump to content

Some drive issues & various bugs


Recommended Posts

I've been using UnRAID for quite a while now, and I really do like it a lot. It meets all my needs, and I'm using it as a media server - storing all my DVDs, music CDs and HD disks backed up as mkv files. Lately I've been encountering a couple of issues, and I'm wondering whether an upgrade to a later version may address one or both of them. BTW...I'm currently running UnRAID Server Pro - version 4.3.3.

 

Issue 1) This one's a bit annoying - but it's not a major issue. I'm finding that if I haven't connected to the server for a while, I can't bring up the webpage. I can still access all the drives OK (via Windows Explorer) but if I want to do any kind of maintenance, the webpage just won't come up. The only way I've found to fix this issue is to restart the server.

 

Issue 2) This one is REALLY scaring me! Because I'm backing up my HD movies, the file sizes tend to be quite large. About a month ago I used my automation system to start a movie out in the lounge (as I'd normally do) but nothing happened - the player came up OK just....no movie. I went out to my main PC in my office and I found I couldn't play it from there either - it just wouldn't start. Using Windows Explorer I found the file on the UnRAID server OK but it just wouldn't start playing. I checked a couple other movies on the same drive and they started just fine. I ended up having to re-rip, prepare, test & transfer the file to the server again - this process takes quite a while so I wasn't happy about having to do it, but I figured maybe something had gone wrong during the transfer process and I hadn't picked it up.

 

Since that time I now test movie playback before AND after transferring the movie to the server. Last night I found 2 movies I'd transferred to the server in the last couple of weeks both refused to start. I checked other movies on the same disc and, sure enough, they're playing back fine. I was sure I had tested these 2 movie files before AND after transferring them to the UnRAID server so something somewhere is corrupting the files AFTER they've been transferred to the server. The prospect of this happening scares the hell outta me!!!

 

Bottom line, I'm probably going to have to upgrade the UnRAID version I'm on anyway, but can anyone tell me whether they've encountered any of these issues before, and what fixed them? Also, what version of UnRAID should I move to? This is our "production" server, but we tend to use it quite simply - i.e. we don't really use any of the bells & whistles - but, like most UnRAID users, I've a serious aversion to data loss (or in this case corruption).

 

Any help, suggestions or advice is appreciated, but please bear in mind that I'm a Windows user and my knowledge of Linux is VERY small.

 

Thanks again!

Link to comment

First off run a memtest, 24hrs or so.

 

Grab a copy of md5 or a similar checksum tool. Check the files pre and post copy. Run parity check, check smart status. If/when the issue re-occurs re-run md5, parity and smart. A media file generally can be heavily corrupted yet still play. Md5 will spot 1 bit errors. Save the md5 in the directory for future ref. Terracopy with test option, will generate a checkum and check the file via verify operation.

 

Syslog would be useful, does 4.3.3 have syslog? I have only ever used 4.4.2 or later so cant comment on anything previous.

Link to comment

Thanks for the info so far.

 

OK - I'm running a total of 8 disks - including the parity disk. 2 x Seagate 500Gb drives, 1 x Seagate 750Gb drive, 4 x Seagate 1Tb drives and a WD Green 1.5Tb drive is the parity.

The M/Board is an Asus P5PE-VM - I can't recall what CPU is in there or how much memory I'm running - it's probably 1Gb. I've got 2 drives connected directly to the M/Board (I just checked and neither are the parity drive), and I use 2 x Promise SATA 300 TX4 SATA controllers. One is fully populated, and one still has another 2 ports spare.

 

In the Seagate drive issue that surfaced a while back I had (I think) 3 drives affected, but I was able to successfully update the firmware on all 3 of them without any issues.

 

If I have to investigate further I'll do the memcheck test, but the server runs headless and without a CD drive so it's a bit tricky to set such a test up.

 

In the meantime I've got another movie I wish to move across to the server, so I've downloaded an MD5 checker tool and run it before the transfer. I'll check it again after the transfer and see what the story is there.

Link to comment

If you can't trust your transfers, you can't trust this server.  You really don't have a choice here, you need to attach a monitor and determine what the problem is.  You don't need a CD drive, because unRAID includes a good memory tester in the boot screen.  Just select it instead of letting it default to booting unRAID.

 

Concerning issue #1, we need to see the syslog captured right after the problem has occurred, and before you reboot.  If you can Telnet in, you can capture it to your flash drive, otherwise it will need to be done at the physical console (another reason to hook up a monitor).  The Troubleshooting link in my sig has more instructions for capturing the syslog.

Link to comment

If you can't trust your transfers, you can't trust this server.

 

Very well said.

 

OK, next time I get the no response from the web interface I'll capture a log. I don't know how to telnet, but I'm sure there'll be some info in the wiki.

 

As far as the memtest goes, I'll have a quick scan in the wiki to see how to get to it from the boot screen. Then I'll report back with what I find.

 

Thanks for your help!

Link to comment

Well, good news and bad news!

 

The good news is - I ran Memtest for just over 24 hours - around 72 passes and not a single error!

 

....now the bad news!

 

When I hit esc to reboot I could see all sorts of errors on the attached monitor screen and when I accessed the console via Firefox from my office PC the main page is saying all my drives are missing!!!!  :'(

 

Thanks to RobJ's instructions I've captured the Syslog, but I don't know how to attach it to this post!

 

I'm concerned I might stuff something up if I try to reboot so, for the moment, I'm just going to leave the server as is until I hear from someone who knows how to help me!

Link to comment

Well, good news and bad news!

 

The good news is - I ran Memtest for just over 24 hours - around 72 passes and not a single error!

 

....now the bad news!

 

When I hit esc to reboot I could see all sorts of errors on the attached monitor screen and when I accessed the console via Firefox from my office PC the main page is saying all my drives are missing!!!!  :'(

 

Thanks to RobJ's instructions I've captured the Syslog, but I don't know how to attach it to this post!

 

I'm concerned I might stuff something up if I try to reboot so, for the moment, I'm just going to leave the server as is until I hear from someone who knows how to help me!

 

When you hit reply there is an Advanced Options section.  From there you can attach the syslog.  you may need to zip it first depending on how big it is.

Link to comment
Given the nature of my original problem has now changed, do I need to open a new thread?

Although the topic title is misleading, I prefer to keep everything together in one place.  Let's keep it all here.

 

When I hit esc to reboot I could see all sorts of errors on the attached monitor screen and when I accessed the console via Firefox from my office PC the main page is saying all my drives are missing!!!!

You have one drive causing all of the drives to appear to be missing, and it happens to be the very first drive it tried to process.  It is the first drive attached to the first connector on the Promise card that has 4 drives attached.  It was identified correctly, but briefly, as a Seagate 500GB, but from the very start, communications could not be established with it.  Over a minute was wasted on this drive, fruitlessly.  Later, all of the other drives were also correctly identified and setup, but too late for unRAID.  unRAID appears to wait for the drive setup, but has what looks like a 30 second timeout, so after waiting about 32 seconds (a very long time in computer land!), the unRAID modules kicked in and began their own initialization.  But not one drive had yet been registered in the Drive Inventory, so that is why they all look like they are missing.  Fixing or disconnecting this drive should make the others reappear.

 

I would normally say to get a SMART report, but since this problem drive could not be setup, you can't run SMART or hdparm queries on it, because it does not have a device ID.  All you can do is check the cables and connections to it, try replacing the SATA cable, make sure all connections are tight, check for power cable issues like a loose power cable splitter or loose wires within the splitter.  If that does not restore access to the drive, and moving it to a different port does not help either, the drive has probably failed, and you will need a replacement drive to rebuild onto.  Are you hearing any unusual noises?

 

I don't see anything in this syslog related to the drive corruption you reported earlier.  All I can say is, let's deal with one problem at a time.

Link to comment

RobJ,

 

Thanks for all your help so far. OK, I got inside the cabinet, unseated & reseated the no. 1 SATA cable on the SATA controller and on the Hotswap case for that drive. Carefully put it back together and the server is now up again and going through a parity check.

 

It was only a mild panic - honestly!

 

So, getting back to the original problem, I now know my memory (2 x 512Mb sticks) is not the issue.

 

Any idea where to from here?

 

Thanks again!

Link to comment

Your unRAID system clock is set to November 30.

 

Y'know, it's funny what you get used to? This is another bug with my server, but I just got to the point where I didn't worry about it. I've gone into the BIOS a number of times to change the clock, but as soon as I boot into UnRAID the date/time is always wrong!

 

One more bug I have (the last one I think!  ::)) is that I've never been able to do a "clean" shutdown. It's not something I've had to do very often so I didn't worry about it.

 

Anyway, onto the original issues (maybe if I get some success with this I should start a new post with the remaining ones!)

Link to comment

Well, hopefully I haven't scared EVERYONE away.

 

I pulled up the web interface after the system had completed its parity check (and found no errors). I clicked the "Spin Up" button and it took a LONG time for the web console to respond. I ended up stopping the page refreshing and then hitting the page refresh button again and, again it took a long time to repond. I ended up grabbing the syslog because I thought it was going to freeze on me again. However, after I'd recovered the syslog, I noticed the web console had recovered. The only thing I thought was interesting was one of my 500Gb drives has an * showing in the temp column - I know I had temps showing on all drives before this, so now I'm wondering whether this drive is on it's way out, or whether there's something screwy with the connection.

 

I've included the syslog just in case it reveals some clues, but I'm grateful for any help or good advice.

 

Thanks.

Link to comment

Well, hopefully I haven't scared EVERYONE away.

 

I pulled up the web interface after the system had completed its parity check (and found no errors). I clicked the "Spin Up" button and it took a LONG time for the web console to respond. I ended up stopping the page refreshing and then hitting the page refresh button again and, again it took a long time to repond. I ended up grabbing the syslog because I thought it was going to freeze on me again. However, after I'd recovered the syslog, I noticed the web console had recovered. The only thing I thought was interesting was one of my 500Gb drives has an * showing in the temp column - I know I had temps showing on all drives before this, so now I'm wondering whether this drive is on it's way out, or whether there's something screwy with the connection.

 

I've included the syslog just in case it reveals some clues, but I'm grateful for any help or good advice.

 

Thanks.

 

A star there is just fine.  If a drive is spun down you can't get the temp reading from it (at least on some drives).  A lot fo the WD will report a temp even when spun down, but the unRAID main page does not seem to support this, I know that my Main in unMenu does as 2 of my WD drives report temps even when spun down.

Link to comment

Your unRAID system clock is set to November 30.

 

Y'know, it's funny what you get used to? This is another bug with my server, but I just got to the point where I didn't worry about it. I've gone into the BIOS a number of times to change the clock, but as soon as I boot into UnRAID the date/time is always wrong!

 

This should not be a bug. If you change the date in bios, reboot and go back to the bios is it the same?

What are your settings in the SETTINGS tab DATE and TIME section in emhttp.

 

One more bug I have (the last one I think!  ::)) is that I've never been able to do a "clean" shutdown. It's not something I've had to do very often so I didn't worry about it.

 

Did you try installing my powerdown package. One of the benefits is during a shutdown (graceful or not) it will make an attempt to kill processes that have the filesystems in use, and umount them.  If not it will at least capture the syslog for review.

 

You should not be having this issue unless you have addons that are not being cleaned up.

 

Link to comment

A star there is just fine.  If a drive is spun down you can't get the temp reading from it (at least on some drives).  A lot fo the WD will report a temp even when spun down, but the unRAID main page does not seem to support this, I know that my Main in unMenu does as 2 of my WD drives report temps even when spun down.

 

OK - a couple of things here. The command I issued was Spin UP - so theoretically ALL my drives should have been spun up. Also, the drive in question is a Seagate, and I know for sure in the past that when the whole array has been spun up I've gotten temps from ALL my drives. This time, that drive isn't giving me a temp anymore.

 

Update - In fact I've just checked the web console now and, sure enough, UnRAID has an orange status and drive 6 (which was the drive in question) has a red light next to it and shows a 4 in the errors column.

 

I'm guessing I need to replace this drive?

Link to comment

Your unRAID system clock is set to November 30.

 

Y'know, it's funny what you get used to? This is another bug with my server, but I just got to the point where I didn't worry about it. I've gone into the BIOS a number of times to change the clock, but as soon as I boot into UnRAID the date/time is always wrong!

 

This should not be a bug. If you change the date in bios, reboot and go back to the bios is it the same?

What are your settings in the SETTINGS tab DATE and TIME section in emhttp.

 

One more bug I have (the last one I think!  ::)) is that I've never been able to do a "clean" shutdown. It's not something I've had to do very often so I didn't worry about it.

 

Did you try installing my powerdown package. One of the benefits is during a shutdown (graceful or not) it will make an attempt to kill processes that have the filesystems in use, and umount them.  If not it will at least capture the syslog for review.

 

You should not be having this issue unless you have addons that are not being cleaned up.

 

 

Weebotech, thanks for your comments. It looks to me like this drive issue is more important for the moment. However, I will check out changing the date in the BIOS and report back (It's been a LONG time since I tried ensuring the date was correct). There was something I used that allowed me to set a custom date/time zone but I'll check it out and report back.

 

As far as the clean shutdown goes, I have no addons that I can think of.

Link to comment

OK.

 

NOW I think I've hit a MAJOR snag!

 

I replaced the faulty drive with a new one that arrived this afternoon. UnRAID successfully recognised it as a new drive and I selected to start it to allow a rebuild. I got it started and about a minute later when I refreshed the screen. I noticed multiple errors on the new drive (drive 6) AND drive 7! I went out to look at the array and noticed the lights were off for drive 6 and 7! In a panic I stopped it. I then opened it up, and confirmed that drives 6 & 7 were connected to the same SATA controller. I pulled out the SATA controller, reseated it, reconnected it, and started the server. I was careful to make sure the lights were on for drives 6 & 7 again - they were.

 

However, now when I check via the web console, the array is stuck showing "STARTING".

 

The rebuilding of data based on the parity data was never going to work with the SATA controller causing a glitch with drives 6 & 7 (no idea what THAT was about!), but now it looks like the array is stuck down. Now I've only got 2 questions:

 

1) What SHOULD I have done in that situation (when I noticed the errors and the lights off on drives 6 & 7 during parity rebuild)?

2) Is there ANY hope of recovering the data that was originally on disk 6?

 

Update: The array is now STARTED, but disk 6 has a red light next to it and shows 1034Gb free out of 1465Gb (the original disk 6 was 500Gb, and the new disk 6 is 1500Gb)

 

Just in case it's of any use - here is the syslog....

Link to comment

Update:

 

I found the section of the wiki concerning re-enabling a drive. I followed the procedure and the array is currently rebuilding (I'm only now starting to calm down!) so hopefully I will be OK, but is there anything else I need to be aware of?

 

BTW....I only just figured out I could change the title, so I did (because the old one was so misleading!)

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...