Trying to preclear 3tb disk...errors

icedragonslair · February 20, 2013

unRAID Server Plus

Version 5.0-rc5

5 x 2TB wd av-gp (1 x parity, 4 x array disks), Kingston HyperX 3k 120gb (cache), Gigabyte GA-H57M-USB3 (v2.0), Intel i530, 8GB G.SKILL Ripjaws DDR3 1333, PC P&C Silencer 750 Black

I am trying to preclear a new 3TB av-gp drive for my unraid server, but I am getting nowhere with this.

I get it to run with no problem and it starts out at about 144mb/s sometime down the road it drops back to 2.6 mb/s and would 10 times longer than any of my 2TB drives did.

Am I missing something, I have taken the array offline and turned off the spindown timer, so preclear has pleanty of ram. I have included the log below.

Also in an unrelated question...is there any way to keep it from putting the monitor to sleep...is does that everytime and it makes me wake it back up by hitting enter (unless is no real issue)...lol

syslog-2013-02-20.txt

icedragonslair · February 20, 2013

I just had a thought after reading somewhere that someone actually precleared a disk on another box running unRaid on it...is that even possible and whould I be better off doing something like that since time is an issue?

Automatic · February 20, 2013

I just had a thought after reading somewhere that someone actually precleared a disk on another box running unRaid on it...is that even possible and whould I be better off doing something like that since time is an issue?

Would be possible, as for if you want to do it it's up to you.

However, make sure you don't write any data to the drive after the preclear or else when you assign it to the drive you'll have to format it again.

Joe L. · February 20, 2013

I just had a thought after reading somewhere that someone actually precleared a disk on another box running unRaid on it...is that even possible and whould I be better off doing something like that since time is an issue?

Would be possible, as for if you want to do it it's up to you.

However, make sure you don't write any data to the drive after the preclear or else when you assign it to the drive you'll have to format it again.

However, make sure you don't write any data to the drive after the preclear or else when you assign it to the drive unRAID will formatclear it before it will being the array online. (You'll be off-line for many many hours).

icedragonslair · February 20, 2013

Thanks for the tips...I just wonder why its doing what it is...it is a proven good drive and I have never had this issue before.

Also, would it be worth upgrading to the most current RC?

Automatic · February 21, 2013

I just had a thought after reading somewhere that someone actually precleared a disk on another box running unRaid on it...is that even possible and whould I be better off doing something like that since time is an issue?

Would be possible, as for if you want to do it it's up to you.

However, make sure you don't write any data to the drive after the preclear or else when you assign it to the drive you'll have to format it again.

However, make sure you don't write any data to the drive after the preclear or else when you assign it to the drive unRAID will formatclear it before it will being the array online. (You'll be off-line for many many hours).

My mistake, although, I still feel that I got my point across

RobJ · February 21, 2013

Feb 20 13:38:44 Tower kernel: irq 16: nobody cared (try booting with the "irqpoll" option)

Feb 20 13:38:44 Tower kernel: Pid: 0, comm: kworker/0:1 Not tainted 3.0.35-unRAID #2

Feb 20 13:38:44 Tower kernel: Call Trace:

Feb 20 13:38:44 Tower kernel: [<c104fbd8>] __report_bad_irq+0x1f/0x95

...

Feb 20 13:38:44 Tower kernel: Disabling IRQ #16

You lost IRQ16, which was used by an AHCI and 2 USB subsystems. This specific AHCI controller handled both sdg (the drive you wanted to Preclear) and sdh (120GB Kingston).

If you had accessed them afterward, they would both have been disabled. I would upgrade to RC11, perhaps the newer kernel will handle the IRQ's more safely? No guarantees though, as I don't know why it happened.

Also in an unrelated question...is there any way to keep it from putting the monitor to sleep...is does that everytime and it makes me wake it back up by hitting enter (unless is no real issue)...lol

I would like to know this myself, is there a way to disable or modify the time of this screen saver feature? I'm pretty sure I saw a Linux (Bash?) setting for this somewhere.

icedragonslair · February 22, 2013

RobJ, Thanks for responding as quickly as you did, I am really getting discouraged and looking for alternatives at this point.

You lost IRQ16, which was used by an AHCI and 2 USB subsystems. This specific AHCI controller handled both sdg (the drive you wanted to Preclear) and sdh (120GB Kingston).
If you had accessed them afterward, they would both have been disabled. I would upgrade to RC11, perhaps the newer kernel will handle the IRQ's more safely? No guarantees though, as I don't know why it happened.

This IRQ16 bug has been around for quite sometime and should have really been addressed (IRQ handling) before now.

I upgraded to rc11 and it did the same thing, I did just run a complete testing (smart, short, extended) on the drive and even wrote zeros to it on a different system...everything passed. However, this system is powerful enough to run unRAID and preclear drives of this size (very old hardware bench tester).

I will try and remove the cache drive and try again...however after upgrading to rc11 I suffered a crash, so I restore the previous rc5 and all is well now.

I would like to know this myself, is there a way to disable or modify the time of this screen saver feature? I'm pretty sure I saw a Linux (Bash?) setting for this somewhere.

Yes I wish someone would adress this.

Thanks

Freddie · February 22, 2013

Also in an unrelated question...is there any way to keep it from putting the monitor to sleep...is does that everytime and it makes me wake it back up by hitting enter (unless is no real issue)...lol

I have been annoyed with the blank screen also. Your question inspired me to figure it out.

This command will keep the screen from blanking during the current boot:

setterm -blank 0

You might also want to investigate the -powersave and -powerdown options. Looking at the /etc/rc.d/rc.M file, it might try to enter a deeper powersaving mode after 60 minutes. My monitor does not.

I haven't tried putting this in the go script for future boots, but it seems like it should work.

Sorry I can't help with your real problem (IRQ?).

RobJ · February 22, 2013

Joe L pointed me in the same direction, thanks Joe! I did some research also (man setterm, etc), and some experimenting, and hopefully the following is accurate, if not complete yet. Any correction is very welcome!

If you just want to disable the screen blanking, then the "setterm -blank 0" command should be all you need. I should point out though, that all of the advice I found indicated a longer version "setterm -blank 0 -powersave off" (that is powersave not powerdown). I think either will work, because the power saver depends on the screen being blanked first, so won't kick in if screen is never blanked.

If you want to adjust the screen blanking time, it's a little more complicated. There are 2 layers involved, what is on the screen itself (an image or a blank screen) and whether the monitor is full power (typically green power light) or in power down mode (2 to 5 watts, typically amber power light). Apparently the screen has to be blanked first, before the power down timer starts. The parameter for -blank is 0 to 60, 0 meaning no blanking, 1 to 60 meaning number of minutes after activity to blank the screen. The parameter for -powerdown (that is powerdown not powersave) is 0 to 60, 0 meaning disable powerdown, 1 to 60 meaning number of minutes after screen blanked to power down the monitor (to power down mode of 2 to 5 watts). So a command of "setterm -blank 15 -powerdown 2" will take 17 minutes after last activity before powering down.

I believe (haven't tested yet) you can put the setterm command of your choice in your go file, and I also believe from what I read that it will apply to all consoles (Alt-F1 through Alt-F6), not just the current one (not tested either). With a little more testing and feedback, I'll add something about this to the FAQ console section.

icedragonslair · February 22, 2013

Amazing, I think the "setterm -blank 0" coommand will work for my instance since the monitor is actually a 24" hdtv used for three systems (3-hdmi ports), so the server hdmi feed can actually stay on all the time.

Thanks loads, I'll post when I get this IRQ thing sorted out or the drive preclears...whichever comes first...lol

On the IRQ16 issue, it may be that I am using all 7 of the sata ports on the MB (2 different controllers) instead of a good addin card. I am going to knock a few more things that it can be first, then I may end up looking into a card, I suppose I will need one sooner or later. Of course I do have an H57 Evo Asus kickin around (6 x stat II, 2 x sata III) as well, will be trying that first

Ice

icedragonslair · February 24, 2013

I got the disk to preclear but I used another system to do it...I am also wondering if the SATA on my mobo has anything to do with the IRQ problem The array and existing parity are on the 5 main sata II ports on this mobo...but the cache drive and the 3TB drive I was trying to preclear were on the 2 secondary sata 6 gb/s slots by the other controller...this seems to be where the error occured.

I have another board that has 6 main sata II ports and 2 additional Sata III ports.

I would then be able to keep my main array and parrity on the same controller and then just have my cache on the secondary controller. Hopefully this would alleviate the IRQ error I was getting.

My main question is will the hardware change make a big difference and how difficult will it be?

Joe L. · February 24, 2013

I got the disk to preclear but I used another system to do it...I am also wondering if the SATA on my mobo has anything to do with the IRQ problem The array and existing parity are on the 5 main sata II ports on this mobo...but the cache drive and the 3TB drive I was trying to preclear were on the 2 secondary sata 6 gb/s slots by the other controller...this seems to be where the error occured.

I have another board that has 6 main sata II ports and 2 additional Sata III ports.

I would then be able to keep my main array and parrity on the same controller and then just have my cache on the secondary controller. Hopefully this would alleviate the IRQ error I was getting.

My main question is will the hardware change make a big difference and how difficult will it be?

Ask that question in the hardware forum in another thread... Usually, it is only a matter of plugging in the new MB.

icedragonslair · February 24, 2013

Thanks Joe L.

I'll do that, I will also post back if that solves the IRQ problem as that may actually be the cause of the errors happening to many users, expecially considering the configurations of Sata controllers that arenative to motherboards.

icedragonslair · February 26, 2013

Sad to say that performance issues are probably going to force me back to win7 64 / flexraid setup after more than 6 months with unRAID. I love the premise but the transfer sppeds are hovering around 1mb/s and that just isn't acceptable at all. Looking at whs 2011 as well

RobJ · February 26, 2013

Sad to say that performance issues are probably going to force me back to win7 64 / flexraid setup after more than 6 months with unRAID. I love the premise but the transfer sppeds are hovering around 1mb/s and that just isn't acceptable at all. Looking at whs 2011 as well

Your post came as somewhat of a surprise, since very recently you had said:

...however after upgrading to rc11 I suffered a crash, so I restore the previous rc5 and all is well now.

I've probably missed a lot, so forgive me for asking, "What went wrong?" You hadn't mentioned transfer speeds before, I assume you are talking networked transfers? Are read and write speeds on the server itself also slow?

I looked back at your syslog, noticed your Gigabyte board has a 2010 BIOS, you might check for a newer one. Also, have you tried the 'mem=4095M" parameter in your syslinux.cfg boot file?

icedragonslair · February 26, 2013

RobJ,

Everything was and is fine with rc5, but it seemed that rc11 just wouldn't run right (haven't had time to look into why yet), also upgrading the parity seemed to slow things a bit, I have tinkered with the sytem a bit and speeds are back a little...read always seemed fine, but write was always around 15-20mb/s, I have it back up to about 8mb/s, but that is unbearably slow for the gear & network I have.

I haven't tried the 'mem=4095m' parameter yet, what exactly does it do or accomplish?

I have now tested the up/down speeds and down from server to local net based system is roughly 80mb/s

the Upload speed to server is still quite dismal at 8-12 mb/s, this I assume part of this is because of the parity write times (real time writes), is this correct?

As far as the anything else that may be causing this I will post another log if necessary, so I can get some more help...probably should wait until I switch motherboards to see if that makes a difference?

I do have a question though, I have been reading alot about the best parity type for servers like mine, where I manually do all the transfers & metadata and don't actually serve anything except through plex...would a snapshot parity make more sense and will unRAID offer that anytime soo if ever?

Thanks for all,

Ice

icedragonslair · February 28, 2013

After suffering some reboots I am leaning more towards hardware issues of sort, I am hoping it is that, if not I honestly am at a loss.

Thanks for the help,

Ice

(still want to know what mem=4095m does...lol)

Joe L. · February 28, 2013

After suffering some reboots I am leaning more towards hardware issues of sort, I am hoping it is that, if not I honestly am at a loss.

Thanks for the help,

Ice

(still want to know what mem=4095m does...lol)

Reboots can be hard to diagnose... but another thing to check are the CPU heatsink and fan. If it overheats and shuts down to protect itself, you'll get the symptoms you are experiencing. If you installed it yourself, make sure you did not put too much heatsink compound between them. (too much is almost as bad as none at all)

mem=4095m will limit the Linux kernel to use only 4Gig of ram, regardless of how much is physically in your server.

Joe L.

icedragonslair · February 28, 2013

JoeL

yeah, I checked that (i checked most of the hardware first before I posted), the only things I have left to check is the mobo itself, and that is being done now.

I will keep this updated, and implement 'mem=4095m' when I se=arch and find instructions to do so.

Also I think I am going to swap the board anyways as a stop gap measure.

Thanks,

Ice

SidebandSamurai · February 28, 2013

JoeL

yeah, I checked that (i checked most of the hardware first before I posted), the only things I have left to check is the mobo itself, and that is being done now.

I will keep this updated, and implement 'mem=4095m' when I se=arch and find instructions to do so.

Also I think I am going to swap the board anyways as a stop gap measure.

Thanks,

Ice

IceDraongsLair,

We need to troubleshoot this in a logical manner. Lets focus on your system.

I read through the thread, The memory work around is for the Supermicro system boards. This slow down is not known to happen in any other system boards that I know of.

Can you re-install RC-11 WITHOUT plugins, then let us know what happened. Including a syslog would be very helpful also. There are articles elsewhere on the forum about a script you can put on your flash drive that will let you switch between RC-5 and RC-11 easily.

In regards to the screen blanking. Be aware, if you turn off the blanking feature you could burn in your screen. Spacebar simply turns your screen back on. The screen blank feature was put in Linux long time ago to prevent the burn in of older CRT type monitors. Even modern flat panel monitors WILL burn in. Keeping it on is not doing any harm.

Have you run memtest86+? and if you have, have you run it for more than one loop? 10 loops is preferable to make sure you don't have hardware problems. Yes this takes a long time (sometimes over two days) but if memtest86+ passes after 10 loops with NO ERRORS then you know your RAM, CPU and Memory, and system board are all in good condition.

How about a BIOS update, are you current on your BIOS updates? Older Bios images can cause performance issues.

Sincerely,

--Sideband Samurai

Reboots can be hard to diagnose... but another thing to check are the CPU heatsink and fan. If it overheats and shuts down to protect itself, you'll get the symptoms you are experiencing. If you installed it yourself, make sure you did not put too much heatsink compound between them. (too much is almost as bad as none at all)

mem=4095m will limit the Linux kernel to use only 4Gig of ram, regardless of how much is physically in your server.

Joe L.

I don't necessarily agree with that. If its an Intel system, the CPU will slow down first. Then it will lock up, not reboot. Also AMD systems will simply lock up to save the CPU, not reboot. He has not told us his hardware configuration yet, and he is about to replace the system board with out really troubleshooting it properly. Its like trying to shoot a duck with a howitzer. You get the duck, but not much of it after you are done.

In your posted syslog I see the following error:

Feb 20 12:08:12 Tower kernel:  sdg: unknown partition table

Also I see this at the beginning at your syslog:

Feb 20 13:54:55 Tower emhttp: WDC_WD30EURS-63SPKY0_WD-WMC1T1399862 (sdg) 2930266584

Is this the drive you are having problems with?

icedragonslair · March 1, 2013

A complete documentation of what has happened so far.

Little bit about my back ground first. I build & maintain systems for a major utility company in the north east (this is however my first foray into Linux as I am not a software engineer), we work mainly with windows server systems. I build custom gaming systems/mod the cases of many of my friends as a hobby and even have a few dedicated business customers (DBA, looking at retirement...lol).

Now for the System hardware (mentioned previously in the first post of this thread):

Intel i3-530 (running stock specs),

Gigabyte GA-H57M-USB3 (most current bios, but it is still 2010, USB3 turned off & Sata controllers set to ahci, no other tweaks),

G.Skill F3-10666CL9D-8GBRL (running at stock timings), (Memtest86+ x10 twice)

1 x WD30EURS-63SPKYO 3TB (PARITY)

2 X WD20EURS-63S48Y0 2TB (ARRAY)

2 X WD20EVDS-63STB0 2TB (ARRAY)

1 X KINGSTON SH103S3120C 3K SSD (CACHE - is this too small? I only use it for running plex at this time)

1 X CRUZER FIT 8GB USB (OS)

PC POWER & COOLING -Silencer 750 EPS 12V QUAD/BLACK series (60A single rail)

All hardware above has been thoroughly tested and passed without issues

The first problem was the IRQ16 bug (many have written about this in this forum) that occurred when I was trying to pre-clear the (new) 3TB parity drive. At the time of this error, both the new parity drive and the cache drive were on the secondary (Gigabyte 6GB/s 'secondary' controller) the syslog also stated a usb issue with IRQ16, so I did some digging and found out that the secondary controller & USB were on the IRQ16 (it just seemed like improper IRQ management). process I upgraded to RC11, but suffered a crash…so I restored RC5. I then removed the drive and pre-cleared it on another system I booted to the free unRAID, had no issues, put it back in the tower and put it on the same controller, but I left the old parity out of the system at this point, and all seemed well.

Once the 2TB (old parity) drive was tested and pre-cleared I added it to the array, it formatted, and then the troubles started again. At that time I started looking into the controller as a possible cause for the IRQ16 problem/bug, but I honestly think the only way I can sort this bug out is to change controllers and see if it still produces the same result.

File transfer speed became completely dismal and unacceptable while the other 2TB drive was still in the system and some random crash/reboots happened. There were also something pointed out about the parity drive not completing its check properly, so once again I removed the 2TB (old parity) drive and reset the parity.

Now the parity drive has synced and has completed the check properly with zero errors. The system is running fine once again but still won’t handle the 2TB (old Parity) drive being installed without causing the original IRQ16 bug.

I have included some data (current syslog and two pics to show array)

@SidebandSamurai

As far as the screen blanking goes I mentioned above that the monitor is actually a TV and only on pc output when I actually key in commands, otherwise it is off PC Input, so no issues with burn in.

&

WDC_WD30EURS-63SPKY0_WD-WMC1T1399862 (sdg) 2930266584

This drive is now the parity and seems to be fine; the drive I am having trouble adding back to the array is WDC_WD20EURS-63S48Y0_WD-WCAZAF816459 (THE Old Parity Drive)

Still following any suggestion I can get, plex (the only addon that I installed is now disabled), so I will try the RC11 upgrade again.

Thanks for all,

Ice

syslog-2013-02-28.txt

icedragonslair · March 1, 2013

here is the pic I mentioned.

SidebandSamurai · March 1, 2013

here is the pic I mentioned.

I see you are still on RC-5. Any chance to upgrade to RC-11? or is that not working at all?

--Sideband Samurai

icedragonslair · March 1, 2013

The first problem was the IRQ16 bug (many have written about this in this forum) that occurred when I was trying to pre-clear the (new) 3TB parity drive. At the time of this error, both the new parity drive and the cache drive were on the secondary (Gigabyte 6GB/s 'secondary' controller) the syslog also stated a usb issue with IRQ16, so I did some digging and found out that the secondary controller & USB were on the IRQ16 (it just seemed like improper IRQ management). process I upgraded to RC11, but suffered a crash…so I restored RC5. I then removed the drive and pre-cleared it on another system I booted to the free unRAID, had no issues, put it back in the tower and put it on the same controller, but I left the old parity out of the system at this point, and all seemed well.

No I haven't tried it again yet..., I don't think it had anything to with the errors or even the mishandling of the IRQ16 error that cropped up earlier.

Trying to preclear 3tb disk...errors

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived