Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Rebuild stuck at 99.9%

Featured Replies

Ok so I had what looked like a drive failure on disk5.

I replaced the old 500gig Disk 5 with a new 2TB disk5 - did rebuild - went to 99.9% stuck.

So I thought - ok maybe the disk is bad.

Installed a NEW 2TB - repeat rebuild - again 99.9% and STUCK!!

 

2 brand new western digital Green 2TB drives.

Rebuild of 2TB took about 24hrs to get to 99% - then stuck at 99.9% for hours.

 

See attached image.

The estimated speed was cranking at 3000 or so but after sitting at 99.9% it drops

 

Any help is appreciated.

Stuck.jpg.a4c4e01328ed972eb1ca06db20c82c5b.jpg

  • Replies 70
  • Views 9.1k
  • Created
  • Last Reply

You might have better chance to get help from forum if you can post a copy of your syslog. because no one has x-ray eyes can see through your web GUI interface to figure out what is wrong.  :D

 

Whatever goes wrong your syslog might, but no guarantee, have some information there.

  • Author

Sys log attached.

 

sys.zip

Something is clearly wrong with disk6.  Your syslog reports read errors for that disk, and the web interface shows a temp of 0.  I don't know the solution to this, so you'll have to wait for Joe L or someone else to guide you.

  • Author

Oh man dont tell me two drive failure!!!

 

Result of FREE

 

            total      used      free    shared    buffers    cached

Mem:      3082520    2992828      89692          0    217220    2503056

-/+ buffers/cache:    272552    2809968

Swap:            0          0          0

 

RESULT of PS

 

  PID TTY          TIME CMD

2783 pts/0    00:00:00 bash

2820 pts/0    00:00:00 ps

 

 

  • Author

Machine is an AMD X2 64

with 3Gig of RAM

On an AASUS ABN32-SLI SOCKET 939

5 drives using on board SATA parity drive is driven by one of these ports

4 IDE PATA drives driven by a Promise Controller - PCI - bus

2 SATA drives driven by a Promise Controller - PCI bus

PCI video card.

 

Machine is an AMD X2 64

with 3Gig of RAM

On an AASUS ABN32-SLI SOCKET 939

5 drives using on board SATA parity drive is driven by one of these ports

4 IDE PATA drives driven by a Promise Controller - PCI - bus

2 SATA drives driven by a Promise Controller - PCI bus

PCI video card.

 

 

Your syslog has been wrapped around, so a lot of information is not in this copy of syslog. you might need to post previous copy of syslog as well for whole picture. regardless, a lot of things going on in background.

 

(a) unRAID has problem to spin down disk6, without inventory table at boot time, can not figure out which disk is disk6.

(b) You have I/O error toward hdb IDE disk. in unRAID, IDE disk start with "h", SATA disk start with "s" in disk name

© System has decided to remount md5 as read-only due to I/O error but assuming you are rebuilding md5, that is why it never finish

(d) because of I/O error, some data write to md5 has been declared MIA (Missing in Action).

 

At this moment, i am suspecting your Promise Controller might have problems, if disk6 is also a IDE disk.

 

 

 

------------ from your syslog -------------------------------------------------------------------------------

 

Oct 29 13:34:48 theoracle kernel: md: disk6: ATA_OP_STANDBYNOW1 ioctl error: -5

Oct 29 13:34:58 theoracle kernel: mdcmd (7060): spindown 6

Oct 29 13:34:58 theoracle kernel: md: disk6: ATA_OP_STANDBYNOW1 ioctl error: -5

Oct 29 13:34:59 theoracle kernel: end_request: I/O error, dev hdb, sector 940049679

Oct 29 13:34:59 theoracle kernel: md: disk6: ATA_OP_SETIDLE1 ioctl error: -5

Oct 29 13:35:09 theoracle kernel: md: disk6 read error

Oct 29 13:35:09 theoracle kernel: handle_stripe read error: 940049616/6, count: 1

Oct 29 13:35:09 theoracle kernel: REISERFS error (device md5): vs-2100 add_save_link: search_by_key ([-1 8 0x1001 DIRECT]) returned -2

Oct 29 13:35:09 theoracle kernel: REISERFS (device md5): Remounting filesystem read-only

Oct 29 13:35:09 theoracle kernel: end_request: I/O error, dev hdb, sector 47655

Oct 29 13:35:22 theoracle kernel: md: disk6 read error

Oct 29 13:35:22 theoracle kernel: handle_stripe read error: 47592/6, count: 1

Oct 29 13:35:22 theoracle kernel: Buffer I/O error on device md5, logical block 5949

Oct 29 13:35:22 theoracle kernel: lost page write due to I/O error on md5

Oct 29 13:35:22 theoracle kernel: md: disk6 read error

Oct 29 13:35:22 theoracle kernel: handle_stripe read error: 47600/6, count: 1

Oct 29 13:35:22 theoracle kernel: Buffer I/O error on device md5, logical block 5950

Oct 29 13:35:22 theoracle kernel: lost page write due to I/O error on md5

 

  • Author

Thanks GK appreciate this much.

 

Disk 6 is IDE PATA on one promise controller

Disk 5 is SATA on another promise controller

 

Some more info - I just switched motherboard/cpu combo as well - can this have anything to do with it?

 

How do I get previous versions of syslog?

 

Why does it complete up to 99.9% always stop exactly at the same place?

Some more info - I just switched motherboard/cpu combo as well - can this have anything to do with it?

 

Well, if you mention this at beginning, i will definitely think this might be an issue, especially if you hadn't done any burn-in test on new CPU/MB/memory. Even you get a new MB recommended by unRAID Wiki doesn't meant you have "free out of jail" card. Massive production stuffs quality is dynamic could change from time to time.

 

If you have new CPU/MB then you should at least run it for certain period of time, couple weeks maybe, with some heavy duty operations such as multiple non-corrected parity check, copying large amount of data in and out, in order to exercise this new set of hardware.  Also run memtest for at least a day or two. After that you can start to do disk swapping/replacement.

 

 

How do I get previous versions of syslog?

 

I believe it is under /var/log

 

 

Why does it complete up to 99.9% always stop exactly at the same place?

 

I have no idea, from sector number in I/O error, that doesn't look like the last sector.

 

  • Author

Its not a new MB/CPU - I mean I have been using this CPU/MB in my main desktop for like 3 years no problems.

Its new to the Unraid server. I upgraded my desktop and moved the old MB/CPU to the unraid server.

 

So what options? How do I move forward?

 

What do you mean disk swapping replacement?

  • Author

Ok running Mem Check

 

Also I noticed that Disk 6 was on an IDE chain with Disk 5 (original) PATA when I replaced Disk 5 - I installed a SATA. Which left the PATA IDE channel primary empty - ie: end of the cable did not have a device - and the only drive on that IDE PATA chain was disk 6 - I wonder if this caused some kind of problem. I moved it to the end of the chain.

 

When Mem Check completes I will try another rebuild on 5.

 

Man if nothing else I am learning some linux the hardway :)

Its not a new MB/CPU - I mean I have been using this CPU/MB in my main desktop for like 3 years no problems.

Its new to the Unraid server. I upgraded my desktop and moved the old MB/CPU to the unraid server.

 

When you used this CPU/MB in desktop if every now and then you also have so many disks and reading data from all of them at same time and

doing XOR calculations and comparing data for 6 to 10 hours non-stop each time, then your assumption might be sustained.  :D. parity check/data rebuilding is a heavy duty job because it exercise almost every piece of your system components for many hours non-stop, doesn't like in desktop if you only do web browsing/editing documents....etc.

 

So what options? How do I move forward?

 

Your option now is to eliminate all those I/O errors because as long as those errors are there you have no chance can rebuild md5.

But how to is a good question. you can

 

(a) Go back to "last know good hardware", that is put everything back to previous HW setup and try to rebuild data on this platform.

(b) Still using "new" CPU/MB but try to make sure they can work together (maybe replace controller card or remove SATA controller card and use on board SATA port instead....etc)

© Last but not least, make sure there is no loose cables (SATA cables as well as power cables).

 

While you are troubleshooting always checkout syslog for any warning/error messages.

 

What do you mean disk swapping replacement?

 

Originally i though you are replacing md5 but now i knew you did that because this disk is "broken". However given all those I/O at data rebuild, i will have my doubt if this old md5 disk is really broken or not.

 

Also I noticed that Disk 6 was on an IDE chain with Disk 5 (original) PATA when I replaced Disk 5 - I installed a SATA. Which left the PATA IDE channel primary empty - ie: end of the cable did not have a device - and the only drive on that IDE PATA chain was disk 6 - I wonder if this caused some kind of problem. I moved it to the end of the chain.

 

IDE has master/slave (aka primary/secondary) concept, every time you rearrange disks you have to make sure those IDE disks had been jumper with correct mode. if you have only one disk on a IDE chain, make sure it is master device.

  • Author

GK20 thanks for all the guidance.

 

MemCheck ran 5 passes 0 errors.

 

I will check all cables and try one more rebuild with existing hardware.

If it fails then I will bring old MB/CPU back online - but I am suspect of this old MB/CPU because I was having the share disappearing problem I linked above. Basically server would work for awhile and then die - need a reboot and then be ok for awhile - no one was able to help me - hence the new MB/CPU combo.

 

Again thank you - I will keep posting till I get it all resolved.

  • Author

I tried to remount the old disk 5

Now UnRAID is looking for a 2TB drive or bigger :) so no going back

AND the old disk 5 - a 500 gig PATA makes some not normal noises on boot up - so I do believe it is failing.

 

Ok rebuild started again. Keeping my fingers crossed.

  • Author

FYI

 

Total size: 1,953,514,552 KB

Current position: 754,020 (0.0%)

Estimated speed: 13,674 KB/sec

Estimated finish: 2379.8 minutes

  • Author

Is there a way to watch the sys log in real time?

No I do not plan to watch it for 24hrs but wanted to spot check it off and on.

  • Author

Rebuild  seems to be running consistent

 

Total size: 1,953,514,552 KB

Current position: 65,257,836 (3.3%)

Estimated speed: 19,129 KB/sec

Estimated finish: 1644.9 minutes

Is there a way to watch the sys log in real time?

No I do not plan to watch it for 24hrs but wanted to spot check it off and on.

 

Easiest way is from a telnet session or from the console type

tail -f /var/log/syslog

 

That will have the tail utility watch the file and print any new additions to the screen real time.  To stop monitoring just press CTRL-C

  • Author

New log attached.

 

Speed seems to have dropped

 

Total size: 1,953,514,552 KB

Current position: 381,260,264 (19.5%)

Estimated speed: 1,612 KB/sec

Estimated finish: 16250.6 minutes

 

AND I see error messages on drive 6

 

 

Model / Serial No. Temperature Size Free Reads Writes Errors

parity WDC_WD20EADS-00S_WD-WCAVY0864163 35°C 1,953,514,552 - 1,073,873 64 0

disk1 WDC_WD5000AAKS-2_WD-WCAS82451651 32°C 488,386,552 44,390,284 1,120,078 5 0

disk2 ST3500641AS_3PM1YPT7 35°C 488,386,552 73,378,604 1,123,186 7 0

disk3 WDC_WD20EADS-00S_WD-WCAVY0878118 36°C 1,953,514,552 8,128,452 1,083,450 5 0

disk4 ST3500641AS_3PM0LCW6 36°C 488,386,552 122,091,392 1,090,824 8 0

disk5 WDC_WD20EARS-00M_WD-WCAZA0967156 23°C 1,953,514,552 1,933,671,552 45 1,721,426 0

 

disk6 ST3500841A_3PM079GN 27°C 488,386,552 245,981,228 955,140 5 29,777

 

disk7 ST3200822A_3LJ3BJ39 * 195,360,952 2,922,704 394,103 5 0

disk8 ST3200822A_3LJ3A3Z8 * 195,360,952 4,284,900 388,041 5 0

disk9 Maxtor_6Y200P0_Y65GPWPE * 199,148,512 56,608,624 435,458 5 0

disk10 ST3200822A_4LJ22G0B * 195,360,952 56,558,920 393,822 5 0

msgs1.txt

 

 

Model / Serial No. Temperature Size Free Reads Writes Errors

disk5 WDC_WD20EARS-00M_WD-WCAZA0967156 23°C 1,953,514,552 1,933,671,552 45 1,721,426 0

disk6 ST3500841A_3PM079GN 27°C 488,386,552 245,981,228 955,140 5 29,777

 

The log you attached doesn't show any error. those errors look like read error from disk6, without disk6 participating in data rebuild error free you can not rebuild disk5. I think you can stop this data rebuild now and try to resolve disk6 issue first. check if the mode (primary/secondary) setting is correct and maybe replace IDE cable. or use different IDE port.

 

meanwhile do a smart check on disk6 to find out if reallocated sector count has increased or not.

 

  • Author

I already changed the IDE cable - just in case.

Shall try different IDE port.

 

Whats the easiest way to do a SMART test on an unraid machine?

 

Ok lets say disk 6 is crap - so now I have crapped out disk 5 and disk 6

 

Max I loose disk 5 (which was not much data - nothing important) and disk 6 which has some stuff I would really like to save.

 

What are my options?

  • Author

So this time it has failed at 19% and disk 6 shows more errors than before. This looks like its going from bad to worse.

 

Attached current log.

 

Will attempt change port / rebuild.

msg2.txt

  • Author

Error on console:

 

reiserfs abort (device md5): Journal write error in flush_commit_list

reiserfs abort (device md6): Journal write error in flush_commit_list

 

 

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.