Get errors on second disk while system rebuild drive.....need help

November 11, 201015 yr

Hi,

i need some help with my unraid system, this is a 4.5.6 Pro system with 12 drives.

Six weeks ago I received an error when I tried to copy a file from my windows machine to the Unraid volume.

I found out that one of my HD had a lot of Pending Sectors, row error read rate and all this beautiful stuff.

I decided to replace this drive and send it to my dealer for a warranty replacement.

From the past I know that this will take a few weeks and because I do not want to leave my data unprotected I bought a new HD,

insert this, rebuild the Volume and everything was fine for the last 4 weeks.

Yesterday I receive the replacement drive. Because I run out of space (sigh….) I decided to replace one of my 1TB drives (DRIVE 3) with this new 2TB drive.

(WD20EARS with jumper set).

I shutdown the unraid server, replaced the drive, boot and started the rebuild process, like dozens of times before.

The expected time for this was 860minutes, so I went to sleep.

This morning I check the status, first I saw only green lights but then I notice on DRIVE 4 more than 23.000 Errors =:-( .

Smart report for this drive told me

  1 Raw_Read_Error_Rate     0x000f   099   099   051    Pre-fail  Always       -       2669
  3 Spin_Up_Time            0x0007   075   075   011    Pre-fail  Always       -       8280
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1127
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       3
  7 Seek_Error_Rate         0x000f   100   100   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       12739
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       13238
10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       278
13 Read_Soft_Error_Rate    0x000e   099   099   000    Old_age   Always       -       2601
183 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
184 Unknown_Attribute       0x0033   100   100   099    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       2608
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   072   058   000    Old_age   Always       -       28 (Lifetime Min/Max 20/32)
194 Temperature_Celsius     0x0022   077   058   000    Old_age   Always       -       23 (Lifetime Min/Max 17/34)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       228353600
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3
197 Current_Pending_Sector  0x0012   087   087   000    Old_age   Always       -       531
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   006   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   072   072   000    Old_age   Always       -       2681

Now I telnet to my unraid and tried to copy files from DISK4 (the one with errors) to the new DISK3 and after a few seconds I receive an error. The logs told me:

Nov 11 04:40:01 nas syslogd 1.4.1: restart.
Nov 11 06:08:06 nas kernel: md: sync done. time=43177sec rate=45244K/sec
Nov 11 06:08:07 nas kernel: md: recovery thread sync completion status: 0
Nov 11 07:08:07 nas kernel: mdcmd (4578): spindown 0
Nov 11 07:08:08 nas kernel: mdcmd (4579): spindown 3
Nov 11 07:08:09 nas kernel: mdcmd (4580): spindown 9
Nov 11 07:08:09 nas kernel: mdcmd (4581): spindown 10
Nov 11 07:08:10 nas kernel: mdcmd (4582): spindown 11
Nov 11 07:58:40 nas kernel: mdcmd (4887): clear 
Nov 11 07:59:28 nas kernel: mdcmd (4899): spinup 3
Nov 11 07:59:28 nas kernel: 
Nov 11 07:59:56 nas kernel: mdcmd (4905): spinup 4
Nov 11 07:59:56 nas kernel: 
Nov 11 08:01:00 nas in.telnetd[5309]: connect from 192.168.1.105 (192.168.1.105)
Nov 11 08:01:07 nas login[5310]: ROOT LOGIN on `pts/0' from `192.168.1.105'
Nov 11 08:03:31 nas kernel: REISERFS error (device md3): reiserfs-2025 reiserfs_cache_bitmap_metadata: bitmap block 235438080 is corrupted: first bit must be 1
Nov 11 08:03:31 nas kernel: REISERFS (device md3): Remounting filesystem read-only
Nov 11 08:03:31 nas kernel: REISERFS warning (device md3): clm-6006 reiserfs_dirty_inode: writing inode 2003 on readonly FS
Nov 11 08:04:07 nas kernel: mdcmd (4942): clear 
Nov 11 08:04:16 nas kernel: mdcmd (4947): spinup 3

From our forum i found the hint to do a filesystem check...

Cd

samba stop

umount /dev/md3

reiserfsck --check /dev/md3

And receive a lot of

…
Trans replayed: mountid 99, transid 146490, desc 1051, len 6, commit 1058, next trans offset 1041
Trans replayed: mountid 99, transid 146491, desc 1059, len 24, commit 1084, next trans offset 1067
Trans replayed: mountid 99, transid 146492, desc 1085, len 21, commit 1107, next trans offset 1090
Trans replayed: mountid 99, transid 146493, desc 1108, len 23, commit 1132, next trans offset 1115
Trans replayed: mountid 99, transid 146494, desc 1133, len 23, commit 1157, next trans offset 1140
Trans replayed: mountid 99, transid 146495, desc 1158, len 22, commit 1181, next trans offset 1164
Trans replayed: mountid 99, transid 146496, desc 1182, len 24, commit 1207, next trans offset 1190
Trans replayed: mountid 99, transid 146497, desc 1208, len 10, commit 1219, next trans offset 1202
Replaying journal: Done.
Checking internal tree.. finished
Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
Checking Semantic tree:
finished
1 found corruptions can be fixed when running with --fix-fixable
###########
reiserfsck finished at Thu Nov 11 08:48:23 2010
###########

Sorry for this long description and my english, but know I’m sitting in front of my computer and ask myself what is the next step?

Can I trust the parity disk?

Can I trust the new rebuild DISK3? (The old 1T disk with all data lies next to me)

Can I trust the DISK4?

Any advice is welcome…unfortunately I'm a linux noob.

By

syslog-2010-11-11.txt

November 11, 201015 yr

Author

damned size limitation....

Logfiles part1

syslog.1.part1.zip

November 11, 201015 yr

Author

logfiles part2

syslog.1.part2.zip

November 11, 201015 yr

You solution is to run reiserfsck with the --fix-fixable option as specified in the output of the reiserfsck you ran.

cd

samba stop

umount /dev/md3

reiserfsck --fix-fixable /dev/md3

November 11, 201015 yr

Author

hmmmm...

i'm not sure that the rebuild process from DRIVE 3 has "really" rebuild my data.

Joe if i got read errors on Drive 4 while i rebuild DRIVE 3, will this not corrupt my data?

What is the best way to go back to the old HD?

Removing new 2T Drive 3 with old 1T Drive

Rebuild the parity disk ?

exchange the Drive4

Rebuild DRIVE4

Thanks!!

November 11, 201015 yr

hmmmm...

i'm not sure that the rebuild process from DRIVE 3 has "really" rebuild my data.

Joe if i got read errors on Drive 4 while i rebuild DRIVE 3, will this not corrupt my data?

Correct. It will corrupt your data..

What is the best way to go back to the old HD?

Removing new 2T Drive 3 with old 1T Drive

Rebuild the parity disk ?

exchange the Drive4

Rebuild DRIVE4

Thanks!!

If disk4 is giving errors, then rebuilding parity with it will result in the same corruption of parity. There is no simple solution to your errors.

November 11, 201015 yr

Author

Hi,

my question was not clear, sorry. I read some things in the FAQ. Can i do this:

Shutdown Unraid

Replace the new 2Tbyte with the old 1TByte one (DISK3)

Unplug the DRIVE4

Boot Server

Remove the DRIVE4 from configuration and check that the DRIVE3 is the old one

Login as root and run

initconfig

-> Have a working system with 11 Drives, lose everything that i have not copied from old DRIVE4

November 11, 201015 yr

Hi,

my question was not clear, sorry. I read some things in the FAQ. Can i do this:

Shutdown Unraid

Replace the new 2Tbyte with the old 1TByte one (DISK3)

Unplug the DRIVE4

Boot Server

Remove the DRIVE4 from configuration and check that the DRIVE3 is the old one

Login as root and run

initconfig

-> Have a working system with 11 Drives, lose everything that i have not copied from old DRIVE4

Yes you can do that. When you next start the array it will begin a complete new parity calculation. You'll be without any parity protection until it is complete. When it is done you should then do a full parity check by pressing the "Check" button.

Joe L.

Get errors on second disk while system rebuild drive.....need help

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)