PANIC: Did I lose my array? Can someone please help me?

May 2, 201412 yr

Author

IT WORKED!!!

OK, the reiserfsck process is completed and generated a huge output.

The actual output is linked here : https://www.dropbox.com/s/ol2pjw82vzv0hqy/reiserfsck-output.txt

when I browsed to /mnt/disk4, all I get is a folder called "lost+found" which has probably over a hundred folders with numbers on them (sectors?). The good news is that I browsed through each of those folders and managed to find all the missing movies and TV shows. I just created the /mnt/disk4/movies subfolder and copied them, and the /mnt/disk4/tvshows and copied the tvshows, and now when I browse my /mnt/user/movies share, I see all of my movies.

I'm still not sure whether this is for real, but it certainly looks like it. I am wondering whether I should do a parity check now. If so, should I do a non-correcting parity check? or should I go ahead and do a correcting parity check? I was thinking that maybe I need a non-correcting check so that I know whether any of my movie files are corrupted? I checked some of my movie files haphazardly and they all seem to play well right from the share without a problem.

Now going back, I'd like to re-enable my cache drive, and restart unraid in normal mode (with plugins), but I'd like to disable all the dynamix plugins and only keep Sabnzbd, Sickbeard, Couchpotato and Crashplan. But I will wait for a parity sync (according to your instructions) before I do that.

Either way, I really want to thank you for all the help you gave me on this. I don't know what I would have done without your help. Thank you so much.

Below is the final screenshot.

Quote

May 2, 201412 yr

When we rebuilt disk4, it used parity and all of the other disks to do the rebuild. So after the rebuild, parity would be perfect. (The rebuild is the technical equivalent of building parity, except we were building the disk instead of the parity disk. Think of all of the disks being a set - and if you pull out any one you can "build" the last one.)

When you ran the reisefsck, you ran it on the "md4" device and not the "sdd1" device. The mdX devices update parity. So parity would have been maintained.

So no parity inconsistency should exist. If you'd like to confirm, you could run a correcting or non-correcting check. Both should come back with no problems.

I've been considering why disk4 did not come up perfect - clearly there were some writes to a disk not captured by parity. I am not 100% certain, but mounting the disk3 in the cache slot might have done some updates in the housekeeping area. If I had it to do over I think I'd put disk3 back in the array rather than in the cache slot. But at the time I didn't want to do any harm as we had disk3 simulated there and I felt that some data would be saved from that if the real disk3 was toasted. So I wanted to leave the array alone while we tested it out.

But the result seems pretty good. I'm happy for you. Try to go back through the steps we did and understand the reasons why. It will help keep you out of trouble!

Hope you have a good weekend!

Quote

May 4, 201412 yr

Author

So I ran into a problem. I finished a correcting parity sync, and it sync'ed a bunch of inconsistencies. That was not the problem. The problem is that at the end of the sync, the errors column displays some errors (12) under the PARITY disk (/dev/sdd).

I looked at the syslog, which was relatively quiet, and but that syslog has the following errors under the parity disk. Does this mean that the parity disk is bad and that I need to replace it? This is worrisome since we just used the parity disk to rebuild parity for the disk4.

Can someone please tell me what this means?

May 3 21:19:24 Tower kernel: sd 1:0:0:0: [sdd] Unhandled sense code
May 3 21:19:24 Tower kernel: sd 1:0:0:0: [sdd]

May 3 21:19:24 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08

May 3 21:19:24 Tower kernel: sd 1:0:0:0: [sdd]

May 3 21:19:24 Tower kernel: Sense Key : 0x3 [current] [descriptor]

May 3 21:19:24 Tower kernel: Descriptor sense data with sense descriptors (in hex):

May 3 21:19:24 Tower kernel: 72 03 11 00 00 00 00 0c 00 0a 80 00 00 00 00 01

May 3 21:19:24 Tower kernel: 7e cb 86 80

May 3 21:19:24 Tower kernel: sd 1:0:0:0: [sdd]

May 3 21:19:24 Tower kernel: ASC=0x11 ASCQ=0x0

May 3 21:19:24 Tower kernel: sd 1:0:0:0: [sdd] CDB:

May 3 21:19:24 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 01 7e cb 86 70 00 00 00 18 00 00

May 3 21:19:24 Tower kernel: end_request: critical target error, dev sdd, sector 6422234736

May 3 21:19:24 Tower kernel: md: disk0 read error, sector=6422234672

May 3 21:19:24 Tower kernel: md: disk0 read error, sector=6422234680

May 3 21:19:24 Tower kernel: md: disk0 read error, sector=6422234688

May 3 21:26:24 Tower kernel: mdcmd (47): spindown 6

May 3 22:12:43 Tower kernel: sd 1:0:0:0: [sdd] Unhandled sense code

May 3 22:12:43 Tower kernel: sd 1:0:0:0: [sdd]

May 3 22:12:43 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08

May 3 22:12:43 Tower kernel: sd 1:0:0:0: [sdd]

May 3 22:12:43 Tower kernel: Sense Key : 0x3 [current] [descriptor]

May 3 22:12:43 Tower kernel: Descriptor sense data with sense descriptors (in hex):

May 3 22:12:43 Tower kernel: 72 03 11 00 00 00 00 0c 00 0a 80 00 00 00 00 01

May 3 22:12:43 Tower kernel: 9d f8 4f b8

May 3 22:12:43 Tower kernel: sd 1:0:0:0: [sdd]

May 3 22:12:43 Tower kernel: ASC=0x11 ASCQ=0x0

May 3 22:12:43 Tower kernel: sd 1:0:0:0: [sdd] CDB:

May 3 22:12:43 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 01 9d f8 4f a8 00 00 00 20 00 00

May 3 22:12:43 Tower kernel: end_request: critical target error, dev sdd, sector 6945263528

May 3 22:12:43 Tower kernel: md: disk0 read error, sector=6945263464

May 3 22:12:43 Tower kernel: md: disk0 read error, sector=6945263472

May 3 22:12:43 Tower kernel: md: disk0 read error, sector=6945263480

May 3 22:12:43 Tower kernel: md: disk0 read error, sector=6945263488

May 3 22:12:46 Tower kernel: sd 1:0:0:0: [sdd] Unhandled sense code

May 3 22:12:46 Tower kernel: sd 1:0:0:0: [sdd]

May 3 22:12:46 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08

May 3 22:12:46 Tower kernel: sd 1:0:0:0: [sdd]

May 3 22:12:46 Tower kernel: Sense Key : 0x3 [current] [descriptor]

May 3 22:12:46 Tower kernel: Descriptor sense data with sense descriptors (in hex):

May 3 22:12:46 Tower kernel: 72 03 11 00 00 00 00 0c 00 0a 80 00 00 00 00 01

May 3 22:12:46 Tower kernel: 9d f8 4f c8

May 3 22:12:46 Tower kernel: sd 1:0:0:0: [sdd]

May 3 22:12:46 Tower kernel: ASC=0x11 ASCQ=0x0

May 3 22:12:46 Tower kernel: sd 1:0:0:0: [sdd] CDB:

May 3 22:12:46 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 01 9d f8 4f c8 00 00 00 28 00 00

May 3 22:12:46 Tower kernel: end_request: critical target error, dev sdd, sector 6945263560

May 3 22:12:46 Tower kernel: md: disk0 read error, sector=6945263496

May 3 22:12:46 Tower kernel: md: disk0 read error, sector=6945263504

May 3 22:12:46 Tower kernel: md: disk0 read error, sector=6945263512

May 3 22:12:46 Tower kernel: md: disk0 read error, sector=6945263520

May 3 22:12:46 Tower kernel: md: disk0 read error, sector=6945263528

Quote

May 4, 201412 yr

Get a smart report on the parity disk. My guess is the cabling to that disk is loose or the cable is bad.

Any parity errors it found were false errors. Parity should have been perfect.

Good news is parity doesn't have to be good if the data is good.

Don't do any writing.

Quote

May 4, 201412 yr

Author

Here is the SMART report for the parity disk. Any thoughts?

I'm concerned that I have now gotten these "read" errors for more than one of my drives during the last few weeks. The only thing I can now think of is if this is a defective PSU. My PSU units is a Corsair HX750, but a few weeks ago I had a major mishap when I accidentally plugged in a molex connector in the reverse orientation and fried a bunch of my hard drives. I recovered the hard drives, but the PSU in use today is the same one I had back then. Could it be possible that a defective PSU power regulator is causing this problem?

root@Tower:~# smartctl -a /dev/sdd   
smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Desktop HDD.15
Device Model:     ST4000DM000-1F2168
Serial Number:    Z3006MBD
LU WWN Device Id: 5 000c50 04fbbb4f4
Firmware Version: CC51
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun May  4 00:42:17 2014 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
				was never started.
				Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249)	Self-test routine in progress...
				90% of test remaining.
Total time to complete Offline 
data collection: 		(  612) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				No Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 532) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x1085)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   099   006    Pre-fail  Always       -       97814225
  3 Spin_Up_Time            0x0003   092   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       255
  5 Reallocated_Sector_Ct   0x0033   098   098   010    Pre-fail  Always       -       2872
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail  Always       -       66650724
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       9195
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       113
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   078   078   000    Old_age   Always       -       22
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   083   083   000    Old_age   Always       -       17
190 Airflow_Temperature_Cel 0x0022   073   051   045    Old_age   Always       -       27 (Min/Max 18/32)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       99
193 Load_Cycle_Count        0x0032   040   040   000    Old_age   Always       -       121557
194 Temperature_Celsius     0x0022   027   049   000    Old_age   Always       -       27 (0 15 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       6463h+55m+08.836s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83740876584
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       267253903257

SMART Error Log Version: 1
ATA Error Count: 22 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 22 occurred at disk power-on lifetime: 9194 hours (383 days + 2 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00   4d+08:06:57.905  READ FPDMA QUEUED
  60 00 20 ff ff ff 4f 00   4d+08:06:35.849  READ FPDMA QUEUED
  60 00 28 ff ff ff 4f 00   4d+08:06:35.813  READ FPDMA QUEUED
  60 00 50 ff ff ff 4f 00   4d+08:06:35.809  READ FPDMA QUEUED
  60 00 90 ff ff ff 4f 00   4d+08:06:35.809  READ FPDMA QUEUED

Error 21 occurred at disk power-on lifetime: 9193 hours (383 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 28 ff ff ff 4f 00   4d+06:32:56.147  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   4d+06:32:56.073  READ LOG EXT
  60 00 88 ff ff ff 4f 00   4d+06:32:36.821  READ FPDMA QUEUED
  60 00 80 ff ff ff 4f 00   4d+06:32:36.820  READ FPDMA QUEUED
  60 00 78 ff ff ff 4f 00   4d+06:32:36.819  READ FPDMA QUEUED

Error 20 occurred at disk power-on lifetime: 9193 hours (383 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 88 ff ff ff 4f 00   4d+06:32:36.821  READ FPDMA QUEUED
  60 00 80 ff ff ff 4f 00   4d+06:32:36.820  READ FPDMA QUEUED
  60 00 78 ff ff ff 4f 00   4d+06:32:36.819  READ FPDMA QUEUED
  60 00 88 ff ff ff 4f 00   4d+06:32:36.819  READ FPDMA QUEUED
  60 00 80 ff ff ff 4f 00   4d+06:32:36.819  READ FPDMA QUEUED

Error 19 occurred at disk power-on lifetime: 9192 hours (383 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 20 ff ff ff 4f 00   4d+05:39:26.445  READ FPDMA QUEUED
  60 00 30 ff ff ff 4f 00   4d+05:39:26.435  READ FPDMA QUEUED
  60 00 58 ff ff ff 4f 00   4d+05:39:26.430  READ FPDMA QUEUED
  60 00 80 ff ff ff 4f 00   4d+05:39:26.430  READ FPDMA QUEUED
  60 00 70 ff ff ff 4f 00   4d+05:39:26.429  READ FPDMA QUEUED

Error 18 occurred at disk power-on lifetime: 9072 hours (378 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00   5d+05:35:29.470  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   5d+05:35:29.263  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   5d+05:35:29.236  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   5d+05:35:29.228  READ FPDMA QUEUED
  e5 00 00 00 00 00 00 00   5d+05:35:29.070  CHECK POWER MODE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 90%      9195         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Quote

May 4, 201412 yr

This disk is bad. 2872 reallocated sectors. RMA that drive.

Quote

PANIC: Did I lose my array? Can someone please help me?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)