Red X Next To Disk


Compass

Recommended Posts

Hi All,

 

Well it's finally happened, after many years of trouble free service.....the dreaded red X.

 

I've been playing around with my server over the last few days re-ordering disks etc as I just upgraded to V6.2.2 and have bought 3 new 6TB drives to add to the array, only 1 has been precleared so far, but not added to the array, but now my Disk 2 has the X(device is disabled, contents emulated) On the dashboard page it's saying it's faulty.

 

I am planning on adding 2 of the new 6TB as parity....my current parity is 4TB.

 

I can see the data(movies) on disk2 from my windows PC as both disk and user shares. However in one of disk 2's disk shares there is a folder called hsperfdata_abc    not sure what this is, never seen it before, my user settings are secure, I'm the only user.

 

Ive checked and re-checked all the connections...all are good. I have attached sys log.

 

I don't have enough room on my other array drives to move the data over to them...so am wondering what my options might be?

 

1: Move the data over my network to my windows pc

 

2: Add the already precleared 6TB as my cache drive(not sure if I can do that as it's bigger than my parity, but technically not in the array)....move the data to that and hold it there untill I can replace the failed drive.

 

3: Wait for you smart people to tell me what to do....

 

Thanks In Advance

 

 

 

 

 

 

tower-syslog-20161031-0656.zip

Link to comment

SMART for disk2 looks fine, server was rebooted so no info on the syslog of what happened, you have two choices:

 

Rebuild disk2 using the same disk (in this case probably a good idea to check/replace all cables and running an extended SMART test before rebuilding).

 

Do a parity swap, use a new 6TB for parity and old parity to rebuild disk2.

 

Ive tried running disk 2 off the motherboard and the SASLP, so with different cables, and ended up with the same result....currently running extended SMART and see what that says and will post.

 

Does the Parity Swap procedure work on V6.2.2?

 

 

Link to comment

It does, you can try to rebuild using the same disk or the parity swap, whichever you prefer.

 

I did the parity swap, but now disk 1 is down....see diagnostics...haven't shut the server down yet.....bugger

 

The disk in question is WDC_WD20EARS-00MVWB0_WD-WMAZA0523138-20161101-1916 during the parity swap/check it came back millions of errors

 

This is the error message on the Disk1 page    scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46

tower-diagnostics-20161101-1922.zip

cropped.jpg.b669bd97e1b935a2ef655eb8d699fcf8.jpg

Link to comment

Parity copy completed successfully but disk1 dropped out a couple of hours after the rebuild of disk2 started, so disk2 needs to be rebuilt again.

 

Looks like disk1 timed out, eventually making the SASLP crash, so I would start by powering down, reseating that controller, check cables to disk1, power up and post new diags so we can see SMART for disk1 and decide best way to proceed.

 

Keep old disk2 intact, it may be needed.

Link to comment

Parity copy completed successfully but disk1 dropped out a couple of hours after the rebuild of disk2 started, so disk2 needs to be rebuilt again.

 

Looks like disk1 timed out, eventually making the SASLP crash, so I would start by powering down, reseating that controller, check cables to disk1, power up and post new diags so we can see SMART for disk1 and decide best way to proceed.

 

Keep old disk2 intact, it may be needed.

 

See attached....thanks again for your help

tower-diagnostics-20161101-2036.zip

cropped_2.jpg.4b97b9123a92d4abf948848228623dff.jpg

Link to comment

That file system corruption is expected, since disk2 wasn't completely rebuilt and disk1 is being emulated using a corrupt disk2, the main problem is that disk1 has pending sectors so it may be impossible to rebuild disk2, to confirm do an extended SMART test on disk1 and post the results.

 

Ok...started the extended on disk 1

Link to comment

That file system corruption is expected, since disk2 wasn't completely rebuilt and disk1 is being emulated using a corrupt disk2, the main problem is that disk1 has pending sectors so it may be impossible to rebuild disk2, to confirm do an extended SMART test on disk1 and post the results.

 

Ok...started the extended on disk 1

 

See attached...failed

WDC_WD20EARS-00MVWB0_WD-WMAZA0523138-20161101-2151.txt

Link to comment

That’s what I was afraid of, it’s not going to be possible to rebuild disk2, fortunately the old disk2 seems to be OK, as long as you didn’t write anything to it after it was disable all data from that disk should be OK, for disk1 you can try two things, depending on how important and/or easy it is to replace its data:

 

-if any missing data is easily replaceable, do a new config with all the old disks except disk1 (include old parity and old disk2), use a spare in disk1's place, let parity sync then mount old disk1 in your cache slot (or using the unassigned devices plugin) and copy all data to the array, with some luck you should be able to copy most of it.

 

-if data from disk1 is very difficult to replace you can try this option first but only if there were no writes to disk2 after it was disable, do a new config with all the old disks except parity, use the new 6TB parity, before starting the array check “parity is already valid”, start array, stop array, unassign old disk1, assign a spare to its slot, start array to begin rebuild. If the rebuild is not completely successful you can still use the first option.

 

Link to comment

That’s what I was afraid of, it’s not going to be possible to rebuild disk2, fortunately the old disk2 seems to be OK, as long as you didn’t write anything to it after it was disable all data from that disk should be OK, for disk1 you can try two things, depending on how important and/or easy it is to replace its data:

 

-if any missing data is easily replaceable, do a new config with all the old disks except disk1 (include old parity and old disk2), use a spare in disk1's place, let parity sync then mount old disk1 in your cache slot (or using the unassigned devices plugin) and copy all data to the array, with some luck you should be able to copy most of it.

 

-if data from disk1 is very difficult to replace you can try this option first but only if there were no writes to disk2 after it was disable, do a new config with all the old disks except parity, use the new 6TB parity, before starting the array check “parity is already valid”, start array, stop array, unassign old disk1, assign a spare to its slot, start array to begin rebuild. If the rebuild is not completely successful you can still use the first option.

 

There have been no writes to any of the disks since these issues started.(Thats if you mean I've added stuff to the array)

 

Is it possible to assign disk 1 to the cache slot first to see if the data is recoverable? Before doing any of the above?

 

I will have to preclear the other 2 6TB drives before I do anything else which will take a few days.

 

Link to comment

Is it possible to assign disk 1 to the cache slot first to see if the data is recoverable? Before doing any of the above?

 

You can, but not knowing what option you're going to do you can have a UUID collision, also your current array has filesystem corruption so it can crash, I would suggest the following:

 

-take a screenshot of all assignments

-tools -> new config -> Retain array configuration: ->  select "all" -> check "Yes I want to do this" -> Apply

-on the main page change the assigned disk2 to the old disk WDC_WD20EARS-00S8B1_WD-WCAVY6506241, you'll probably need to reasign old disk1 also

-remaining assignments as they are, including new parity

-before starting the array check "parity is already valid"

-start array

 

All your data should come online, including disk1, you can check disk1 contents but don't try to copy anything from it, also don't write anything to the array.

 

If all looks good and you want to do option 2 you just need to stop the array, unassign old disk1, assign a spare, e.g., old parity disk, and start array to rebuild.

 

This way you don't need to preclear anymore disks for now and it will leave old disk1 intact in case you need to do option 1.

 

 

Link to comment

Is it possible to assign disk 1 to the cache slot first to see if the data is recoverable? Before doing any of the above?

 

You can, but not knowing what option you're going to do you can have a UUID collision, also your current array has filesystem corruption so it can crash, I would suggest the following:

 

-take a screenshot of all assignments

-tools -> new config -> Retain array configuration: ->  select "all" -> check "Yes I want to do this" -> Apply

-on the main page change the assigned disk2 to the old disk WDC_WD20EARS-00S8B1_WD-WCAVY6506241

-remaining assignments as they are, including new parity

-before starting the array check "parity is already valid"

-start array

 

All your data should come online, including disk1, you can check disk1 contents but don't try to copy anything from it, also don't write anything to the array.

 

If all looks good and you want to do option 2 you just need to stop the array, unassign old disk1, assign a spare, e.g., old parity disk, and start array to rebuild.

 

This way you don't need to preclear anymore disks for now and it will leave old disk1 intact in case you need to do option 1.

 

Thanks...already 11hrs into preclearing one of the other 6TB drives....will let that finish and the proceed with the above.

 

Out of interest I used the Unassigned devices plugin and mounted the old disk2 and all the data looks ok. Again haven't written too or moved any files on the array whilst all this is happening.

Link to comment

Is it possible to assign disk 1 to the cache slot first to see if the data is recoverable? Before doing any of the above?

 

You can, but not knowing what option you're going to do you can have a UUID collision, also your current array has filesystem corruption so it can crash, I would suggest the following:

 

-take a screenshot of all assignments

-tools -> new config -> Retain array configuration: ->  select "all" -> check "Yes I want to do this" -> Apply

-on the main page change the assigned disk2 to the old disk WDC_WD20EARS-00S8B1_WD-WCAVY6506241

-remaining assignments as they are, including new parity

-before starting the array check "parity is already valid"

-start array

 

All your data should come online, including disk1, you can check disk1 contents but don't try to copy anything from it, also don't write anything to the array.

 

If all looks good and you want to do option 2 you just need to stop the array, unassign old disk1, assign a spare, e.g., old parity disk, and start array to rebuild.

 

This way you don't need to preclear anymore disks for now and it will leave old disk1 intact in case you need to do option 1.

 

Thanks...already 11hrs into preclearing one of the other 6TB drives....will let that finish and the proceed with the above.

 

Out of interest I used the Unassigned devices plugin and mounted the old disk2 and all the data looks ok. Again haven't written too or moved any files on the array whilst all this is happening.

 

Ok...Ive done the following

 

-printed a screenshot of all assignments

-stopped the array

-tools -> new config -> Retain array configuration: ->  select "all" -> check "Yes I want to do this" -> Apply

-on the main page change the assigned disk2 to the old disk WDC_WD20EARS-00S8B1_WD-WCAVY6506241

-remaining assignments as they are, including new parity

-before starting the array check "parity is already valid"

 

however there is still a message next to the parity disk that "all data on this disk will be erased when array is started"

 

Do I still go ahead and start the array? will it start a parity sync straight away?

Erase_Data.jpg.5bb91b6b3abf236c2a4c86f0cd162f6e.jpg

Link to comment

however there is still a message next to the parity disk that "all data on this disk will be erased when array is started"

 

That is normal, but if "parity is already valid" is checked it will not be rebuilt, you can start the array.

 

Yep cool....50% into the rebuild...I did option 2...after checking the contents of Disk 1, it all looked good, I stopped the array, re-assigned Disk 1 with my old parity disk to rebuild onto, restarted the array and it started rebuilding but Disk 1 has an orange triangle next to it(with 'device contents emulated' when hovered over) but it seems to be being written to and there are no errors being recorded...fingers crossed the orange triangle disappears after it's finished?

 

The old Disk 2 seems to fine now too....weird...time will tell.

Link to comment

...restarted the array and it started rebuilding but Disk 1 has an orange triangle next to it(with 'device contents emulated' when hovered over) but it seems to be being written to and there are no errors being recorded...fingers crossed the orange triangle disappears after it's finished?

 

That is normal during the rebuild, it will change to green once it finishes.

Link to comment

...restarted the array and it started rebuilding but Disk 1 has an orange triangle next to it(with 'device contents emulated' when hovered over) but it seems to be being written to and there are no errors being recorded...fingers crossed the orange triangle disappears after it's finished?

 

That is normal during the rebuild, it will change to green once it finishes.

 

It turned to green and seems to be working however I'm getting this REISERFS error(see screenshot attachment)

Have attached diagnostics too

 

Also is there a 'global' security setting? I've got all my USER Shares as Secure and my Disk Shares as Public but can't add anything to either type of share?

Reiserfs_Error.jpg.175c211c56e31c10dea9b2cdb5d1d52e.jpg

tower-diagnostics-20161105-1913.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.