Drives dropping out of array into UD (split from Preclear Results)


TODDLT

Recommended Posts

Below is a lot of history of this issue.  Summary of current status can be found Here:  https://forums.unraid.net/topic/78318-drives-dropping-out-of-array-into-ud-split-from-preclear-results/?do=findComment&comment=725770

 

History Below:

 

This was my first ever pre-clear fail.  I started two drives pre-clearing at once last night.  Toshiba N300's.

 

This morning I have a failed email.  I can't get the full log to open from the main page in unRAID, but have attached the preview snapshot and the 3 reports I could find on the server.  It looks like No Space Left on Device error.

 

I am assuming this is a dead drive but wanted to make sure it's not a cable issue or something deserving a re-try.  The cable seats look good.

 

Also what is the "no such file or directory" error below?

 

Thanks all/anyone :)

 

image.png.d697025752bd0822bbaa2977cf09d938.png

 

988YK07LFAXG.resume

3602201014017.sreport

988GK045FAXG.resume

Edited by TODDLT
Link to comment
  • Replies 141
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

9 minutes ago, TODDLT said:

It looks like No Space Left on Device error.

9 minutes ago, TODDLT said:

Also what is the "no such file or directory" error below?

I am guessing these are the same cause and you have filled up /tmp

 

Post diagnostics which might clear this up as well as let us take a look at SMART report for that disk if it is responding.

Link to comment
4 hours ago, trurl said:

I am guessing these are the same cause and you have filled up /tmp

 

Post diagnostics which might clear this up as well as let us take a look at SMART report for that disk if it is responding.

The pre-clear ran for 4 or 5 hours at least before I went to sleep and it failed sometime after that.  I also know the drive was actually spinning at the time.  If it lost communication with the drive it was while in flight. 

 

I'll post this evening if I can get the disk to respond.  I'll have to wait for the 1st drive to complete one cycle (of 2) before I kill everything to check connections if you think that might be the issue but I believe it will have finished by the time I get home.

 

thanks.

Link to comment
11 hours ago, trurl said:

I am guessing these are the same cause and you have filled up /tmp

 

Post diagnostics which might clear this up as well as let us take a look at SMART report for that disk if it is responding.

I think it's not able to read or see the drive now.  It looks like it did at one time but when I click on "start short self test" it just blinks at me and does nothing.  No self tests have been logged, and capabilities says "Cannot read capabilities"   There is a green dot on the drive in unassigned devices.

 

My other drive is only 50% done with the post read so likely will be between midnight and 2 AM before it's done.  I'll either reboot tonight or in the AM and see if it finds the drive. 

Link to comment
23 hours ago, trurl said:

I am guessing these are the same cause and you have filled up /tmp

 

Post diagnostics which might clear this up as well as let us take a look at SMART report for that disk if it is responding.

OK Life got a little strange this morning, please let me know if you have any ideas.

 

Last night after the Good drive completed a successful preclear I stopped the 2nd cycle, shut down the server, and checked cables.  Restarted and everything looked normal.  One drive showed a pre-cleared status and the other not.  However, I could not go into the drives via unassigned devices and run a smart test.  The same result.. It blinks but does nothing.   

 

So I started the 2nd cycle on one, and restarted a clean 1st cycle on the other.   Went to bed.

 

This morning, I again have an email message saying the pre-read failed.  This time, only 15 minutes into the pre-read.   Here is where it goes bizarre.

 

I go to pull up the main page on the server and it is very slow to respond.  Eventually after a couple tries I get the array to show up, BUT:

- Unassigned devices section doesn't want to resolve.  When it finally does respond, it shows both pre-clears still running.  the "good drive" is only running at half speed (93MB/sec).  the supposed bad drive is running 190 MB/sec (despite getting a failed out email message).   

- Some of my array drives are showing up in Unassigned Devices not up in the array.

- My Cache drives are showing up in unassgined devices and blinking in and out of the cache drive section right in front of my eyes.

- My boot device is showing up in unassigned devices, not in the boot device.

- The server is not accessible via windows but the array shows online in the window.

 

A few notes about my configuration that may play in here.

- All of my array spinners are connected via LSI controller cards.  

- The cache drives are both direct to the mother board.   The two eSata ports being used for the preclears are connected to MB sata ports.   -- Two array SSD's are connected to the MB ports and those show up steady in the array, not in unassigned devices.

 

I stopped the array ---- offline all the devices show in their proper place

 

I started the array - everything comes back to normal and only the correct devices are in unassigned devices.  However, the shares are still not accessible.  Both pre-clears still running.

 

Then LSI connected drives start appearing in unasssigned devices again. 

 

I stop both pre-clears and it looks like to ALL goes back to normal.  My shares are now visible again.  Drives are now in the proper place again.  Nothing is "moving" around the window.

 

I will throw into the mix that I have pre-cleared two drives at once before with this exact hardware configuration when I bumped up to 6 TB parity 4-6 months ago.  There have been updates to both unassigned devices and preclear plugins since that time.

 

Then I get a red error from "Fix Common Problems"  I am used to the SSD warning, but new items now show up.  one regarding the /tmp file.   Please see the attached image.  I have now removed the ca.cleanup plugin (and have never actually used it before).   What do you make of the error message because that seems related to the original comment you made about the tmp file being full.

 

1949008898_fixproblems.thumb.JPG.1b7bac4a4c6986d749c2ab377c7be9d6.JPG

 

 

 

 

 

 

 

 

 

 

 

Edited by TODDLT
Link to comment

I just tried to pull a diagnostics file prior to reboot and the array started blinking between "started" and "undefined"  The diagnostics is not showing anything.  I closed the diagnostics window and now ALL my drives are in unassigned devices.  I'm just going to reboot and assuming this is the tmp file being full?

Link to comment
5 minutes ago, trurl said:

Some of these problems might suggest a problem with flash. Put your flash in your PC and let it checkdisk. Are you using a USB2 port for flash?

 

After reboot post diagnostics.

shut down.  Pulled flash and ran chkdsk in Windows.  no errors found. 

This is a my original flash and has some age on it.  I have a replacement I bough but never bothered to swap over.  If you think this is an issue I can make the swap.

Link to comment
45 minutes ago, trurl said:

You have 2 Toshiba disks that aren't giving a SMART report:

 

988GK045FAXG

988YK07LFAXG

 

You can try to enable SMART reporting on each with this command, substitute the correct letter for X:

 


smartctl -s on /dev/sdX 

OK attached are the two smart reports.  Thanks I didn't know how to get this working.  Those are the new drives.

 

The one that failed is K045FAXG

 

TOSHIBA_HDWN160_988YK07LFAXG-diagnostics-20190221 (sdg).txt

TOSHIBA_HDWN160_988GK045FAXG-diagnostics-20190221 (sdh).txt

Edited by TODDLT
Link to comment
2 minutes ago, trurl said:

Those look OK. You might try again after checking connections. Make sure you check power connections all the way back to the PSU.

Do you think the tmp file filling up was due to a bad connection to the drive?

 

Power is via a split connector going to both drives.  It seems unlikely that one of the two would fail only though I will look for loose connections.

I'll verify SATA's where the plug into the MB and try again.

Link to comment
6 minutes ago, mathomas3 said:

I havent looked into your reports... but I did have something like this at one point in my setup... 

 

It ended up being my power supply was too small... had to upgrade to a 1200watt...

hmmm.. I have 14 HDD's connected and 4 SSD's.  2 of the connected HDD's are "spares" and spun down.  My hardware configuration is current (see signature).  Yes, that's a lot but I have a 750W Corsair PSU.  I've had this many connected before and had no issues.  It starts up fine meaning 14 HDD's spin up together.  Do you think this is a PSU issue with one failing and one not?

Link to comment

I just started the preclear on the bad drive only.  I have to leave in about 10 minutes but will see how this goes and check back an hour or so later.  If it's still working, I'll start the 2nd preclear and see if that changes anything.  (power drain)

 

When I first started to engage preclear from unassigned devices, none of the drive info was populated at the startup window.  I did it a 2nd time and all the drive info appeared.   

Link to comment
2 hours ago, TODDLT said:

hmmm.. I have 14 HDD's connected and 4 SSD's.  2 of the connected HDD's are "spares" and spun down.  My hardware configuration is current (see signature).  Yes, that's a lot but I have a 750W Corsair PSU.  I've had this many connected before and had no issues.  It starts up fine meaning 14 HDD's spin up together.  Do you think this is a PSU issue with one failing and one not?

I was having some strange errors like you are having... and I would expect that until you will continue to have more of these strange errors to continue till you upgrade it

Link to comment

OK, 2 hours of preclear and no issues on the drive that previously reported bad twice.

Spun up the whole array, and no change.

Started the preclear on the other drive that passed one cycle and it's running now.

 

the only Oddity is the "good" drive is running at half speed.  I it's 30% through the pre-read (2nd cycle) and I think I'll restart it and see what happens.  On the 1st cycle it ran over 200 MB / Sec at the front end and averaged 180 MB / Sec overall.  30% into preclear and it's running 95MB/Sec.

Link to comment

OK, restarted the preclear of the original "good" drive and the speed is back up to normal.  I even did a full array spin up with both pre-clears running and things keep moving along.   I'm going to leave it alone for the afternoon and we'll see what happens.

 

Thanks all for the input, I'll drop a note when I get home this evening/night.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.