Drives dropping out of array into UD (split from Preclear Results)


TODDLT

Recommended Posts

9 hours ago, johnnie.black said:

Flash is sda, it's mounted under /boot

4 hours ago, TODDLT said:

should have seen that, knew it was sda..... Thanks!

/boot is always flash, but it is not necessarily sda. The disk "letters" are assigned during boot and might even change between boots. Unraid associates the drive "numbers" (slots) with the disk serial number, and that is usually all that matters. I think maybe it looks at the volume name UNRAID on flash to decide to mount it as /boot.

 

On my server sda is an Unassigned 500GB SSD and flash (/boot) is sdb.

 

 

Link to comment
  • Replies 141
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

3 hours ago, trurl said:

/boot is always flash, but it is not necessarily sda. The disk "letters" are assigned during boot and might even change between boots. Unraid associates the drive "numbers" (slots) with the disk serial number, and that is usually all that matters. I think maybe it looks at the volume name UNRAID on flash to decide to mount it as /boot.

 

On my server sda is an Unassigned 500GB SSD and flash (/boot) is sdb.

 

Mine has always prioritized the USB drive as sda ahead of all the SATA ports.  It's always shown up as SDA.  Best of my memory anyway.  

Link to comment

I actually have 2 Unraid flash drives plugged into my system.   One is labelled ‘UNRAID’ for when I am running Unraid as the host and the other ‘UNRAID-VM’ for running Unraid in a VM for testing purposes.   They are different brands so that I can tell the Bios which one to boot.   I have found that it seems to be a bit random which one come up as ‘/dev/sda’ and which one as ‘/dev/sdb’ but I have found that the Unraid host always selects the correct one to mount as /boot as it is specifically looking for one labelled as ‘UNRAID’.

Link to comment

OK Chaos Resumes this morning so here is what we learned:

Scratch DVR off the list, nothing was recording. 

2 pre-clears running last night triggered the same set of trouble this morning.

Its Definitely me making the GUI request that is triggering the errors and screwing up how everything is being reported.  They syslog shows the errors starting at that time index after 9AM.

I ran the df -h and nothing looks any different than it did yesterday  Both reports are attached

I copy pasted the syslog and attached the full version and then cut out everything before 4 AM (for ease).

 

So whereto now? 

What is going on with my pre-clear that is clogging up the machine at 4:40?

I have to step out for a couple hours but will avoid re-starting unraid till i return in case someone wants me to pull other info (which is limited).

 

 

df -h 2-28.JPG

df -h 9 AM.JPG

syslog 4 AM .docx

syslog full.docx

syslog full.txt

syslog 4 AM .txt

Link to comment
21 hours ago, johnnie.black said:

Not really.

 

You could try for example uninstalling preclear plugin and running the extended SMART tests again to see if there's any difference.

I can try this tonight and see what happens if we think this is the next logical step.

 

thoughts?

 

Link to comment
1 hour ago, johnnie.black said:

That's what I would do, since I suspect the plugin is causing the errors, but it's just a guess and I could be completely wrong, easy to test though.

Will do.

 

Not sure why no one else would have reporting this trouble if it's the pre-clear plug-in itself.  I guess that's the next question.

 

I'll run it tonight.

 

It's also worth mentioning, in this case, pre-clear hasn't failed out yet.  It appears (though I can't be 100%) it is still running).   The drive light is on consistently and that only reflects drives plugged into the MB (these two area).  

 

Also for whatever reason I have now had some 90+ emails reporting the pre-read completed and/or the zeroing started.  Every few minutes they get sent.

Edited by TODDLT
Link to comment

How about throwing this into the mix.  I reported that I had been noticing a lot of consistent CPU activity even when idle.

 

I removed pre-clear, and the CPU is pretty silent.  I didn't happen to look just ahead.  I can't say for certain the two are connected, but it's the first time I've seen the CPU this idel in a while.

Link to comment

OK Scratch pre-clear from the list.  With preclear removed, the same trouble occurred.  There is a much shorter set of syslog data here without pre-clear repeating the diskinfo command constantly.  The single 4:40 diskspace check error is the ONLY one this time.  The next sign of trouble is when I initiated the GUI page request at 8:39.  Nothing has been edited out of the below between the four hour period.  if you go above 4:40 it's a bandwidth test at 3 AM.   and above that you see where I mounted the drives to start the extended smart test.  The full syslog is attached if you need it copy / paste method.

 

Mar 3 04:40:38 TODD-Svr vnstatd[4545]: Error: Free diskspace check failed, unable to write database, continuing with cached data.

Mar 3 04:55:25 TODD-Svr kernel: mdcmd (84): spindown 4

Mar 3 04:56:33 TODD-Svr kernel: mdcmd (85): spindown 1

Mar 3 04:56:54 TODD-Svr kernel: mdcmd (86): spindown 2

Mar 3 04:57:11 TODD-Svr kernel: mdcmd (87): spindown 5

Mar 3 06:00:01 TODD-Svr speedtest: Internet bandwidth test started

Mar 3 06:00:21 TODD-Svr speedtest: Host: Farmers Telephone Cooperative, Inc. (Sumter, SC) [128.50 km]

Mar 3 06:00:21 TODD-Svr speedtest: Ping (Lowest): 56.52 ms | Download (Max): 186.02 Mbit/s | Upload (Max): 44.09 Mbit/s

Mar 3 06:00:21 TODD-Svr speedtest: Internet bandwidth test completed Mar 3 08:39:03 TODD-Svr root: error: /webGui/include/DeviceList.php: uninitialized csrf_token

Mar 3 08:39:03 TODD-Svr root: error: /webGui/include/DeviceList.php: uninitialized csrf_token

Mar 3 08:39:03 TODD-Svr root: error: /webGui/include/DeviceList.php: uninitialized csrf_token

 

So the only other thing I know to try is take UD off, but I'm not sure how to run the smart test without it.   I could put pre-clear back on, and run that through the night tonight without UD installed.   

 

Thoughts?

syslog.txt

Link to comment
56 minutes ago, johnnie.black said:

No need to mount for SMART tests.

I think I had to mount it in UD to execute a smart test. I couldn't access the drive page without it.   I'm assuming that is a UD issue then and will remove UD to try the smart test tonight.

Link to comment

OK, now we can scratch UD from the list too.  I ran the smart test from the terminal, but also didn't realize you could still have done it from the main screen with UD uninstalled. (I've had UD installed for a while now).  

 

Anyway, same results this morning with UD uninstalled, and Pre-clear uninstalled.  Same single disk write error at 4:40, same failure after attempting a request for the GUI.  Unlike in the past where the page was very slow to respond and populate and then I would see drives bounce back and forth between assigned slots and UD, this time the page was immediate and totally un-populated.  No drives shown at all (screenshot attached).  No access to shares, however Plex still works and plays.  

 

Mar 4 03:00:24 TODD-Svr speedtest: Internet bandwidth test completed

Mar 4 04:41:31 TODD-Svr vnstatd[4551]: Error: Free diskspace check failed, unable to write database, continuing with cached data.

Mar 4 04:56:35 TODD-Svr kernel: mdcmd (60): spindown 1

Mar 4 04:56:53 TODD-Svr kernel: mdcmd (61): spindown 2

Mar 4 04:57:09 TODD-Svr kernel: mdcmd (62): spindown 5

Mar 4 06:00:01 TODD-Svr speedtest: Internet bandwidth test started

Mar 4 06:00:22 TODD-Svr speedtest: Host: Sprint (Fairfax, SC) [116.27 km]

Mar 4 06:00:22 TODD-Svr speedtest: Ping (Lowest): 53.417 ms | Download (Max): 278.95 Mbit/s | Upload (Max): 41.12 Mbit/s

Mar 4 06:00:22 TODD-Svr speedtest: Internet bandwidth test completed

Mar 4 07:37:33 TODD-Svr root: error: /webGui/include/DeviceList.php: wrong csrf_token

Mar 4 07:37:33 TODD-Svr root: error: /webGui/include/DeviceList.php: wrong csrf_token

Mar 4 07:37:33 TODD-Svr root: error: /webGui/include/DeviceList.php: wrong csrf_token

Mar 4 07:37:33 TODD-Svr root: error: /webGui/include/Notify.php: wrong csrf_token

 

I'm back to completely stumped and looking for ideas.  

 

Main.JPG

Edited by TODDLT
Link to comment

OK will do, Tuesday night.  Slow process only being able to truly test once a day at 4:40 in the morning.

 

I thought I was going to be able to pull diagnostics.  It actually started the process but timed out and wouldn't even start on a 2nd attempt.  The server did respond to a "reboot" command in the GUI this time.  I didnt have to use the power button to re-boot.

 

 

Link to comment

After reading this, I think your problems are hardware related. The rc.diskinfo invocation means a disk got added, changed or removed. After the diskinfo service update the unassigned disk information, it writes a file located at the RAM rootfs. This means you got faulty or full memory.

Link to comment
1 hour ago, gfjardim said:

After reading this, I think your problems are hardware related. The rc.diskinfo invocation means a disk got added, changed or removed. After the diskinfo service update the unassigned disk information, it writes a file located at the RAM rootfs. This means you got faulty or full memory.

Did you see the screenshot attached to this post?

https://forums.unraid.net/topic/78318-drives-dropping-out-of-array-into-ud-split-from-preclear-results/?do=findComment&comment=726573

 

I missed it before but it does show rootfs going from 10% full to 100% full.  The question is what about running a pre-clear or an extended Smart scan is causing that file to fill up?  

 

One thing I can try is to run a smartscan and try to see how fast it actually is filling up.  All I can tell right now I am initiating one of those processes before going to sleep and sometime before 4:40 in the morning (which is when unRAID is trying to write to that file) the file is getting full and the errors commence.

Link to comment
8 minutes ago, TODDLT said:

Did you see the screenshot attached to this post?

https://forums.unraid.net/topic/78318-drives-dropping-out-of-array-into-ud-split-from-preclear-results/?do=findComment&comment=726573

 

I missed it before but it does show rootfs going from 10% full to 100% full.  The question is what about running a pre-clear or an extended Smart scan is causing that file to fill up?  

 

One thing I can try is to run a smartscan and try to see how fast it actually is filling up.  All I can tell right now I am initiating one of those processes before going to sleep and sometime before 4:40 in the morning (which is when unRAID is trying to write to that file) the file is getting full and the errors commence.

Are you using any process to backup data around this time?

Link to comment
18 minutes ago, gfjardim said:

Are you using any process to backup data around this time?

ahhhh...  RClone runs once a day to backup a handful of folders to an external hard drive and is executed via User Scripts.  Yes it probably runs sometime at night, and that drive was disconnected during the pre-clears.  I haven't seen any errors due to RClone so it never hit my radar as a cause.  

 

Do you think this would fill up the temp file location?  Is it just because the drive is disconnected or should I suspend the backup during any preclear/extended smart test?

 

If you think so, I will suspend the backup script and try again tonight.

 

 

Link to comment
53 minutes ago, TODDLT said:

ahhhh...  RClone runs once a day to backup a handful of folders to an external hard drive and is executed via User Scripts.  Yes it probably runs sometime at night, and that drive was disconnected during the pre-clears.  I haven't seen any errors due to RClone so it never hit my radar as a cause.  

 

Do you think this would fill up the temp file location?  Is it just because the drive is disconnected or should I suspend the backup during any preclear/extended smart test?

 

If you think so, I will suspend the backup script and try again tonight.

 

 

If your backup drive connection is dropping during the backup, the rclone process can write files directly to your RAM. This would explain why diskinfo service got invoked and why your RAM is filling up.

Link to comment
31 minutes ago, gfjardim said:

If your backup drive connection is dropping during the backup, the rclone process can write files directly to your RAM. This would explain why diskinfo service got invoked and why your RAM is filling up.

Well that sounds promising to be source of the problem then.  I have disabled RClone for now and will run a test of this soon (tonight or tomorrow night).  

 

Is there anyway to set up RClone so it would not to attempt to write a backup to Memory in the event of a failed drive?  

Edited by TODDLT
Link to comment
Well that sounds promising to be source of the problem then.  I have disabled RClone for now and will run a test of this soon (tonight or tomorrow night).  
 
Is there anyway to set up RClone so it would not to attempt to write a backup to Memory in the event of a failed drive?  
If it's the source of the problem, we can figure a way to control that.
Link to comment
41 minutes ago, gfjardim said:

If it's the source of the problem, we can figure a way to control that.

I think my plan for testings is this:

 

1. Tonight I'll leave RClone engaged, no backup drive present, but not have either pre-clear or the smartest running.  If RClone is the problem, I should still have an issue with the GUI page request tomorrow and should see Rootfs being full.  The reason for this step is my partial belief that this condition has already taken place just by the happenstance of all the recent testing scenarios.  It wasn't an official test and don't recall getting errors, I just can't say for 100%.  

 

2.  Assuming I DO get an error tomorrow morning when accessing the GUI, and see the Memory , then tomorrow night I will run the smart test with RClone turned off.  If that has no error that I think we have an definitive answer.

 

Any thoughts?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.