Drives dropping out of array into UD (split from Preclear Results)


TODDLT

Recommended Posts

  • Replies 141
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

HERE is the syslog and where the trouble started.  I was able to view it in the syslog utility and copy paste it out.

 

I know what the trouble is now, but no idea how to fix it.

 

At 4:40 AM you will see warnings start that read:  Feb 28 04:40:41 TODD-Svr rc.diskinfo[8979]: PHP Warning: file_put_contents(): Only 4096 of 9058 bytes written, possibly out of free disk space in /etc/rc.d/rc.diskinfo on line 266

 

At 4:40 AM Plex started recording a DVR show.  

 

By 7:30:27, you start getting errors:

Feb 28 07:30:07 TODD-Svr rc.diskinfo[8979]: PHP Warning: file_put_contents(): Only 4096 of 17739 bytes written, possibly out of free disk space in /etc/rc.d/rc.diskinfo on line 266

Feb 28 07:30:23 TODD-Svr rc.diskinfo[8979]: PHP Warning: file_put_contents(): Only 0 of 17739 bytes written, possibly out of free disk space in /etc/rc.d/rc.diskinfo on line 266

Feb 28 07:30:27 TODD-Svr root: error: /plugins/unassigned.devices/UnassignedDevices.php: uninitialized csrf_token Feb 28 07:30:28 TODD-Svr root: error: /plugins/preclear.disk/Preclear.php: uninitialized csrf_token

Feb 28 07:30:28 TODD-Svr root: error: /plugins/unassigned.devices/UnassignedDevices.php: uninitialized csrf_token Feb 28 07:30:29 TODD-Svr root: error: /webGui/include/DeviceList.php: wrong csrf_token

Feb 28 07:30:29 TODD-Svr root: error: /webGui/include/DeviceList.php: wrong csrf_token

 

So it appears that the Plex DVR and preclear and a full smart test all are trying to use the same piece of real estate on a disk?  Can anyone confirm if the warnings that started at 4:40 are only due to the pre-clear operations and could also occur during an extended self test?  Is there some other unRAID operation interrupted?  I have never had an error during a DVR recording on Plex.  After Friday morning I have a long enough window with no DVR activity to try a pre-clear and see if it can finish.  However it seems there should be some setting that can be changed to fix this either in Plex or in unRAID.

 

Plex transcodes using the cache drive, but I'm not sure if that means the DVR uses the cache drive or not.  However my cache drive is 250 GB and I only use 25 so there is PLENTY of room to be shared.  However if this file location is on the flash drive, there is no way it would have enough space.  I don't think Plex writes anything to the flash drive, nothing was ever set up that way and Plex's system would seemingly not ever do it.

 

There was a 4 AM recording yesterday too and I woke up to the same bunch of chaos.  I'm feeling pretty confident this is the trouble, I just need to figure out how to fix it.

 

 

 

 

 

Syslog - from 3 AM.txt

Edited by TODDLT
Link to comment
14 minutes ago, TODDLT said:

So it appears that the Plex DVR and preclear and a full smart test all are trying to use the same piece of real estate on a disk?

preclear and smart test don't write anything to any of your assigned disks.

 

15 minutes ago, TODDLT said:

Plex transcodes using the cache drive, but I'm not sure if that means the DVR uses the cache drive or not.

Whether or not DVR does transcoding is in your plex settings.

Link to comment
10 minutes ago, trurl said:

preclear and smart test don't write anything to any of your assigned disks.

 

Whether or not DVR does transcoding is in your plex settings.

The "converting" is turned off for certain (I thought that setting had been removed and was experimental but apparently it came back). 

Perhaps it's where the DVR cache's a recording before it's finished?  I know shows don't show up in their permanent locations until the the recording is finished, so it's being "cached" somewhere.

Edited by TODDLT
Link to comment
38 minutes ago, TODDLT said:

I know shows don't show up in their permanent locations until the the recording is finished, so it's being "cached" somewhere.

Your incomplete plex recordings are stored in a hidden (.grab) subfolder of the same folder you have it configured to store the completed recording. It doesn't show up in your plex library until it is complete.

Link to comment
2 minutes ago, trurl said:

Your incomplete plex recordings are stored in a hidden (.grab) subfolder of the same folder you have it configured to store the completed recording. It doesn't show up in your plex library until it is complete.

Any thoughts on my observations from this morning then?  The syslog?  The out of disk space warnings that coincide with the start of the DVR?

 

My next recording is 9 PM tonight.  If I dont have time to run a test before then I can look at the syslog right after a recording starts and no pre-clear running to see if there are any warnings showing up.  I've never had any issues during a recording before so in my mind it appears to be the combination of the two events happening simultaneously.

 

Any thoughts on how to resolve?  Do you think I should put something in the PlexPass forum and point them to look at this?  That is not always the most responsive crowd (as you may know).

 

Link to comment

Since one of the symptoms you have been having is disks becoming "unassigned" it could be that is the cause of the "possibly out of free disk space".

 

I think I would try to isolate the problem before bothering plex with it. Don't run plex and preclear at the same time, for example. Nobody else is reporting your problems. I think plex is an especially unlikely cause for the particular symptom of disks becoming unassigned.

 

And of course, in case you may have missed the discussion about preclear on the numerous threads. Preclearing a disk isn't required.

 

Unraid only "needs" a clear (all zero) disk when you ADD it to a NEW data slot in an array that already has valid parity. This is so parity will remain valid, since all zeros has no impact on parity.

 

Older versions of Unraid would take the array offline when it "needed" (ONLY when adding to a new slot) to clear a disk.

 

To avoid this offline array, preclear was invented. This would clear a disk before ("pre") adding it. Additional features were put into preclear to allow further testing of the disk, and to "burn-in" a new disk to help get it past "infant mortality".

 

Recent versions of Unraid will clear a disk without taking the array offline, so the original purpose of preclear isn't needed. People still use it for testing and burn-in. There are other methods of testing a new disk, including disk diagnostics from the manufacturers available for free download.

Link to comment
1 hour ago, trurl said:

Since one of the symptoms you have been having is disks becoming "unassigned" it could be that is the cause of the "possibly out of free disk space".

 

I think I would try to isolate the problem before bothering plex with it. Don't run plex and preclear at the same time, for example. Nobody else is reporting your problems. I think plex is an especially unlikely cause for the particular symptom of disks becoming unassigned.

 

Recent versions of Unraid will clear a disk without taking the array offline, so the original purpose of preclear isn't needed. People still use it for testing and burn-in. There are other methods of testing a new disk, including disk diagnostics from the manufacturers available for free download.

I have continued to pre-clear for the burn in period / test before trusting in my array.  I would like to resolve the conflict (assuming it's with Plex or otherwise).  

 

I must have been going crosseyed with numbers this morning or not enough coffee yet... or who knows, but the DVR started at 4 AM.  The first warning showed up at 4:40 AM.

 

My thinking it might be an indirect cause/effect relationship.  IE The combination of Pre-clear (or the full smart test which is also scanning the full surface of the disk), AND the Plex DVR running at the same time maybe conflicting, maybe filling up some drive or temp drive location causing file write errors.  The temp drive / temp file being full, is what is causing the array to go haywire and not keep the drives mounted.  Obviously I don't know this.   As to no one else reporting this error, how many people do a pre-clear and a DVR recording at the same time.  I'm sure it's happened but possibly not a regular occurrence.

 

Removing what "I think" and just staying with what I know....  Consider this timeline.  (I did eventually find email notifications of pre-clear failures contrary to what I reported above)

 

11:34 AM - preclear started

9:20 PM - preread finished / zeroing started

4:00 AM - Plex DVR starts

Warnings of file not being completely written start showing up and continue through the syslog.

  • 04:40:41 TODD-Svr rc.diskinfo[8979]: PHP Warning: file_put_contents(): Only 4096 of 9058 bytes written, possibly out of free disk space in /etc/rc.d/rc.diskinfo on line 266

5:00 AM - Plex DVR finishes

7:09 AM - emails show up first drive shows zeroing completing and then a 2nd email with the same time stamp zeroing failing

7:10 AM - identical scenario occurs with the 2nd drive.  Two emails, first succeed, then fail.

7:30 AM the warnings turn to errors.

  • 07:30:23 TODD-Svr rc.diskinfo[8979]: PHP Warning: file_put_contents(): Only 0 of 17739 bytes written, possibly out of free disk space in /etc/rc.d/rc.diskinfo on line 266
  • 07:30:27 TODD-Svr root: error: /plugins/unassigned.devices/UnassignedDevices.php: uninitialized csrf_token
  • Feb 28 07:30:28 TODD-Svr root: error: /plugins/preclear.disk/Preclear.php: uninitialized csrf_token

I think coincidentally I first went to look at the status very near 7:30.  It is possible it was AT 7:30 and my attempt to open the main page may have been the trigger for the first error.  I can't say this for certain but the timing is close.

 

Please let me know if that timeline tells you anything or points you in a direction to look for trouble.  Also there are a large number of warnings / errors in the syslogs above.  They aren't all the same.  Can you scan and see if anything else is telling for this issue?

 

Tonight:

- I'll initiate a preclear, and then a few minutes later start a DVR recording at random.  I'll keep on eye on the syslog for a few hours.

This weekend:

- I can run a preclear with enough time to not conflict with a DVR recording and see if it finishes.

 

Edited by TODDLT
Link to comment
22 minutes ago, TODDLT said:

The first warning showed up at 4:40 AM

Unless you have changed it the default "Daily" cron runs at 4:40am. So anything you have set to run Daily will start at that time, and that is the default Mover schedule.

 

27 minutes ago, TODDLT said:

The combination of Pre-clear (or the full smart test which is also scanning the full surface of the disk), ...maybe filling up some drive or temp drive location causing file write errors. 

As I said before, 

2 hours ago, trurl said:

preclear and smart test don't write anything to any of your assigned disks.

In fact, they don't write anything to any disks, except possibly writing the results to flash. There isn't any "temp" location except /tmp, which is in RAM and doesn't have much capacity. If any of your applications, including Plex, have a temporary storage location, it would be set within the application. Unraid doesn't give "temp" disk space out to anything.

 

Link to comment
56 minutes ago, trurl said:

Unless you have changed it the default "Daily" cron runs at 4:40am. So anything you have set to run Daily will start at that time, and that is the default Mover schedule.

 

In fact, they don't write anything to any disks, except possibly writing the results to flash. There isn't any "temp" location except /tmp, which is in RAM and doesn't have much capacity. If any of your applications, including Plex, have a temporary storage location, it would be set within the application. Unraid doesn't give "temp" disk space out to anything.

 

I don't think I have any custom daily tasks.      

Scanning the syslog at the write errors starting at 4:40....

 

What is rc.diskinfo?  /etc/rc.d/rc.diskinfo     is there any way to tell where is it being written to?

What is nginx?  /var/lib/nginx/client_body/0000020717     That is the second failed write, same question?

Are these both unRAID functions and writing to the flash?   The flash is the "full location" correct?

 

I see the outgoing emails at 7:10 for zeroing finishing then failing.   Theory - it completed the zeroing task (1st email), then in trying to write progress to the preclear log (located on the flash), the disk was full and that triggered the fail (2nd email)?

 

The next "new" thing in the syslog are the errors right at 7:30.  This is when I first tried to check the system.  The 4th error says "Webgui" so this is likely when I made the page request, correct?   What causes the below token errors?  So my page request at the time of the full disk/file possibly is what sent everything into a tailspin? 

Feb 28 07:30:27 TODD-Svr root: error: /plugins/unassigned.devices/UnassignedDevices.php: uninitialized csrf_token
Feb 28 07:30:28 TODD-Svr root: error: /plugins/preclear.disk/Preclear.php: uninitialized csrf_token
Feb 28 07:30:28 TODD-Svr root: error: /plugins/unassigned.devices/UnassignedDevices.php: uninitialized csrf_token
Feb 28 07:30:29 TODD-Svr root: error: /webGui/include/DeviceList.php: wrong csrf_token

 

It still leaves the core question, what is causing the "disk full" problem (on the flash?) when rc.diskinfo and nginx is trying to write.

I've been using the DVR for a year or two, near positive this is not the first series of events when it was running during the 4:40 time slot. 

I can set a recording from 4:00 to 5:00 AM tomorrow and see if something is triggered without a preclear running.  I seriously doubt this is the case, but it will positively rule out the conflict

I'll start a pre-clear Friday at lunch and I have nothing else recording till Sunday at 10 AM so the preclear should be able to finish.  Will go through 2 - 4:40 AM cycles, one during preclear (early saturday morning) and likely one after it's all finished (early sunday morning).  

 

If both of those tests pass successfully then aren't we now at a logical conclusion that is something is causing the flash to clog up by both Pre-clear and the DVR running concurrently when the 4:40 activities try and write to the flash?

 

 

Thanks for your continued help.

 

Link to comment
10 minutes ago, trurl said:

Current diagnostics might give some idea to these questions.

This post from yesterday (link below) has a system diagnostics after reboot, after failure.  As I found this morning, once the disk is full, I can't pull a diagnostics.  If it's helpful I'll pull another this evening.

 

https://forums.unraid.net/topic/78318-drives-dropping-out-of-array-into-ud-split-from-preclear-results/?do=findComment&comment=725715

Edited by TODDLT
Link to comment

I was wondering if you aren't somehow filling up the OS filesystems, which are in RAM, and that is where things are getting broken. I don't see it in those diagnostics though, and maybe would only see it if you could capture them when things are broken.

 

The thing I am looking at in diagnostics you can get from the command line with this:

df -h

 

 

Link to comment
35 minutes ago, trurl said:

I was wondering if you aren't somehow filling up the OS filesystems, which are in RAM, and that is where things are getting broken. I don't see it in those diagnostics though, and maybe would only see it if you could capture them when things are broken.

 

The thing I am looking at in diagnostics you can get from the command line with this:


df -h

 

 

OK, thanks.  

 

I do have 16 GB RAM and its entirely underutilized.  I imagine the OS's use of RAM is capped?  

I'll see if there is some way to check the flash file system for size as well next time.  It might have to be done from the terminal as well.

 

Tonight I will have the DVR running with no preclear through the 4:40 hour and I'll see what if anything shows up in diagnostics.  Tomorrow it will be the other way around.  Preclear running no DVR.  If neither of these cause a failure.  I do both and force it to fail again so we can see what we can collect from that information.

 

Thanks again for all the attention to troubleshoot this one with me!

Link to comment

Actually, I don't see flash.  Am I missing something?  It's a 2 GB flash drive.

 

Also, I do see the /var locations referenced where files were not able to be written and they are very small folders.   I do not see /etc which was one of the two.

Link to comment

Last night's test was having the DVR running but no pre-clear when the 4:40 operations ran.

 

This morning there are no errors in the syslog, in fact nothing at all showing up at 4:40.  The only thing that showed in yesterday's log was the actual errors so assuming the 4:40 operations (rc.diskinfo and ngnix) normally don't show at all.

 

What I completely didn't think about is my parity check runs on the 1st and it is still going now.  Would a parity check stand down those normal 4:40 operations (rc.diskinfo and ngnix)?  If so then there actually wasn't a test at all.  I'm not sure what those two functions are other than they were the first "disk/full" errors in yesterday's syslog.

 

Edited by TODDLT
Link to comment
12 hours ago, TODDLT said:

Also, I do see the /var locations referenced where files were not able to be written and they are very small folders.   I do not see /etc which was one of the two.

/var/log is where syslog is. It will sometimes fill if you are logging a lot of errors.

/etc and lots of other linux OS folders are in rootfs, /

 

All of the things above /dev/sda1 in your screenshot is in RAM.

Link to comment
1 hour ago, johnnie.black said:

I believe this is used by preclear.

OK, that is interesting.   Did you happen to read post above, specifically the time line laid out? https://forums.unraid.net/topic/78318-drives-dropping-out-of-array-into-ud-split-from-preclear-results/?do=findComment&comment=726026 

 

The first "disk full" error shows up at 4:40:41, and it is a file write attempted by rc.diskinfo.  It was thought that "daily tasks" occur at 4:40.  Any thought as to why this would run right at that time as part of Pre-clear?  

Link to comment
1 minute ago, johnnie.black said:

Not really.

 

You could try for example uninstalling preclear plugin and running the extended SMART tests again to see if there's any difference.

this afternoon I'm starting a pre-clear that will not be encumbered by the DVR running in the middle of the night.  This will rule out (or in) the DVR service having anything to do with this.  

 

If everything goes crazy again, I'll have more sys log info and the drive status again shown above after the error but before a reboot (hopefully).

 

I'll try a smart test with pre-clear removed as the next step.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.