[SOLVED] Weird Preclear behaviour


Recommended Posts

Hey all,

 

Yesterday I got my first 6TB drive. Previously, I never precleared. Bad thing, I know.

So i tried to start the preclear and at first everything seemed fine.

The prompt gave me the right describtion, size etc.

 

Afterwards, I went sleeping because it was late. Maybe I should have paid more attention because the next morning I noticed this:

 

Now, I was really afraid because apparently the drive that was undergoing was spun down while two array drives were spun up. I also noticed the read count was increasing.

Could anyone give me any advice on what my next steps should be as I'm afraid it would go on with writing to those two disks after the read step.

 

Oh, another "detail": the new drive is connected via eSATA in an external enclosure.

 

EDIT: To summarize the whole topic: it wasn't preclear, bt most probably cache dirs. cache dirs seems to behave badly and I really ned to find out what is happening (and why), but I need to do that later because I can't find the time right now, hence I'll mark this as solved.

Link to comment

Not quite sure what you think the problem is?

 

The drive that is being pre-cleared is not part of the unRAID array (yet at least) so will not affect normal unRAID operation.  As long as the telnet/console session that is controlling the pre_clear is still updating I would not worry.  The pre_clear of a single 6TB drive can take a long time - over  2 days on my system for a single cycle.  Looking at your screen shot of the current pre_clear progress I would suspect something of the same order of magnitude on your system.

Link to comment

the problem being that there apperently was NO activity on the drive that was supposed to be clreared, but instead on two array drives.

I think I should have made clear that all those reads (on the array) were NOT coming from me.

 

so either this is a display bug in dynamix or preclear actually was using the wrong disk.

Link to comment

I would suspect a dynamix bug.

 

The pre_clear screen shows both the device id it is using and the details of the disk it is using so there is no real chance of it working on the wrong disk.  In fact the pre_clear script has a check built in that stops it being used on a disk currently allocated to the array.

Link to comment

It is odd the disk is getting the device ID hda and not sdX.

 

I made changes to preclear to create and update a text file in the /tmp folder that myMain uses to show preclear status. If preclear is stopped in the middle that file will stop getting updated and will be displayed over and over until a reboot, the preclear is started up again,  or the user clicks on the message in myMain which deletes it. Several GUIs are using the tmp file to show progress like myMain does, but don't implement the click to delete function.

Link to comment

It is odd the disk is getting the device ID hda and not sdX.

That normally means that either the drive is an IDE drive or ACPI mode has not been enabled for SATA dives in the BIOS.  One wants ACPI mode enabled for SATA drives both to maximise performance, and to (potentially) enable hot-swap for drives.

Link to comment

i will check the bios settings regarding ACPI.

 

Regarding the temp file: The problem occured during the preclear, so it isn't associated to not deleting the file. Also, the server was rebooted right before the pre-clear. Actually, for this to be an UI related bug, the UI would have to somehow parse the "/dev/hda" label preclear is producing and map it to two different drives (sdd and sde) AND continously update both drives' (not at the same rate but with an nearly even distribution) read count.

 

So, is there any definite way for me to check which hdd is being used (as I don't trust this to just be an UI bug)?

Link to comment

sorry, AHCI ofc. Short trip to the BIOS and its sdf instead of hda now.

 

However, I still think this is NOT an UI thing:

 

It didn't show up directly after starting the preclear...

 

Any hints to prevent the data loss by preclear are much appreciated :)

 

Link to comment

sorry, AHCI ofc. Short trip to the BIOS and its sdf instead of hda now.

 

However, I still think this is NOT an UI thing:

 

It didn't show up directly after starting the preclear...

 

Any hints to prevent the data loss by preclear are much appreciated :)

I did some checking on my system and it appears that with the v6b6 GUI if you start a pre_clear then progress is shown in the GUI.  When it completes the result summary is also shown in the GUI.  As long as the drive concerned is not part of the array (which pre_clear should have checked) then there is nothing to worry about.

Link to comment

I'm sorry, I think I must have missed to explain things properly.

 

- I started the preclear on sdf. preclear correctly displayed all information and started.

 

- The next morning I checked the status and saw that not only preclear was running but also that the read count on TWO ARRAY DRIVES increased. those two drives weren't read by me (or any other client). I'm sure of this because no machine beside the one i used at that moment was running. Also, the active streams screen didn't show anything. Since I was able to still see the read count climb even further I am also sure that this wasn't just some access during the night thatI forgot about.

 

- I was able to see this happening twice. The first time I saw it in dynamix, the second time I also had unmenu installed and it displayed the same information (thats what you see in the screenshots). Hence, I can say this isn't a display bug.

 

- Of course the read count could be because of something different than preclear, but I don't want to risk waiting until preread is done and see the write count increase on my array drives. Hence, I was looking for a way to check what exactly is accessing the drive.

 

- Meanwhile, I did a little research myself and came up with lsof. But I don't quite understand how to read the result. I guess "lsof | grep /dev/sd" isn't enough since it isn't showing whats access via /mnt/user. So, is it enough to grep for those two?

Link to comment

I think you missed my point about the tmp file. It appears that preclear died and that file stopped getting updated.  Is it possible you ran preclear in a telnet session without using screen that first time? If so, the telnet session would be closed after a period of inactivity and kill the preclear in the process.

 

Also, your overnight I/O activity could have been the mover script.

Link to comment

@ itimpi: great, i'll try to check if it occurs again.

 

@ bjp999: I'm pretty sure I understood. It can't be the mover because I still saw the read count rising the next morning. I know for sure that I didn't add anything in the cache the night before, I rebooted before starting the prelear (so the count should have been nearly zero) and of course I checked manually checked if its running (both via checking the status in dynamix and checking the content on the cache drive). the only way this could be related to the mover is if the mover was still running while I was watching, but this was NOT the case.

I also don't see how this could be caused by the temp file. Maybe I'm really missing something here. I checked with both, unmenu and dynamix and both display the same, so both would need to have the same bug. but even if this was the case, preclear was still running and updateing the file. I stopped it manually AFTER checking everything else.

As I said, maybe I missed your point regarding the temp file, but the problem is not about the status not updateing but rather about TWO drives that shouldn't even be spinning reading data somehow.

 

btw thanks for the ongoing support. I'll try to solve this today and give feedback. I'm away from tomorrow so I hope I can leave preclear running and have the drive ready by next weekend... :)

Link to comment

sorry for double posting. I'm back with more information.

 

- It's definately not preclear. I startetd preclear in the morning, watched my drives during the day and didn't see anything happen. preclear went on and startetd to write without accessing the array.

 

- There deinately is something else going an. and from the read count it seems to be a lot (comparing with read count of me watching a movie). I tried to find out who is accessing the discs (via lsof) today and have two candidates:

a) "find" or b) smartctl.

since the read count is increasing by more than 1.000.000 i'd like to know what is causing it (and maybe prevent it).

I'm not sure why find is running and why it seems to be touching every single file (at least as far as I could observe within the timespan I watched via lsof).

To check if folder caching is involved I deactivated it, but to no avail...

 

Could anybody elaborate on what and why could cause this behaviour? I'll try to access the box remotely during the week to do further checks.

 

EDIT:

forgot to add that find is accessing via /mnt/user/disk*. seen in lsof like this:

find      20159      root  cwd      DIR        9,1          984      7555 /mnt/disk1/<my files>

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.