Jump to content

Disk rw errors after unclean shutdown


Max
Go to solution Solved by JorgeB,

Recommended Posts

Hey guys so for some unknown reason my server detected unclean shutdown this morning although it is connected to apc ups so this should  not have happened but it did and i don't know why it did and as usual after an unclean shutdown server started automatic parity check it was all running fine until just now for some reason disk 3 started throwing read and write errors and now my disk 3 is disabled and its contents are currently being emulated.
please suggest how should i proceed ?

unraid.local-diagnostics-20231129-2100.zip

Link to comment
40 minutes ago, JorgeB said:

It's not logged as a disk problem, and SMART looks OK, check/replace cables and if the emulated disk is mounting and contents look correct you can rebuild.

okay so after reseating the cables i have started rebuilding data but im not sure wheteher everything is going fine here, cause firstly the moment i clicked sync webgui became unresponsive for almost a minute then when it came back it got paused on its own. Secondly its all ready done 73issh percent.
im not sure whether this is how it works or i m doing something wrong here.

unraid.local-diagnostics-20231129-2208.zip

Link to comment
29 minutes ago, JorgeB said:

I recommend always rebuilding in normal mode, or at least check that the emulated disk is mounting before rebuilding.

okay once again i just removed the drive from pool and started the array normally although now its says disk3 missing but once again just like earlier disk3 contents are getting emulated and i can access any content that supposed to be there.
can i try rebuilding again cause im not too sure about earlier rebuild process i mean how could it do 71-72 percent rebuilding in a couple of minutes ??

Link to comment
1 minute ago, Max said:

okay once again i just removed the drive from pool and started the array normally although now its says disk3 missing but once again just like earlier disk3 contents are getting emulated and i can access any content that supposed to be there

This seams odd, I didn't see any error on the rebuild, can you post new diags?

 

2 minutes ago, Max said:

i mean how could it do 71-72 percent rebuilding in a couple of minutes ??

If you sure this happened the rebuild cannot be good, but I don't see how it happened without any errors logged, try rebuilding again in normal mode now.

 

Link to comment

Wait, I only noticed this now:

 

Nov 29 21:57:43 Unraid kernel: md: recovery thread: recon D3 ...
Nov 29 21:58:05 Unraid Parity Check Tuning: Send notification: Array operation restarted: Automatic Read-Check (73.1% completed) (type=normal link=/Settings/Scheduler)

 

Uninstall the parity check plugin for now, @itimpiany idea how the plugin could have resumed a rebuild that should be starting while logging that it's resuming a parity check?

 

 

Link to comment
7 minutes ago, JorgeB said:

Wait, I only noticed this now:

 

Nov 29 21:57:43 Unraid kernel: md: recovery thread: recon D3 ...
Nov 29 21:58:05 Unraid Parity Check Tuning: Send notification: Array operation restarted: Automatic Read-Check (73.1% completed) (type=normal link=/Settings/Scheduler)

 

Uninstall the parity check plugin for now, @itimpiany idea how the plugin could have resumed a rebuild that should be starting while logging that it's resuming a parity check?

 

 


the plugin simply tells Unraid to resume the operation that was in progress when the array was shut down.    It is up to Unraid to decide what type of operation that is.    The plugin will also only attempt the restart if it thinks the shutdown was a tidy shutdown.   Having said that I will see if there is anything I can spot.

Link to comment
2 minutes ago, itimpi said:

the plugin simply tells Unraid to resume the operation that was in progress when the array was shut down.

That's odd because before there were read checks, so the rebuild should start from the beginning:


 

Nov 29 21:42:23 Unraid kernel: md: recovery thread: check ...
Nov 29 21:42:34 Unraid kernel: mdcmd (37): nocheck PAUSE
Nov 29 21:42:34 Unraid kernel: md: recovery thread: exit status: -4
Nov 29 21:42:39 Unraid Parity Check Tuning: Send notification: Paused: No array operation in progress (36.5% completed) (type=warning link=/Settings/Scheduler)

Nov 29 21:50:47 Unraid kernel: md: recovery thread: check ...
Nov 29 21:51:00 Unraid kernel: mdcmd (37): nocheck PAUSE
Nov 29 21:51:00 Unraid kernel: md: recovery thread: exit status: -4
Nov 29 21:51:05 Unraid Parity Check Tuning: Send notification: Paused: No array operation in progress (36.6% completed) (type=warning link=/Settings/Scheduler)

Nov 29 21:55:14 Unraid Parity Check Tuning: Send notification: Array stopping: Restart will be attempted on next array start: Automatic Read-Check (36.6% completed) (type=normal link=/Settings/Scheduler)

 

Next one is the rebuild:


 

Nov 29 21:56:53 Unraid Parity Check Tuning: disk3: Changed
Nov 29 21:56:53 Unraid Parity Check Tuning: restart to be attempted

Nov 29 21:57:43 Unraid kernel: md: recovery thread: recon D3 ...
Nov 29 21:58:05 Unraid Parity Check Tuning: Send notification: Array operation restarted: Automatic Read-Check (73.1% completed) (type=normal link=/Settings/Scheduler)

 

 

 

 

Link to comment
8 minutes ago, JorgeB said:

As you mentioned the emulated disk is mounting, that's good news, and for certain the previous rebuild wasn't complete, try again, still recommend uninstalling the parity check tuning plugin for now, just in case it's related to the previous issue.

okay i have uninstalled parity check tuning plugin started rebuilding again, so far looks good, started from zero percent this percent time. gonna take much longer this time 😅

  • Like 1
Link to comment
20 hours ago, Max said:

okay i have uninstalled parity check tuning plugin started rebuilding again, so far looks good, started from zero percent this percent time. gonna take much longer this time 😅

phewww!!! finally rebuild is finished and fortunately without any errors or weird notifications this time and disk3 is back up and running in normal operation dont know what went wrong the first time.

BTW any guesses on what could have caused unclean shutdown in the beginning.

Link to comment
18 minutes ago, itimpi said:

Difficult to say.

 

Have you read this section of the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page to see if it might give a clue.

just gave it a read but thing is, it mainly talks about ungraceful shutdown after a power failure but there was no power failure.

Link to comment
16 minutes ago, Max said:

just gave it a read but thing is, it mainly talks about ungraceful shutdown after a power failure but there was no power failure.

 

It also spends some considerable time pointing out checks that you should make to make sure you do not get an unclean shutdown from a regular shutdown.   For instance you can always get unclean shutdowns if the various timeouts are too short for your system.

Link to comment
21 hours ago, itimpi said:

 

It also spends some considerable time pointing out checks that you should make to make sure you do not get an unclean shutdown from a regular shutdown.   For instance you can always get unclean shutdowns if the various timeouts are too short for your system.

yeah actually thats what i meant that it mainly talks about ungraceful shutdown due to too short of a timeout but honestly as this wasn't my first unclean shutdown i have manually tested it many times like deliberately pulled out the cord to see whether it stays on on ups or not or whether it shuts down properly according to set rules.

i have tried different plug points on the back of ups, changed the power cords as well, still haven't figured out whats the issue.

@JorgeB ahh looks we are back to square one, disk3 is again throwing read errors and its disabled and currently being emulated

 

unraid.local-diagnostics-20231202-1923.zip

Edited by Max
forgot to attach diagnostics
Link to comment
On 12/2/2023 at 7:51 PM, JorgeB said:

Looks more like a power/connection issue, check/replace cables and try again.

so far looks like it was bad sata power splitter as i did try again after reseating the cables (i tried reseating cable again cause i realized that last time i mistakenly reseated cables of completely different drive😅) and now while rebuilding data it started throwing errors on disk 2, disk 2 and 3 are the ones that are connected through sata power splitter, so finally decided to pull the plug on sata power splitters and bought myself Gigabyte P750GM which comes with 8 sata connectors.

i dont know its just my bad luck or what, my history with SATA cables and power splitters has been quite troublesome. I found myself frequently reseating or replacing these cables every 2-3 months. About 6 or 7 months ago, I invested in an LSI 9207-8i, and since then, I haven't encountered a single issue related to SATA link speed or any other connectivity problems. Here's hoping that this new power supply will have a similar positive impact on my SATA power issues.

Link to comment

@JorgeB okay something weird is happening again first off replacing psu somehow cleared the bios which resulted it to be back on optimized defaults, meaning no more proper iommu group sepration but i thought i will figure it later. so important stuff now, so last yesterday night when i checked it was going fine i think it was almost 80 percent done without any errors but this morning when logged on webgui it showed a notification saying data rebuild finished with 310912 errors !!! though the disk has returned to normal operations and data is there and accessible and logs also doesn't show any errors or warning.
and somehow all this changed my unraid servers name and screwed time, so time is off on the logs.

tower-diagnostics-20231204-0829.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...