Unclean shutdown issues


Recommended Posts

Helllo again,

 

first thank you community for helping me so quickly with my previous issues! It is much appreciated.  Unfortunately I need to rely on you all again as I stumbled on this issue with my server last night. My Unraid server was unreachable, I connected a monitor to it and took the picture you see. I had to reboot the server twice. and then drive sdf had about 700 errors on it. I ran a smart test and it passed. I did an xfs repair on it and I am now re-adding it to the array. I have attached the photo and my diagnostics zip file. I am very curious as to what caused the issue on the server. Also I am concerned about the health of drive sdf.  Any help would be much appreciated. 

Thank you 

IMG_20190108_215104.thumb.jpg.a7e9989e3aa4c8baf96ac3f4197da027.jpg

unraid2-diagnostics-20190109-0744.zip

Link to comment
17 minutes ago, ssjucrono said:

drive sdf had about 700 errors on it. I ran a smart test and it passed.

This part is lacking some details. What errors did you see and where did you see them?

 

19 minutes ago, ssjucrono said:

I did an xfs repair on it and I am now re-adding it to the array.

And this part is very troubling. I don't understand the part about re-adding it to the array. You never should have removed it. You should always do xfs_repair on an array disk with the disk still in the array. How exactly did you remove it? What exactly did you do to repair it. How are you re-adding it? All of this is wrong and could result in loss of data.

 

I haven't looked at your diagnostics yet. I will wait on further explanation so I will have a better idea how to interpret them.

 

Link to comment

The errors I saw was on the Main tab of unraid where it lists the read, writes, and errors.

(Current screenshot, won't show error number as I am re-enabling the disk)

image.png.125706b100df4dea4e206a68224049ed.png

 

Also, I did do an xfs repair while the disk was still in the array. 

This disk was disabled after rebooting the server. 

I ran a smart test

did the xfs repair in the array

then because the drive was disabled, I re-enabled the drive following this process:

https://wiki.unraid.net/Troubleshooting#Re-enable_the_drive

 

Please let me know if you have any other questions. 

Link to comment

OK. So instead of re-adding you really meant you were rebuilding it. The word Add has a specific meaning when working with Unraid disks. Sometimes it is necessary to get some clarity when we are trying to figure out what you already know about your situation but we don't.

 

The Reads, Writes, and Errors columns reset when you reboot, so that is why Errors are zero, not because you are rebuilding. Those columns are actually good indicators of the rebuild. Errors should always be zero, but when rebuilding, the disk being rebuilt will get a lot of writes, and all the other disks will get a lot of reads. Just as you see in your screenshot.

 

Since you referred to that disk as "sdf" I still have another question. "drive sdf" is not really a good way to identify a disk, since that letter assignment can change between boots or even disconnects. Unraid keeps track of disk assignments by the serial number of each disk. You should always refer to a disk in the array or cache by their disk assignment in the array or cache. Parity, disk1, disk2, ..., cache, cache2... maybe even parity2.

 

Did you use the webUI to run the disk repair? Or did you do it from the command line? The reason I ask is because doing it from the command line is a way you could have specified that disk letter "sdf", and if you did it that way then you didn't really do the repair in the array. If you did it in the webUI then it would have used the correct designation, which in this case is "md4".

 

I took a brief glance at your syslog, but since it restarts after a reboot, no way to know what happened before.

 

The disk you are rebuilding doesn't have any SMART report in those diagnostics you posted so I can't tell anything about its health. I assume you decided to reuse it because it passed SMART, and that is probably OK.  If you click on that disk and go to the Attributes section, does it display the SMART attributes? If so, go to the Self-Test section and click the Download button and attach the result to your next post.

 

 

Link to comment
1 minute ago, ssjucrono said:

I am using the wrong terms I guess.

Completely understandable. I have opened a discussion with the developers regarding whether or not that letter designation should even appear on that page. It can occasionally be useful for troubleshooting but can also be confusing to have it displayed so prominently. We had another user panicked into action recently because those letters had changed.

 

That disk looks fine. And you are proceeding as necessary to get your array back to normal.

 

I wish I had more to give you on the real reason for your post. Does it happen often? If so you might try using the Troubleshooting Mode of Fix Common Problems plugin, which will periodically save syslog to flash so we can get it later in the event you can't get them before restarting. But you don't want to keep Troubleshooting Mode running all the time since it logs a lot more information and can cause your log space to fill up, thus causing another problem.

 

Link to comment

this is the first time it has happened. This is all new hardware to me that I got recently. I got the hardware used from a co-worker. That is a good tip about the troubleshoot mode.

I was un-raring a large number of files as well as downloading data to the array. I left it for sometime and came back later to that original photo I attached. 

 

Yes I am proceeding as normal to get the array back to normal.  

I also have another disk, disk 1 which has UDMA errors that has crept up slowly. Could you look at that?  I have attached the smart report. 

unraid2-smart-20190109-1127.zip

Link to comment
38 minutes ago, ssjucrono said:

I also have another disk, disk 1 which has UDMA errors that has crept up slowly. Could you look at that?  I have attached the smart report

CRC Errors are communication problems with the disk, not a problem with the disk itself. Typically this is caused by a bad connection, a bad cable, or a controller issue. You can acknowledge that warning on the dashboard and it will warn you again if it increases.

 

But then this...

7 minutes ago, ssjucrono said:

@trurl

Ok so this just happened now, During the parity check/Disk rebuild of disk 4. It cancelled and gave errors on Disk 4 and disabled the disk. I downloaded the smart report again for disk 4.  I also attached my diagnostics zip file. 

Usually diagnostics will include SMART for all disks so you don't have to get them separately. However, in this case, your diagnostics don't include SMART for that disk, and the SMART you attached has no information. So disk4 isn't communicating for whatever reason.

 

Shutdown, check ALL connections, power and SATA on ALL disks. Reboot and post another diagnostic.

Link to comment
11 minutes ago, ssjucrono said:

I don't have any personal experience with that but it looks like a good option if you don't have room in your case. A lot better than one with port multipliers or even worse USB. I'll have to keep that one in mind.

Link to comment

@trurl

Just to be clear the Disk 4 is disabled and is not spun up in the current array state. It will not let me run a smart test in this state. Last time when I rebooted unraid it remained disabled until I re-enabled it following the process I attached earlier. Just an FYI.

When I get home later tonight I will power down, check all cables (most likely swap them). Power on and attach another diagnostic zip.

 

Thank you again for your help thus far.  

Link to comment
12 minutes ago, ssjucrono said:

Just to be clear the Disk 4 is disabled and is not spun up in the current array state. It will not let me run a smart test in this state.

Disk dropped offline, rebooting should bring it back online, though it will remain disabled, change the SATA cable when you reboot and check power cable

  • Like 1
Link to comment
38 minutes ago, ssjucrono said:

@trurl

Just to be clear the Disk 4 is disabled and is not spun up in the current array state. It will not let me run a smart test in this state. Last time when I rebooted unraid it remained disabled until I re-enabled it following the process I attached earlier. Just an FYI.

When I get home later tonight I will power down, check all cables (most likely swap them). Power on and attach another diagnostic zip.

 

Thank you again for your help thus far.  

Not asking for a SMART test to be run. The SMART report shows the SMART attributes and results from previous tests. It should be available whether or not the disk is disabled unless it can't communicate with the disk for some reason. Diagnostics includes SMART reports for all disks it can get them from. We depend on the SMART reports to make recommendations when people get a disabled disk.

Link to comment

@trurl

I powered down the server.

I swapped all the SATA Cables with different SATA cables, checked the connection.

I powered on the server, I did not start the array yet.

I downloaded a diagnostics and attached it.

I have not started the array yet or done anything else. I am waiting on anymore instructions. 

Please let me know what else you may need.  

 

unraid2-diagnostics-20190110-0956.zip

Edited by ssjucrono
Link to comment

@trurl Server has been working well since I rebuilt the array.

I did shutdown and add a video card.

when I started the array I got a notification about hardware error: "Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged"

 

I have also attached the diagnostics zip

let me know if this is anything to be concerned about.

 

Thank you 

raid-diagnostics-20190116-0828.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.