username34793

May 26

I replaced the sata power cables for all drives with custom cables for my power supply and case, which should cover proper power distribution without overloading the cables/psu. I replaced both disabled drives one at a time. After both data rebuilds and a parity sync for a goof there were and still are no errors. I'll consider this problem solved and likely due to bad power distribution to the drives.

Thanks for the advise and insight.

May 13

Thank y'all for the feedback regarding this conundrum. I think I have a few possible ways forward regarding proper power distribution to the hdds.

Just to clarify, typically, how many drives in total can be powered by each PSU connection for peripherals/SATA taking into consideration all potential splitters and connectors already being used on the PSU supplied power cables? In my case my SATA power cables already have 4 connectors on them. My PSU has 5 connections for SATA and/or molex cables.

As far as the disabled drives are concerned, the best way to sort this out would be to verify the emulated content is correct and simply rebuild on top or in this case probably swap the drive with SMART errors. Is this correct?

I expect it would be most wise to handle one rebuild on top/swap at a time. Any thoughts on that?

May 13

27 minutes ago, JorgeB said:

Yep.

I appreciate your candor. Any thoughts or suggestions to accomplish proper powering of 13 hard drives using reliable extensions and splitters within the constraints of 5 available PSU power connections?

May 13

A picture of the sata power splitter I am using for the array drives is attached. Seems to be exactly what I should not have used...

At minimum I should reevaluate how I am powering my drives. I was under the impression molex to sata was a bad idea or something like that... Not to mention the lack of molex cables for my power supply.

Any recommendations for proper extensions/splitters to accommodate 12+ total hdds in the array while taking into consideration ssds and fans and everything else?

PSU is a Seasonic VERTEX GX-1200 with 5 peripheral - SATA/molex connections. I can make use of the PCIe power connections if that is possible and not ill-advised.

May 13

All drives are powered individually.

Specifically for my array drives I made sure to run them from the PSU cable to a 1 in 4 out sata power splitter adapter. So in the case of drives 1-4, the power is coming from 1 peripheral/sata port on the PSU to a sata power splitter cable. Drives 5-8 are powered from a separate PSU port to a separate sata power splitter... and so on.

So from my point of view, a power issue or connection issue at the sas expander should be affecting more than 2 drives.

A power/connection issue at the PSU level should affect 4 drives, same goes for at the sata splitter. Which leaves the obvious culprit of the individual sata power connectors for drive 3 and drive 6 respectively. Which of course could very well be the issue. Unfortunately I can only rely on visual and tactile inspection for these cables and the individual ports on the drives themselves. I don't have or know of a sata or similar power continuity testing device and I am not well versed in using a multi meter or similar tools... perhaps I should be...

The best I have is my eyes and hands for this one but this system has been deployed for a quite a while now undisturbed and in regular use. Of course now, I will methodically go through each connection as I prepare to swap out drive 6 and maybe drive 3.

May 13

I will re-verify the power and data connections but previous checks have confirmed everything is/was fine as far as my eyes can see.

I have an LSI 9206-16e with 1 connection feeding 2 sas parity drives and 2 connections feeding the inputs on an IBM ServeRAID 16-Port SAS-2 expander (46M0997 Firmware 634A). disks 1-4 are on the first sas expander output and disks 5-8 are on the second sas expander output. How likely is it for 2 drives on 2 different ports of the sas expander to exhibit read and write errors due to a power issue that is related to the sas expander, while the 6 other drives on said sas expander ports did not have any issues?

I'm definitely not trying to come off as rude. I apologize if it reads that way. To the best of my capabilities all connections are and were properly seated and all cabling perfectly intact as far as my eyes are capable of seeing. It seems like at least one other drive should have had some kind of error if there was a bad connection.

I appreciate the assistance and input thus far. I will check everything again and go from there. I do have reservations when it comes to rebuilding onto a drive with uncorrected errors so I will probably at minimum replace one drive once I can re-confirm all connections are good to go.

Thank you so far!

May 13

Apologies,

Array started, new diagnostics attached.

Thank you.

unraid-diagnostics-20240513-0641.zip

syslog excerpt.txt

May 13

Diagnostics attached.

Further information:

During the network transfer, reconstruct write was enabled. Data was only being written to either disk 7 or disk 8, obviously the other disks were being used to calculate parity and read errors occurred on disk 3 and/or 6. Either way disks 3 and 6 are now disabled but only disk 6 has SMART errors (that are concerning).

disk 4 and cache 2 have some UDMA CRC error count due to a loose connection.

unraid-diagnostics-20240513-0536.zip

May 13

What is the best practice on how to proceed?

During a network transfer two drives reported errors. Enough errors the array seemed to crash entirely, along with it most of the Unraid gui/web interface. I tried to stop the array and reboot from the gui but it was mostly unresponsive. I used the terminal to try to reboot but it didn't work. Shutdown did work, however it was instant and the shutdown was unclean. Upon boot up the two degraded disks are being emulated but once I start the array a parity check/sync will start automatically.

I have two parity drives. I have replacement drives I can swap in. Should I allow the automatic parity sync operation proceed with the emulated drives or should I swap in one new drive at a time, allowing parity to rebuild my drives?

My concerns are the integrity of my parity and the what multiple parity sync / drive rebuild operations could do to other drives. And of course, the proper order of operations to restore my array to working order.

Cabling has been checked and verified to be working without issue. The two drives in question are reporting s.m.a.r.t errors.

I have my system powered off for now while I wait for input. I can grab diagnostics if its absolutely necessary however, I'm pretty sure I just lucked out and suffered two hard drive failures at the same time.

After reading through some documentation and forum posts I haven't found guidance for this specific scenario.

Please advise

Thank you.

December 10, 2023

On 12/8/2023 at 4:54 AM, iJacks said:

With regards to MKV profiles, as far as I am aware you do indeed only need to create said file and then add it to the MKV_ARGS option in the arm.yaml config file. The path will be relative to ARM's Docker setup and the sample they include (MKV_ARGS: "--profile=/opt/arm/default.mmcp.xml") would not be visible externally by default.

Anyone have any further input regarding Makemkv profiles/conversion profiles?

It seems the MakeMKV profile/conversion profiles have to be placed in the .MakeMKV directory and then the ripper settings/arm.yaml file needs to be updated to have MKV_ARGS: use /home/arm/.MakeMKV/insert profile name...

So in my case it looks like this "--profile=/home/arm/.MakeMKV/pcmflac.mmcp.xml"

After correcting a conversion profile series of errors I can confirm ARM will use a proper MakeMKV conversion profile. Only issue I noticed is the app_DefaultSelectionString= parameter within the profile was ignored. I had to add it back to the settings.conf file.

By doing this, the logs indicate this profile is being used however, everything I want this profile to do is not working. This is either a profile issue or ARM/ARM's installation of MakeMKV doesn't process these profiles correctly. I will test this profile and other profiles further to attempt to determine what is or isn't working and where/why.

I have the profile located in the same directory as the abcde.conf. I have pointed to it with the MKV_ARGS option and nothing seems to work. I have used --profile=/etc/arm/config/pcmflac.mmcp.xml and "--profile=/etc/arm/config/pcmflac.mmcp.xml" as the MKV_ARGS: option.

~~There's a separate settings.conf file located in the.MakeMKV directory which absolutely respects anything added to app_DefaultSelectionString=~~

~~However, I do not know if or how to populate the settings.conf with MakeMKV profile/conversion profile info as the profiles use XML and a specific structure... apparently.~~

~~From what I can tell, the profile needs to be in MakeMKVs data directory which is the .MakeMKV folder/directory within ARM but I have no idea how to point to it with the MKV_ARGS yaml option,~~

December 8, 2023

Now that I have a working ARM container, can anyone tell me how to go about adding custom Makemkv profiles to it? I can't figure out what directory the custom xml file should be in. I was/am expecting to find the default.mmcp.xml file somewhere within the ARM container files but I cannot. I'm not sure if I just need to create the profile and appropriately reference it via the MKV_ARGS: option via the webui ripper settings / arm.yaml file.

Thanks in advance

December 7, 2023

Well, actually this turned out to be a pretty easy fix. It only took somewhere around 16 hours of trial and error to find...

It seems the Music / music folder thing can cause problems during the container creation. If there's a music folder already in the directory the scripts will halt because it cannot delete the folder... I don't know exactly how to explain it but I was able to replicate it a whole lot with a virtual machine. It was actually in the log I posted below from the beginning but it took me far to long to realize that was the problem.

I'm not too good at this here Linux stuff.

I cannot get this to work at all. I left the container config at default except for removing the nvidia runtime and adjusting the cd/dvd/bd drive parameters. Everything appears to be successful and the container will start but I cannot get to the webui no matter what ports I try and inserting a dvd doesn't do anything.

ARM log below:

text error warn system array login

*** Running /etc/my_init.d/10_syslog-ng.init...
*** Running /etc/my_init.d/arm_user_files_setup.sh...
rm: cannot remove '/home/arm/music': Is a directory
*** /etc/my_init.d/arm_user_files_setup.sh failed with status 1

*** Killing all processes...
*** Running /etc/my_init.d/10_syslog-ng.init...
*** Running /etc/my_init.d/arm_user_files_setup.sh...
rm: cannot remove '/home/arm/music': Is a directory
*** /etc/my_init.d/arm_user_files_setup.sh failed with status 1

*** Killing all processes...
Dec 7 01:39:29 b82e733c525e syslog-ng[13]: syslog-ng starting up; version='3.25.1'
Updating arm user id from 1000 to 99...
Updating arm group id from 1000 to 100...
Adding arm user to 'render' group
Dec 7 01:39:30 b82e733c525e syslog-ng[13]: syslog-ng shutting down; version='3.25.1'
Dec 7 01:39:35 b82e733c525e syslog-ng[12]: syslog-ng starting up; version='3.25.1'
Updating arm user id from 1000 to 99...
usermod: no changes
Updating arm group id from 1000 to 100...
Adding arm user to 'render' group
Dec 7 01:39:36 b82e733c525e syslog-ng[12]: syslog-ng shutting down; version='3.25.1'

** Press ANY KEY to close this window **

December 1, 2023

@JorgeB

Thank you for the reply and insight. I am barely a novice when it comes to understanding Linux. Do you have any further information regarding this general protection fault or perhaps you can point me to some resources to better understand Linux/UNRAID general protection faults?

My assumption is this has something to do with the SAS drives I added and the way SAS drives handle and report SMART info but I am pretty much just guessing. I do not have the required understanding to do anything other than guess.

I'll take your advice and not worry for now but I would love to understand more about these faults.

December 1, 2023

Recently I've been seeing general protection faults regarding smartctl. The two most recent examples are:

Nov 30 18:56:01 UNRAID kernel: traps: smartctl[12156] general protection fault ip:1467c84358e4 sp:7ffe9bf86558 error:0 in libc-2.37.so[1467c8373000+169000]

Dec 1 00:45:05 UNRAID kernel: traps: smartctl[18759] general protection fault ip:14b9dd2d28e4 sp:7ffe6616ed48 error:0 in libc-2.37.so[14b9dd210000+169000]

I added two 20TB SAS drives as my parity drives and I think these general protection faults started around that time.

I have checked all cables for proper connection and function and cannot identify any issues with cables.

As far as I can tell, everything is working fine but seeing these errors in the log is a bit concerning.

Diagnostics attached.

Thanks in advance.

unraid-diagnostics-20231201-0448.zip

username34793

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by username34793

(SOLVED) How should I proceed? - Two drives failed during a network transfer, crashed unraid, caused unclean shutdown.

(SOLVED) How should I proceed? - Two drives failed during a network transfer, crashed unraid, caused unclean shutdown.

(SOLVED) How should I proceed? - Two drives failed during a network transfer, crashed unraid, caused unclean shutdown.

(SOLVED) How should I proceed? - Two drives failed during a network transfer, crashed unraid, caused unclean shutdown.

(SOLVED) How should I proceed? - Two drives failed during a network transfer, crashed unraid, caused unclean shutdown.

(SOLVED) How should I proceed? - Two drives failed during a network transfer, crashed unraid, caused unclean shutdown.

(SOLVED) How should I proceed? - Two drives failed during a network transfer, crashed unraid, caused unclean shutdown.

(SOLVED) How should I proceed? - Two drives failed during a network transfer, crashed unraid, caused unclean shutdown.

(SOLVED) How should I proceed? - Two drives failed during a network transfer, crashed unraid, caused unclean shutdown.

[SUPPORT] automatic-ripping-machine/automatic-ripping-machine

[SUPPORT] automatic-ripping-machine/automatic-ripping-machine

[SUPPORT] automatic-ripping-machine/automatic-ripping-machine

(SOLVED) UNRAID kernel: traps: smartctl[12156] general protection fault (SOLVED)

(SOLVED) UNRAID kernel: traps: smartctl[12156] general protection fault (SOLVED)