![](http://content.invisioncic.com/u329766/set_resources_34/84c1e40ea0e759e3f1505eb1788ddf3c_pattern.png)
MacDaddy
-
Posts
50 -
Joined
-
Last visited
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Posts posted by MacDaddy
-
-
Thanks for the suggestion. I will add attributes 1 and 200 to be monitored as you suggest.
The parity completed with no errors. The main screen shows there are 56 errors on the parity disk alone. All other disks are 0. I'm assuming these are all the correctable read errors.
I believe I will use this opportunity to replace the parity with an 8TB option. This will give me the latitude to begin incrementing the data drives to 8TB as the storage is consumed. Unless I'm just being hyper paranoid, I'll not add the former parity drive back to the disk pool. If the risk is is low to the point of non-existent then I'll be willing to add it back in.
-
I have had a MCE occuring since mid December. I've ordered some replacement memory that has been delivered and plan to install it tomorrow. While awaiting the replacement memory, the parity drive alerted for read errors. Before I reboot following the new memory installation I wanted to add the current diagnostic information. It appears that the HDD read errors were corrected, but I wanted to ask for help in determining if the HDD read error might possibly be a false positive influenced by the memory errors. If it is truly failing, I have no problem replacing the drive. In this case would adding the old parity drive back to the data pool be an unreasonable risk. Thanks in advance for your advice.
tower-diagnostics-20220110-1653.zip tower-smart-20220110-1648.zip tower-syslog-20220110-2300.zip
-
Fix Common Problems alerted me to a previous MCE. I'm seeing repeating entries for about 5 secs of:
Oct 18 22:34:33 Tower kernel: mce: [Hardware Error]: Machine check events logged Oct 18 22:34:33 Tower kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR Oct 18 22:34:33 Tower kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x5ff304 offset:0xec0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0091 socket:1 ha:0 channel_mask:8 rank:0)
It corrected and I have not seen a repeat event. Is this a one-time event that bears attention if it should happen again, or do I need to start looking for some new memory? Thanks in advance for any advice you can offer.
-
Any possibility to add sshpass?
In conjunction with user.scripts I'm hoping to implement something like :
#!/bin/bash #argumentDescription=Enter password and box name (mypass pihole) sshpass -p $1 ssh pi@$2.rmac "sudo dd bs=4M if=/dev/mmcblk0 status=progress | gzip -1 - " | dd of=/mnt/user/Backups/$2/$(date +%Y%m%d\_%H%M%S)\_$box.gz
I run 4 different Raspberry Pi boxes. They run for a good long time, but I've just had the third SD card fail. I would like to keep an image where I can recover quickly with minimal pain.
-
Thanks for the info. The 5400 drives should in theory be quieter than their 7200 counterparts. Good points on the airflow. Noise and the airflow will go hand in hand. I’ll look in to active cooling on the CPUs.
Sent from my iPhone using Tapatalk -
I have a Supermicro X9DRi-LN4+/X9DR3-LN4+ with dual Xeon® CPU E5-2630L v2 based server for my unRaid build. It is a surplus server in a Supermicro CSE-835TQ-R920B case. In my prior residence, I had the luxury of converting one of the closets to house all my equipment. It was designed for power/ventilation/noise. I'm now in a place where I can't modify any rooms and the only location to house the equipment is a closet in the master bedroom. Needless to say, the server sounds like a hoover vacuum with asthma on steroids. It has served me well and I am thinking to transfer the M/B and 5xWD40EFRX to a silenced case. I'm thinking something like be quiet! Dark Base 900 https://www.bequiet.com/en/case/697 might work.
The CPU shows 60W TDP. Currently they have a passive cooler with the custom air shroud from Supermicro. I would intend to keep them configured as such. I would have to change from the redundant power supplies currently in the server chassis. I would appreciate any potential suggestions in that area.
What is your advice on the noise footprint after the conversion? Any experience with this case or any other that might accommodate the M/B? Any thoughts on potential roadblocks I might find?
Thanks in advance for any input.
-
Thanks for your response. I had a feeling it would go that way. This is my first encounter with corruption.
When I complete the XFS repair it will prune data (according to the dry run output). Is that data lost for good or will unRaid recognize it and let parity reconstruct?
Sent from my iPhone using Tapatalk -
I'm currently using a docker MakeMKV to write cloned DVD structures in to a MKV container. I've noticed that a share that I'm using for the output keeps dropping. I can reboot and the array will start with all drives green and the share is restored. A snippet from the log is attached. I can start in maintenance mode and dry run xfs_repair on all the hard drives. All are clean except md2.
Is it better to xfs_repair the md2 drive or replace with new drive and let it rebuild? Note-while parity shows valid, it has been more than 700 days since last check.
Oct 13 18:36:15 Tower kernel: XFS (md2): Metadata CRC error detected at xfs_inobt_read_verify+0xd/0x3a [xfs], xfs_inobt block 0x19f754db8 Oct 13 18:36:15 Tower kernel: XFS (md2): Unmount and run xfs_repair Oct 13 18:36:15 Tower kernel: XFS (md2): First 128 bytes of corrupted metadata buffer: Oct 13 18:36:15 Tower kernel: 0000000095cfb836: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Oct 13 18:36:15 Tower kernel: 000000001de8c0f3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Oct 13 18:36:15 Tower kernel: 00000000c8d99f19: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Oct 13 18:36:15 Tower kernel: 00000000a9a413e7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Oct 13 18:36:15 Tower kernel: 000000003c326670: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Oct 13 18:36:15 Tower kernel: 000000005abd08ab: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Oct 13 18:36:15 Tower kernel: 000000003867ab1f: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Oct 13 18:36:15 Tower kernel: 0000000085cdd1ba: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Oct 13 18:36:15 Tower kernel: XFS (md2): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x19f754db8 len 8 error 74 Oct 13 18:36:15 Tower kernel: XFS (md2): xfs_do_force_shutdown(0x1) called from line 300 of file fs/xfs/xfs_trans_buf.c. Return address = 000000007c1ff77b Oct 13 18:36:15 Tower kernel: XFS (md2): I/O Error Detected. Shutting down filesystem Oct 13 18:36:15 Tower kernel: XFS (md2): Please umount the filesystem and rectify the problem(s)
-
Some people should learn to search the forum before posting redundant issues. My apologies.
Sent from my iPhone using Tapatalk-
1
-
1
-
-
I’m resurrecting my unRaid box. It shows the latest update v6.5.3 is available. When I initiate the upgrade it throws an invalid URL/ server error message. Sorry for the pic, I’m on direct terminal.
Is it possible that amazon is down? Or maybe I need an address intermediate step?
Sent from my iPhone using Tapatalk
-
That's probably it. My 57 year old eyes need a little extra size in fonts these days:-)
I'll experiment with another size.
Sent from my iPhone using Tapatalk -
-
Logs are back. Many thanks.
Sent from my iPhone using Tapatalk -
Heads up. I just updated the app on iPhone. Major features work as expected and alignment of tab scrolling is much improved.
However it seems that the logs no longer display. I did a quick check for a plugin update on the server side. It appears I'm on the latest.
Is there any diagnostic info I can send that would be helpful?
Sent from my iPhone using Tapatalk -
Thanks so much for all the work on this. I wanted to send some beer money. On the Apps tab, there are links for Statistics and Credits in the upper right. Selecting them brings up the expected response with the familiar PayPal donate button. It took four attempts to get the donation sent - it terminated many times with a cryptic "fatal error" message. I think donations are an essential function:-). Thanks for keeping the unRaid apps world safe.
-
Good point. This is the first drive on the SAS card and the first time the slot in this particular cage has been used. I'm not sure if it's really impacting the preclear right at the moment. I'm getting about 98.3 MB/s and 21% through the first step at 8:13 hours in. I think I should let it finish and then try to address the cable/card/backplane items when it completes.
-
I've used this script for reclaiming two of my older 1TB disks from other equipment in to my unRaid server. It was great to still be able to use the existing array with all of my normal shares still in a protected state while the new drives were clearing. I am now in the process of preclearing a 2TB WD EARS drive and got some interesting syslog entries. I'm a Linus newb, so would appreciate any insight on this. By the way, this is on one of the Limetech MD1510 machines and this particular disk is the first (and only) on the second SAS.
Right after the login at 23:21:36 I kicked off the preclear and went to bed. The hard resetting of the link worries me.
Jun 8 23:19:47 Tower unmenu-status: Starting unmenu web-serverJun 8 23:21:36 Tower login[3702]: ROOT LOGIN on `tty1'
Jun 8 23:22:24 Tower kernel: sda: unknown partition table
Jun 8 23:47:56 Tower sSMTP[7314]: Creating SSL connection to host
Jun 8 23:47:57 Tower sSMTP[7314]: SSL connection using DHE-RSA-AES256-SHA
Jun 8 23:47:58 Tower sSMTP[7314]: Sent mail for root@localhost (221 2.0.0 omta04.emeryville.ca.mail.comcast.net comcast closing connection)
Jun 8 23:51:52 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jun 8 23:51:52 Tower kernel: ata4.00: failed command: IDENTIFY DEVICE
Jun 8 23:51:52 Tower kernel: ata4.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Jun 8 23:51:52 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Jun 8 23:51:52 Tower kernel: ata4.00: status: { DRDY }
Jun 8 23:51:52 Tower kernel: ata4: hard resetting link
Jun 8 23:51:58 Tower kernel: ata4: link is slow to respond, please be patient (ready=0)
Jun 8 23:52:02 Tower kernel: ata4: SRST failed (errno=-16)
Jun 8 23:52:02 Tower kernel: ata4: hard resetting link
Jun 8 23:52:06 Tower kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 8 23:52:06 Tower kernel: ata4.00: configured for UDMA/133
Jun 8 23:52:06 Tower kernel: ata4: EH complete
Jun 9 00:00:55 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Jun 9 00:00:55 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED
Jun 9 00:00:55 Tower kernel: ata4.00: cmd 60/00:00:00:5b:10/02:00:00:00:00/40 tag 0 ncq 262144 in
Jun 9 00:00:55 Tower kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 9 00:00:55 Tower kernel: ata4.00: status: { DRDY }
Jun 9 00:00:55 Tower kernel: ata4: hard resetting link
Jun 9 00:01:01 Tower kernel: ata4: link is slow to respond, please be patient (ready=0)
Jun 9 00:01:05 Tower kernel: ata4: SRST failed (errno=-16)
Jun 9 00:01:05 Tower kernel: ata4: hard resetting link
Jun 9 00:01:11 Tower kernel: ata4: link is slow to respond, please be patient (ready=0)
Jun 9 00:01:15 Tower kernel: ata4: SRST failed (errno=-16)
Jun 9 00:01:15 Tower kernel: ata4: hard resetting link
Jun 9 00:01:21 Tower kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 9 00:01:21 Tower kernel: ata4.00: configured for UDMA/133
Jun 9 00:01:21 Tower kernel: ata4.00: device reported invalid CHS sector 0
Jun 9 00:01:21 Tower kernel: ata4: EH complete
Jun 9 00:19:20 Tower kernel: mdcmd (394): spindown 9
Jun 9 00:45:02 Tower kernel: mdcmd (548): spindown 0
Jun 9 00:45:03 Tower kernel: mdcmd (549): spindown 2
Jun 9 00:45:03 Tower kernel: mdcmd (550): spindown 5
Jun 9 00:45:03 Tower kernel: mdcmd (551): spindown 8
Jun 9 00:45:13 Tower kernel: mdcmd (553): spindown 2
Jun 9 00:47:40 Tower sSMTP[12676]: Creating SSL connection to host
Jun 9 00:47:41 Tower sSMTP[12676]: SSL connection using DHE-RSA-AES256-SHA
Jun 9 00:47:42 Tower sSMTP[12676]: Sent mail for root@localhost (221 2.0.0 omta17.westchester.pa.mail.comcast.net comcast closing connection)
Jun 9 04:00:17 Tower kernel: mdcmd (1719): spindown 0
Jun 9 04:00:17 Tower kernel: mdcmd (1720): spindown 8
Jun 9 05:02:12 Tower kernel: mdcmd (2090): spindown 0
Jun 9 07:40:07 Tower kernel: usb 4-1: USB disconnect, address 2
Jun 9 07:40:07 Tower kernel: usb 4-1.1: USB disconnect, address 3
Jun 9 07:40:07 Tower kernel: usb 4-1.3: USB disconnect, address 4
Jun 9 07:41:47 Tower unmenu[3699]: Disk /dev/sda doesn't contain a valid partition table
Jun 9 07:44:03 Tower kernel: mdcmd (3068): spindown 8
Jun 9 07:44:13 Tower kernel: mdcmd (3070): spindown 8
Jun 9 07:44:13 Tower kernel: mdcmd (3071): spindown 9
Jun 9 07:44:24 Tower kernel: mdcmd (3073): spindown 0
Jun 9 07:44:35 Tower kernel: mdcmd (3075): spindown 0
Jun 9 07:44:35 Tower kernel: mdcmd (3076): spindown 2
Jun 9 07:44:45 Tower kernel: mdcmd (3078): spindown 5
I'll attach the full syslog if there is any further information that might help in explaining. thanks for your help.
[solved] Does a Memory Check Error contribute to a HDD read error?
in General Support
Posted
Is this the appropriate place to initiate monitoring 1 and 200 attributes?