How to reset healthy drive override?


Recommended Posts

Sorry for the convoluted subject line.  I'll explain.  I have a 1TB SSD with a single reallocated sector.  I've set that up as my cache drive, but the Dashboard tab on Unraid was showing a thumbs-down 👎 next to "healthy" for that drive.  So I clicked on it to override it to healthy, thinking that the status going to unhealthy again will alert me to new sector reallocations.

 

But I had second thoughts and wondered how I could reset it so that it would go back to the default behavior of a thumbs-down for that SSD.  At first, I hoped it was a browser cookie, but I proved that wrong.

 

Does anyone know how to return that indicator to its normal function (thumbs down on that drive because it has a reallocated sector)?

Link to comment

Thanks, but that didn't affect the thumbs-up I am seeing on the Dashboard.  It started at a thumbs up "healthy" on the cache drive and it stayed at a thumbs up "healthy" even though there is one reallocated sector. 

 

I want it to go back to showing thumbs down for any drive with more than 0 reallocated sectors -- like it did before I acknowledged the thumbs down on that cache drive.

 

Screen Shot 2020-12-10 at 3.56.55 AM.png

Screen Shot 2020-12-10 at 3.58.41 AM.png

Link to comment

 

1 hour ago, ChatNoir said:

Most SMART attribute are never reset.

Once the drive has a realocated sector, I don't think the value will go back to 0.

You are correct and that aligns with what I know about SMART attributes.  (Don't get me started about the stupidity of not being able to clear UDMA CRC errors -- which are almost always caused by bad cables or connectors.  I've got a perfectly healthy drive with a bunch of UDMA CRC errors left over from some long-ago-corrected cabling issue.)

 

1 hour ago, ChatNoir said:

If this is not an issue, I think this is the correct thing to do to acknowledge the issue to be alerted if the number increase once more.

 

I did acknowledge it, turning the thumbs down into a thumbs up, but then had second thoughts.  

 

Now I want to "undo" my acknowledgement so that the drive health goes back to a thumbs-down on the Dashboard.  As I wrote previously, "I want it to go back to showing thumbs down for any drive with more than 0 reallocated sectors -- like it did before I acknowledged the thumbs down on that cache drive."

 

There must be a way to return the Dashboard GUI to its default behavior.  But I've been poring over menus and config files and I'm not seeing where the reallocation count limit has been changed for that SSD.  So any help is appreciated.

 

Edited by Sissy
Clarification
Link to comment

Thanks Vr2lo!  Here's the output of the ls command on /boot/config:

 

root@Unraid-N5550:/boot/config# ls -lrt
total 88
drwx------ 2 root root 4096 Nov 30 06:15 wireguard/
drwx------ 2 root root 4096 Nov 30 06:15 ssh/
-rw------- 1 root root   33 Nov 30 06:15 machine-id
-rw------- 1 root root  169 Nov 30 06:16 docker.cfg
drwx------ 3 root root 4096 Nov 30 06:16 ssl/
-rw------- 1 root root  256 Nov 30 06:16 Trial.key
-rw------- 1 root root  119 Nov 30 06:18 flash.cfg
-rw------- 1 root root  106 Nov 30 07:13 network.cfg
-rw------- 1 root root   71 Nov 30 07:13 go
drwx------ 2 root root 4096 Nov 30 18:04 shares/
-rw------- 1 root root  378 Dec  7 15:39 network-rules.cfg
-rw------- 1 root root    7 Dec  9 16:13 drift
-rw------- 1 root root  512 Dec  9 17:49 random-seed
drwx------ 2 root root 4096 Dec  9 18:42 plugins-removed/
drwx------ 8 root root 4096 Dec  9 18:42 plugins/
-rw------- 1 root root   33 Dec  9 18:56 smart-all.cfg
-rw------- 1 root root  714 Dec  9 18:57 ident.cfg
-rw------- 1 root root  577 Dec  9 18:57 share.cfg
-rw------- 1 root root 2411 Dec  9 20:48 disk.cfg
-rw------- 1 root root  199 Dec 10 06:46 domain.cfg
-rw------- 1 root root 4096 Dec 10 11:53 super.dat
-rw------- 1 root root  796 Dec 10 11:53 parity-checks.log

 

 

Link to comment

I've got an 8-bay QNAP and it's working fine.  The Thecus, on the other hand, is pretty much abandoned with no software updates in years.  I took out the DOM, built a power cable, and attached my 1TB Samsung 840 EVO in its place.  Thecus was kind enough to pre-punch holes in the top to screw a 2.5" drive up there.  (see attached photo)

 

Looking at disk.cfg, I do wonder if deleting it would cause me to have to reconfigure a boatload of stuff.  Also, I'm not seeing any line that looks like it's related to the acknowledgement of the thumbs down health status.  But I'll have to try it later.

 

# Generated settings:
startArray="no"
spindownDelay="1"
queueDepth="auto"
spinupGroups="yes"
defaultFormat="2"
defaultFsType="xfs"
shutdownTimeout="90"
luksKeyfile="/root/keyfile"
poll_attributes="1800"
nr_requests="Auto"
md_scheduler="auto"
md_num_stripes="1280"
md_queue_limit="80"
md_sync_limit="5"
md_write_method="auto"
diskComment.0=""
diskFsType.0="auto"
diskSpindownDelay.0="-1"
diskSpinupGroup.0=""
diskComment.1=""
diskFsType.1="xfs"
diskSpindownDelay.1="-1"
diskSpinupGroup.1=""
diskExport.1="e"
diskFruit.1="no"
diskCaseSensitive.1="auto"
diskSecurity.1="public"
diskReadList.1=""
diskWriteList.1=""
diskVolsizelimit.1=""
diskExportNFS.1="-"
diskSecurityNFS.1="public"
diskHostListNFS.1=""
diskExportAFP.1="-"
diskSecurityAFP.1="public"
diskReadListAFP.1=""
diskWriteListAFP.1=""
diskVolsizelimitAFP.1=""
diskVoldbpathAFP.1=""
diskComment.2=""
diskFsType.2="xfs"
diskSpindownDelay.2="-1"
diskSpinupGroup.2=""
diskExport.2="e"
diskFruit.2="no"
diskCaseSensitive.2="auto"
diskSecurity.2="public"
diskReadList.2=""
diskWriteList.2=""
diskVolsizelimit.2=""
diskExportNFS.2="-"
diskSecurityNFS.2="public"
diskHostListNFS.2=""
diskExportAFP.2="-"
diskSecurityAFP.2="public"
diskReadListAFP.2=""
diskWriteListAFP.2=""
diskVolsizelimitAFP.2=""
diskVoldbpathAFP.2=""
diskComment.3=""
diskFsType.3="xfs"
diskSpindownDelay.3="-1"
diskSpinupGroup.3=""
diskExport.3="e"
diskFruit.3="no"
diskCaseSensitive.3="auto"
diskSecurity.3="public"
diskReadList.3=""
diskWriteList.3=""
diskVolsizelimit.3=""
diskExportNFS.3="-"
diskSecurityNFS.3="public"
diskHostListNFS.3=""
diskExportAFP.3="-"
diskSecurityAFP.3="public"
diskReadListAFP.3=""
diskWriteListAFP.3=""
diskVolsizelimitAFP.3=""
diskVoldbpathAFP.3=""
diskComment.29=""
diskFsType.29="auto"
diskSpindownDelay.29="-1"
diskSpinupGroup.29=""
cacheId="Samsung_SSD_840_EVO_1TB_S1D9NEAD932978J"
cacheFsType="btrfs"
cacheComment=""
cacheSpindownDelay="-1"
cacheSpinupGroup="host3"
cacheUUID="b708b8bc-d59a-499e-8daa-3e3a8b6282f2"
cacheExport="e"
cacheFruit="no"
cacheCaseSensitive="auto"
cacheSecurity="public"
cacheReadList=""
cacheWriteList=""
cacheVolsizelimit=""
cacheExportNFS="-"
cacheSecurityNFS="public"
cacheHostListNFS=""
cacheExportAFP="-"
cacheSecurityAFP="public"
cacheReadListAFP=""
cacheWriteListAFP=""
cacheVolsizelimitAFP=""
cacheVoldbpathAFP=""

 

Thecus_Unraid.jpeg

Link to comment

You might want to take a time out at this point.  Go to Wikipedia and look up the article on S.M.A.R.T.

 

The really critical attributes that you want to be flagging for action (particularly if you are a bit paranoid!) are 196, 197 and 198-- Not all manufacturers report all three as there is some overlap in what they are reporting.  Attribute #5 reports a condition which has been previously addressed and fixed.   (All HD manufacturers provide a  number of spare sectors which can be 'mapped' in to replace a defective one.)  Attribute #5 should be monitored only to see that there is not a rapid growth in the number of sectors that have been remapped in a short period of time.  You really want to be tipped off if it jumps from 1 sector to 20 sectors and you should probably replace the disk ASAP if were to jump from 1 to 200.  Sitting there watching it at count  of 1 for the next 20,000 hours is not really a good idea as it will condition you to ignore it if it were to take a BIG jump!

 

Next thing to be very aware of is that SMART is not an actual specification.  Each HD manufacturer uses the basic idea surrounding SMART but each one has implemented it slightly differently.  There are a couple of people who are active on the board who have a much better understanding of what these raw numbers that are in the report really signify (In some cases, the decimal number has be converted back to hexadecimal and then further broken down to provide the actual data)  and many of the attributes are meaningful only to the HD manufacturing Engineers.

Edited by Frank1940
Link to comment

The correct one is " /boot/config/plugins/dynamix/monitor.ini "

 

You need "stop array" first then at command prompt delete or modify the file, then type "reboot" ASAP

 

The content look like as below,  "22" or something else is SMART attribute ID in my case 

[smart]
parity.22="100"
disk6.22="100"
disk7.22="100"
disk8.22="100"
disk9.22="100"
disk10.22="100"
disk11.22="100"
disk12.22="100"
disk13.22="100"
disk14.22="100"
parity2.22="100"
parity.ack="true"
parity2.ack="true"

 

Once ack all, then content will be

[smart]
parity.22="100"
disk6.22="100"
disk7.22="100"
disk8.22="100"
disk9.22="100"
disk10.22="100"
disk11.22="100"
disk12.22="100"
disk13.22="100"
disk14.22="100"
parity2.22="100"
parity.ack="true"
parity2.ack="true"
disk6.ack="true"
disk7.ack="true"
disk8.ack="true"
disk9.ack="true"
disk10.ack="true"
disk11.ack="true"
disk12.ack="true"
disk13.ack="true"
disk14.ack="true"

 

22 hours ago, Sissy said:

1TB Samsung 840 EVO in its place

So good that Thecus have extra SATA port for this, my QNAP have too but no SATA socket solder on PCB.

Edited by Vr2Io
Link to comment
On 12/10/2020 at 5:06 PM, Frank1940 said:

You might want to take a time out at this point.  Go to Wikipedia and look up the article on S.M.A.R.T.

You might want to back off on the condescension. I started working with hard drives when they had physical labels that listed the bad sectors; you would type in the list when initializing the drives.  You can just see the corner of a metal computer case in background of that photo -- it's a CP/M-80 Z80 system that I built, initially with floppies and later upgraded with a 5MB (yes, megabyte), Tandon TM501, 5.25", full-height, drive interfaced to the computer with an MFM-to-SCSI adapter.  I built that computer before IBM released a PC. I was already an embedded systems engineer at that time.  

 

Quote

The really critical attributes that you want to be flagging for action (particularly if you are a bit paranoid!) are 196, 197 and 198

The drive in question, a Samsung 840 EVO 1TB SSD (mentioned above), doesn't even report those attributes! Samsung's white paper entitled Using SMART Attributes to Estimate Drive Lifetime: Increase ROI by Measuring the SSD Lifespan in Your Workload says that "the most important indicators of drive health [on Samsung SSDs] are attributes 179, 181, 182, and 183."  I think that I'll follow Samsung's guidance -- at least the attributes they identify are actually reported by the drive.

 

Going forward, I will avail myself of this forum's "poll" feature should I seek community input on best-practices for running my Unraid server. 

 

Thank you for your input. 

Edited by Sissy
Proofreading - minor punctuation error
Link to comment
34 minutes ago, Vr2Io said:

The correct one is " /boot/config/plugins/dynamix/monitor.ini "

 

You need "stop array" first then at command prompt delete or modify the file, then type "reboot" ASAP

 

The content look like as below,  "100" or something else is the SMART attribute ID in my case 

{snip}

Thank you, that's exactly what I wanted!  I really appreciate all of your effort.  I'll let you know how it goes.

 

37 minutes ago, Vr2Io said:

So good that Thecus have extra SATA port for this, my QNAP have too but no SATA socket solder on PCB.

Which QNAP model?

Link to comment

I did as you directed and it worked perfectly! 

Unraid rebooted and immediately flagged the drive's health after providing a notification of the reallocated sector.

 

Here's the file before and after acknowledging the condition:

root@Unraid-N5550:/boot/config/plugins/dynamix# cat monitor.ini
[smart]
cache.5="1"
root@Unraid-N5550:/boot/config/plugins/dynamix# cat monitor.ini
[smart]
cache.5="1"
cache.ack="true"

Thanks again.  It would have taken me a long time to find that without your help.

34 minutes ago, Vr2Io said:

TS-851, ATOM J1800, now places as spare.

FYR, performance drop a lot with dual parity with 13 disks ( 2 parity + 11 data disk, some were USB disk )

That's a supercomputer compared to my Thecus N5550 Unraid box! 

Atom D2550 vs. Intel Celeron J1800

 

My Thecus box is only dealing with two parity, three data drives, and a single cache SSD.  It pretty much saturates gigabit, but it can't do any virtualization (I just want it to be a simple NAS, though).

 

It looks like your DOM is a USB device that plugs into what seems to be a standard, two-row, dual-port USB header on the motherboard.  I guess you could put your Unraid boot drive into that header using an appropriate, USB-A-terminated cable.

 

I feel better with my Thecus DOM in an anti-static bag -- rather than plugged into the box, awaiting something stepping on it or the BIOS deciding to boot from it (think dying button cell battery).  That it gives me a SATA port for the SSD is just a plus.

 

My QNAP is a TS-853A, BTW.

 

Thanks again.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.