October 16, 20169 yr I woke up this morning to an angry red warning popup on my unRAID GUI saying "Array health report [FAIL]". It was followed, however by green notifications that my disk rebuild had completed and another report, 8 hours later stating "Array health report [PASS]". (see image) I've had a number of issues over the last 2 weeks or so, including: a drive failurehttp://lime-technology.com/forum/index.php?topic=52573.msg505213#msg505213 [*]the server not shutting down properly http://lime-technology.com/forum/index.php?topic=52710.msg506597#msg506597 [*]preclear failure http://lime-technology.com/forum/index.php?topic=39985.msg506020#msg506020 It appears, from looking at the Dashboard and Main tabs of webGUI, that everything is running OK now. However, I'm also attaching Diagnostics.zip. Does anyone see anything in there that I need to be concerned about? Notes: there are a number of dockers listed in my .sig, however, only CrashPlan, SAB & SB are running. yes, I'm aware the email is not being sent via Gmail. I've got a separate issue opened on that. I'll probably have to change my password to get it working again yes, I'm rather low on disk space, with most drives under 1GB free space nas-diagnostics-20161016-0849.zip
October 17, 20169 yr Go to Tools -> Archived Notifications and open the health report, it will show you the reason of the failure in more detail.
October 17, 20169 yr Author Thank you sir! Here's what I found: 2016-10-16 00:20 unRAID Status Notice [NAS] - array health report [FAIL] Array has 13 disks (including parity & cache) alert Parity - ST4000DM000-1F2168_Z301F9LZ (sdc) - active 34 C [OK] Disk 1 - WDC_WD10EARS-00Y5B1_WD-WCAV5U070002 (sdn) - active 33 C [OK] Disk 2 - ST32000542AS_5XW27ZVJ (sdf) - active 33 C [OK] Disk 3 - Hitachi_HDT721010SLA360_STF604MH0EN8XB (sdg) - active 37 C [OK] Disk 4 - ST32000542AS_5XW2EWWX (sdd) - active 34 C [OK] Disk 5 - HGST_HDN724040ALE640_PK1338P4GT2D9B (sdb) - active 41 C [OK] Disk 6 - WDC_WD20EARX-00PASB0_WD-WCAZAJ031431 (sdi) - active 33 C [OK] Disk 7 - ST2000DM001-1CH164_W2418WLL (sdj) - active 36 C [OK] Disk 8 - WDC_WD20EFRX-68AX9N0_WD-WMC1T0990614 (sdk) - active 32 C [OK] Disk 9 - ST1000DM003-1CH162_S1DE5H8K (sdl) - active 34 C [OK] Disk 10 - WDC_WD30EZRX-00D8PB0_WD-WMC4N0533589 (sde) - active 32 C [OK] Disk 11 - HGST_HMS5C4040ALE640_PL2331LAG6W5WJ (sdh) - active 35 C [DISK INVALID] Cache - SAMSUNG_HM251HI_S237JDRZ912002 (sdm) - active 28 C [OK] Parity sync / Data rebuild in progress. Total size: 4 TB Elapsed time: 11 hours, 19 minutes Current position: 2.40 TB (60.0 %) Estimated speed: 59.2 MB/sec Estimated finish: 7 hours, 30 minutes Sync errors corrected: 0 Then there was an earlier error which I don't remember seeing at all: 2016-10-15 16:20 unRAID Status Notice [NAS] - array health report [FAIL] Array has 13 disks (including parity & cache) alert Parity - ST4000DM000-1F2168_Z301F9LZ (sdc) - active 35 C [OK] Disk 1 - WDC_WD10EARS-00Y5B1_WD-WCAV5U070002 (sdn) - active 35 C [OK] Disk 2 - ST32000542AS_5XW27ZVJ (sdf) - active 34 C [OK] Disk 3 - Hitachi_HDT721010SLA360_STF604MH0EN8XB (sdg) - active 39 C [OK] Disk 4 - ST32000542AS_5XW2EWWX (sdd) - active 36 C [OK] Disk 5 - HGST_HDN724040ALE640_PK1338P4GT2D9B (sdb) - active 41 C [OK] Disk 6 - WDC_WD20EARX-00PASB0_WD-WCAZAJ031431 (sdi) - active 35 C [OK] Disk 7 - ST2000DM001-1CH164_W2418WLL (sdj) - active 38 C [OK] Disk 8 - WDC_WD20EFRX-68AX9N0_WD-WMC1T0990614 (sdk) - active 34 C [OK] Disk 9 - ST1000DM003-1CH162_S1DE5H8K (sdl) - active 35 C [OK] Disk 10 - WDC_WD30EZRX-00D8PB0_WD-WMC4N0533589 (sde) - active 32 C [OK] Disk 11 - HGST_HMS5C4040ALE640_PL2331LAG6W5WJ (sdh) - active 35 C [DISK INVALID] Cache - SAMSUNG_HM251HI_S237JDRZ912002 (sdm) - active 28 C [OK] Parity sync / Data rebuild in progress. Total size: 4 TB Elapsed time: 3 hours, 19 minutes Current position: 605 GB (15.1 %) Estimated speed: 76.5 MB/sec Estimated finish: 12 hours, 20 minutes Sync errors corrected: 0 The key line from both of them seems to be: Disk 11 - HGST_HMS5C4040ALE640_PL2331LAG6W5WJ (sdh) - active 35 C [DISK INVALID] Which seems to me to raise some unnecessary concern popping up a red [FAIL] message, when it obviously knows there's a data rebuild in progress. Might it be more reasonable to make that a yellow/orange [WARNING] error instead? Also, it does appear that this is the only thing of concern in the message, so at this point, since the data rebuild is complete, I don't need to worry about anything else. Is that correct?
October 17, 20169 yr Yeah, nothing to worry about. When the rebuild is complete all should be fine (this is confirmed by the later green notifications). Implementation right now is OK or FAIL, in the latter case no further investigation is done what caused the failure, all is classified as an alert.
October 26, 20169 yr I had the Array health report [FAIL] notice too but it's in orange. I looked in the Archived Notifications and found the cause: Cache - TOSHIBA_THNSNH256GMCT_X3ES100STOMY (sdf) - active 46 C (disk is hot) [NOK] Cache 2 - TOSHIBA_THNSNH256GMCT_X3ES1013TOMY (sdg) - active 47 C (disk is hot) [NOK] However, these are mSATA SSDs and I know they run hot so I've set their individual warning and critical temperatures to 55 and 60 degrees, so I shouldn't get this warning.
October 27, 20169 yr I had the Array health report [FAIL] notice too but it's in orange. I looked in the Archived Notifications and found the cause: Cache - TOSHIBA_THNSNH256GMCT_X3ES100STOMY (sdf) - active 46 C (disk is hot) [NOK] Cache 2 - TOSHIBA_THNSNH256GMCT_X3ES1013TOMY (sdg) - active 47 C (disk is hot) [NOK] However, these are mSATA SSDs and I know they run hot so I've set their individual warning and critical temperatures to 55 and 60 degrees, so I shouldn't get this warning. Right, the array health report doesn't take individual disk temperature thresholds into account. Need to change that.
Archived
This topic is now archived and is closed to further replies.