Minor Bug Since Moving to XFS

reggierat · April 5, 2015

Last week i have finished my conversion of all ReiserFS drives over to XFS. Since doing that one of my drives, Disk 3 will sometimes continue to report it's temperature to the system long after it has been spun down. This same drive is also preventing my server from sleeping due to 'disk activity'

[Edit]

It definitely effects other drives, but this one seems more often

syslog.zip

garycase · April 5, 2015

Not sure what's going on here, but it is clearly NOT associated with changing the file system

Post a S.M.A.R.T. report for the drive.

I would also shut the system down; unplug the drive and reseat it if it's in a caddy; or unplug the data cable and replug it if it's directly connected to a SATA port (or, better yet, use a new cable); and then boot back up and see if that resolves the issue.

Squid · April 5, 2015

With the default poll_attributes settings, it can take 14b up to 30 minutes to properly display the temperature of a drive http://lime-technology.com/forum/index.php?topic=38409.msg356665#msg356665

reggierat · April 5, 2015

Disk 3 attached to port: sde
Model Family:	Hitachi Deskstar 5K3000
Device Model:	Hitachi HDS5C3020ALA632
Serial Number:	ML0220F30DYRWD
LU WWN Device Id:	5 000cca 369c5e416
Firmware Version:	ML6OA580
User Capacity:	2,000,398,934,016 bytes [2.00 TB]
Sector Size:	512 bytes logical/physical
Rotation Rate:	5940 rpm
Device is:	In smartctl database [for details use: -P show]
ATA Version is:	ATA8-ACS T13/1699-D revision 4
SATA Version is:	SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:	Mon Apr 6 05:24:44 2015 AEST
SMART support is:	Available - device has SMART capability.
SMART support is:	Enabled
SMART overall-health :	PASSED

ID#	ATTRIBUTE NAME	FLAG	VALUE	WORST	THRESH	TYPE	UPDATED	FAILED	RAW VALUE
1	Raw Read Error Rate	0x000b	100	100	016	Pre-fail	Always	Never	0
2	Throughput Performance	0x0005	136	136	054	Pre-fail	Offline	Never	92
3	Spin Up Time	0x0007	135	135	024	Pre-fail	Always	Never	405 (Average 406)
4	Start Stop Count	0x0012	099	099	000	Old age	Always	Never	7915
5	Reallocated Sector Ct	0x0033	100	100	005	Pre-fail	Always	Never	0
7	Seek Error Rate	0x000b	100	100	067	Pre-fail	Always	Never	0
8	Seek Time Performance	0x0005	144	144	020	Pre-fail	Offline	Never	30
9	Power On Hours	0x0012	097	097	000	Old age	Always	Never	24314
10	Spin Retry Count	0x0013	100	100	060	Pre-fail	Always	Never	0
12	Power Cycle Count	0x0032	100	100	000	Old age	Always	Never	1350
192	Power-Off Retract Count	0x0032	094	094	000	Old age	Always	Never	7916
193	Load Cycle Count	0x0012	094	094	000	Old age	Always	Never	7916
194	Temperature Celsius	0x0002	222	222	000	Old age	Always	Never	27 (Min/Max 8/46)
196	Reallocated Event Count	0x0032	100	100	000	Old age	Always	Never	0
197	Current Pending Sector	0x0022	100	100	000	Old age	Always	Never	0
198	Offline Uncorrectable	0x0008	100	100	000	Old age	Offline	Never	0
199	UDMA CRC Error Count	0x000a	200	200	000	Old age	Always	Never	378

even though the default poling time is 30mins, this drive is still reporting temp several hour after everything has been spun down. Will reseat cable next

BRiT · April 5, 2015

I sort of recall having a discussion in the past how some drives are able to report temps even when spun down. This was possibly back in the version 5 beta series, so that could be anywhere from 27 to 52 months ago, since vwrsion 6 is in beta for 15 months now and I think version 5.0 was final for a year before v6 kicked off, but I could be mistaken.

reggierat · April 5, 2015

i find it strange that this has only popped up now though. I recently replaced my two oldest drives in the array, then decided it was a good time to migrate the file systems. Other then these two events things have been working flawlessly.

I did have some issues with the server waking from sleep but that was fixed by purchasing an Intel NIC and not using my MB's onboard Realtek adaptor.

garycase · April 5, 2015

The SMART report looks okay. I initially thought your load cycle count was high, since you indicated you'd replaced a couple drives ... but looking more closely, this not a new drive at all (over 24,000 hours on it), so the count is fine.

Like Brit, I have a vague memory of a discussion about this issue a year or so ago, but don't recall how (or even if) it was finally resolved. I THINK the drive with the issue was simply replaced ... but a few minutes of searching didn't help find that specific thread, so I can't be sure.

SSD · April 5, 2015

I sort of recall having a discussion in the past how some drives are able to report temps even when spun down.

Western Digital drives (not HGST but WD branded drives) are able to report their SMART statistics without spinning up. I don't understand why other drives need to be spun up, but they do. Anyway, myMain "knows" that WD drives are special and reports temps of these drives when spun down. But I am quite certain that the unRAID GUI has never had this logic employed.

I have had issues with newly added drives to arrays not spinning down properly, which is fixed with a reboot. I would try that and expect it will resolve the issue. But if it continues, try booting in safe mode with no plugins installed or Dockers running.

reggierat · April 8, 2015

Hmm bit at a loss with this. Have rebooted several times, issue usually occurs with disk 3 but a couple of times it affected disk 2. Disk appears spun down in the GUI but temp is still being reported and s3_sleeps debug shows disk activity to the drive. Disk 3 has only one file on it so I doubt it is actual file access

garycase · April 8, 2015

Just had another thought -- are you running the most current Beta? There was a drive spindown issue that was, I believe, corrected in a later Beta ... don't recall exactly which versions had the bug and when it was corrected, but if you upgrade to 14b you'll at least know that's not your issue.

reggierat · April 10, 2015

I am running the latest Beta.

Apr 11 07:43:44 Tower s3_sleep: Disk activity on going: sde

Apr 11 07:43:44 Tower s3_sleep: Disk activity detected. Reset timers.

Apr 11 07:44:44 Tower s3_sleep: Disk activity on going: sde

Apr 11 07:44:44 Tower s3_sleep: Disk activity detected. Reset timers.

anyway to tell what this Disk activity, the drive shows spun down whilst this message is being logged, but temperature still being reported.

whether it's coincidence or not this drive is almost totally empty, except for the one file i copied to it to make sure i could write to it ok, at what point does the mover decide to start using the drive? all shares are set to high-water i believe

[edit] and for the sake of thoroughness i have swapped my sata cable over

reggierat · April 11, 2015

Just had success spinning all drives up and back down again, seems to have reset something.

On a whim I have also disabled cAdvisor as it's a recent install

Minor Bug Since Moving to XFS

Recommended Posts

reggierat

Link to comment

garycase

Link to comment

Squid

Link to comment

reggierat

Link to comment

BRiT

Link to comment

reggierat

Link to comment

garycase

Link to comment

SSD

Link to comment

reggierat

Link to comment

garycase

Link to comment

reggierat

Link to comment

reggierat

Link to comment

Archived