Jump to content
limetech

unRAID Server Release 6.0-beta15-x86_64 Available

507 posts in this topic Last Reply

Recommended Posts

Sound good, thanks for the update. I will wait for the new release.

John,  I have removed all plugins (moved them to docker) except snap and my unraid b15 still crashes in about two days. I think this is docker or kvm related. I am eager to move back to b12 for sake of stability. If I can expect b15 within the next week or so, I would like to give that a try before going back.

jonp - any updates on b16 with the docker fixes?

 

Soon.

 

^^^^^

TRUTH.

 

Putting it through its paces along with a number of other fixes before we release.

 

I wouldn't expect it to be more than a week or so, but crazy things can and do happen.

 

Share this post


Link to post

I noticed a few post back someone else is getting disk reports similar to mine, which I'll post below. I too have just filled a data disk though the reports below are for the SSD cache. These failures cause long freezes when carrying out most file oerations. I'm also curious why the SATA SSD connections is limited at 1.5G when it should go at 3G or even 6G? It is a Samsung 840.

 

May 10 00:00:06 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED

May 10 00:00:06 Tower kernel: ata4.00: cmd 60/08:58:a8:59:22/00:00:00:00:00/40 tag 11 ncq 4096 in

May 10 00:00:06 Tower kernel:        res 41/84:38:28:25:10/84:00:00:00:00/40 Emask 0x10 (ATA bus error)

May 10 00:00:06 Tower kernel: ata4.00: status: { DRDY ERR }

May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

May 10 00:00:06 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED

May 10 00:00:06 Tower kernel: ata4.00: cmd 60/08:60:80:31:10/00:00:00:00:00/40 tag 12 ncq 4096 in

May 10 00:00:06 Tower kernel:        res 41/84:38:28:25:10/84:00:00:00:00/40 Emask 0x10 (ATA bus error)

May 10 00:00:06 Tower kernel: ata4.00: status: { DRDY ERR }

May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

May 10 00:00:06 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED

May 10 00:00:06 Tower kernel: ata4.00: cmd 60/08:68:90:c3:18/00:00:00:00:00/40 tag 13 ncq 4096 in

May 10 00:00:06 Tower kernel:        res 41/84:38:28:25:10/84:00:00:00:00/40 Emask 0x10 (ATA bus error)

May 10 00:00:06 Tower kernel: ata4.00: status: { DRDY ERR }

May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

May 10 00:00:06 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED

May 10 00:00:06 Tower kernel: ata4.00: cmd 60/08:70:90:32:10/00:00:00:00:00/40 tag 14 ncq 4096 in

May 10 00:00:06 Tower kernel:        res 41/84:38:28:25:10/84:00:00:00:00/40 Emask 0x10 (ATA bus error)

May 10 00:00:06 Tower kernel: ata4.00: status: { DRDY ERR }

May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

May 10 00:00:06 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED

May 10 00:00:06 Tower kernel: ata4.00: cmd 60/08:78:80:24:10/00:00:00:00:00/40 tag 15 ncq 4096 in

May 10 00:00:06 Tower kernel:        res 41/84:38:28:25:10/84:00:00:00:00/40 Emask 0x10 (ATA bus error)

May 10 00:00:06 Tower kernel: ata4.00: status: { DRDY ERR }

May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

May 10 00:00:06 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED

May 10 00:00:06 Tower kernel: ata4.00: cmd 60/08:80:50:03:10/00:00:00:00:00/40 tag 16 ncq 4096 in

May 10 00:00:06 Tower kernel:        res 41/84:38:28:25:10/84:00:00:00:00/40 Emask 0x10 (ATA bus error)

May 10 00:00:06 Tower kernel: ata4.00: status: { DRDY ERR }

May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

May 10 00:00:06 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED

May 10 00:00:06 Tower kernel: ata4.00: cmd 60/08:88:d0:1a:10/00:00:00:00:00/40 tag 17 ncq 4096 in

May 10 00:00:06 Tower kernel:        res 41/84:38:28:25:10/84:00:00:00:00/40 Emask 0x10 (ATA bus error)

May 10 00:00:06 Tower kernel: ata4.00: status: { DRDY ERR }

May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

May 10 00:00:06 Tower kernel: ata4: hard resetting link

May 10 00:00:06 Tower kernel: ata4: nv: skipping hardreset on occupied port

May 10 00:00:07 Tower kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

May 10 00:00:07 Tower kernel: ata4.00: configured for UDMA/133

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] UNKNOWN Result: hostbyte=0x00 driverbyte=0x08

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] Sense Key : 0xb [current] [descriptor]

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] ASC=0x47 ASCQ=0x0

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] CDB:

May 10 00:00:07 Tower kernel: cdb[0]=0x28: 28 00 00 10 19 b0 00 00 08 00

May 10 00:00:07 Tower kernel: blk_update_request: I/O error, dev sde, sector 1055152

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] UNKNOWN Result: hostbyte=0x00 driverbyte=0x08

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] Sense Key : 0xb [current] [descriptor]

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] ASC=0x47 ASCQ=0x0

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] CDB:

May 10 00:00:07 Tower kernel: cdb[0]=0x28: 28 00 00 10 3d 28 00 00 08 00

May 10 00:00:07 Tower kernel: blk_update_request: I/O error, dev sde, sector 1064232

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] UNKNOWN Result: hostbyte=0x00 driverbyte=0x08

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] Sense Key : 0xb [current] [descriptor]

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] ASC=0x47 ASCQ=0x0

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] CDB:

May 10 00:00:07 Tower kernel: cdb[0]=0x28: 28 00 00 10 25 28 00 00 08 00

May 10 00:00:07 Tower kernel: blk_update_request: I/O error, dev sde, sector 1058088

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] UNKNOWN Result: hostbyte=0x00 driverbyte=0x08

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] Sense Key : 0xb [current] [descriptor]

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] ASC=0x47 ASCQ=0x0

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] CDB:

May 10 00:00:07 Tower kernel: cdb[0]=0x28: 28 00 00 10 24 d8 00 00 08 00

May 10 00:00:07 Tower kernel: blk_update_request: I/O error, dev sde, sector 1058008

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] UNKNOWN Result: hostbyte=0x00 driverbyte=0x08

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] Sense Key : 0xb [current] [descriptor]

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] ASC=0x47 ASCQ=0x0

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] CDB:

May 10 00:00:07 Tower kernel: cdb[0]=0x28: 28 00 00 04 2d e8 00 00 08 00

May 10 00:00:07 Tower kernel: blk_update_request: I/O error, dev sde, sector 273896

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] UNKNOWN Result: hostbyte=0x00 driverbyte=0x08

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] Sense Key : 0xb [current] [descriptor]

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] ASC=0x47 ASCQ=0x0

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] CDB:

May 10 00:00:07 Tower kernel: cdb[0]=0x28: 28 00 00 18 f4 40 00 00 08 00

May 10 00:00:07 Tower kernel: blk_update_request: I/O error, dev sde, sector 1635392

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] UNKNOWN Result: hostbyte=0x00 driverbyte=0x08

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] Sense Key : 0xb [current] [descriptor]

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] ASC=0x47 ASCQ=0x0

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] CDB:

May 10 00:00:07 Tower kernel: cdb[0]=0x28: 28 00 00 22 59 a8 00 00 08 00

May 10 00:00:07 Tower kernel: blk_update_request: I/O error, dev sde, sector 2251176

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] UNKNOWN Result: hostbyte=0x00 driverbyte=0x08

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] Sense Key : 0xb [current] [descriptor]

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] ASC=0x47 ASCQ=0x0

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] CDB:

May 10 00:00:07 Tower kernel: cdb[0]=0x28: 28 00 00 10 31 80 00 00 08 00

May 10 00:00:07 Tower kernel: blk_update_request: I/O error, dev sde, sector 1061248

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] UNKNOWN Result: hostbyte=0x00 driverbyte=0x08

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] Sense Key : 0xb [current] [descriptor]

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] ASC=0x47 ASCQ=0x0

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] CDB:

May 10 00:00:07 Tower kernel: cdb[0]=0x28: 28 00 00 18 c3 90 00 00 08 00

May 10 00:00:07 Tower kernel: blk_update_request: I/O error, dev sde, sector 1622928

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] UNKNOWN Result: hostbyte=0x00 driverbyte=0x08

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] Sense Key : 0xb [current] [descriptor]

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] ASC=0x47 ASCQ=0x0

May 10 00:00:07 Tower kernel: sd 4:0:0:0: [sde] CDB:

May 10 00:00:07 Tower kernel: cdb[0]=0x28: 28 00 00 10 32 90 00 00 08 00

May 10 00:00:07 Tower kernel: blk_update_request: I/O error, dev sde, sector 1061520

May 10 00:00:07 Tower kernel: ata4: EH complete

May 10 00:00:07 Tower kernel: ata4.00: Enabling discard_zeroes_data

Share this post


Link to post
May 10 00:00:06 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED

May 10 00:00:06 Tower kernel: ata4.00: cmd 60/08:88:d0:1a:10/00:00:00:00:00/40 tag 17 ncq 4096 in

May 10 00:00:06 Tower kernel:        res 41/84:38:28:25:10/84:00:00:00:00/40 Emask 0x10 (ATA bus error)

These errors are consistent with a bad / loose sata cable or power problems

Share this post


Link to post

@rjstott

 

You might want to have a look at:

http://www.techspot.com/article/997-samsung-ssd-read-performance-degradation/

 

Maybe not relevant to the problems you are having, and maybe you won't be affected (cache disk usage - transient),

but if you are holding VM / Container files or any other static files on the cache, you would be affected.

Interesting article.  Sounds like I need to pull out the Samsung SSD or be faced with performance problems at some point.  I do not think this is the cause of our current problem (rjscott and I) as I reformatted my SSD and re-copied all the data to it and the errors continued immediately.

 

I'm also not certain it's sata cable or power (I'm not ruling it out however)...I did replace the sata cable and even changed the sata port that it was connected to.  Power seems stable (it's in a super micro 24 drive server with the dual power supplies) and I have no other power problems.  In addition, it seems this problem started with beta 15.  It IS making the entire server pause while it tries to reset the sata port.

 

If this problem is just limited to samsung SSD's as cache drives, then I'll just pull mine, but since the problem seems to have started with a release, is it possible this is a driver or kernel problem?

 

 

Share this post


Link to post
May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

 

It's the BadCRC and ICRC error flags that specifically indicate corrupted packets, usually from a bad SATA cable.  Since you have repeated ICRC error flags, which cause the pauses and resets, and cause the SATA link speed to be slowed down to hopefully improve communications integrity, I suspect you also have an increased UDMA_CRC_Error_Count on the SMART report for that drive.  I know you said you replaced the SATA cable, but it doesn't look like a good cable from here.  There's still a small chance that it may be a bad power situation instead.

Share this post


Link to post

One more interesting observation:

 

I have "force NCQ disabled=yes" on the disk configuration screen.  Yet it appears (maybe I'm looking the wrong way?) that NCQ is still enabled for all my drives, including this cache drive that's having the problems.

 

If I

cat /sys/block/sdc/device/queue_depth

it reports a value of 31, which indicates NCQ is in play if I understand this correctly (I believe it should report 0 or 1 if NCQ is disabled?)

 

now, if I change the queue_depth to 1 with

echo 1 >/sys/block/sdc/device/queue_depth

it appears that my errors with this ssd no longer occur (based on a quick test...set to 1, copy large file to ssd, no errors, set back to 31, recopy same file, errors occur.)

 

Am I understanding this right?

 

 

I thought about looking at this because of this: https://bugzilla.kernel.org/show_bug.cgi?id=89261

and this document which explains how to dynamically change ncq settings: https://exemen.wordpress.com/2011/05/16/enabling-disabling-and-checking-ncq/

 

 

Share this post


Link to post

Rob: 

My UDMA_CRC_Error_Count is 2,so it does not seem to be CRC errors from the drives perspective.

I'm still happy to try another SATA cable if that still makes sense now that the errors seem to have stopped (I'm going to monitor for a couple days before confirming) with the NCQ setting change.

 

Also, it's possible that rjscotts problem *is* a cable or power, since when I look back at my log, I see a different message before the failed command...I see:

May  9 23:14:43 Tower kernel: ata16.00: cmd 61/00:50:e0:4f:b1/38:00:03:00:00/40 tag 10 ncq 7340032 out
May  9 23:14:43 Tower kernel:         res 40/00:a8:d8:a7:ae/00:00:03:00:00/40 Emask 0x40 (internal error)
May  9 23:14:43 Tower kernel: ata16.00: status: { DRDY }
May  9 23:14:43 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED

I do not see the CRC error message you pointed out...I don't know what DRDY means, but could it mean that we're overflowing the buffers sending data to the drive (that would explain the out of IOMMU space I reported earlier as well I would assume)

 

 

 

May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

 

It's the BadCRC and ICRC error flags that specifically indicate corrupted packets, usually from a bad SATA cable.  Since you have repeated ICRC error flags, which cause the pauses and resets, and cause the SATA link speed to be slowed down to hopefully improve communications integrity, I suspect you also have an increased UDMA_CRC_Error_Count on the SMART report for that drive.  I know you said you replaced the SATA cable, but it doesn't look like a good cable from here.  There's still a small chance that it may be a bad power situation instead.

Share this post


Link to post

One more interesting observation:

 

I have "force NCQ disabled=yes" on the disk configuration screen.  Yet it appears (maybe I'm looking the wrong way?) that NCQ is still enabled for all my drives, including this cache drive that's having the problems.

 

If I

cat /sys/block/sdc/device/queue_depth

it reports a value of 31, which indicates NCQ is in play if I understand this correctly (I believe it should report 0 or 1 if NCQ is disabled?)

 

now, if I change the queue_depth to 1 with

echo 1 >/sys/block/sdc/device/queue_depth

it appears that my errors with this ssd no longer occur (based on a quick test...set to 1, copy large file to ssd, no errors, set back to 31, recopy same file, errors occur.)

 

Am I understanding this right?

 

I thought about looking at this because of this: https://bugzilla.kernel.org/show_bug.cgi?id=89261

and this document which explains how to dynamically change ncq settings: https://exemen.wordpress.com/2011/05/16/enabling-disabling-and-checking-ncq/

 

That IS interesting!  By default NCQ should be off for all drives.  I remember long ago when we were testing this, we found that in almost every case, performance was better with NCQ off!  So Tom added a loop to turn it off for all drives whenever the array is started, but left it in as an option for anyone to change if they desired.  If yours is on, there may be a bug involved.

Share this post


Link to post

May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

 

It's the BadCRC and ICRC error flags that specifically indicate corrupted packets, usually from a bad SATA cable.  Since you have repeated ICRC error flags, which cause the pauses and resets, and cause the SATA link speed to be slowed down to hopefully improve communications integrity, I suspect you also have an increased UDMA_CRC_Error_Count on the SMART report for that drive.  I know you said you replaced the SATA cable, but it doesn't look like a good cable from here.  There's still a small chance that it may be a bad power situation instead.

Rob: 

My UDMA_CRC_Error_Count is 2,so it does not seem to be CRC errors from the drives perspective.

I'm still happy to try another SATA cable if that still makes sense now that the errors seem to have stopped (I'm going to monitor for a couple days before confirming) with the NCQ setting change.

Also, it's possible that rjscotts problem *is* a cable or power

Yes, I was referring to the ICRC in rjstott's syslog extract.

 

when I look back at my log, I see a different message before the failed command...I see:

May  9 23:14:43 Tower kernel: ata16.00: cmd 61/00:50:e0:4f:b1/38:00:03:00:00/40 tag 10 ncq 7340032 out
May  9 23:14:43 Tower kernel:         res 40/00:a8:d8:a7:ae/00:00:03:00:00/40 Emask 0x40 (internal error)
May  9 23:14:43 Tower kernel: ata16.00: status: { DRDY }
May  9 23:14:43 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED

I do not see the CRC error message you pointed out...I don't know what DRDY means, but could it mean that we're overflowing the buffers sending data to the drive (that would explain the out of IOMMU space I reported earlier as well I would assume)

DRDY just means 'Drive ReaDY', a good flag.  The important part of your exception is the 'internal error' message unfortunately.  When a programmer maps something odd or unexpected to 'internal error', it usually means they either don't expect it to happen, or don't want to deal with it, or don't know how to deal with it.  There's no further information available, so you are kind of stuck!  Any firmware updates available, for that disk controller?

Share this post


Link to post

May 10 00:00:06 Tower kernel: ata4.00: error: { ICRC ABRT }

 

It's the BadCRC and ICRC error flags that specifically indicate corrupted packets, usually from a bad SATA cable.  Since you have repeated ICRC error flags, which cause the pauses and resets, and cause the SATA link speed to be slowed down to hopefully improve communications integrity, I suspect you also have an increased UDMA_CRC_Error_Count on the SMART report for that drive.  I know you said you replaced the SATA cable, but it doesn't look like a good cable from here.  There's still a small chance that it may be a bad power situation instead.

Rob: 

My UDMA_CRC_Error_Count is 2,so it does not seem to be CRC errors from the drives perspective.

I'm still happy to try another SATA cable if that still makes sense now that the errors seem to have stopped (I'm going to monitor for a couple days before confirming) with the NCQ setting change.

Also, it's possible that rjscotts problem *is* a cable or power

Yes, I was referring to the ICRC in rjstott's syslog extract.

 

when I look back at my log, I see a different message before the failed command...I see:

May  9 23:14:43 Tower kernel: ata16.00: cmd 61/00:50:e0:4f:b1/38:00:03:00:00/40 tag 10 ncq 7340032 out
May  9 23:14:43 Tower kernel:         res 40/00:a8:d8:a7:ae/00:00:03:00:00/40 Emask 0x40 (internal error)
May  9 23:14:43 Tower kernel: ata16.00: status: { DRDY }
May  9 23:14:43 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED

I do not see the CRC error message you pointed out...I don't know what DRDY means, but could it mean that we're overflowing the buffers sending data to the drive (that would explain the out of IOMMU space I reported earlier as well I would assume)

DRDY just means 'Drive ReaDY', a good flag.  The important part of your exception is the 'internal error' message unfortunately.  When a programmer maps something odd or unexpected to 'internal error', it usually means they either don't expect it to happen, or don't want to deal with it, or don't know how to deal with it.  There's no further information available, so you are kind of stuck!  Any firmware updates available, for that disk controller?

 

Thanks robj:

It makes more sense to me now.

 

I'll look into controller firmware...Im using the sata controller on the motherboard (it's a super micro) for this drive.  The drive itself has the latest.

 

I checked the NCQ settings on several of my drives and they're all set to 31...so looks like the all-off isn't working (at least in my setup)  So either a bug or something strange with my setup...I'll look closer tomorrow.

 

I'll also keep watching to see if the error messages occur now that the ssd is set to NCQ=1.  I suspect this is the bug that's alluded to in the bugzilla report (doesn't look like its been fixed by the linux kernel guys yet).

 

Thanks again!

David

 

 

Share this post


Link to post

I didn't mention that I have a continuous background disk read acivity of average 256K/s that I don't understand as well as associated cpu use. I don't run docker or VM on this box but it does have Plex. I have stopped Plex but this didn't kill the background activity. I ran the 'smart tests' on all drives and checked the error logs and all passed ok. The SSD cache drive doesn't show any errors logged which is strange? Overnight, there was an error report about every 30 mins but it varies between 10-40 mins. Apart from the disk operation timeouts, reading and writing to the array seems ok. I've attached a snapshot of activity, Plex is stopped!

unRaid_Stats.jpg.2ff0d45aea7c8a93eef79469cc313a4b.jpg

Share this post


Link to post

Does the disk activity include the main root system?  If so, that's having your system running as its using /var/log and /etc and /usr/local and /proc and /tmp and all the typical unix directories of a running system.

 

Also do you have cache dirs enabled?

Share this post


Link to post

Yes I was running directory caching and when I disabled it the disk activity stopped. Perhaps I need to read up about the level of caching to use? So on the disk error saga I swapped around a pair of Sata cables and the error moved. OK so I changed out what seemed to be the faulty cable and now I've got similar errors on a drive I didn't touch? I plan to leave directory caching off for a while and see if things settle down. At least the drive reporting errors isn't the cache and changes to other drives are much less frequent.

 

Thanks to everyone for their input.

Share this post


Link to post

Yes I was running directory caching and when I disabled it the disk activity stopped. Perhaps I need to read up about the level of caching to use?

What are your CacheDirs settings?

 

So on the disk error saga I swapped around a pair of Sata cables and the error moved. OK so I changed out what seemed to be the faulty cable and now I've got similar errors on a drive I didn't touch?

Users often think exception handler messaging looks very similar, but most of it is just the 'envelope'.  The important part is the error message in parentheses and the error flags, which may only be a twentieth of the whole verbiage, yet can indicate completely different errors and which subsystem is involved.  How similar were the errors?

Share this post


Link to post

Here's the latest bit of log:

 

May 10 19:57:31 Tower kernel: res 41/84:00:c8:f2:2c/84:00:05:00:00/40 Emask 0x10 (ATA bus error)

May 10 19:57:31 Tower kernel: ata5.00: status: { DRDY ERR }

May 10 19:57:31 Tower kernel: ata5.00: error: { ICRC ABRT }

May 10 19:57:31 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED

May 10 19:57:31 Tower kernel: ata5.00: cmd 60/20:10:70:f5:2c/00:00:05:00:00/40 tag 2 ncq 16384 in

May 10 19:57:31 Tower kernel: res 41/84:00:c8:f2:2c/84:00:05:00:00/40 Emask 0x10 (ATA bus error)

May 10 19:57:31 Tower kernel: ata5.00: status: { DRDY ERR }

May 10 19:57:31 Tower kernel: ata5.00: error: { ICRC ABRT }

May 10 19:57:31 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED

May 10 19:57:31 Tower kernel: ata5.00: cmd 60/10:18:f8:f9:2c/00:00:05:00:00/40 tag 3 ncq 8192 in

May 10 19:57:31 Tower kernel: res 41/84:00:c8:f2:2c/84:00:05:00:00/40 Emask 0x10 (ATA bus error)

May 10 19:57:31 Tower kernel: ata5.00: status: { DRDY ERR }

May 10 19:57:31 Tower kernel: ata5.00: error: { ICRC ABRT }

May 10 19:57:31 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED

May 10 19:57:31 Tower kernel: ata5.00: cmd 60/a8:e8:c0:96:2c/00:00:05:00:00/40 tag 29 ncq 86016 in

May 10 19:57:31 Tower kernel: res 41/84:00:c8:f2:2c/84:00:05:00:00/40 Emask 0x10 (ATA bus error)

May 10 19:57:31 Tower kernel: ata5.00: status: { DRDY ERR }

May 10 19:57:31 Tower kernel: ata5.00: error: { ICRC ABRT }

May 10 19:57:31 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED

May 10 19:57:31 Tower kernel: ata5.00: cmd 60/18:f0:90:98:2c/00:00:05:00:00/40 tag 30 ncq 12288 in

May 10 19:57:31 Tower kernel: res 41/84:00:c8:f2:2c/84:00:05:00:00/40 Emask 0x10 (ATA bus error)

May 10 19:57:31 Tower kernel: ata5.00: status: { DRDY ERR }

May 10 19:57:31 Tower kernel: ata5.00: error: { ICRC ABRT }

May 10 19:57:31 Tower kernel: ata5: hard resetting link

May 10 19:57:31 Tower kernel: ata5: nv: skipping hardreset on occupied port

May 10 19:57:31 Tower kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

May 10 19:57:31 Tower kernel: ata5.00: configured for UDMA/133

May 10 19:57:31 Tower kernel: ata5: EH complete

May 10 19:57:31 Tower kernel: ata5.00: Enabling discard_zeroes_data

 

Looks very similar to me. Except we're now faulting ata5 not ata4! Could this be a MB problem? The board is an Asus m2npv-vm with a Phenom X4 cpu. There are just 4 SATA ports and it has legacy PATA (not used). When I first swapped the cable the problem became ata6 and if you looked back the cache was ata4.

 

 

The Directory Cache settings are all default as when I read this up before there seemed to be no need to constrain anything?

 

 

Share this post


Link to post

Here's the latest bit of log:

 

May 10 19:57:31 Tower kernel: res 41/84:00:c8:f2:2c/84:00:05:00:00/40 Emask 0x10 (ATA bus error)

May 10 19:57:31 Tower kernel: ata5.00: status: { DRDY ERR }

May 10 19:57:31 Tower kernel: ata5.00: error: { ICRC ABRT }

 

Looks very similar to me. Except we're now faulting ata5 not ata4! Could this be a MB problem? The board is an Asus m2npv-vm with a Phenom X4 cpu. There are just 4 SATA ports and it has legacy PATA (not used). When I first swapped the cable the problem became ata6 and if you looked back the cache was ata4.

That is the same error, a bad cable.  It is not necessarily a different drive, because linux initialization is dynamic and multi-threaded, and different controllers and devices are often setup at different times on each boot.  That could certainly be the same drive, you would have to check the drive setup part of the syslog.  Put another way, the same drive could be associated with ata4 on one boot, and ata15 on another.  This is why unRAID identifies drives by their serial numbers, because all of their device ID's and symbols can be different on each boot.

 

The Directory Cache settings are all default as when I read this up before there seemed to be no need to constrain anything?

I don't recommend the default settings for what folders to cache.  I believe the default is to cache everything, but you should only cache what needs caching.  If you do have something that is trying to check every folder and file on your entire server, then yes you may want to try caching it all, but that's not normally true.  Usually, you only want media folders cached, if you have media programs that constantly poll the media folders.  But everyone's needs are different.  One way to do it, is cache nothing, then determine what drives are staying spun up, then figure out what is being accessed repeatably on those drives, and set up the caching to only cache those folders.

Share this post


Link to post

Thanks for the explanation of drive allocation and I'll try another cable but that would make two bad cables, maybe these nice round flexible cables aren't so clever after all!

 

Given that the server is mainly for PLEX there probably isn't any need to cache anyway?

 

Regards to all

Share this post


Link to post

Well if plex is transcoding, cache helps for a tmp directory for that.

 

Thanks for the explanation of drive allocation and I'll try another cable but that would make two bad cables, maybe these nice round flexible cables aren't so clever after all!

 

Given that the server is mainly for PLEX there probably isn't any need to cache anyway?

 

Regards to all

Share this post


Link to post

 

Disclaimer: This is beta software.  While every effort has been made to ensure no data loss, use at your own risk!

 

I just upgraded my NAS from unRAID 5.0.5 to 6.0-beta15 following the upgrade guide.  The process went very smoothly.  I ran a parity check over night, and re-enabled and tested services this morning.  All seems to be working well.  The new webGUI is really nice!

 

FYI - my unRAID environment is itself a VM on VMware ESXi 5.5 with a passed-through disk controller.  I'm not running any dockers or VM from within unRAID.

 

John

Share this post


Link to post

I have upgraded to v6.

 

Can someone check if my syslog is ok.

 

I have noticed that "Unraid emhttp: read_line: client closed the connection" is coming up a lot and its continuing!

syslog.zip

Share this post


Link to post

With Beta15-x86_64 I see the following on my console:

 

Tower login: stat: cannot stat '/usr/local/emhttp/mnt/cache/*' : No such file or directory

 

I recently added a cache drive to my system if that helps. Not sure it is working properly. There are still large files on the cache drive even though I have run mover. I have enabled cache on all shares.

 

Thanks for any help.

 

Spencer

Share this post


Link to post

With Beta15-x86_64 I see the following on my console:

 

Tower login: stat: cannot stat '/usr/local/emhttp/mnt/cache/*' : No such file or directory

 

I recently added a cache drive to my system if that helps. Not sure it is working properly. There are still large files on the cache drive even though I have run mover. I have enabled cache on all shares.

 

Thanks for any help.

 

Spencer

See v6 help link in my sig

Share this post


Link to post

Just upgraded from 5.0.5 and got one question regarding the cache drive. When I boot up my server the array will start without my cache drive added. If I stop the array it will automatically find the cache drive, I do not need to do anything more than just to start the array and it will automatically add the cache drive. Is this how it's supposed to work in unraid 6 or is there something I can do about it? Just to make sure no files on the drive was the problem I reformatted the drive with the same result.

 

I've added my syslog! sdg is the cache drive.

 

Thank you in advance!

syslog.zip

Share this post


Link to post

Just upgraded from 5.0.5 and got one question regarding the cache drive. When I boot up my server the array will start without my cache drive added. If I stop the array it will automatically find the cache drive, I do not need to do anything more than just to start the array and it will automatically add the cache drive. Is this how it's supposed to work in unraid 6 or is there something I can do about it? Just to make sure no files on the drive was the problem I reformatted the drive with the same result.

 

I've added my syslog! sdg is the cache drive.

 

Thank you in advance!

No that is not the way its supposed to work. Have you tried it on a different port or with a different cable?

Share this post


Link to post
Guest
This topic is now closed to further replies.