Jump to content
limetech

unRAID Server Release 6.0-beta15-x86_64 Available

507 posts in this topic Last Reply

Recommended Posts

...

  4. Reboot - System should come up EXACTLY like V5 except only accessing 250GB of each share (disk?)

The 250GB DISK limit would not take affect until RC1, and it is still up for debate whether that would happen.

  9. Insert NEW 6TB Drive - Format - Allocate - should become XFS Parity Drive. (XFS? or RFS?)

Preclear any new drives before trusting them in the array. Parity disk does not have a file system, so is not formatted, neither XFS or RFS. Parity would be rebuilt at this point.

10. Rerun Parity Check. XFS Parity and RFS Share Disks.

See above, but a parity check after a parity build is always a good idea.

11. If Successful - Insert new 4TB Disk - Format - Allocate - Should be Share Disk 3 and XFS.

I am going to substitute Disk everywhere you have put Share Disk for the remainder of your quoted text. While a disk can be shared, calling it a Share Disk in this context is confusing and adds nothing to the discussion. The only time I might refer to something as a Share Disk is in the context of talking about a User Share. Then I might talk about the disks included in a user share as a share disk.

12. Initiate copy process to copy/move ... Disk 1 to ... Disk 3.

What copy process do you refer to here?

13. Initiate copy process to copy/move ... Disk 2 to ... Disk 3.

14. Verify all Shares.

Not clear if you mean disks or user shares here. Maybe disks would be appropriate.
... Disks 1 and 2 should empty if data was MOVED. Verify everything works as expected and stable before proceeding.

15. Format Old Parity 2TB Disk for additional storage as XFS.

You will have to preclear this disk before adding it to the array.

16. Format Old ... Disks 1 & 2 or reallocate them for other purposes.

preclear if adding to array.

17. Purchase and install 240 or 480 GB SSD - setup as XFS or BTRFS for use as Docker Application/Mover Disk

By Mover Disk I assume you mean Cache disk.

18. Proceed with Dockerization of existing apps (mostly just want to run Universal Media Server)

Not sure I've seen a UMS docker, but there might be
, I am still thinking about upgrading Motherboard/CPU/Memory)

 

Now I am not so sure...Gary says to upgrade Parity BEFORE you move, but wouldn't that keep you on RFS on Parity volume? Is the rest of this correct?

Hope this helps.

 

If you decide to do this, please feel free to start another thread so we can help guide you through this.

Share this post


Link to post

Personally, I would:

 

1) Buy the License, it works on both unRAID 5 and 6.

2) Install the License

3) Upgrade Parity, procedure here.  As trurl points out, parity doesn't have a file system.

4) Upgrade to unRAID 6 so you have XFS support.

5) Add the 4TB disk and migrate data.

6) Reformat other disks and move onto other things like cache disks and dockers.

7) Open up a thread in the unRAID 6 General Support section if anything happens along the way.

 

When I went through a similar process it was very helpful to carefully backup my USB stick each step of the way.  That way if anything went wrong, my exposure was usually limited to a restore of the USB stick and parity sync.  Make sure your base unRAID 6.0 install is stable before you start experimenting, IMO.

 

FWIW, I'm a little annoyed with the way WD Red 6TB disks have worked in my setup.  I've had to leave them spinning as they seem to be slow to spin up and that has caused some issues.  I'm not sure whether this issue is related to unRAID 6 or not - my upgrade sequence was somewhat like yours, I was doing a bunch of things at the same time.  From that comes my recommendation above - go slow, test, and take it one step at a time.  It's a minor issue now that I've sorted things out, though.

Share this post


Link to post

Hello:

After upgrading to beta16, (possibly before, but I didn't notice these errors--I think they're new with beta16)...

I'm getting a bunch of errors talking to my cache drive (it's an SSD).

 

The errors in my log look like this:

May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

May  4 13:56:31 Tower kernel: ata16: EH in SWNCQ mode,QC:qc_active 0x3C00 sactive 0x3C00

May  4 13:56:31 Tower kernel: ata16: SWNCQ:qc_active 0x0 defer_bits 0x0 last_issue_tag 0x9

May  4 13:56:31 Tower kernel:  dhfis 0x0 dmafis 0x0 sdbfis 0x0

May  4 13:56:31 Tower kernel: ata16: ATA_REG 0x40 ERR_REG 0x0

May  4 13:56:31 Tower kernel: ata16: tag : dhfis dmafis sdbfis sactive

May  4 13:56:31 Tower kernel: ata16.00: exception Emask 0x0 SAct 0x3c00 SErr 0x0 action 0x6

May  4 13:56:31 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED

May  4 13:56:31 Tower kernel: ata16.00: cmd 61/00:50:68:79:05/20:00:00:00:00/40 tag 10 ncq 4194304 out

May  4 13:56:31 Tower kernel:        res 40/00:70:58:d3:02/00:00:00:00:00/40 Emask 0x40 (internal error)

May  4 13:56:31 Tower kernel: ata16.00: status: { DRDY }

May  4 13:56:31 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED

May  4 13:56:31 Tower kernel: ata16.00: cmd 61/00:58:68:99:05/1c:00:00:00:00/40 tag 11 ncq 3670016 out

 

At first I thought maybe firmware on the SSD, so I upgraded that, and while I was in there I replaced the SATA cable to the SSD.

 

I also saw some strange storage behavior, with error messages on some folders when viewed via share0 indicating "wrong exec format"...I suspect those are a side effect of this problem.  Since this was my cache drive and it was formatted with btrfs, I moved everything off, reformatted with ifs, and put files back...I saw a TON of these errors when putting the files back...so it wasn't related to the file system.

 

I have NCQ turned off in the unraid settings, and smart ctrl shows the cache drive is clean...

 

Where should I look next?...or is this a bug in the sata_nv driver again?

 

 

Share this post


Link to post

Personally, I would:

 

1) Buy the License, it works on both unRAID 5 and 6.

 

v5 does not support the Basic license.    I agree it's best to have the license BEFORE migrating to v6 => just put the key file on your v6 flash drive.

 

[Note:  I don't think it would hurt anything if you put the Basic key file on a v5 flash drive -- I suspect it would simply run as 3-drive free (unlicensed) setup.]

 

 

Share this post


Link to post

Hello:

After upgrading to beta16, (possibly before, but I didn't notice these errors--I think they're new with beta16)...

I'm getting a bunch of errors talking to my cache drive (it's an SSD).

 

The errors in my log look like this:

May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

May  4 13:56:31 Tower kernel: sata_nv 0000:00:05.2: PCI-DMA: Out of IOMMU space for 65536 bytes

May  4 13:56:31 Tower kernel: ata16: EH in SWNCQ mode,QC:qc_active 0x3C00 sactive 0x3C00

May  4 13:56:31 Tower kernel: ata16: SWNCQ:qc_active 0x0 defer_bits 0x0 last_issue_tag 0x9

May  4 13:56:31 Tower kernel:  dhfis 0x0 dmafis 0x0 sdbfis 0x0

May  4 13:56:31 Tower kernel: ata16: ATA_REG 0x40 ERR_REG 0x0

May  4 13:56:31 Tower kernel: ata16: tag : dhfis dmafis sdbfis sactive

May  4 13:56:31 Tower kernel: ata16.00: exception Emask 0x0 SAct 0x3c00 SErr 0x0 action 0x6

May  4 13:56:31 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED

May  4 13:56:31 Tower kernel: ata16.00: cmd 61/00:50:68:79:05/20:00:00:00:00/40 tag 10 ncq 4194304 out

May  4 13:56:31 Tower kernel:        res 40/00:70:58:d3:02/00:00:00:00:00/40 Emask 0x40 (internal error)

May  4 13:56:31 Tower kernel: ata16.00: status: { DRDY }

May  4 13:56:31 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED

May  4 13:56:31 Tower kernel: ata16.00: cmd 61/00:58:68:99:05/1c:00:00:00:00/40 tag 11 ncq 3670016 out

 

At first I thought maybe firmware on the SSD, so I upgraded that, and while I was in there I replaced the SATA cable to the SSD.

 

I also saw some strange storage behavior, with error messages on some folders when viewed via share0 indicating "wrong exec format"...I suspect those are a side effect of this problem.  Since this was my cache drive and it was formatted with btrfs, I moved everything off, reformatted with ifs, and put files back...I saw a TON of these errors when putting the files back...so it wasn't related to the file system.

 

I have NCQ turned off in the unraid settings, and smart ctrl shows the cache drive is clean...

 

Where should I look next?...or is this a bug in the sata_nv driver again?

 

First and foremost, what errors are occurring that need fixing?  I see that you found some events in the log, but what functionality has stopped working for you or is broken that wasn't broken before.  Error messages aren't always an indication of something broken.  IOMMU has to do with virtualization (Intel VT-d specifically).

Share this post


Link to post

Hello:

 

If you want to start a new topic, then do so - don't change the Subject of an existing topic!

 

Update by bjp999 - expletive deleted. Subjects repaired.

Share this post


Link to post

Guys, since you are discussing something I am imminently contemplating doing...

 

Be wise and capture md5 or other hashes of all files you plan to move or convert to XFS.

If something goes wrong, at the very least you can verify the files to insure they are intact.

 

There are some tools to assist in this such as bitrot and bunker.

 

I helped loady with some procedures a while back with some example commands,  you can start here.

http://lime-technology.com/forum/index.php?topic=38507.msg360594#msg360594

 

I would also suggest you do smart long tests on each of the data drives and verify there are no pending sectors before doing any upgrades.

Share this post


Link to post

I'll start by apologizing for changing the subject...I didn't realize I was changing the entire thread.  Other boards create a sub-subject within a thread if the subject is edited.  Strange that I can even edit it.  Thanks Mods for fixing it...I meant no harm...

 

I was hoping I'd distilled the problem down to these errors.  This is even more confusing now...if IOMMU is an Intel VT-d error then I must have some real problems, since I have AMD CPU's, and wasn't running any virtualization at the time these occurred.  I also only see these errors when writing to my SSD (one out of 19 drives).

 

Perhaps the IOMMU's are just a side effect, the real problem seems to the "exception emask" and "failed command" messages.  there are hundreds of these.

 

I'll start at the beginning, since perhaps the original snippit isn't enough to distill the problem.

 

Here's the history:

The night after I upgraded to beta 15 (from beta 14), the mover ran and tried to move all my files that were on cache onto a drive that was full...it generated a ton of the messages I quoted above, along with "out of space" errors..and blew away the files it wasn't able to move (I lost data).  It wasn't anything critical I'd lost, and it appeared that the problem was likely with the ssd or my share configuration, so I updated those as I indicated (reformatted, flashed firmware, added full drives to the "excluded" disks in the share configuration, etc) but the errors writing to the SSD didn't go away. 

 

Here are some extracts from that first nights log:

 

These errors occurred while I was copying one of my disk image files (to and from my cache drive) so I could test KVM vs XEN:

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: sata_nv 0000:00:05.1: PCI-DMA: Out of IOMMU space for 65536 bytes

Apr 30 22:17:52 Tower kernel: ata30: EH in SWNCQ mode,QC:qc_active 0xFFFF0 sactive 0xFFFF0

Apr 30 22:17:52 Tower kernel: ata30: SWNCQ:qc_active 0x0 defer_bits 0x0 last_issue_tag 0x3

Apr 30 22:17:52 Tower kernel:  dhfis 0x0 dmafis 0x0 sdbfis 0x0

Apr 30 22:17:52 Tower kernel: ata30: ATA_REG 0x40 ERR_REG 0x0

Apr 30 22:17:52 Tower kernel: ata30: tag : dhfis dmafis sdbfis sactive

Apr 30 22:17:52 Tower kernel: ata30.00: exception Emask 0x0 SAct 0xffff0 SErr 0x0 action 0x6

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:20:a0:58:6f/24:00:28:00:00/40 tag 4 ncq 4718592 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:28:a0:7c:6f/20:00:28:00:00/40 tag 5 ncq 4194304 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:30:a0:9c:6f/24:00:28:00:00/40 tag 6 ncq 4718592 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:38:a0:c0:6f/28:00:28:00:00/40 tag 7 ncq 5242880 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:40:a0:e8:6f/20:00:28:00:00/40 tag 8 ncq 4194304 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:48:a0:08:70/20:00:28:00:00/40 tag 9 ncq 4194304 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:50:a0:28:70/28:00:28:00:00/40 tag 10 ncq 5242880 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:58:a0:50:70/34:00:28:00:00/40 tag 11 ncq 6815744 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:60:a0:84:70/28:00:28:00:00/40 tag 12 ncq 5242880 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:68:a0:ac:70/24:00:28:00:00/40 tag 13 ncq 4718592 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:70:a0:d0:70/28:00:28:00:00/40 tag 14 ncq 5242880 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:78:a0:f8:70/24:00:28:00:00/40 tag 15 ncq 4718592 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:80:a0:1c:71/0c:00:28:00:00/40 tag 16 ncq 1572864 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:88:a0:28:71/20:00:28:00:00/40 tag 17 ncq 4194304 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:90:a0:48:71/1c:00:28:00:00/40 tag 18 ncq 3670016 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30.00: failed command: WRITE FPDMA QUEUED

Apr 30 22:17:52 Tower kernel: ata30.00: cmd 61/00:98:a0:64:71/20:00:28:00:00/40 tag 19 ncq 4194304 out

Apr 30 22:17:52 Tower kernel:        res 40/00:a0:90:b8:6e/00:00:28:00:00/40 Emask 0x40 (internal error)

Apr 30 22:17:52 Tower kernel: ata30.00: status: { DRDY }

Apr 30 22:17:52 Tower kernel: ata30: hard resetting link

Apr 30 22:17:52 Tower kernel: ata30: nv: skipping hardreset on occupied port

Apr 30 22:17:52 Tower kernel: ata30: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Apr 30 22:17:52 Tower kernel: ata30.00: configured for UDMA/133

Apr 30 22:17:52 Tower kernel: ata30: EH complete

This exact sequence appears approx 197 times in a row (sometimes with different number of WRITE QUEUED messages) from timestamp 22:17:31 to 23:51:11 (the time I was copying a 10GB file)  Also seems like a long time to copy 10 GB to and from an SSD...

 

I didn't notice these messages until the next day, when I went looking to find out why a bunch of files were missing that should have been moved over from the cache drive.  Here's part of the log from the Mover script:

 

May  1 03:40:01 Tower logger: moving "Movies"

May  1 03:40:01 Tower logger: ./Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002/imdbinfo.nfo

May  1 03:40:01 Tower logger: .d..t...... ./

May  1 03:40:16 Tower shfs/user0: shfs_setxattr: lsetxattr: system.posix_acl_access /mnt/disk9/Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002 (28) No space left on device

May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002, ACL_TYPE_ACCESS): Exec format error (8)

May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002, ACL_TYPE_ACCESS): Exec format error (8)

May  1 03:40:16 Tower logger: rsync: set_acl: sys_acl_set_file(Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002, ACL_TYPE_ACCESS): No space left on device (28)

May  1 03:40:16 Tower logger: rsync: recv_generator: failed to stat "/mnt/user0/Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002/imdbinfo.nfo": Exec format error (8)

May  1 03:40:16 Tower logger: .d..t...... Movies/CLEAN/

May  1 03:40:16 Tower logger: .d.......a. Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002/

May  1 03:40:16 Tower logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0]

May  1 03:40:16 Tower logger: rm: cannot remove '/mnt/user0/./Movies/CLEAN/godzilla/Godzilla against Mechagodzilla 2002/imdbinfo.nfo': Exec format error

May  1 03:40:16 Tower logger: ./Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001/imdbinfo.nfo

May  1 03:40:16 Tower shfs/user0: shfs_setxattr: lsetxattr: system.posix_acl_access /mnt/disk9/Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001 (28) No space left on device

May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001, ACL_TYPE_ACCESS): Exec format error (8)

May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001, ACL_TYPE_ACCESS): Exec format error (8)

May  1 03:40:16 Tower logger: rsync: set_acl: sys_acl_set_file(Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001, ACL_TYPE_ACCESS): No space left on device (28)

May  1 03:40:16 Tower logger: rsync: recv_generator: failed to stat "/mnt/user0/Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001/imdbinfo.nfo": Exec format error (8)

May  1 03:40:16 Tower logger: .d..t...... Movies/CLEAN/

May  1 03:40:16 Tower logger: .d.......a. Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001/

May  1 03:40:16 Tower logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0]

May  1 03:40:16 Tower logger: rm: cannot remove '/mnt/user0/./Movies/CLEAN/godzilla/Godzilla Mothra and King Ghidorah Giant Monsters All-Out Attack 2001/imdbinfo.nfo': Exec format error

May  1 03:40:16 Tower logger: ./Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992/imdbinfo.nfo

May  1 03:40:16 Tower shfs/user0: shfs_setxattr: lsetxattr: system.posix_acl_access /mnt/disk9/Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992 (28) No space left on device

May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992, ACL_TYPE_ACCESS): Exec format error (8)

May  1 03:40:16 Tower logger: rsync: get_acl: sys_acl_get_file(Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992, ACL_TYPE_ACCESS): Exec format error (8)

May  1 03:40:16 Tower logger: rsync: set_acl: sys_acl_set_file(Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992, ACL_TYPE_ACCESS): No space left on device (28)

May  1 03:40:16 Tower logger: rsync: recv_generator: failed to stat "/mnt/user0/Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992/imdbinfo.nfo": Exec format error (8)

May  1 03:40:16 Tower logger: .d..t...... Movies/CLEAN/

May  1 03:40:16 Tower logger: .d.......a. Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992/

May  1 03:40:16 Tower logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0]

May  1 03:40:16 Tower logger: rm: cannot remove '/mnt/user0/./Movies/CLEAN/godzilla/Godzilla vs Destroyah 1992/imdbinfo.nfo': Exec format error

May  1 03:40:16 Tower logger: ./Movies/CLEAN/godzilla/Godzilla vs Gigan 1972/imdbinfo.nfo

There is one set of these error messages for each movie that mover tried to move.  Here's the really strange thing:  each folder had a movie file (a file with extension MKV if it makes any difference) and an NFO file.  I don't see reference to any of the MKV files in the log, however they were completely deleted from the cache drive--not moved.  The NFO files were "moved", but after the move (and during according to the log) if you try to access them yo get the "Exe format error". 

 

I had attributed the exe format error to corruption in the reiserfs file system--rather than try to repair it, I reformatted the drive as xfs (I've been systematically doing that anyway) and restored the files that were there.

 

The other strange thing with this is that these files had not been moved from cache previously (I don't have the older log files, so I don't know what/if there were previous errors.  They'd all been there at least a week or two (perhaps since beta 14?).  Also, disk9, where mover was trying to move them to, was *almost* full (just a few K free), and was NOT part of the share configuration (not included or excluded) and there were drives that are included that had plenty of free space.  I've updated all my share configs to explicitly indicate which drives to include and exclude, so perhaps that was my mistake in configuration.

 

Finally, the mover didn't nuke all my files, some of them failed like these (share was different)

May  1 04:05:41 Tower logger: .d..t...... Usenet/download/move/

May  1 04:05:41 Tower logger: >f+++++++++ Usenet/download/move/1973 interesting file.720p.ac3.CG.avi

May  1 04:05:41 Tower shfs/user0: shfs_write: write: (28) No space left on device

May  1 04:05:41 Tower shfs/user0: shfs_write: write: (28) No space left on device

May  1 04:06:01 Tower logger: rsync: write failed on "/mnt/user0/Usenet/download/move/1973 interesting file.720p.ac3.CG.avi": No space left on device (28)

May  1 04:06:01 Tower logger: rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]

 

I had originally thought these mover errors stemmed from the IOMMU's earlier..perhaps I was wrong and it was simply configuration and riserfs corruption.  If that's the case then I think I'm fixed regarding lost files...but the IOMMU's seem troubling since I don't have an Intel CPU and they do seem to be impacting performance (all those ATA bus resets can't be good for performance).

 

Lastly, here's a snip from the start of the log where the ata30 device starts up:

Apr 30 22:13:20 Tower kernel: ata30: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Apr 30 22:13:20 Tower kernel: ata30.00: ATA-9: Samsung SSD 840 EVO 500GB, S1DHNSADA10381K, EXT0BB0Q, max UDMA/133

Apr 30 22:13:20 Tower kernel: ata30.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32)

Apr 30 22:13:20 Tower kernel: ata30.00: configured for UDMA/133

 

 

Is there anything else I should be looking at?  or are the remaining IOMMU messages and failed command messages just informational?

 

Thanks!

 

Share this post


Link to post

post a smart report for your ssd, it might be end of life?

Share this post


Link to post
... Also, disk9, where mover was trying to move them to, was *almost* full (just a few K free), and was NOT part of the share configuration (not included or excluded) and there were drives that are included that had plenty of free space.  I've updated all my share configs to explicitly indicate which drives to include and exclude, so perhaps that was my mistake in configuration...
In shares configuration, Included disk(s) means ONLY those disks, Excluded disk(s) means EXCEPT those disks. It is recommended to set one or the other but not both of these fields. If neither is set then no disks are excluded. So, if you previously had nothing set for the share, then disk9 would be included by default.

Share this post


Link to post

... Also, disk9, where mover was trying to move them to, was *almost* full (just a few K free), and was NOT part of the share configuration (not included or excluded) and there were drives that are included that had plenty of free space.  I've updated all my share configs to explicitly indicate which drives to include and exclude, so perhaps that was my mistake in configuration...
In shares configuration, Included disk(s) means ONLY those disks, Excluded disk(s) means EXCEPT those disks. It is recommended to set one or the other but not both of these fields. If neither is set then no disks are excluded. So, if you previously had nothing set for the share, then disk9 would be included by default.

 

The way user shares work are mysterious and wily.

 

The shares include and exclude setting, as well as free space requirements, are ONLY inquired if you are creating a NEW folder at a splittable level (that's a whole different topic which I won't go into here).

 

So if you have a directory called /MYSHARE/PEOPLE/FRED on disk2, and FRED cannot be split, and a new file is written to the /PEOPLE/FRED directory inside the share MYSHARE, it will go to disk2 - even if disk2 is explicitly excluded or even if that file would not fit on disk2, regardless of free space on other disks. And the files in the MYSHARE folder on disk2 are going to show up in that share (for reading) regardless. (Only by renaming MYSHARE on disk2 to something else would the files and folders inside no longer be seen inside the share.)

 

But if you were creating a new folder, say /PEOPLE/ALICE in MYSHARE, the ALICE folder would be created on an included (or not excluded) disk containing sufficient free space according to the share settings.

 

The only way to truly exclude a disk from a share would be to do so from the GLOBAL share settings, and that would render that disk excluded from ALL shares. And I have never tried that and not 100% sure it works, but that's my understanding of intended functionality.

 

Hope this helps.

Share this post


Link to post

... The only way to truly exclude a disk from a share would be to do so from the GLOBAL share settings

 

Actually that's not quite accurate.  It's true IF the share has previously been allowed to use other disks and then an Exclude is added (or a disk previously Included is removed from the Include list).

 

... but if the share is originally set to only be on a specific set of disks (whether via Includes or Excludes => I do NOT, by the way, recommend using both settings), then it will ONLY be on those disks.

 

The key thing to remember is that Includes/Excludes only apply to WRITES to a share ... the actual bonding of top level folders into a share includes all top level folders with that name, regardless of whether the disk they're on is included in or excluded from the share.

 

But you can definitely get some unintended results if you start changing the disks that are part of the share.  In general it's best to never remove a disk from a share unless you've moved ALL of the contents for that share from that disk and then delete the top-level folder for the share from that disk.

 

Share this post


Link to post

Not to get too far off topic, but I have a share that's set to only include Disk 4. I have only written to it via SMB.  I have periodically found folders for this share on other disks and had to tidy.  I've assumed that this is because disk 4 was full, but didn't really delve into it.  The share has no exclude disks.  It's not a particular issue for me though.

Share this post


Link to post

post a smart report for your ssd, it might be end of life?

 

OK, here's the smart report.  Looks OK to me, but maybe I'm missing something?

 

root@Tower:~# smartctl -a /dev/sdc

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.4-unRAID] (local build)

Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

 

=== START OF INFORMATION SECTION ===

Device Model:    Samsung SSD 840 EVO 500GB

Serial Number:    S1DHNSADA10381K

LU WWN Device Id: 5 002538 8a008a13e

Firmware Version: EXT0BB0Q

User Capacity:    500,107,862,016 bytes [500 GB]

Sector Size:      512 bytes logical/physical

Rotation Rate:    Solid State Device

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  ACS-2, ATA8-ACS T13/1699-D revision 4c

SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)

Local Time is:    Tue May  5 10:25:55 2015 EDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

was never started.

Auto Offline Data Collection: Disabled.

Self-test execution status:      (  0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: ( 6600) seconds.

Offline data collection

capabilities: (0x53) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

No Offline surface scan supported.

Self-test supported.

No Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  2) minutes.

Extended self-test routine

recommended polling time: ( 110) minutes.

SCT capabilities:       (0x003d) SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 1

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  5 Reallocated_Sector_Ct  0x0033  100  100  010    Pre-fail  Always      -      0

  9 Power_On_Hours          0x0032  097  097  000    Old_age  Always      -      12830

12 Power_Cycle_Count      0x0032  099  099  000    Old_age  Always      -      27

177 Wear_Leveling_Count    0x0013  084  084  000    Pre-fail  Always      -      190

179 Used_Rsvd_Blk_Cnt_Tot  0x0013  100  100  010    Pre-fail  Always      -      0

181 Program_Fail_Cnt_Total  0x0032  100  100  010    Old_age  Always      -      0

182 Erase_Fail_Count_Total  0x0032  100  100  010    Old_age  Always      -      0

183 Runtime_Bad_Block      0x0013  100  100  010    Pre-fail  Always      -      0

187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

190 Airflow_Temperature_Cel 0x0032  073  060  000    Old_age  Always      -      27

195 Hardware_ECC_Recovered  0x001a  200  200  000    Old_age  Always      -      0

199 UDMA_CRC_Error_Count    0x003e  099  099  000    Old_age  Always      -      2

235 Unknown_Attribute      0x0012  099  099  000    Old_age  Always      -      24

241 Total_LBAs_Written      0x0032  099  099  000    Old_age  Always      -      29197432370

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

Share this post


Link to post

Not to get too far off topic, but I have a share that's set to only include Disk 4. I have only written to it via SMB.  I have periodically found folders for this share on other disks and had to tidy.  I've assumed that this is because disk 4 was full, but didn't really delve into it.  The share has no exclude disks.  It's not a particular issue for me though.

The way I think of it is like this:

 

1) Any disks top level folder is a share, named the same as the folder name.

 

2) Any share can write to any disk that has a top level folder with the same name as the share name. Allocation method, min free space, split level will be considered when deciding which disk to write to.

 

3) The only thing Included or Excluded does is prevent unRAID from creating a top level folder for the share on specific disks, and if you change these settings after a top level has already been created, then 2) applies.

Share this post


Link to post

The way I think of it is like this:

But if I didn't create top level folders, how did they get there?  If I understand you, if a top level folder called - say - "stuff" existed on all drives, then a share called "stuff" would happily write to any disk, regardless of inclusion/exclusion.  That's fair enough.  But when I create a share with one include disk, I *think* a top level folder would only ever be automatically created on that disk.  Unless there are exceptions further down the line that I'm not familiar with (entirely likely).

 

Or have I misunderstood?

Share this post


Link to post

The way I think of it is like this:

But if I didn't create top level folders, how did they get there?  If I understand you, if a top level folder called - say - "stuff" existed on all drives, then a share called "stuff" would happily write to any disk, regardless of inclusion/exclusion.  That's fair enough.  But when I create a share with one include disk, I *think* a top level folder would only ever be automatically created on that disk.  Unless there are exceptions further down the line that I'm not familiar with (entirely likely).

 

Or have I misunderstood?

I think you understand perfectly.

 

It's never happened to me, but I have never had any of my disks above 80% full. Maybe there is some rule about full disks, or at least disks that are above minimum free space for the share, but I have seen other reports of unRAID failing to write because disks were too full.

 

Perhaps some plugin or docker was misconfigured to write directly to the disk instead of the share. I have seen examples where someone accidentally created a user share named "cache" or "disk4" with a misconfigured app.

 

Share this post


Link to post

The way I think of it is like this:

But if I didn't create top level folders, how did they get there?  If I understand you, if a top level folder called - say - "stuff" existed on all drives, then a share called "stuff" would happily write to any disk, regardless of inclusion/exclusion.  That's fair enough.  But when I create a share with one include disk, I *think* a top level folder would only ever be automatically created on that disk.  Unless there are exceptions further down the line that I'm not familiar with (entirely likely).

 

Or have I misunderstood?

I think you understand perfectly.

 

It's never happened to me, but I have never had any of my disks above 80% full. Maybe there is some rule about full disks, or at least disks that are above minimum free space for the share, but I have seen other reports of unRAID failing to write because disks were too full.

 

Perhaps some plugin or docker was misconfigured to write directly to the disk instead of the share. I have seen examples where someone accidentally created a user share named "cache" or "disk4" with a misconfigured app.

 

Agree -- your understanding is correct.    I've never seen UnRAID create a new top-level folder for a share on a disk that wasn't "Included" in that share ... and don't believe it will.    If you've had that happen, it almost certainly was either because a add-in or docker was misconfigured;  or perhaps you inadvertently wrote to it with a disk reference and accidentally used the wrong disk number.  [e.g. if you meant to write to \\Tower\disk4\MyShare  and instead wrote to \\Tower\disk3\MyShare it would create the "MyShare" folder on disk3, and it would then be seen as part of your share, even if disk3 wasn't included in the share.]

 

 

Share this post


Link to post

... The only way to truly exclude a disk from a share would be to do so from the GLOBAL share settings

 

Actually that's not quite accurate.  It's true IF the share has previously been allowed to use other disks and then an Exclude is added (or a disk previously Included is removed from the Include list).

 

The only way to have a root level folder on a disk with the same name as a user share is by excluding the entire disk from participating in any user shares. This is accurate.

 

... but if the share is originally set to only be on a specific set of disks (whether via Includes or Excludes => I do NOT, by the way, recommend using both settings), then it will ONLY be on those disks.

 

It is true that if you use user shares exclusively and don't use the underlying disk shares. But adding a directory to the server on a disk not participating in the user share can cause it to suddenly be a part. This is really important to understand, shares are not constrained by their settings.

 

The key thing to remember is that Includes/Excludes only apply to WRITES to a share ... the actual bonding of top level folders into a share includes all top level folders with that name, regardless of whether the disk they're on is included in or excluded from the share.

 

Include/Excludes do NOT apply to all writes. They only apply to creating new folders and writes to NEW files at splittable levels. If you are overwriting a file OR  creating a directory / file inside a non-splittable directory, INCLUDE and EXCLUDE are ignored.

 

But you can definitely get some unintended results if you start changing the disks that are part of the share.  In general it's best to never remove a disk from a share unless you've moved ALL of the contents for that share from that disk and then delete the top-level folder for the share from that disk.

 

If a user understands how user shares work, the features can be used to to do useful things, and the results would be "INTENDED." :) For example, if a disk is in a TV user share, but you would like to stop new TV shows being added to that disk (maybe your plan is to gradually empty the disk and ultimately remove it), You could exclude that disk from the user share. It would still participate in TV shows already on that disk, but it would not have any new TV shows added even if it had a ton of free space.

Share this post


Link to post

There's a BIG difference between  "... The only way to truly exclude a disk from a share would be to do so from the GLOBAL share settings ..."  and  "... The only way to have a root level folder on a disk with the same name as a user share is by excluding the entire disk from participating in any user shares ..."

 

Clearly if you're creating folders with the same names as shares, but don't want those to participate in the share, then it's a different ballgame.

 

Agree that files will be written to already existing share folders regardless of whether or not the disks are still included in the share (whether by Include or Exclude settings) ... that's why I don't recommend putting yourself in that situation.    That's why I said "In general it's best to never remove a disk from a share unless you've moved ALL of the contents for that share from that disk and then delete the top-level folder for the share from that disk."

 

Clearly if a user is fully aware of how shares function, including the impact of split levels, then they can indeed change their settings and the behavior may be what they want.  But for typical users, I think it's best to not remove disks from a share unless they also remove all the contents of that share from the disk AND delete the associated top-level folder.

 

 

Share this post


Link to post

jonp - any updates on b16 with the docker fixes?

 

Soon.

Share this post


Link to post

jonp - any updates on b16 with the docker fixes?

 

Soon.

 

^^^^^

TRUTH.

 

Putting it through its paces along with a number of other fixes before we release.

Share this post


Link to post

John,  I have removed all plugins (moved them to docker) except snap and my unraid b15 still crashes in about two days. I think this is docker or kvm related. I am eager to move back to b12 for sake of stability. If I can expect b15 within the next week or so, I would like to give that a try before going back.

jonp - any updates on b16 with the docker fixes?

 

Soon.

 

^^^^^

TRUTH.

 

Putting it through its paces along with a number of other fixes before we release.

 

Share this post


Link to post

John,  I have removed all plugins (moved them to docker) except snap and my unraid b15 still crashes in about two days. I think this is docker or kvm related. I am eager to move back to b12 for sake of stability. If I can expect b15 within the next week or so, I would like to give that a try before going back.

jonp - any updates on b16 with the docker fixes?

 

Soon.

 

^^^^^

TRUTH.

 

Putting it through its paces along with a number of other fixes before we release.

 

I wouldn't expect it to be more than a week or so, but crazy things can and do happen.

Share this post


Link to post
Guest
This topic is now closed to further replies.