Jump to content

Problems with unicode characters in filenames when using zfs


Go to solution Solved by JorgeB,

Recommended Posts

Posted (edited)

I ran into an interesting problem.  I have a set of files I can successfully copy to the cache drive, but they cannot the be moved to the array.

 

The issue seems to be when filenames contain unicode characters.  I don't know if it is all unicode characters or just some.   

The difference is the cache is using btrfs, and the array is using zsh.   I would normally expect filenames to be stored using UTF8, since that is the defacto Linux standard.   But this seems only true for btrfs, not zfs.


The impact is if I have a share set to copy to the cache drive, the folder and all the files upload to the cache file but then get stuck.  If I login and try manually moving the files I then see the problem is with the character encoding.   If I set the files to copy directly to the array. then I simply receive an IO error when attempting to copy the files to the NAS.

Normally I would avoid this type of issue in Ubuntu, Fedora, etc, by making sure I had a default local set in LC_ALL like en_CA.UTF-8.  When I check in the console, I see there is the environmental variable set LANG=en_US.UTF-8, which should have the same effect, so I don't know why it is not working.

 

Edited by docbillnet
adding more details
Link to comment

Sample output going directly to the array, where /mnt/higgs/backups is an nfs mount to my NAS running unraid:
 

# sudo rsync -aPs --links /share/Backups/ /mnt/higgs/backups/

root/var/lib/cronsrev/usrpftp.servicesource.com:20/From RedHat/NA/BAW Weekly Bookings Reconciliation Report \#226 NALA - Schedule.2015-03-28-09-05-03.xls.gpg": Input/output error (5)

rsync: [generator] recv_generator: failed to stat "/mnt/higgs/backups/briemers2-root/var/lib/cronsrev/usrpftp.servicesource.com:20/From RedHat/NA/BAW Weekly Bookings Reconciliation Report \#226 NALA - Schedule.2015-04-07-10-45-55.xls.gpg": Input/output error (5)

rsync: [generator] recv_generator: failed to stat "/mnt/higgs/backups/briemers2-root/var/lib/cronsrev/usrpftp.servicesource.com:20/From RedHat/NA/BAW Weekly Bookings Reconciliation Report \#226 NALA - Schedule.2015-04-11-09-05-28.xls.gpg": Input/output error (5)

rsync: [generator] recv_generator: failed to stat "/mnt/higgs/backups/briemers2-root/var/lib/cronsrev/usrpftp.servicesource.com:20/From RedHat/NA/Downloads/BAW Weekly Bookings Reconciliation Report \#226 NALA - Schedule.2014-07-26-09-05-53.xls": Input/output error (5)

rsync: [generator] recv_generator: failed to stat "/mnt/higgs/backups/briemers2-root/var/lib/cronsrev/usrpftp.servicesource.com:20/From RedHat/NA/Downloads/BAW Weekly Bookings Reconciliation Report \#226 NALA - Schedule.2014-07-26-09-05-53.xls.gpg": Input/output error (5)

rsync: [generator] recv_generator: failed to stat "/mnt/higgs/backups/briemers2-root/var/lib/cronsrev/usrpftp.servicesource.com:20/From RedHat/NA/QA/BAW Weekly Bookings Reconciliation Report \#226 NALA - Schedule.2014-11-15-09-05-30.xls.gpg": Input/output error (5)



I will see if I can uuencode the names into a small script, since the translation to post the output changes the character encoding...
 

Link to comment
Posted (edited)

Here a short way to reproduce:

 

root@Higgs:/mnt/cache# base64 --decode <<+ |tar xfj -
QlpoOTFBWSZTWUvfjaYAAGp/pMiAAGRAAf+AOARYwW7t3iABAAAAgAgwALm2IiNqGjQAAAADTIRM
p6g0AAAAAABJEp6mQ0HqaAaA0A0Mll1eKiUYggomSSLzAkUyhhnammM2MvjrAwkDgycsHLbf3yrn
n1Xea6jJwzCDf2kKAzDbYkIDuCkwESFJRO0E0oWE1aAI0ilgCMjNGKcLSwzEgyEhhOSkPJ7eycJ9
wMZTzaXC/IUdcAmBgpWp+gQIJOsJVCKAcn9k3i4lsxCQfxdyRThQkEvfjaY=
+
root@Higgs:/mnt/cache# cd /mnt/disk1
root@Higgs:/mnt/disk1# base64 --decode <<+ |tar xfj -
QlpoOTFBWSZTWUvfjaYAAGp/pMiAAGRAAf+AOARYwW7t3iABAAAAgAgwALm2IiNqGjQAAAADTIRM
p6g0AAAAAABJEp6mQ0HqaAaA0A0Mll1eKiUYggomSSLzAkUyhhnammM2MvjrAwkDgycsHLbf3yrn
n1Xea6jJwzCDf2kKAzDbYkIDuCkwESFJRO0E0oWE1aAI0ilgCMjNGKcLSwzEgyEhhOSkPJ7eycJ9
wMZTzaXC/IUdcAmBgpWp+gQIJOsJVCKAcn9k3i4lsxCQfxdyRThQkEvfjaY=
+
tar: ./BAW Weekly Bookings Reconciliation Report \226 APAC Scheduled.XLS.gpg: Cannot open: Invalid or incomplete multibyte or wide character
tar: Exiting with failure status due to previous errors


This is simply extracting a uuencode tar file to each of the respective mount points.  Assuming /mnt/cache is btrfs filesystem and /mnt/disk1 is zfs, I would expect similar results.

The tar file is simply one of the files that was causing me issues truncated to 0 size, since it is only the filename matters.
 

Edited by docbillnet
removed extra file from tar.
Link to comment
  • Solution

Unraid creates all zfs pools with utf8only=on, I see two options that you could use to solve that:

 

- have rsync convert to utf8 during the transfer, you would need to know the old charset in use, if it was for example ISO-8859-1, you just need add to the rsync command

 

rsync -av --iconv=ISO-8859-1,utf-8 /source /dest

 

- other option would be to reformat the pool/disk without enabling utf8only=on

Link to comment
Posted (edited)

If I am reformatting a drive, is there any reason not to convert the cache drive to zfs instead?   It is not the UTF-8 only that is the problem, it is the inconsistency in options that is causing files to get stuck.   Out of 20 TB's worth of files I think it is just 6 or so that appear not to be UTF-8, so it makes sense to convert those files.   But I would save a lot of hassle of the uploads to the NAS failed, rather than having files fail to move once they are on the NAS.  It is an odd default to make the cache drive btrfs without compression.

Edited by docbillnet
Link to comment
55 minutes ago, docbillnet said:

If I am reformatting a drive, is there any reason not to convert the cache drive to zfs instead?

You can but will have the same issue there, unless you manually create the pool without utf8only=on

Link to comment

Hey, 

 

sorry I am struggling for a while now with my old Plex folder. 

 

After I used space invaders Auto Dataset Script I noticed that the old Plex Folder would not be deleted. 

 

/mnt/cache/appdata/plex_broken/l/l/p/Cache/1/test# ls -l

/bin/ls: cannot access 'f913f438c95d6b8ef55eba5bf40b'$'\202\202\201\210\377\377''9e049e.jpg': Invalid or incomplete multibyte or wide character

total 0

-????????? ? ? ? ?            ? f913f438c95d6b8ef55eba5bf40b\202\202\201\210\377\3779e049e.jpg

 

I just to want get rid of the folder /mnt/cache/appdata/plex_broken/. 

 

Thanks in advance guys 

Link to comment
Posted (edited)
4 hours ago, JorgeB said:

You can but will have the same issue there, unless you manually create the pool without utf8only=on

How so?  It seems if the cache drive has the same settings as the drive in the arrays, then I would never be able to upload a file into the NAS to have it stuck in the cache...   

My concern is not if I can upload non-UTF-8 files.  My concern is that when I just copy a folder it uploads and gets stuck...   Meaning I have to manually fix the issue on the NAS instead of the device where I'm uploading from.

It is certainly far less work to reform a cache that is routinely close to empty anyways, than to reformat mostly full hard drives.   But my concern is there might be a really good reason why the cache defaults to BTRFS.   Maybe zfs and xfs are unsuitable for the cache drive?  Otherwise, I would expect unraid would have defaulted to use the same file system for the cache drive to avoid this type of issue.

 

Edited by docbillnet
Link to comment
3 minutes ago, docbillnet said:

How so?  It seems if the cache drive has the same settings as the drive in the arrays, then I would never be able to upload a file into the NAS to have it stuck in the cache...   

I understood the cache is not zfs, or is it?

 

 

Link to comment
4 hours ago, feraay said:

Hey, 

 

sorry I am struggling for a while now with my old Plex folder. 

 

After I used space invaders Auto Dataset Script I noticed that the old Plex Folder would not be deleted. 

 

/mnt/cache/appdata/plex_broken/l/l/p/Cache/1/test# ls -l

/bin/ls: cannot access 'f913f438c95d6b8ef55eba5bf40b'$'\202\202\201\210\377\377''9e049e.jpg': Invalid or incomplete multibyte or wide character

total 0

-????????? ? ? ? ?            ? f913f438c95d6b8ef55eba5bf40b\202\202\201\210\377\3779e049e.jpg

 

I just to want get rid of the folder /mnt/cache/appdata/plex_broken/. 

 

Thanks in advance guys 


This might be exactly the type of issue I was concerned about if I change the file system for my cache drive.  Do you might asking what filesystem you are using?  What error do you get when you do:

rm -rf /mnt/cache/appdata/plex_broken/*

?

Link to comment
4 minutes ago, JorgeB said:

I understood the cache is not zfs, or is it?

 

 

It is not.  Initially I created all the filesystems with the default settings.  Which is xfs for disk1, disk2, and disk3, and btrfs for cache.  But then I remembered I wanted to try out zfs, so I changed disk1, disk2, and disk3 to zfs.  Since zfs is sort of like the next generation xfs filesystem, I didn't expect any issues leaving the cache with the default file system.

Link to comment

Only zfs filesystem have this utf8only=on property set, so xfs or btrfs won't be affected, that's why I mentioned that if you change the cache to zfs, using the current default settings, you would see the same issue with cache.

 

@feraayis also using zfs, though in his case it could be more complicated, because the file already exists on zfs like that, so not sure how to delete it now, I would expect the error would be on creating the file.

Link to comment
Posted (edited)

I guess in my case it's some cached stuff from Plex Phototrancoder... not a file I created. 

/mnt/user/appdata/Plex-Media-Server/Library/Application Support/Plex Media Server/Cache/PhotoTranscoder

 

Is it a corrupted zfs Filesystem? Or really just a File not utf8 charset ? Everything works as expected no problems except that folder. 

Maybe I already transferred the file from my old non-ZFS Cache.... I really don't know. 


I tried a lot of Tips like sed and iconv  or how it's called. My last Clue is to move everything "working" away from the cache pool, wipe and move everything back. But I hoped there will be a "smaller" solution. 

Edited by feraay
Link to comment
10 hours ago, feraay said:

Is it a corrupted zfs Filesystem?

Don't think so.

 

10 hours ago, feraay said:

My last Clue is to move everything "working" away from the cache pool, wipe and move everything back. But I hoped there will be a "smaller" solution. 

Possibly that is what you'll have to do.

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...