Jump to content

Problem with file name encoding?


Recommended Posts

I need to copy files from an unassigned HD (was previously part of an array but I have done a new configuration with larger disks so uses it as "backup") to my UnRAID array but for a lot of directories and files I get the error message from rsync "failed to stat "/mnt/disk1/:somefilename Invalid or incomplete multibyte or wide character (84)" and when it is a directory that is the problem also "*** Skipping any contents from this failed directory ***"

 

I assume these faulty files and directories (there are many hundred of them) was created by SMB from data on a Windows machine (I live in Sweden where we have åäöÅÄÖ and I can see from the failing names that it is the ones including these international characters).


From the error messages it SEEMS from a few example that Ä in directory names result in \#303\#204 and ä results in #303\#244 while in ordinary files Ä is #216 and ä #204.  This seems strange to me - i.e. do the files use more than one encoding?!

 

As there are so manty affected files and directories spread out over the whole disk it is not very doable to manually rename them one by one and I instead need to find a way to tackle the encoding issue with a script and some utility working on character encodings.

 

I have googled for these types of problems and it seems one can tell rsync to convert file names between encodings but then I need to know for sure what encodings I have and want and sadly I don't and have no idea about how to find out 😞

 

There is also a Linux utility "convmv" but once again I need to somehow figure out what encoding to encode from and to and also this utility in not available in UnRAID (and I cant find it in NerdTools either)...

 

Sadly I know about nothing about character encoding in general as well as in Linux/UnRAID in particular so not sure what I need to do in order to fix this?


Anybody with expertise on character encoding that can give me some tips on how I can figure out what my current and desired encodings are and how to best try solving the problem?

 

I have not changed anything related to character encoding in UnRAID neither in SMB, when mounting disk or in general so all is "default". I run UnRAID 6.12.2.

Edited by NAS-newbie
Link to comment
  • 7 months later...

I don't have a simple answer to this. As matter of fact I am struggling with the same issue.

 

I have a bunch of files with accent characters and other characters from way back then ( mp3 files are plastered with these issues ) and never had any issues copying those files between Windows, Synology, macOS, my Linux installs, etc.

 

With the advent of my unRaid server a few weeks back I have tried a myriad of things to get my files copied over. The first time I realized it was when I setup Resilio between my Synology NAS and my new unRaid Server to get all my data over to unRaid. I compared folder sizes on both sides and the sizes were off by alot. Turned out Resilio just skipped all files unRaid saw an issue with and I missed all those files. I tried rsync and other third party tools and unRaid plugins, to no avail. I even tried copying files from my Synology NAS to an external SSD first and then mounting it to the unRaid server to copy those files from the external ssd to the destination share on unRaid, unfortunately with the same result, skipped files!.

 

NOTHING worked and it became a PITA and a huge time hawk.

 

After some more research I found a script that renames accent characters from 'ä' to 'ae' and so on. For me not ideal at all since I had some installers and those ended up failing due to obvious reasons of not being able to find the file after they have been renamed by the script and obviously my playlist didn't find affected files.

 

I came across a solution where you instructed rsync to change the character encoding using '--iconv=utf8,iso88591' (whichever applies). Unfortunately that didn't work for me, I probably did something wrong.

 

The only thing that I was able to do, which is far from ideal and ends up renaming files, was to compress the entire folder on my Synology NAS (Windows or wherever you have them before you try to get them over to your unRaid server), copy the archive over to the unRaid server and extract the archive using 7z within mc ( Midnight Commander ). At least it extracted all files and folders but as mentioned in some cases the names changed. I am still searching for a solution that just works.

 

I am still too new to unRaid but don't want to give up on unRaid either especially after investing a lot into the Dell server and I really like unRaid thus far. This filename/forldername issue is quite a bummer.

 

I am not sure if this has something to do with the file system, most of my setup is using ZFS and a couple use the good old EXT, the character mapping, I am not sure how to exactly tackle this issue and I was unable to find a 100% working solution.

Edited by DoggByte
Link to comment
  • 3 months later...
Posted (edited)

Looks like this is variation of the same issue I just reported for:
 

Only in my case it was slightly worse.  Because I could copy the files the network share, but then they a stuck on the cache drive.

I know some filesystems strictly check the character encoding, while others don't.  The files were originally generated by Windows users and copied to ext4 fs via sftp.  Then I copied that onto my hard drive using btrfs.   The cache drive also uses btrfs, but I have the hdd formatted in zfs.  So my guess is all these other filesystems allow the windows encoded filenames without error, but zsh does not.  But that is only a guess.

In my case I'll resolve the issue by putting the folder with the offending filenames into a tar file.  But that is hardly a good general solution.

 

Edited by docbillnet
Link to comment
Posted (edited)

first, install the NerdTools plugin and install wget in NerdTools.

using wget, you can download the convmv-<version>.tar.gz file on https://www.j3e.de/linux/convmv/

When you use tar to extract the tar.gz file, there is a conmv executable file inside the conmv directory.

type './convmv -f utf-8 -t utf-8 -r --notest --nfc <nextcloud-data-folder>'

Edited by Nyosy
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...