Jump to content
We're Hiring! Full Stack Developer ×

WinSCP and UTF-8 filenames


Recommended Posts

Hallo,

 

WinSCP uses an auto detection method  for filename encoding by quering LANG environment variable

printenv LANG

and activates UTF-8 filenames if the result contains UTF-8 like:

en_US.UTF-8

Unfortunately this isn't the case for non-interactive sessions in Unraid.

 

How can I set LANG variable to en_US.UTF-8 for non-interactive session?

 

Ref: https://winscp.net/forum/viewtopic.php?p=96851#96851

Link to comment

My preferred language is German where ÄäÖöÜuß– characters are used additionally.

I did an initial import of my files to a user share by WinSCP (SCP method).

When I accessed the same share with Windows Explorer (Windows 10, Client for Microsoft Networks) some files were missing or had bogus characters in their filenames. Samba hides those files but they were in the filesystem (ls -l shows them). It turned out that WinSCP didn't detect UTF-8 and use another code page for filenames.

 

I could rename those files to proper UTF-8 filenames with an adapted version of iconvmv script (https://github.com/YeLee/code/blob/master/shell/iconvmv) using iconv -f ISO-8859-1 in the mv command.

 

In order preserve other unraid users from this issue how can environment variable LANG be filled with en_US.UTF-8 for non-interactive sessions?

In /etc/profile.d/lang.sh this LANG variable is exported but isn't filled when WinSCP queries it. I don't have this behaviour with Ubuntu Server.

 

Additionally dos charset was set to cp1252 in /boot/config/smb-extra.conf afterwards.

dos charset = cp1252


 

Edited by bjmi
Link to comment

Hmm. I tried uploading filenames in with JPN characters, but I don't see the issue.

Logging in via terminal shows the filename correctly in UTF-8. attempting to LS without UTF8 in the LANG shows question marks instead of a proper filename.

I do see that winscp can be forced to assume that UTF-8 is enabled before you connect to the server in the advanced settings.

 

Maybe, your issue is that the filenames on you original files are not in UTF8 but in the native iso-1252 latin, and winscp copied the filenames as is, resulting in native latin filenames, which Samba assumes they are invalid UTF-8 filenames and hides them to avoid weird things from happening on the client side.

 

Other than this I have no idea what else can be done, other than trenaming the files to UTF-8, and making sure future SCP uploads have the UTF setting forced.

Link to comment
On 3/9/2020 at 3:14 PM, ken-ji said:

Maybe, your issue is that the filenames on you original files are not in UTF8 but in the native iso-1252 latin, and winscp copied the filenames as is, resulting in native latin filenames, which Samba assumes they are invalid UTF-8 filenames and hides them to avoid weird things from happening on the client side.

That is what probably happened.

 

On 3/9/2020 at 3:14 PM, ken-ji said:

Other than this I have no idea what else can be done, other than trenaming the files to UTF-8, and making sure future SCP uploads have the UTF setting forced.

 

That's why I want to export LANG variable and WinSCP encoding detection works again.

Link to comment

I guess its technically a never seen limitation as most users either upload files via Samba; do OS level disk to disk; or download the files from the internet.

So I guess you should submit and Feature/Enhancement request in the correct board, and enable manually the UTF-8 setting.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...