Home arrow Forum

Remository Forum

 


alteiisfr

Karma: 0  
Major problem with J!1.5 and accented filenames - 2009/02/25 22:16 Hi,
For all versions of remository 3.46 (idem for last svn)filename with accents are encoded (like ut8 -> ansi).
For old versions or 3.46 version for Joomla 1.X there's no problems.

For instance :
I upload a file named "accenté espàace.txt"

With J!1.X + Remository 3.46 file is stored as "accenté espàace.232.txt"
(file id is 232)
With J!1.5.6 + Remository 3.46 file is stored as "accenté espàace.232.txt"
(file id is 232)

When you download correct names are restored are ok.

But i have to migrate 1.X remository with more than 200 files, the upgrade is ok but all the files in french, german, spanish with accents could not be downloaded Remo with Joomla 1.5 try to link an encoded name...

Another question is there a way to import all files with file storage to db storage ?

thanks
  | | Sorry, you do not currently have permission to write here.
alteiisfr

Karma: 0  
Re:Major problem with J!1.5 and accented filenames - 2009/02/26 02:22 Partial solution:

The problem is a bug in move_uploaded_file php function => see http://bugs.php.net/bug.php?id=47096

filename is submitted as utf8 and move_uploaded_file scramble chars (there's no problem in Joomla 1.X because filename is not submitted as utf8)

A temporary solution :

When remo save uploaded file, in remositoryPhysicalFile.php
around line 200

/*if (move_uploaded_file($this->file_path, utf8_decode($destination))) {*/

/* ALTEIIS */
/*
Transform utf-8 to ISO-8859-1. We can use utf8_decode function or iconv. Iconv is supported in all PHP version.*/

if (move_uploaded_file($this->file_path, iconv("UTF-8","ISO-8859-1//TRANSLIT", $destination))) {


When remository transmit a file, in remository_download_Controller.php.php
around line 133 (the filename)
$downpath = $fileinfo->filePath();
$downpath = html_entity_decode($downpath,ENT_QUOTES);
$downpath = iconv("UTF-8","ISO-8859-1", $downpath);


around line 169 (the name of the download)
$displayname = html_entity_decode($displayname,ENT_QUOTES);
/*ALTEIIS*/
$displayname = iconv("UTF-8","ISO-8859-1", $displayname)
;

Post edited by: alteiisfr, at: 2009/02/26 02:22
  | | Sorry, you do not currently have permission to write here.
admin

Karma: 101  
Re:Major problem with J!1.5 and accented filenames - 2009/02/26 11:02 This is a nasty problem. I don't see any real resolution prior to the release of PHP6, which has been a long time coming.

If the fix works for you, that is good. I wouldn't want to put it into standard Remository because there are many users who use characters that cannot be expressed in ISO 8859-1.

Are you running on Windows? The bug reports appear to be specific to Windows. I know it may be little comfort to you, but I would strongly recommend against hosting open source (and especially PHP) systems on Windows, since the development effort is always stronger on Linux.

On Linux hosting, I don't experience any discrepancy between the operation of Remository in Joomla 1.0.x or 1.5.x. Uploading a file with an umlaut gives exactly the same result with both, according to FireFTP.

Also, it is easy to be misled by tools, not all of which will correctly render file system names, which on Linux are usually stored as UCS-2.

Remository does its best to move files around if you edit a container and change the absolute path. It will move files from one directory to another, or if you remove the absolute path, it will move the files into the database. Or vice versa. It is possible to run out of time doing this, as PHP is typically very restricted. But if you save the container again, Remository will again try to deal with any misplaced files, putting them into the correct location. Obviously if an absolute path does not point to a valid directory, writeable by Remository, things will not happen!
Martin Brampton aka Counterpoint
http://aliro.org
http://black-sheep-research.com
  | | Sorry, you do not currently have permission to write here.
alteiisfr

Karma: 0  
Re:Major problem with J!1.5 and accented file names - 2009/02/26 13:17 Big thanks,
Yes,my fix is just for Europeans. I have traced all the source code with eclipse/pdt, too understand that the problem was not remository but php.
It's not a problem of rendering filename, lol, the fix work. When the filename have no accent, there's no problems with the J!1.5 upgrade.

1) Windows/Linux/Bsd ?
No, It's not a windows or ntfs problem there's no correlation with file system (most people thinks that).

I work with LAMP platform(debian linux) and I discovered this problem with OVH server (biggest french host).
I'm serious guy (^)) i don't use fireftp but winscp (ssh)

Try to upload a file with a name like éàéè.txt under remo+joomla 1.X and Joomla 1.5. Take a look at the file storage to see what became the filename...

2) There's no upgrade possible
There's no big problem with fresh installation, filename on storage is "scrambled" but when user download the file, the filename is restored (if there's only accents there's no filename). It could be annoying in specific case, but not terrible...

But when you upgrade from Joomla 1.X, remository refuse to give the file (file not found) because is looking for encoded filename :/

3) Perhaps simple solutions to investigate :

- Having the possibility to choose filename encoding by parameters and use iconv("utf-8", choose encoding//TRANSLIT,...). The user have de choice. There must be a decode for download too. Something like my fix but with choose of encoding.
(Or automatic by detecting charset. There's perhaps things with form encoding charset...)

=> The user have the choice do nothing or use iconv with custom charset.

- Trying file exist with different encoding when sending file to user (download controller) try with scrambled and try with a specific charset when file is not found with "scrambled version.

=> Again with the use of charset parameter but just for download action, new files are encoded "scrambled"

- Having the possibility to rename files when upgrade from J!1.X to J!1.5.
File Attachment:
File name: ____________.txt
File size:9 bytes


Post edited by: alteiisfr, at: 2009/02/26 13:26
  | | Sorry, you do not currently have permission to write here.
alteiisfr

Karma: 0  
Re:Major problem with J!1.5 and accented file names - 2009/02/26 13:17 Sorry for my bad english...

The Remository/Aliro ACL classe is just awesome !!!
J!1. 5 should open their eyes.
Post edited by: alteiisfr, at: 2009/02/26 13:17

Post edited by: alteiisfr, at: 2009/02/26 13:18

Post edited by: alteiisfr, at: 2009/02/26 13:28
  | | Sorry, you do not currently have permission to write here.
admin

Karma: 101  
Re:Major problem with J!1.5 and accented file names - 2009/03/01 21:23 This is not an easy problem to deal with. The character set issues are difficult, and it is rarely clear what tools are doing with different character sets. My own hosting shows file names byte by byte in an ls command, but they show apparently correctly when accessed from a Linux machine using a remote folder (effectively SFTP).

Maybe the main issue is to deal with the file system, and that can be converted to UTF-8 using convmv. That could be the best approach for anyone who is moving from J1.0 to J1.5 and has a lot of files with non-ASCII file names.

The database should usually be a slightly simpler issue, especially if it is MySQL 4.1 or greater, where data is by default stored as UTF-8.

I am glad you like the role based access control system from Aliro. There are a lot of good things in Aliro! You can find the latest code in the SVN repository - see http://aliro.org.
Martin Brampton aka Counterpoint
http://aliro.org
http://black-sheep-research.com
  | | Sorry, you do not currently have permission to write here.
alteiisfr

Karma: 0  
Re:Major problem with J!1.5 and accented file names - 2009/03/02 09:45 In reality the problem affect containers and files.
Some ftp/ssh tools translate utf8 as human readable and you could believe that utf-8 "été" is the same as ucs "été" but in fact for the filesystem utf-8 "été" is "été" when you try to compare or do a file exist download,...it's different.

When you migrate French, German, Spanish remository files and containers it fails with 90% of the files.
It affect most filesystems.

The best is to use remositoryPhysicalFile.php function with possibility (just an optional possibility...) to convert to specified charset to create, compare, delete, move... I think (humbly) it's the only solution to handle migration or advanced use (cannot use db storing, must maintain directory for other kinds of access). For instant, i don't see drawback, of this approach. The only things is to link all physical operations to this class methods).

In most php projects like Joomla, files names are filtered [A-Za-z0-1] (it's probably used 90% of time). Other projects use utf-8 convert but limit the use for American and west Europeans use.

If you want, I can show (by online screen sharing) directly on hosting the different problems.

Most users have no access to convmv or console...but it's a good tool, I confess, I didn't know this tool (after all this time, with bsd and linux, it's a shame).
  | | Sorry, you do not currently have permission to write here.
admin

Karma: 101  
Re:Major problem with J!1.5 and accented file names - 2009/03/02 20:09 Hmm, I'm afraid that I really don't understand this. In particular, it seems extremely difficult to find any reliable information. Searching on "CentOS file system character set" comes up with nothing useful, for example.

I note your example, but don't understand why, if the file system is using UCS-2 it would put UTF-8 multi-byte characters into UCS-2 double bytes using only a single byte at a time.

So there are questions about what character set is in use in the file system, and how would it be possible to know what character set was used to create a file? Users may themselves not know - that is certainly the case with language files. But how can the software find out?

Then there are issues with the database, where some people are still running pre-4.1 versions of MySQL and many others have their databases configured with ISO-8859 character sets, or others such as Far Eastern ones.

And Remository runs on a variety of versions of several different CMS platforms, not all of which are committed to using UTF-8 to communicate with the browser.

I can see that there could be issues with file names and path names (which occur in container records). Whether the issues can be confined to remositoryPhysicalFile class or not I'm not certain.

If there were solid information to go on, it might be possible to do something, but right now that seems to be lacking. Also, I'm not clear of the extent of the problem - that is, how many people have been using non-ASCII characters for file names in pre UTF-8 CMS versions that they are now attempting to convert to other CMS versions that use UTF-8?
Martin Brampton aka Counterpoint
http://aliro.org
http://black-sheep-research.com
  | | Sorry, you do not currently have permission to write here.

Login

Subscribe to Premium Support

Get priority support for Remository and Glossary, sign up now for a Premium Support monthly subscription:

Your Remository user name

Or purchase a year's support:

Your Remository user name

Recommended SEF

SEF Advance

Who is Online

Remository welcomes guests and visitors

We have 17 guest online