PDA

View Full Version : Calling translator in UTF encoded language like Hindi etc


vpahuja
May 20th, 2005, 10:06
All:

I have been working on mambo translation to Hindi, official language of Indian

subcontinent. I would like to call upon all the translators in other UTF encoded languages to join forces and work together...

This is where I am at right now...I am trying to find why some characters show up as boxes...

http://www.pahuja.net/projects/mambo_Hindi/


--Vish

neoxeon
May 21st, 2005, 22:07
I am trying to translate Mambo into Nepalese which is also a UTF encoded language. I am having some problems similar to yours (which I saw in your site). I am having some sort of silly texts in the title and at some places. Can you suggest how should I proceed to get rid of them. Also, could you please provide me the roadmap to the translation so that I would be doing it the correct way.

infograf768
May 21st, 2005, 22:20
If you are interested by forming a Team for your languages (Hindi and Nepalese), please let me know your e-mail by pm.

I'll ask someone to look into this UTF problem and answer to you as much as possible.

Thank you for posting

vpahuja
May 22nd, 2005, 10:22
When I try to send you PM I get this message

infograf768 has exceeded their stored private messages quota and can not accept further messages until they clear some space.


I sure would be interested in a setting up Hindi/Devnagari. What do I need to do to set it up? Do I need to get like a formal aproval from the core mambo team? Or how does it work? I am also thinking of doing something like Mambo India website or something like that....Can I go ahead and register the domain name or I need to go through the Core Mambo team?

infograf768
May 22nd, 2005, 10:28
When I try to send you PM I get this message

infograf768 has exceeded their stored private messages quota and can not accept further messages until they clear some space.

Sorry,
I just erased enough for you to try again. :)

infograf768
May 22nd, 2005, 10:30
When I try to send you PM I get this message

infograf768 has exceeded their stored private messages quota and can not accept further messages until they clear some space.


I sure would be interested in a setting up Hindi/Devnagari. What do I need to do to set it up? Do I need to get like a formal aproval from the core mambo team? Or how does it work? I am also thinking of doing something like Mambo India website or something like that....Can I go ahead and register the domain name or I need to go through the Core Mambo team?
Please give me your e-mail by pm.
(I know, you just tried :) )

MasterChief
May 23rd, 2005, 23:54
Hi vpahuja

Can you post a link to a screen shot and show the problems characters highlighted.

Thanks.

MasterChief
May 23rd, 2005, 23:58
Never mind. I see the characters now.

MasterChief
May 24th, 2005, 00:05
OK, we need a bit of background.

Are you running on cvs copies of Mambo.

How are you editting the language files? By hand in a text editor or using the language manager? Can you try both ways.

Can you look at what characters are a problem, like is there a particular ascii value (I may be shoing my ignorance of character sets here) or similar that is causing a problem.

Oh, one final thing. Have you searched this forum for any clues?

infograf768
May 24th, 2005, 00:18
I have seen 2 threads that look like dealing with this problem:

http://forum.mamboserver.com/showthread.php?t=19213&highlight=utf8
and
http://forum.mamboserver.com/showthread.php?t=30925&highlight=utf8

brianteeman
May 24th, 2005, 00:28
Also what browser are you using on what operating system

zhous
May 24th, 2005, 06:59
OK, my website is based on a utf-8 version, which I finished mambo code exchange myself and my friend finished utf-8 simple machine forum. Moreover, I also tried a utf-8 version of Simplied Chinese mambo 4.5.2.1, but didn't finished cause I didn't have enough time to do if myself. I'd like to know your whole plan and balance my time.
my website: http://www.mambo.cn

vpahuja
May 25th, 2005, 20:38
Let me first answer the questions raised.

-I am using notepad to create this language file. I save the file in UTF-8 encoding.

-I am using Mambo 4.5.2 that I downloaded from MamboForge. I believe it should be cvs version (??).

-The server details are
Apache version 1.3.33 (Unix)
MySQL version 4.0.24-standard
PHP version 4.3.11
PHP info Click to view
PERL version 5.8.6
Operating system Linux
Kernel version: 2.6.11.7.dn2.64
Machine Type i686

-The client I tried this on is Windows XP, IE 6.0.2800

-How can I use "Language Manager" to create language file for me? I thought it will allow me to change the current language on backend and frontend.

I would now define the issue in a more verbose manner. When I save the language file (hindi.php) in notepad and test it in mambo, I get a session_start (): errors. More information can be found at

http://forum.mamboserver.com/showthread.php?p=213810#post213810

I was pointed in the above mentioned thread that apparently notepad is adding 3 extra characters before <?php that is causing this error. Please see the above mentioned thread for screenshot of these 3 characters. These characters do not show up in notepad. They show up in exceed and FAR. Another interesting thing is that they show up only if I save them as PHP. If I save them as say txt and open them in exceed, I do not see those characters....

OK, So there are these characters, now if I delete them in exceed and save them and then use them, I now do not see those session start error but now some characters show up as boxes. Pretty weird. It does not end here. Now if I add those 3 characters in exceed and save it back and now open it in notepad, the same characters do not show up and are replaced by questions marks.

I guess the challenge right now is to create a php file that does not throw session start errors/warning and shows up all the characters. I am not sure how zhous is doing this.

I donno if it is important but to be able to do this in notepad, I am using

http://bhashaindia.com/Downloads/downloads.aspx



zhous:
Can you share your language file with me? How did you created this language file?

MasterChief
May 25th, 2005, 21:08
I see the latest edition of PHP Architect has an article on Unicode support, so I'll do some reading on the subject. Hopefully I will find some information that will help.

vpahuja
May 25th, 2005, 21:14
One answer that you can probably help me with is why language file needs to be a php file? If the language file can be of say .mambo extension that can be saved in notepad, my problem is probably solved, provided mambo can read that file....just thinking out loud here....

MasterChief
May 25th, 2005, 22:20
Mambo language files are being changed to an .ini format...so maybe that solves your problem?

You should also note that Mambo 4.5.2 (which uses php files) is not the same as the CVS version (which uses .ini files).

vpahuja
May 25th, 2005, 22:52
That is very interesting. So, should I be working with CVS version? How can I get access to cvs version? I just tried to get it through tortoise and it failed. I think I will probably need special user name and password for that. Is that right?

zhous
May 26th, 2005, 20:19
I'm lost here.
OK, first of all, what I can say is that getting a Mambo utf-8 version is not easy. Remember, you could never been successful if you don't know how to use tools. Generally speaking, MS frontpage is the best one for .xml, UltraEdit is good for language files but "BOM" must be "off", that is to say you must uncheck sth like "Add a Unicode Signature (BOM)" (you can find "BOM" on the list of "configuration" of "Advance", I'm sorry I've just used a Chinese version UltraEdit, those are tranlated back again.), and also you must check"automatically detect utf-8 files" and "saving file as its original format", of course, don't forget change ASCII into utf-8 before you get started to edit the language file.
I've never used notepad after I found its bad utf-8 support to .sql files.
If you just finished utf-8 language files, as I know, you can't make Mambo work.

vpahuja
May 27th, 2005, 00:10
Remember, you could never been successful if you don't know how to use tools.

I don't know where is that coming from? Are you suggesting that I am not using the tools right? Please tell me how to use it. I will be thankful to you.



If you just finished utf-8 language files, as I know, you can't make Mambo work.

Point noted. Please write and share your experience with all so that all can benefit from your experience.

infograf768
May 27th, 2005, 01:13
That is very interesting. So, should I be working with CVS version? How can I get access to cvs version? I just tried to get it through tortoise and it failed. I think I will probably need special user name and password for that. Is that right?

You may get it with these parameters:
http://help.mamboserver.com/index.php?option=com_content&task=view&id=42&Itemid=0

Good luck

zhous
May 28th, 2005, 10:32
OK,vpahuja,we know different languages meet different way to solve its utf-8 version.
Frankly, I don't know what Hindi facing. If it's one of iso code. You'll finally find it's not very hard to convert a mambo version into a utf-8 version. If it just likes Chinese. That work would be tough.
OK, Let's just show what I did.
I used frontpage to deal with .xml. This the only tool I find both effective and very easy.
I used UltraEdit to deal with language files and all .sql files.
Cause mambo4.5.2 or mambo4.5.1a don't integrate mambelfish themselves. I have to change many php files(cause I've mixed Chinese into these files). So I used Converz.exe to do this job. The advantage of this tool is that you can batch change the code of these files. I don't care what I should change and what I can ignore. Just batch change all .php files.
Then, the tough thing waits for you. PHP and mysql badly support utf-8! I don't know how bad they are, but when I searched the web, more deeply I searched, more deeply my heart sank. So I just stopped and hope to work with some guys.

infograf768
May 28th, 2005, 10:45
That is very interesting. So, should I be working with CVS version? How can I get access to cvs version? I just tried to get it through tortoise and it failed. I think I will probably need special user name and password for that. Is that right?

If you still can't get the CVS with Tortoise, try this (it is another soft but the method is similar).
It is a good flash tutorial (in Spanish)
http://www.beza.com.ar/innocvs

You should definitely work on 453 CVS and not 452.

vpahuja
May 29th, 2005, 01:16
OK, Let's just show what I did.
I used frontpage to deal with .xml. This the only tool I find both effective and very easy.

Now we are talking. Which .xml file you had to change?

I used UltraEdit to deal with language files and all .sql files.

Interesting....I cannot comment on this as I have not yet tried converting .sql files.

Cause mambo4.5.2 or mambo4.5.1a don't integrate mambelfish themselves. I have to change many php files(cause I've mixed Chinese into these files). So I used Converz.exe to do this job. .

Tell me more about this Converz.exe. I could not anything about this exe on google.


The advantage of this tool is that you can batch change the code of these files. I don't care what I should change and what I can ignore. Just batch change all .php files.

What do you need to change in php files?


Thanks for you taking time and writing this. I am sure it will help many people.


Vish

vpahuja
June 1st, 2005, 00:16
Just want to update with my first day of playing with translation.

The methodology that I am looking at right now, which can change, is that I am starting with the translating the installation first. I found that there are two files that you need to do this. <language>.ini and <language>.xml. The format can be seem in english.ini and english.xml. I am following these as my standard template.

So far the results have been encouraging. I have some screenshots attached. THe purpose of this is to *share*

http://www.pahuja.net/projects/mambo_Hindi/images/p2.gif
http://www.pahuja.net/projects/mambo_Hindi/images/p3.gif

infograf768
June 1st, 2005, 10:05
Very encouraging indeed.
And a good idea, your google group.
To devise a complete method would help all.
A manual maybe?
(Don't forget to let me know about your project for the team).

MasterChief
June 1st, 2005, 22:21
OK, I'm reading up on character sets (http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html). What else do I need to know from a core perspective.

I currently work in Eclipse. Do I need to change the character set with which I save the raw files themselves?

Is there an advantage of supplying function like htmlspecialchars with their charset argument?

hedgie
June 2nd, 2005, 02:28
I see the latest edition of PHP Architect has an article on Unicode support, so I'll do some reading on the subject. Hopefully I will find some information that will help.

Could you plase post the URL of that article ?

Hedgie

MasterChief
June 2nd, 2005, 14:51
Re: PHP Architect You have to purchase the magazine to get it. It's very good and well worth the very cheap subscription. See http://www.phparch.com

vpahuja
June 2nd, 2005, 23:48
Awrightty !!

I have successfully finished translation of installation part of mambo into hindi. I have attached screenshots of my final version. Please comment.

I was not able to add the attachment so here is the link to download it

Download the screen shots (http://www.pahuja.net/projects/screenshots.zip)

You can also try out part of the install at

http://www.pahuja.net/projects/4.5.x

Please be nice and do not try to complete the install.

I would now like to know how can I have this included in the final version of Mambo?

I also have some thoughts and comments for the record:

-Translating Mambo 4.5.3 seems to be far more easier than Mambo 4.5.2. I think it is because of the effort put in by the developers. Kudos to them.

-The fact that in 4.5.3 the language files are *.ini instead of *.php file seems to do the trick for UTF-8 encoded languages. I was having hard time to get that to work in 4.5.2 where the language files are *.php file.

-I would also like to enphasise to the documentation team and develeopers that when writing a language file, please remember that it is very important to know the *context* in which a particular string is being used. In some languages the *context* changes and governs the flow of the sentence. I wish I could explain that with an example but I am sure you will get the point. If that context is documented, it really helps and fasten the translation process.

-I had seen this in 4.5.2 that some strings had variables in it like

"Please enter a valid %s. No spaces, more than %d characters and contain 0-9,a-z,A-Z"

%s I believe is a variable that will be filled in at run time. I really like it as this gives translators flexibility of forming the sentence. I am not sure if this could be done with an ini file, but just want to point this out as I did not see this in 4.5.3 and this might well be a limitation of using ini file.

Thats all I can think of right now...will add if something comes to my mind.

--Vish
www.mamboindia.org

infograf768
June 3rd, 2005, 00:07
By using the debugging in Global Config, you may find some of the context matters precisely (The •text• and ??text??.
See the draft Manual:
http://help.mamboserver.com/media/TranslationManual.pdf

Although you are right and, in any language, some partial sentences or just plain words do not have the same translation depending on the context.
The problem I foresee is that it would be different for each language...
And therefore multiply the instances of a specific LANG code, thus increasing size and number of ini files. Almost one line per instance instead of regrouping, as possible today in most latin languages.

Also, there is still one language.php file remaining.

MasterChief
June 3rd, 2005, 00:09
Well done vpahuja. If there are any things you had to do to the core let me know what they are and I'll take a look.

On the 'strings %s' format, yes, we need to hunt these down because I realise that words are in different orders in other languages.

vpahuja
June 3rd, 2005, 00:24
infograph:

Did you get a chance to see the screenshots?
What do you think?
How can I have this and more that is coming in the final Mambo version?

infograf768
June 3rd, 2005, 00:35
As far as I know, as Andrew is getting on it (specifically on the Context matter), as much chances as any language. :)

This if you can solve the language.php challenge.

cobrabyte
June 8th, 2005, 22:29
I know that it was mentioned, but PHP Architect is running an interesting article on Unicode this month. It is their sample article of the month, which means that you can download it (free) from their website for printing/viewing/eating.

¿Que suerte, eh?

Heh, seriously ... it's at http://www.phparch.com/issuedata/articles/article_179.pdf

Thanks,
Chris