climateprediction.net home page

The world's largest climate forecasting experiment for the 21st century.

teamname - encoding problem


Advanced search
Questions and Answers : Getting started : teamname - encoding problem

Sort
AuthorMessage
uipo

Joined: Nov 15 06
Posts: 3
ID: 423803
Credit: 52,258
RAC: 0
Message 27691 - Posted 2 Apr 2007 16:30:01 UTC

Hello,

my team has a problem encoding the name "Universität der Bundeswehr München"

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=5624.

Within the Boincstats the name is "Universität der Bundeswehr München"

http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624

But it should belong to the right team:
http://www.boincstats.com/stats/boinc_team_graph.php?pr=bo&id=27493

With QMC my team had the same problem. Only an administrator of QMC was able to solve this.

Can anybody help?

Profile mo.v
Forum moderator
Avatar

Joined: Sep 29 04
Posts: 1642
ID: 21936
Credit: 1,288,809
RAC: 144
Message 27692 - Posted 2 Apr 2007 17:08:10 UTC
Last modified: 2 Apr 2007 17:08:26 UTC

For ä have you tried both ALT + 132 on the number keypad, and ALT + 0228?

For ü try both ALT + 129 and ALT + 0252.

Let us know whether one of those combinations works. (You also have to press the Number Lock key.)
____________
Cpdn news
5 CPDN READMEs

uipo

Joined: Nov 15 06
Posts: 3
ID: 423803
Credit: 52,258
RAC: 0
Message 27831 - Posted 12 Apr 2007 5:41:41 UTC - in response to Message ID 27692.

I tried these combinations, with the result that the teamname was written in an other way but not the correct one.

Can you help?

Thank you

Profile mo.v
Forum moderator
Avatar

Joined: Sep 29 04
Posts: 1642
ID: 21936
Credit: 1,288,809
RAC: 144
Message 27858 - Posted 13 Apr 2007 1:23:58 UTC
Last modified: 13 Apr 2007 1:25:18 UTC

Sorry about that. On this page I see it correctly:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=5624

Try this instead.

*Windows Start menu
*Programs
*Accessories
*System Tools
*Character map

(or you may need to click Start, Run, then type Charmap)

In the character map, you then need to

*choose a font
*double-click the character you want
*click Copy
*return to your document
*click Paste


If that doesn't work, I'll ask JohnofWem or Richard Rodway to come and make suggestions. I think they'll know what to do.

My team description doesn't display correctly either. (I didn't write it.) The word should be món, but ó shows as a square.
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=201


____________
Cpdn news
5 CPDN READMEs

uipo

Joined: Nov 15 06
Posts: 3
ID: 423803
Credit: 52,258
RAC: 0
Message 27931 - Posted 16 Apr 2007 19:52:20 UTC - in response to Message ID 27858.

I tried this, but the problem is still the same. Within Climate Prediction the name is still correct, but within the Boincstats it is not correct.

It is still written in the wrong way.
You can see it here: http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624

But it should belong to the right team:
http://www.boincstats.com/stats/boinc_team_graph.php?pr=bo&id=27493

For us it does not matter in which way the team-name is written within Climate-Prediction. For us it is important that the results of Climate Prediction belong to the right team within the boincstats.

Thank you for your help.

Profile mo.v
Forum moderator
Avatar

Joined: Sep 29 04
Posts: 1642
ID: 21936
Credit: 1,288,809
RAC: 144
Message 27940 - Posted 17 Apr 2007 10:39:42 UTC

Here's a live link where the problem is obvious

http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624

I'll see if I can get John and/or Richard to advise.
____________
Cpdn news
5 CPDN READMEs

Profile JohnofWem

Joined: Feb 15 06
Posts: 10
ID: 252341
Credit: 1,058,188
RAC: 2,411
Message 27943 - Posted 17 Apr 2007 13:13:08 UTC - in response to Message ID 27940.
Last modified: 17 Apr 2007 13:13:55 UTC

Here's a live link where the problem is obvious

http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624

I'll see if I can get John and/or Richard to advise.


Hmm. Looks like a misinterpretation of the unicode characters between the Windows fonts and international universal fonts. There is often more than one way of getting the same symbol even in the same font set. They look the same to the viewer but the underlying string of bytes representing the characters is different and often, as in this case, there are two bytes for a single non-Latin character, the first byte indicating where to start counting in the table of more than 256 characters. All standard ASCII characters are numbered between 32 and 127 with extended ones up to 255. Some of these extended ones are accented letters but these could be repeated later in the table, often more than once if they are used in different language sets; this is probably where the problem arises.

I have had this problem, especially with German characters but I get round it by always using the same method for any string input output or matches and ensuring that Windows is always using the same international settings and fonts for a particular country-based database. Obviously you can't do this here as you don't know what international setting is used for the string matching or the string input/output. The real problem is that Microsoft is American English based at its core. For British users this only means spelling some words like colour and programme differently but for languages with accented or even different characters this can be much worse.


Sorry if this isn't much help but it may at least explain the problemThis might help or try googling unicode for more information.

Profile mo.v
Forum moderator
Avatar

Joined: Sep 29 04
Posts: 1642
ID: 21936
Credit: 1,288,809
RAC: 144
Message 27945 - Posted 17 Apr 2007 18:46:08 UTC

Following what John said, I wonder whether using this would help?

http://www.atm.ox.ac.uk/user/iwi/charmap.html
____________
Cpdn news
5 CPDN READMEs

rrodway

Joined: Jan 19 07
Posts: 9
ID: 221094
Credit: 2,010,193
RAC: 468
Message 27947 - Posted 17 Apr 2007 19:07:57 UTC
Last modified: 17 Apr 2007 19:14:36 UTC

You're sending UTF-8 (A unicode encoding) to the Boincstats site, but that site is expecting (probably) CP1252 (possibly ISO 8859-1). Solution, EITHER whatever is sending the data to Boincstats needs to send what that site expects (try CP1252), OR (better) Boincstats should support Unicode (UTF-8)

There's a Japanese term for this... mojibake :)

--Richard

<edit> Just verified. Boincstats is serving the page up encoded as ISO8859-1. And stuffing UTF-8 into it. A bit naughty!. They should change the content="text/html; charset=iso-8859-1"> at the top of their served pages to content="text/html; charset=utf-8">

Profile mo.v
Forum moderator
Avatar

Joined: Sep 29 04
Posts: 1642
ID: 21936
Credit: 1,288,809
RAC: 144
Message 27952 - Posted 17 Apr 2007 19:41:33 UTC
Last modified: 17 Apr 2007 19:42:46 UTC

Hi Richard

Does that mean there's no real list of code numbers that members can use to make their team names display properly on these boincstats pages?

If you confirm that this is the case, I'll post about the problem on the boinc_dev forum.
____________
Cpdn news
5 CPDN READMEs

rrodway

Joined: Jan 19 07
Posts: 9
ID: 221094
Credit: 2,010,193
RAC: 468
Message 27958 - Posted 17 Apr 2007 21:54:31 UTC

Sorry about the delay in reply, had to get my daughter to bed.

I wouldn't think so. It's definitely UTF-8 that's appearing on the boincstats pages and it looks like the correct (2 byte) UTF-8 sequences are being used. Unfortunately the page is being served as an ISO8859-1 page and as a result the 2 byte sequence is not being interpreted as one character, but as two.

I notice that the climateprediction page for that team is also a 8859-1 encoded page, but in this case the correct code values are being used. 'ä' is encoded as the single byte 0xE4 in 8859-1 and this is being used on the cpdn pages.

I don't know how the team name is getting propagated to the boincstats servers, but something in the way has translated that to UTF-8. The encoding for 'ä' in UTF-8 is the 2 byte sequence 0xC3 0xA4. However if you read that as 8859-1 then instead of translating that sequence into the one character U+00E4 (ä) it gets viewed as the 2 8859-1 characters 0xC3 and 0xA4. 0xC3 is a Ã, 0xA4 is a ¤

To fix the problem you need to make sure that whatever is sending the team names to boincstats is doing so in an encoding that boincstats understands. There's nothing at all wrong with UTF-8, and my preferred solution is for boincstats to use UTF-8 in its webpages. Not only would this fix this problem, it'd also allow teams (and names) to use any character. Such as Japanese or Korean characters... Which is quite impossible in 8859-1, there's only 256 characters in that characterset, as opposed to about 1.1 million in Unicode... (although I think only about 150,000 are currently in use)

--Richard

Les Bayliss
Forum moderator

Joined: Sep 5 04
Posts: 3623
ID: 12875
Credit: 3,467,707
RAC: 213
Message 27959 - Posted 17 Apr 2007 22:17:37 UTC

There are other stats sites, so I guess that a check on how they're handling this is also needed. It may just be BOINCstats.

Profile mo.v
Forum moderator
Avatar

Joined: Sep 29 04
Posts: 1642
ID: 21936
Credit: 1,288,809
RAC: 144
Message 27981 - Posted 18 Apr 2007 20:53:56 UTC

With a lot of help from Richard, I've now posted about this problem on the boinc_dev forum

http://boinc.berkeley.edu/dev/forum_thread.php?id=1734
____________
Cpdn news
5 CPDN READMEs

[BOINCstats] Willy

Joined: Aug 12 04
Posts: 35
ID: 824
Credit: 322,875
RAC: 0
Message 28078 - Posted 23 Apr 2007 20:50:12 UTC
Last modified: 23 Apr 2007 20:50:31 UTC

I think the problem is in the way CPDN is exporting stats.

In the XML file are these lines:


<team>
<id>5624</id>
<type>6</type>
<name>Universit&#239;&#191;&#189;t der Bundeswehr M&#239;&#191;&#189;nchen</name>
....


Notice the HTML codes. When you put the team name in a html file and view it in a browser it translates to the wrong characters seen on BOINCstats.

The wrong characters are also seen on other stats sites.
____________

Join team BOINCstats

Profile mo.v
Forum moderator
Avatar

Joined: Sep 29 04
Posts: 1642
ID: 21936
Credit: 1,288,809
RAC: 144
Message 28079 - Posted 23 Apr 2007 23:40:01 UTC
Last modified: 23 Apr 2007 23:48:27 UTC

Hi Willy

What I don't understand is why the combinations that the member types for ä and ü, which must be two different combinations, both translate to the same string &#239;&#191;&#189;. This looks like a list of 3 items.

Richard Rodway and I submitted this problem as boinc Trac ticket #57

http://boinc.ssl.berkeley.edu/trac/query

We thought this was a boinc problem rather than a defect in the cpdn (and other project) software. I think I'd better ask Milo in Oxford to have a look at this thread.


____________
Cpdn news
5 CPDN READMEs

rrodway

Joined: Jan 19 07
Posts: 9
ID: 221094
Credit: 2,010,193
RAC: 468
Message 28712 - Posted 15 May 2007 15:14:57 UTC

Fossilised reply, but just for interest. That encoded sequence in the XML is the UTF-8 encoding of the Unicode U+FFFD, which is the 'replacement' character. It's used when you are trying to convert something to Unicode and that conversion failed. So in otherwords, whatever is generating that XML is trying to translate the a umlaut and u umlaut to UTF-8 and failing (maybe because it's assuming ASCII source or something?)
However this doesn't explain what actually ended up in BOINCstats. Somehow the 'real' data got through to it, otherwise we'd have seen � in the team name on the pages, not ä (for the a umlaut)

As a matter of interest I had a look through some Japanese team names. Most just use English names (probably because they worked out that Japanese names didn't work :)) I didn't find any with correctly displaying Japanese names, I did find some with names displaying the same symptoms as we see here (UTF-8 displayed as 8859-1)

All of this is too late to be of any interest I suspect, I've been way way too busy recently.

--Richard

Profile mo.v
Forum moderator
Avatar

Joined: Sep 29 04
Posts: 1642
ID: 21936
Credit: 1,288,809
RAC: 144
Message 28713 - Posted 15 May 2007 19:29:56 UTC
Last modified: 15 May 2007 19:38:13 UTC

Well, it looks as if we wasted our time submitting the problem to the wrong people/place. And Milo didn't get an answer to the query he added either. Here's the fate of our ticket - wontfix.

https://boinc.berkeley.edu/trac/ticket/57

Does anyone know who might be willing and able to fix this defect? I've reopened the ticket to ask.
____________
Cpdn news
5 CPDN READMEs

Questions and Answers : Getting started : teamname - encoding problem




Copyright © 2002-2009 climateprediction.net