Eric Fausett
|
Re: can not get the utf8?
|
Eric Fausett
04/26/2010 7:10 AM
post52739
|
Re: can not get the utf8?
|
|
|
Wojtek Lerch
|
Re: can not get the utf8?
|
Wojtek Lerch
04/26/2010 11:03 AM
post52783
|
Re: can not get the utf8?
Xiaolong,
Perhaps I misunderstood what you meant by your examples. My assumption that each line (such as "ffffff94") in your
original example represented a single byte of the UTF-8 string. In other words, I assumed that the original example was
produced by C code similar to this:
char *utf8string = ...;
for ( i = 0; utfstring[i] != '\0'; +i )
printf( " %x\n", utfstring[i] );
And now I'll assume that the new example represents each byte with a three-character sequence, such as obtained by C
code similar to this:
for ( i = 0; utfstring[i] != '\0'; +i )
printf( "%%%02X\n", (unsigned char) utfstring[i] );
If my assumption were correct, the only difference between %E5%86%8D and
ffffffe5
ffffff86
ffffff8d
would be how the same UTF-8 string is converted to text for the purpose of posting it here. In other words, both would
refer to the same three-byte UTF-8 sequence that represents the Unicode character U518D. Another ways of representing
the same byte sequence would be as snippets of C code such as
char utf8string[] = "\xE5\x86\x8D";
or
char utf8string[] = { 0xffffffe5, 0xffffff86, 0xffffff8d };
Obviously, I am misunderstanding something, and you will have to explain what *you* mean by those examples before I can
answer your question.
|
|
|