Raj Bhumagani(deleted)
|
Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
|
Raj Bhumagani(deleted)
06/01/2016 11:20 AM
post116376
|
Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
Hello,
I am using regcomp and regexec for matching Chinese UTF-8 strings with Chinese UTF-8 regular expressions.
When I use the following regular expression there is a match
String to match : "火车a"
Regular Expression : ^(火车.)$
The "." in the regular expression correctly matches the "a" in the string
However when I use the following regular expression there is NO match. (it should have matched)
String to match : "火车车"
Regular Expression : ^(火车.)$
The "." in the regular expression should have matched the 3rd chinese character "车" in the string... but it is NOT.
However I observed that the following returns a match:
String to match : "火车车"
Regular Expression : ^(火车...)$
Essentially I had to put 3 "." i.e. "..." instead of a single "." for the regular expression to match.
However a single "." should essentially match any character be it Chinese or ASCII..
Can someone please tell me what needs to be done so that I can use a single "." character to match.
thanks,
Raj
|
|
|
Oleg Bolshakov
|
Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
|
Oleg Bolshakov
06/03/2016 8:33 AM
post116378
|
Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
Hi Raj,
Both regcomp() and regexec() work with 8bit characters. So one . (dot) matches only one 8bit character or 1 byte. The 车
has 3 byte UTF-8 code 0xE8 0xBD 0xA6. So you have to match 车 as three dots (…) or use another regular expression.
Respectfully,
Oleg
1 июня 2016 г., в 18:20:00, Raj Bhumagani <community-noreply@qnx.com> написал:
> Hello,
>
> I am using regcomp and regexec for matching Chinese UTF-8 strings with Chinese UTF-8 regular expressions.
>
> When I use the following regular expression there is a match
> String to match : "火车a"
> Regular Expression : ^(火车.)$
>
> The "." in the regular expression correctly matches the "a" in the string
>
> However when I use the following regular expression there is NO match. (it should have matched)
>
> String to match : "火车车"
> Regular Expression : ^(火车.)$
>
> The "." in the regular expression should have matched the 3rd chinese character "车" in the string... but it is NOT.
>
> However I observed that the following returns a match:
> String to match : "火车车"
> Regular Expression : ^(火车...)$
>
> Essentially I had to put 3 "." i.e. "..." instead of a single "." for the regular expression to match.
>
> However a single "." should essentially match any character be it Chinese or ASCII..
>
> Can someone please tell me what needs to be done so that I can use a single "." character to match.
>
> thanks,
> Raj
>
>
>
>
>
> _______________________________________________
>
> QNX4 Community Support
> http://community.qnx.com/sf/go/post116376
> To cancel your subscription to this discussion, please e-mail qnx4-community-unsubscribe@community.qnx.com
|
|
|
Raj Bhumagani(deleted)
|
RE: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
|
Raj Bhumagani(deleted)
06/03/2016 8:42 AM
post116379
|
RE: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
Hello Oleg,
Can you please clarify what the other regular expressions are available for QNX 6.3.2 platform..
Are these in some existing libraries that we can download from the foundry. If yes can you please let us know which ones
they are or share the link.
Thanks,
Raj
-----Original Message-----
From: Олег Большаков [mailto:community-noreply@qnx.com]
Sent: Friday, June 03, 2016 6:02 PM
To: qnx4-community@community.qnx.com
Cc: Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation);
Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation)
Subject: EXT: Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
Hi Raj,
Both regcomp() and regexec() work with 8bit characters. So one . (dot) matches only one 8bit character or 1 byte. The 车
has 3 byte UTF-8 code 0xE8 0xBD 0xA6. So you have to match 车 as three dots (…) or use another regular expression.
Respectfully,
Oleg
1 июня 2016 г., в 18:20:00, Raj Bhumagani <community-noreply@qnx.com> написал:
> Hello,
>
> I am using regcomp and regexec for matching Chinese UTF-8 strings with Chinese UTF-8 regular expressions.
>
> When I use the following regular expression there is a match
> String to match : "火车a"
> Regular Expression : ^(火车.)$
>
> The "." in the regular expression correctly matches the "a" in the string
>
> However when I use the following regular expression there is NO match. (it should have matched)
>
> String to match : "火车车"
> Regular Expression : ^(火车.)$
>
> The "." in the regular expression should have matched the 3rd chinese character "车" in the string... but it is NOT.
>
> However I observed that the following returns a match:
> String to match : "火车车"
> Regular Expression : ^(火车...)$
>
> Essentially I had to put 3 "." i.e. "..." instead of a single "." for the regular expression to match.
>
> However a single "." should essentially match any character be it Chinese or ASCII..
>
> Can someone please tell me what needs to be done so that I can use a single "." character to match.
>
> thanks,
> Raj
>
>
>
>
>
> _______________________________________________
>
> QNX4 Community Support
> https://urldefense.proofpoint.com/v2/url?u=http-3A__community.qnx.com_sf_go_post116376&d=CwIGaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=Qkq3thnwMzV5zVsxn1O85oHu8Ax_ML0WYHSrWEFY7uE&m=Py3dmmDDlagc-AleG_s_dkZyRptg1CnN6pGRYYXHNLQ&s=GMqWisSCWU34xK7-hsTmAge_Afj0N3R7-Fra3qKeiR0&e=
> To cancel your subscription to
this discussion, please e-mail qnx4-community-unsubscribe@community.qnx.com
_______________________________________________
QNX4 Community Support
https://urldefense.proofpoint.com/v2/url?u=http-3A__community.qnx.com_sf_go_post116378&d=CwIGaQ&c=
IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=Qkq3thnwMzV5zVsxn1O85oHu8Ax_ML0WYHSrWEFY7uE&m=Py3dmmDDlagc-
AleG_s_dkZyRptg1CnN6pGRYYXHNLQ&s=CbptcN5d9BouJTUg8Vx0heUG0vzx53gwrBcn1OoCiD0&e=
To cancel your subscription to this discussion, please e-mail qnx4-community-unsubscribe@community.qnx.com
|
|
|
Oleg Bolshakov
|
Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
|
Oleg Bolshakov
06/03/2016 9:01 AM
post116380
|
Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
This is discussion board for QNX 4. You have to post your question about QNX 6.3.2 in appropriate discussion board.
Respectfully,
Oleg
3 июня 2016 г., в 15:42:14, Bhumagani, Rajani Kanth (GE Transportation) <community-noreply@qnx.com> написал
:
> Hello Oleg,
>
> Can you please clarify what the other regular expressions are available for QNX 6.3.2 platform..
>
> Are these in some existing libraries that we can download from the foundry. If yes can you please let us know which
ones they are or share the link.
>
> Thanks,
> Raj
>
> -----Original Message-----
> From: Олег Большаков [mailto:community-noreply@qnx.com]
> Sent: Friday, June 03, 2016 6:02 PM
> To: qnx4-community@community.qnx.com
> Cc: Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation);
Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation)
> Subject: EXT: Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
>
> Hi Raj,
>
>
>
> Both regcomp() and regexec() work with 8bit characters. So one . (dot) matches only one 8bit character or 1 byte. The
车 has 3 byte UTF-8 code 0xE8 0xBD 0xA6. So you have to match 车 as three dots (…) or use another regular expression.
>
>
>
> Respectfully,
>
> Oleg
>
>
>
> 1 июня 2016 г., в 18:20:00, Raj Bhumagani <community-noreply@qnx.com> написал:
>
>
>
>> Hello,
>
>>
>
>> I am using regcomp and regexec for matching Chinese UTF-8 strings with Chinese UTF-8 regular expressions.
>
>>
>
>> When I use the following regular expression there is a match
>
>> String to match : "火车a"
>
>> Regular Expression : ^(火车.)$
>
>>
>
>> The "." in the regular expression correctly matches the "a" in the string
>
>>
>
>> However when I use the following regular expression there is NO match. (it should have matched)
>
>>
>
>> String to match : "火车车"
>
>> Regular Expression : ^(火车.)$
>
>>
>
>> The "." in the regular expression should have matched the 3rd chinese character "车" in the string... but it is NOT.
>
>>
>
>> However I observed that the following returns a match:
>
>> String to match : "火车车"
>
>> Regular Expression : ^(火车...)$
>
>>
>
>> Essentially I had to put 3 "." i.e. "..." instead of a single "." for the regular expression to match.
>
>>
>
>> However a single "." should essentially match any character be it Chinese or ASCII..
>
>>
>
>> Can someone please tell me what needs to be done so that I can use a single "." character to match.
>
>>
>
>> thanks,
>
>> Raj
>
>>
>
>>
>
>>
>
>>
>
>>
>
>> _______________________________________________
>
>>
>
>> QNX4 Community Support
>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__community.qnx.com_sf_go_post116376&d=CwIGaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=Qkq3thnwMzV5zVsxn1O85oHu8Ax_ML0WYHSrWEFY7uE&m=Py3dmmDDlagc-AleG_s_dkZyRptg1CnN6pGRYYXHNLQ&s=GMqWisSCWU34xK7-hsTmAge_Afj0N3R7-Fra3qKeiR0&e=
>
>> To cancel your subscription to this discussion, please e-mail qnx4-community-unsubscribe@community.qnx.com
>
>
>
>
>
>
>
> _______________________________________________
>
> QNX4 Community Support
>...
View Full Message
|
|
|
Raj Bhumagani(deleted)
|
RE: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
|
Raj Bhumagani(deleted)
06/03/2016 9:10 AM
post116381
|
RE: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
Hello Oleg,
I just realized you were responding to the question I posted on the Discussion board.
I had in fact already created a case (00151453) in the Technical Support Portal using our Silver Support Plan. The case
has all the right fields populated.
I hope that should work for now.
Thanks,
Raj
-----Original Message-----
From: Олег Большаков [mailto:community-noreply@qnx.com]
Sent: Friday, June 03, 2016 6:30 PM
To: qnx4-community@community.qnx.com
Cc: Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation);
Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation)
Subject: EXT: Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
This is discussion board for QNX 4. You have to post your question about QNX 6.3.2 in appropriate discussion board.
Respectfully,
Oleg
3 июня 2016 г., в 15:42:14, Bhumagani, Rajani Kanth (GE Transportation) <community-noreply@qnx.com> написал
:
> Hello Oleg,
>
> Can you please clarify what the other regular expressions are available for QNX 6.3.2 platform..
>
> Are these in some existing libraries that we can download from the foundry. If yes can you please let us know which
ones they are or share the link.
>
> Thanks,
> Raj
>
> -----Original Message-----
> From: Олег Большаков [mailto:community-noreply@qnx.com]
> Sent: Friday, June 03, 2016 6:02 PM
> To: qnx4-community@community.qnx.com
> Cc: Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation);
Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation)
> Subject: EXT: Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
>
> Hi Raj,
>
>
>
> Both regcomp() and regexec() work with 8bit characters. So one . (dot) matches only one 8bit character or 1 byte. The
车 has 3 byte UTF-8 code 0xE8 0xBD 0xA6. So you have to match 车 as three dots (…) or use another regular expression.
>
>
>
> Respectfully,
>
> Oleg
>
>
>
> 1 июня 2016 г., в 18:20:00, Raj Bhumagani <community-noreply@qnx.com> написал:
>
>
>
>> Hello,
>
>>
>
>> I am using regcomp and regexec for matching Chinese UTF-8 strings with Chinese UTF-8 regular expressions.
>
>>
>
>> When I use the following regular expression there is a match
>
>> String to match : "火车a"
>
>> Regular Expression : ^(火车.)$
>
>>
>
>> The "." in the regular expression correctly matches the "a" in the string
>
>>
>
>> However when I use the following regular expression there is NO match. (it should have matched)
>
>>
>
>> String to match : "火车车"
>
>> Regular Expression : ^(火车.)$
>
>>
>
>> The "." in the regular expression should have matched the 3rd chinese character "车" in the string... but it is NOT.
>
>>
>
>> However I observed that the following returns a match:
>
>> String to match : "火车车"
>
>> Regular Expression : ^(火车...)$
>
>>
>
>> Essentially I had to put 3 "." i.e. "..." instead of a single "." for the regular expression to match.
>
>>
>
>> However a single "." should essentially match any character be it Chinese or ASCII..
>
>>
>
>> Can someone please tell me what needs to be done so that I can use a single "." character to...
View Full Message
|
|
|
|