Project Home
Project Home
Trackers
Trackers
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters: (5 Items)
   
Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters  
Hello, 

I am using regcomp and regexec for matching Chinese UTF-8 strings with Chinese UTF-8 regular expressions. 

When I use the following regular expression there is a match
String to match : "火车a"
Regular Expression : ^(火车.)$

The "." in the regular expression correctly matches the "a" in the string

However when I use the following regular expression there is NO match. (it should have matched) 

String to match : "火车车"
Regular Expression : ^(火车.)$

The "." in the regular expression should have matched the 3rd chinese character "车" in the string... but it is NOT. 

However I observed that the following returns a match: 
String to match : "火车车"
Regular Expression : ^(火车...)$

Essentially I had to put 3 "." i.e. "..." instead of a single "." for the regular expression to match. 

However a single "." should essentially match any character be it Chinese or ASCII.. 

Can someone please tell me what needs to be done so that I can use a single "." character to match. 

thanks,
Raj

Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters  
Hi Raj,

Both regcomp() and regexec() work with 8bit characters. So one . (dot) matches only one 8bit character or 1 byte. The 车
 has 3 byte UTF-8 code 0xE8 0xBD 0xA6. So you have to match 车 as three dots (…) or use another regular expression. 

Respectfully,
Oleg

1 июня 2016 г., в 18:20:00, Raj Bhumagani <community-noreply@qnx.com> написал:

> Hello, 
> 
> I am using regcomp and regexec for matching Chinese UTF-8 strings with Chinese UTF-8 regular expressions. 
> 
> When I use the following regular expression there is a match
> String to match : "火车a"
> Regular Expression : ^(火车.)$
> 
> The "." in the regular expression correctly matches the "a" in the string
> 
> However when I use the following regular expression there is NO match. (it should have matched) 
> 
> String to match : "火车车"
> Regular Expression : ^(火车.)$
> 
> The "." in the regular expression should have matched the 3rd chinese character "车" in the string... but it is NOT. 
> 
> However I observed that the following returns a match: 
> String to match : "火车车"
> Regular Expression : ^(火车...)$
> 
> Essentially I had to put 3 "." i.e. "..." instead of a single "." for the regular expression to match. 
> 
> However a single "." should essentially match any character be it Chinese or ASCII.. 
> 
> Can someone please tell me what needs to be done so that I can use a single "." character to match. 
> 
> thanks,
> Raj
> 
> 
> 
> 
> 
> _______________________________________________
> 
> QNX4 Community Support
> http://community.qnx.com/sf/go/post116376
> To cancel your subscription to this discussion, please e-mail qnx4-community-unsubscribe@community.qnx.com

RE: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters  
Hello Oleg, 

Can you please clarify what the other regular expressions are available for QNX 6.3.2 platform.. 

Are these in some existing libraries that we can download from the foundry. If yes can you please let us know which ones
 they are or share the link. 

Thanks,
Raj

-----Original Message-----
From: Олег Большаков [mailto:community-noreply@qnx.com] 
Sent: Friday, June 03, 2016 6:02 PM
To: qnx4-community@community.qnx.com
Cc: Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation); 
Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation)
Subject: EXT: Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters

Hi Raj,



Both regcomp() and regexec() work with 8bit characters. So one . (dot) matches only one 8bit character or 1 byte. The 车
 has 3 byte UTF-8 code 0xE8 0xBD 0xA6. So you have to match 车 as three dots (…) or use another regular expression. 



Respectfully,

Oleg



1 июня 2016 г., в 18:20:00, Raj Bhumagani <community-noreply@qnx.com> написал:



> Hello, 

> 

> I am using regcomp and regexec for matching Chinese UTF-8 strings with Chinese UTF-8 regular expressions. 

> 

> When I use the following regular expression there is a match

> String to match : "火车a"

> Regular Expression : ^(火车.)$

> 

> The "." in the regular expression correctly matches the "a" in the string

> 

> However when I use the following regular expression there is NO match. (it should have matched) 

> 

> String to match : "火车车"

> Regular Expression : ^(火车.)$

> 

> The "." in the regular expression should have matched the 3rd chinese character "车" in the string... but it is NOT. 

> 

> However I observed that the following returns a match: 

> String to match : "火车车"

> Regular Expression : ^(火车...)$

> 

> Essentially I had to put 3 "." i.e. "..." instead of a single "." for the regular expression to match. 

> 

> However a single "." should essentially match any character be it Chinese or ASCII.. 

> 

> Can someone please tell me what needs to be done so that I can use a single "." character to match. 

> 

> thanks,

> Raj

> 

> 

> 

> 

> 

> _______________________________________________

> 

> QNX4 Community Support

> https://urldefense.proofpoint.com/v2/url?u=http-3A__community.qnx.com_sf_go_post116376&d=CwIGaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=Qkq3thnwMzV5zVsxn1O85oHu8Ax_ML0WYHSrWEFY7uE&m=Py3dmmDDlagc-AleG_s_dkZyRptg1CnN6pGRYYXHNLQ&s=GMqWisSCWU34xK7-hsTmAge_Afj0N3R7-Fra3qKeiR0&e= 

> To cancel your subscription to
 this discussion, please e-mail qnx4-community-unsubscribe@community.qnx.com







_______________________________________________

QNX4 Community Support
https://urldefense.proofpoint.com/v2/url?u=http-3A__community.qnx.com_sf_go_post116378&d=CwIGaQ&c=
IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=Qkq3thnwMzV5zVsxn1O85oHu8Ax_ML0WYHSrWEFY7uE&m=Py3dmmDDlagc-
AleG_s_dkZyRptg1CnN6pGRYYXHNLQ&s=CbptcN5d9BouJTUg8Vx0heUG0vzx53gwrBcn1OoCiD0&e= 
To cancel your subscription to this discussion, please e-mail qnx4-community-unsubscribe@community.qnx.com
Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters  
This is discussion board for QNX 4. You have to post your question about QNX 6.3.2 in appropriate discussion board.

Respectfully,
Oleg

3 июня 2016 г., в 15:42:14, Bhumagani, Rajani Kanth (GE Transportation) <community-noreply@qnx.com> написал
:

> Hello Oleg, 
> 
> Can you please clarify what the other regular expressions are available for QNX 6.3.2 platform.. 
> 
> Are these in some existing libraries that we can download from the foundry. If yes can you please let us know which 
ones they are or share the link. 
> 
> Thanks,
> Raj
> 
> -----Original Message-----
> From: Олег Большаков [mailto:community-noreply@qnx.com] 
> Sent: Friday, June 03, 2016 6:02 PM
> To: qnx4-community@community.qnx.com
> Cc: Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation); 
Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation)
> Subject: EXT: Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters
> 
> Hi Raj,
> 
> 
> 
> Both regcomp() and regexec() work with 8bit characters. So one . (dot) matches only one 8bit character or 1 byte. The 
车 has 3 byte UTF-8 code 0xE8 0xBD 0xA6. So you have to match 车 as three dots (…) or use another regular expression. 
> 
> 
> 
> Respectfully,
> 
> Oleg
> 
> 
> 
> 1 июня 2016 г., в 18:20:00, Raj Bhumagani <community-noreply@qnx.com> написал:
> 
> 
> 
>> Hello, 
> 
>> 
> 
>> I am using regcomp and regexec for matching Chinese UTF-8 strings with Chinese UTF-8 regular expressions. 
> 
>> 
> 
>> When I use the following regular expression there is a match
> 
>> String to match : "火车a"
> 
>> Regular Expression : ^(火车.)$
> 
>> 
> 
>> The "." in the regular expression correctly matches the "a" in the string
> 
>> 
> 
>> However when I use the following regular expression there is NO match. (it should have matched) 
> 
>> 
> 
>> String to match : "火车车"
> 
>> Regular Expression : ^(火车.)$
> 
>> 
> 
>> The "." in the regular expression should have matched the 3rd chinese character "车" in the string... but it is NOT. 

> 
>> 
> 
>> However I observed that the following returns a match: 
> 
>> String to match : "火车车"
> 
>> Regular Expression : ^(火车...)$
> 
>> 
> 
>> Essentially I had to put 3 "." i.e. "..." instead of a single "." for the regular expression to match. 
> 
>> 
> 
>> However a single "." should essentially match any character be it Chinese or ASCII.. 
> 
>> 
> 
>> Can someone please tell me what needs to be done so that I can use a single "." character to match. 
> 
>> 
> 
>> thanks,
> 
>> Raj
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> _______________________________________________
> 
>> 
> 
>> QNX4 Community Support
> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__community.qnx.com_sf_go_post116376&d=CwIGaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=Qkq3thnwMzV5zVsxn1O85oHu8Ax_ML0WYHSrWEFY7uE&m=Py3dmmDDlagc-AleG_s_dkZyRptg1CnN6pGRYYXHNLQ&s=GMqWisSCWU34xK7-hsTmAge_Afj0N3R7-Fra3qKeiR0&e= 
> 
>> To cancel your subscription to this discussion, please e-mail qnx4-community-unsubscribe@community.qnx.com
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> 
> QNX4 Community Support
>...
View Full Message
RE: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters  
Hello Oleg, 

I just realized you were responding to the question I posted on the Discussion board. 

I had in fact already created a case (00151453) in the Technical Support Portal using our Silver Support Plan. The case 
has all the right fields populated. 

I hope that should work for now. 

Thanks,
Raj


-----Original Message-----
From: Олег Большаков [mailto:community-noreply@qnx.com] 
Sent: Friday, June 03, 2016 6:30 PM
To: qnx4-community@community.qnx.com
Cc: Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation); 
Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation)
Subject: EXT: Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters

This is discussion board for QNX 4. You have to post your question about QNX 6.3.2 in appropriate discussion board.



Respectfully,

Oleg



3 июня 2016 г., в 15:42:14, Bhumagani, Rajani Kanth (GE Transportation) <community-noreply@qnx.com> написал
:



> Hello Oleg, 

> 

> Can you please clarify what the other regular expressions are available for QNX 6.3.2 platform.. 

> 

> Are these in some existing libraries that we can download from the foundry. If yes can you please let us know which 
ones they are or share the link. 

> 

> Thanks,

> Raj

> 

> -----Original Message-----

> From: Олег Большаков [mailto:community-noreply@qnx.com] 

> Sent: Friday, June 03, 2016 6:02 PM

> To: qnx4-community@community.qnx.com

> Cc: Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation); 
Mudiam, Veni (GE Transportation); Paritosh, Prakarsh (GE Transportation); Chauhan, Shivali (GE Transportation)

> Subject: EXT: Re: Regular Expressions issue with meta character "." usage for UTF-8 Chinese Characters

> 

> Hi Raj,

> 

> 

> 

> Both regcomp() and regexec() work with 8bit characters. So one . (dot) matches only one 8bit character or 1 byte. The 
车 has 3 byte UTF-8 code 0xE8 0xBD 0xA6. So you have to match 车 as three dots (…) or use another regular expression. 

> 

> 

> 

> Respectfully,

> 

> Oleg

> 

> 

> 

> 1 июня 2016 г., в 18:20:00, Raj Bhumagani <community-noreply@qnx.com> написал:

> 

> 

> 

>> Hello, 

> 

>> 

> 

>> I am using regcomp and regexec for matching Chinese UTF-8 strings with Chinese UTF-8 regular expressions. 

> 

>> 

> 

>> When I use the following regular expression there is a match

> 

>> String to match : "火车a"

> 

>> Regular Expression : ^(火车.)$

> 

>> 

> 

>> The "." in the regular expression correctly matches the "a" in the string

> 

>> 

> 

>> However when I use the following regular expression there is NO match. (it should have matched) 

> 

>> 

> 

>> String to match : "火车车"

> 

>> Regular Expression : ^(火车.)$

> 

>> 

> 

>> The "." in the regular expression should have matched the 3rd chinese character "车" in the string... but it is NOT. 


> 

>> 

> 

>> However I observed that the following returns a match: 

> 

>> String to match : "火车车"

> 

>> Regular Expression : ^(火车...)$

> 

>> 

> 

>> Essentially I had to put 3 "." i.e. "..." instead of a single "." for the regular expression to match. 

> 

>> 

> 

>> However a single "." should essentially match any character be it Chinese or ASCII.. 

> 

>> 

> 

>> Can someone please tell me what needs to be done so that I can use a single "." character to...
View Full Message