Register  |  Login




Advertisement

Start Your Own Q&A Site

Create your own Q&A site easily, allowing you to quickly grow a new community around any subject matter or generate new organic traffic for your existing website.

Question

Status: Closed Points: 250 Time: 14:47 - Apr 19, 2007  

jgivoni

Regular expression with foreign characters.

I am working on a search engine handling utf-8 encoded text in any language.
Everything is working so far: The search term is recieved from the user, passed on to the database, and matching rows are returned to the browser - all in utf-8 all the way.

Typing certain foreign characters as ñ, é and ô also matches any n, e and o and vice versa (using MySQL's LIKE operator).

The problem appears when I try to highlight the search terms in the resulting page.
This is done using PHP's preg_replace function and in this case ñ only matches ñ, not n, as well as é matches é but not e and so on. The result simply is that some found rows won't have anything highlighted.

Is there a way to make the regex insensitive to these differences (in a similar way that the i modifier makes it case-insensitive i.e. n also matches N)?
I have tried using the u modifier (for utf-8) but it did not seem to have any effect.

Please help me here!

Jakob

Answer Discussion
Tutorials

 

jgivoni

Date:: May 19, 2007

Time:: 07:21

From other forums and experts I've learned that there is no way to do what I wanted with a simple regex modifer.
However, I've found a solution that is not very complicated.

Here is a detailed description of my PHP approach:

// First I make a string of characters grouped together, which should be treated as equivalent
$equiv = "aàáâãäå,eéèêë,iì&#
237;îï,oòóôõö,uùúûü,y
53;ÿ,nñ,cç";

// The groups are split into an array and each group is processed
$equiv = explode(",", $equiv);
foreach ($equiv as $e)
{
// If either of the characters of a group is found in my search term, they will be replaced by the
// entire group (in [] brackets) before matching the search term against the search result text
// I use the /u modifier because my document is utf-8 encoded
 $term = preg_replace("/[$e]/iu", "[$e]", $term);
}

// The modified search term will now match similar terms of the search result text $str
// and wrap them in a 'highlighting' tag
$str = preg_replace("/$term/iu", "<span class='highlight'>$0</span>", $str);

Example:
- term = "leon"
- "leon" will not match "léon"
- therefore "leon" will be substituted with "l[eéèêë]on"
- "l[eéèêë]on" will now match "léon"

Hope it's useful :-)
Jakob

jgivoni

Date:: May 19, 2007

Time:: 07:22

Shit, I see that some of my characters were not well received by quomon, so please ignore the strange parts of this:
$equiv = "aàáâãäå,eéèêë,iì&a
mp;#
237;îï,oòóôõö,uùúûü,y
53;ÿ,nñ,cç";
The numbers should have been shown as foreign characters.

Question Answered

This question has been closed, and points have been rewarded to the following experts:


jgivoni: 250

You're welcome however to comment or give additional information or if you wish, you have the ability to write a Tutorial in the Tutorial Area.

Answer this Question

New User

Email:

Upon submission of this form, you will automatically be registered as a Quomon user and we will send your login information to this address

Registered User

Username:

Password:

Forgot Your Password?

No tutorials have been submitted yet. Want to be the first?

Answer this Question

New User

Email:

Upon submission of this form, you will automatically be registered as a Quomon user and we will send your login information to this address

Registered User

Username:

Password:

Forgot Your Password?

Ask a Question

Have a new question? Ask!

You have 100 characters to use



Top Experts

View More

Rank

Expert

Points

1.

nidhi

10354

2.

oracleofDelphi

6493

3.

rcastagna

5596

4.

LAGM

4848

5.

PeterNZ

3487

6.

gonzalo

2840

7.

Mason

2770

8.

jgivoni

2303

9.

xarcus

1820

10.

Anpanman

917

Become an Expert

Register today to share your knowledge with the community and be recognized and rewarded for your contributions.


Register Here




"Psst, Quomon is a great site. Pass it on."     Tell a Friend  |   Link To Us  |   Save to Delicious  |   Digg! Digg it



Language Options

English:

www.quomon.com

Español:

www.quomon.es