substr function for unicode cryllic content

PHP substr() function can't split text that includes unicode cryllic letters.
But here's a substring solution for UTF-8 Russian, Mongolian or other similar contents.

Why this function is necessary?

because unicode cryllic letters consist of two ASCII code. So if you try to get length of "
бөы" word it replies 6 instead of 3. Because 3 cryllic letters equal to 3x2 = 6 characters.
If you try to get its first letter using substr function such as substr("
бөы",0,1) it returns chr(208) or displays a unreadable character. So you need to do substr("бөы",0,2) to get first letter of the word.

But some contents can include numbers and latin letters. So you need to check it. This is solved by the function. So you need separate function same as substr() or to get length of cryllic unicode words.

code:
function mbm_substr( $txt='',$limit = 20){
$buf = '';
$k=0; //used to define letter position

for($i=0;$i207 && ord($txt{$i})<212){>=$limit){ //stops if it reaches limit
return $buf;
}
}
return $buf;
}


Usage:
echo mbm_substr('бөыангрхар',5);

Result:
бөыан

0 comments: