~ overflow ~

Tag: charset

How to detect if a string is utf8 on php?

by z3n on Apr.24, 2010, under Coding, Tips & Hints

Problem:
During the debug of utf8 strings i came across a string that could or not be a utf8 strings, thanks to IE. There’s no such function as is_utf8 or a specific function to detect if a string is actually utf8.

Solution:

define('_is_utf8_split',5000);

function is_utf8($string) { // v1.01
	if (strlen($string) > _is_utf8_split) {
		// Based on: http://mobile-website.mobi/php-utf8-vs-iso-8859-1-59
		for ($i=0,$s=_is_utf8_split,$j=ceil(strlen($string)/_is_utf8_split);$i < $j;$i++,$s+=_is_utf8_split) {
			if (is_utf8(substr($string,$s,_is_utf8_split)))
				return true;
		}
		return false;
	} else {
		// From http://w3.org/International/questions/qa-forms-utf-8.html
		return preg_match('%^(?:
				[\x09\x0A\x0D\x20-\x7E]            # ASCII
			| [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
			|  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
			| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
			|  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
			|  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
			| [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
			|  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
		)*$%xs', $string);
	}
}  

Notes:

According to some posts on php and this specific posting, there’s a bug that happens on strings bigger than 5000 chars, this function will split those strings and test their parts.

Leave a Comment :, , , , more...

MySQL importing .sql with accents causing issues

by z3n on Dec.30, 2009, under Uncategorized

Problem:

When importing a .sql with entries with accents, like not regular english, it may lead to issues, like:

São Paulo‘ instead of ‘São Paulo

Solution:

Even mysqld default charset being latin1, sometimes it don’t work with accents, depending on the imports you’re doing.

So you may need to force it to fallback to utf8, on my case i just added this to the beggining of the .sql file i was importing:

charset utf8 \c

and it worked just fine.

Note: If you are using asian chars (japanese/chinese specific), then utf8 might not be enough to cover all chars.

Leave a Comment :, , , , , more...

Keep it simple, stupid jQuery experience

by z3n on Aug.14, 2009, under Coding, Notes

As much as I like jQuery I must admit that it’s far away from simplifying things. Although it might be a great idea using it on 100% jQuery scripts, it’s a real bad idea using it to refurbish an old script.

Today I’ve spent over 2 hours implement jQuery on a old script I have, and I felt into so many issues that it didn’t worth at all.

My script was simple, I had a huge variable list that could be edited by a form, script loops through the variables building a form with input fields for each variable. I will not get into specific details because it’s boring, but I needed to allow the user to add a new variable inside an array, so i thought that jQuery would help a lot since i only would need to dynamic add a new input field as needed then post everything back to script to save the file.

First I spent an hour figuring out that jQuery was ruining the text by converting the whole thing into UTF-8, loosing all the accents, eventually I found out about contentType encoding ajax variable:

contentType:"application/json; charset=utf-8"
which could be changed to the charset i wanted.

It was useless, jQuery still posting into the wrong charset, there’s some other tweks on this, but they are also useless.

I was able to fix the accent issue with this php statement:

mb_convert_encoding(urldecode($variable),”ISO-8859-1″,”auto”);

This is much more obscure though, but I was familiar with it since i coded in japanese charsets which are a pain to convert.

After having this cleared, and searching a lot of useless blogs and postings, turns out that jQuery was using the hard coded form names to post the data, which could be overlapped by an dynamic added field, I did a script to change the name of the hard coded inputs, something like this:

$(“#field_id”).attr(’name’,’new_name’);

Theorically, it worked, but when I did:

$(“#form”).serialize();

jQuery used the dynamic fields with the ordinary hard coded ignoring the attr changes.

Now I had to add a handler to dynamic convert and read all the inputs and do my own serialize in order to TRY to make it work…and that’s because i didn’t tested it on IE yet.

So that’s when I quit using jQuery for this script and do something plain and simple, which took me about 20 minutes and 0 searches.

It looks like that if I had used DOM elements for the whole form, all elements generated by jQuery itself, not hard coded, i would have less trouble with the form, although, the charset issues still.

Super Fun Sources:

Stack overflow posting

Stack overflow posting 2

jQuery Ajax Documentation (completly useless since contentType explanation has 2 lines)

Leave a Comment :, , , , , more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!