<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>~ overflow ~ &#187; charset</title>
	<atom:link href="http://www.overflow.biz/blog/lang/en-us/tag/charset/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.overflow.biz/blog</link>
	<description>Coding and Internet Randomness</description>
	<lastBuildDate>Sun, 08 Jan 2012 23:34:17 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en-us</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How to detect if a string is utf8 on php?</title>
		<link>http://www.overflow.biz/blog/lang/en-us/2010/04/24/how-to-detect-if-a-string-is-utf8-on-php?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=how-to-detect-if-a-string-is-utf8-on-php</link>
		<comments>http://www.overflow.biz/blog/lang/en-us/2010/04/24/how-to-detect-if-a-string-is-utf8-on-php#comments</comments>
		<pubDate>Sat, 24 Apr 2010 17:32:35 +0000</pubDate>
		<dc:creator>z3n</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Tips & Hints]]></category>
		<category><![CDATA[charset]]></category>
		<category><![CDATA[detecting charset encoding]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[is_utf8]]></category>
		<category><![CDATA[utf8]]></category>

		<guid isPermaLink="false">http://www.overflow.biz/blog/?p=399</guid>
		<description><![CDATA[Problem:
During the debug of utf8 strings i came across a string that could or not be a utf8 strings, thanks to IE. There&#8217;s no such function as is_utf8 or a specific function to detect if a string is actually utf8.
Solution:

define('_is_utf8_split',5000);

function is_utf8($string) { // v1.01
	if (strlen($string) &#62; _is_utf8_split) {
		// Based on: http://mobile-website.mobi/php-utf8-vs-iso-8859-1-59
		for ($i=0,$s=_is_utf8_split,$j=ceil(strlen($string)/_is_utf8_split);$i &#60; $j;$i++,$s+=_is_utf8_split) {
			if [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Problem:</strong><br />
During the debug of utf8 strings i came across a string that could or not be a utf8 strings, thanks to IE. There&#8217;s no such function as is_utf8 or a specific function to detect if a string is actually utf8.</p>
<p><strong>Solution:</strong></p>
<pre class="brush: php;">
define('_is_utf8_split',5000);

function is_utf8($string) { // v1.01
	if (strlen($string) &#62; _is_utf8_split) {
		// Based on: http://mobile-website.mobi/php-utf8-vs-iso-8859-1-59
		for ($i=0,$s=_is_utf8_split,$j=ceil(strlen($string)/_is_utf8_split);$i &#60; $j;$i++,$s+=_is_utf8_split) {
			if (is_utf8(substr($string,$s,_is_utf8_split)))
				return true;
		}
		return false;
	} else {
		// From http://w3.org/International/questions/qa-forms-utf-8.html
		return preg_match('%^(?:
				[\x09\x0A\x0D\x20-\x7E]            # ASCII
			&#124; [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
			&#124;  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
			&#124; [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
			&#124;  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
			&#124;  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
			&#124; [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
			&#124;  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
		)*$%xs', $string);
	}
}  </pre>
<p><strong>Notes:</strong></p>
<p>According to some posts on php and <a href="http://mobile-website.mobi/php-utf8-vs-iso-8859-1-59" target="_blank">this</a> specific posting, there&#8217;s a bug that happens on strings bigger than 5000 chars, this function will split those strings and test their parts.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.overflow.biz/blog/lang/en-us/2010/04/24/how-to-detect-if-a-string-is-utf8-on-php/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>MySQL importing .sql with accents causing issues</title>
		<link>http://www.overflow.biz/blog/lang/en-us/2009/12/30/mysql-importing-sql-with-accents-causing-issues?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=mysql-importing-sql-with-accents-causing-issues</link>
		<comments>http://www.overflow.biz/blog/lang/en-us/2009/12/30/mysql-importing-sql-with-accents-causing-issues#comments</comments>
		<pubDate>Wed, 30 Dec 2009 21:33:15 +0000</pubDate>
		<dc:creator>z3n</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[charset]]></category>
		<category><![CDATA[impot]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[mysqld]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[utf8]]></category>

		<guid isPermaLink="false">http://www.overflow.biz/blog/?p=295</guid>
		<description><![CDATA[Problem:
When importing a .sql with entries with accents, like not regular english, it may lead to issues, like:
&#8216;SÃ£o Paulo&#8216; instead of &#8216;São Paulo&#8216;
Solution:
Even mysqld default charset being latin1, sometimes it don&#8217;t work with accents, depending on the imports you&#8217;re doing.
So you may need to force it to fallback to utf8, on my case i just [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Problem:</strong></p>
<p>When importing a .sql with entries with accents, like not regular english, it may lead to issues, like:</p>
<p>&#8216;<em>SÃ£o Paulo</em>&#8216; instead of &#8216;<em>São Paulo</em>&#8216;</p>
<p><strong>Solution:</strong></p>
<p>Even mysqld default charset being latin1, sometimes it don&#8217;t work with accents, depending on the imports you&#8217;re doing.</p>
<p>So you may need to force it to fallback to utf8, on my case i just added this to the beggining of the .sql file i was importing:</p>
<p><strong>charset utf8 \c</strong></p>
<p>and it worked just fine.</p>
<p>Note: If you are using asian chars (japanese/chinese specific), then utf8 might not be enough to cover all chars.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.overflow.biz/blog/lang/en-us/2009/12/30/mysql-importing-sql-with-accents-causing-issues/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Keep it simple, stupid jQuery experience</title>
		<link>http://www.overflow.biz/blog/lang/en-us/2009/08/14/keep-it-simple-stupid-jquery-experience?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=keep-it-simple-stupid-jquery-experience</link>
		<comments>http://www.overflow.biz/blog/lang/en-us/2009/08/14/keep-it-simple-stupid-jquery-experience#comments</comments>
		<pubDate>Fri, 14 Aug 2009 03:07:15 +0000</pubDate>
		<dc:creator>z3n</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Notes]]></category>
		<category><![CDATA[charset]]></category>
		<category><![CDATA[dom]]></category>
		<category><![CDATA[issues]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[jQuery]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://www.overflow.biz/blog/?p=185&amp;lang=en-us</guid>
		<description><![CDATA[As much as I like jQuery I must admit that it’s far away from simplifying things. Although it might be a great idea using it on 100% jQuery scripts, it’s a real bad idea using it to refurbish an old script.
Today I’ve spent over 2 hours implement jQuery on a old script I have, and [...]]]></description>
			<content:encoded><![CDATA[<p><span>As much as I like jQuery I must admit that it’s far away from simplifying things. Although it might be a great idea using it on 100% jQuery scripts, it’s a real bad idea using it to refurbish an old script.</span></p>
<p><span>Today I’ve spent over 2 hours implement jQuery on a old script I have, and I felt into so many issues that it didn’t worth at all.</span></p>
<p><span>My script was simple, I had a huge variable list that could be edited by a form, script loops through the variables building a form with input fields for each variable. I will not get into specific details because it’s boring, but I needed to allow the user to add a new variable inside an array, so i thought that jQuery would help a lot since i only would need to dynamic add a new input field as needed then post everything back to script to save the file.</span></p>
<p><span>First I spent an hour figuring out that jQuery was ruining the text by converting the whole thing into UTF-8, loosing all the accents, eventually I found out about contentType encoding ajax variable:</span></p>
<pre class="prettyprint"><code><span><strong><span class="pln">contentType</span><span class="pun">:</span><span class="str">"application/json; charset=utf-8"</span></strong></span></code></pre>
<pre class="prettyprint"><code><span class="str"><span>which could be changed to the charset i wanted.</span></span></code></pre>
<p><span>It was useless, jQuery still posting into the wrong charset, there’s some other tweks on this, but they are also useless.</span></p>
<p><span>I was able to fix the accent issue with this php statement:</span></p>
<p><span><strong>mb_convert_encoding(urldecode($variable),&#8221;ISO-8859-1&#8243;,&#8221;auto&#8221;);</strong></span></p>
<p><span>This is much more obscure though, but I was familiar with it since i coded in japanese charsets which are a pain to convert.</span></p>
<p><span>After having this cleared, and searching a lot of useless blogs and postings, turns out that jQuery was using the hard coded form names to post the data, which could be overlapped by an dynamic added field, I did a script to change the name of the hard coded inputs, something like this:</span></p>
<p><span><strong>$(&#8220;#field_id&#8221;).attr(’name’,’new_name’);</strong></span></p>
<p><span>Theorically, it worked, but when I did:</span></p>
<p><span><strong>$(&#8220;#form&#8221;).serialize();</strong></span></p>
<p><span>jQuery used the dynamic fields with the ordinary hard coded ignoring the attr changes.</span></p>
<p><span>Now I had to add a handler to dynamic convert and read all the inputs and do my own serialize in order to TRY to make it work&#8230;and that’s because i didn’t tested it on IE yet.</span></p>
<p><span>So that’s when I quit using jQuery for this script and do something plain and simple, which took me about 20 minutes and 0 searches.</span></p>
<p><span>It looks like that if I had used DOM elements for the whole form, all elements generated by jQuery itself, not hard coded, i would have less trouble with the form, although, the charset issues still.</span></p>
<p><span><strong>Super Fun Sources:</strong></span></p>
<p><span><a href="http://stahttp://stackoverflow.com/questions/26620/how-to-set-encoding-in-getjson-jqueryblank">Stack overflow posting</a></span></p>
<p><span><a href="http://stahttp://stackoverflow.com/questions/965150/jquery-ajax-post-and-encodingblank">Stack overflow posting 2</a></span></p>
<p><span><a href="http://dochttp://docs.jquery.com/Ajax/jQuery.ajax#optionsblank">jQuery Ajax Documentation</a> (completly useless since contentType explanation has 2 lines)</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.overflow.biz/blog/lang/en-us/2009/08/14/keep-it-simple-stupid-jquery-experience/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

