We recently upgraded the MySql installation in our emanilapoetry website which is using WordPress 2.8+ install. The upgrade somehow caused a mismatch in the character sets between WordPress and the database tables, the former using a default character set “utf8″ and the latter using “latin1.”
When our technical people tried using WordPress’ default character set in the database script, other more serious problems came out. It appears that there is a known issue with the current version of WordPress which we are using with the newly installed database script. See this page > http://codex.wordpress.org/Converting_Database_Character_Sets
Tests and comparisons
To confirm our assessment that the problem was indeed resulting from the mismatch, we did a quick test changing our website’s WordPress configuration to match the database script. The “funny looking characters” displayed in some posts were immediately eliminated. Upon further investigation, we noted that the “funny looking characters” are actually errors in converting “unrecognized characters” which include incorrect use of spaces between words and between an apostrophe or double-quote and single-quote and nearby character.
Whilst we were able to eliminate those “funny looking characters”, a new set of “funny looking characters” are now being displayed on posts made after the upgrade date.
Here are screenshots of a sample entry, Bago Na, which was posted after the upgrade.
Screenshot 1 shows the post with mismatched character sets. Screenshot 2 shows the post with matched character sets.
Comparison between the two posts disclosed that the new set of “funny looking characters” was the result of errors in spacing.
Here are screenshots of another sample entry, Be One Nation, also posted after the upgrade.
Screenshot 3 shows the post with mismatched character sets. Screenshot 4 shows the post with matched character sets.
As you will note, Screenshots 3 and 4 do not display any differences at all. Unlike Bago Na, Be one nation does not display this new set of “funny looking characters.” Upon closer examination, we noted that Be one nation has correctly spaced out words and sentences and has complied with rigid typesetting rules.
Conclusion
Based on these tests, there is no argument that the character sets of the database and WordPress should be matched. But that is only half of the story. The other half is that in order for posts to display properly, there should be no extra and unnecessary spaces between words or characters, the two character sets should yield the same results. “Unrecognised” characters should also be avoided so that no losses and errors in conversion would result during database upgrades and changeover.
Whilst the “latin1 char set” may be rigid and may restrict writers’ “creativity”, we believe this is something that we have to live with. But that should not be a problem at all. After all, aside from the message and form, the poet should also be concerned with correct syntax, spelling, and typesetting anyway.
In summary, the options we considered are:
1. Amend the WordPress “utf8″ char set to align with the “latin1″ char set used by the database tables to eliminate the “funny looking characters” in the posts entered prior to the upgrade.
2. Do not amend the WordPress “utf8″ char set which is more “generous” with errors, and leave the “funny looking characters” for old posts as they are.
We will be implementing Option 1 shortly.
This option requires that emanilapoetry members should endeavour to observe the correct rules of syntax and typesetting in their posts if they want their posts to display properly.
Note: As of this writing, there is still a mismatch between the database and WP character sets. Thus, those funny looking characters may still be displaying in some pre-upgrade posts not compliant with typesetting rules. This may change once we implement the character set matching. In the process, new posts not compliant with typesetting rules may not display properly.
Other suggested articles
[Not necessarily related to above post; Automatically generated]

