Mysql Does Not Load Accents Despite Utf8
April 3, 2014
Have you ever worked with a database that had data with accents? You’ve exhausted every option from the file encoding to the database collation to anything else yet you still get that pesky black diamond with the blasted question mark. What if I told you there was one more way?
That way is the default character_set_server
value. It’s most likely not utf8mb4
and most likely latin1
.
Run this query in the database:
mysql> show variables like "%character%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
Notice something fishy? Yep. It’s that latin1
. If the data coming from the database is utf8
, how will mysqld
handle it? latin
style but since it’s not the same – error. Let’s change that to utf8mb4
.
Wait a minute, utf8mb4
?
Since MySQL 5.5.3 you should use utf8mb4
rather than utf8
. While they both refer to the UTF-8 encoding, the older utf8
had a MySQL-specific limitation preventing use of characters numbered above 0xFFFD whereas utf8mb4
was the refactored version.
##Fixing it
1. Set default character set
Run this SQL query to set utf8mb4
as your default database character set:
ALTER DATABASE db_name CHARACTER SET utf8mb4;
ALTER DATABASE db_name COLLATE 'utf8mb4_general_ci';
If that didn’t work – keep reading.
On a Windows machine?
When you run these queries on a database hosted in a Windows machine, you may run into trouble:
ALTER DATABASE db_name CHARACTER SET utf8mb4;
ALTER DATABASE db_name COLLATE 'utf8mb4_general_ci';
This is because of the way Windows locks their files. Nonetheless, we have one more way here:
- Dump that database (structure and content, no
CREATE
info) - Drop that database
- Create a database with the
utf8mb4
character set and collation - Import the dump into it
Tiresome, I know. Did that work? No? Let’s move on.
2. Alter the my.cnf
config file
You can append this to the configuration file that mysql
uses. It should be in the /etc/mysql
directory or somewhere there:
[mysqld]
init_connect='SET collation_connection = utf8mb4_general_ci'
init_connect='SET NAMES utf8mb4'
character-set-server=utf8mb4
collation-server=utf8mb4_general_ci
skip-character-set-client-handshake
The most important line is skip-character-set-client-handshake
. If the client requests a connection in utf8
, it would still use utf8mb4
. This is important because a lot of people still use utf8
. If you want that fire emoji in your database, you need to have utf8mb4
.
This configuration tells mysqld
what queries to run and what values are to be set. Simple stuff.
Restart MySQL:
sudo /etc/init.d/mysql restart
and there we have it:
mysql> show variables like "%character%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.03 sec)
Your database should should showing that fire emoji and those accented characters!
Diagnostics
To see the individual character set of a database, run this:
mysql> show create database db_name;
+----------+---------------------------------------------------------------+
| Database | Create Database |
+----------+---------------------------------------------------------------+
| fbdb | CREATE DATABASE `fbdb` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+---------------------------------------------------------------+
1 row in set (0.00 sec)
This database would have been created with the default character set of utf8 due to your settings.
You can also see which character sets are available to you based on the version of MySQL you carry:
mysql> show collation where charset='utf8mb4';
+------------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+------------------------+---------+-----+---------+----------+---------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 |
| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 |
+------------------------+---------+-----+---------+----------+---------+
26 rows in set (0.00 sec)