Wednesday, March 18, 2009

Django + SQLite + Collations

There is rather well-known "SQLite case-insensitive problem", which is that SQLite can do case-insensitive SELECT only for ASCII character set. This problem was discussed at Django users list, posted as Djungo feature request and documented in the documentation:

SQLite doesn't support case-insensitive matching for non-ASCII strings. Some
possible workarounds for this are `documented at sqlite.org`_, but they are
not utilised by the default SQLite backend in Django. Therefore, if you are
using the ``iexact`` lookup type in your queryset filters, be aware that it
will not work as expected for non-ASCII strings.


I also met this problem and I had to find a solution since I need to deploy one small application (ebooks catalogue) at the department machine where I didn't have root account and was unable to install another database engine.

The solution is not really hard.

Support of collations for SQLite is provided by loadable extension through ICU library, which is installed on almost every Linux box. This is documented here.

So first step is to get ICU extension for SQLite here and compile it with

gcc -shared icu.c icu-config --ldflags -o icu.so

You need to have development files of libicu installed, for Debian/Ubuntu do

sudo apt-get install libicu-dev


Now it is possible to test the extension (and also check if sqlite3 in your system is built with extension support):

nazarov@heps1:~/lib> sqlite3
SQLite version 3.6.11
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .load ./icu.so
sqlite> SELECT icu_load_collation('ru_RU', 'RUSSIAN');

sqlite> SELECT * FROM core_author WHERE name LIKE 'вайн%';
1420|Вайнер А.Л.
4557|Вайнберг М.М.
4558|Вайнберг М.М.
6374|Вайнберг Б.
7326|Вайнштейн Б.К.
7327|Вайнштейн Л.А.
7458|Вайнштейн С.И.
8636|Вайнберг С.
8958|Вайнштейн С.И.

If your SQLite is built without extension support, get appropriate version at SQLite homepage.

Then we need extension support in pysqlite. Since version 2.5.2 pysqlite can load SQLite extensions, but this feature is disabled by default. So download pysqlite, extract, remove the line

define=SQLITE_OMIT_LOAD_EXTENSION

from the file setup.cfg and build it:

wget http://oss.itsystementwicklung.de/download/pysqlite/2.5/2.5.5/pysqlite-2.5.5.tar.gz
tar xzf pysqlite-2.5.5.tar.gz
cd pysqlite-2.5.5
vi setup.cfg
python setup.py build


Now you can check, that pysqlite can load icu and make case insensitive selects:

export PYTHONPATH=`pwd`:$PYTHONPATH;export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH;python
Python 2.5 (r25:51908, Nov 27 2006, 19:14:46)
[GCC 4.1.2 20061115 (prerelease) (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pysqlite2 import dbapi2 as sqlite3
>>> con = sqlite3.connect("lib.db")
>>> con.enable_load_extension(True)
>>> con.load_extension("./icu.so")
>>> con.execute("SELECT icu_load_collation('ru_RU', 'RUSSIAN');")
<pysqlite2.dbapi2.Cursor object at 0xb7cab660>
>>> con.execute("SELECT * FROM core_author WHERE name LIKE 'вайн%';")
<pysqlite2.dbapi2.Cursor object at 0xb7cab820>


Also I had to build static version of pysqlite which included SQLite, because SQLite on target system didn't support loadable extenstion. I did it with

python setup.py build_static


So now we need to tell Django to use collations. Correct way would be to write custom db backend, but it is far simplier just to add 3 following lines to the file django/db/backends/sqlite3/base.py:

*** 143,148 ****
--- 143,153 ----
}
kwargs.update(self.options)
self.connection = Database.connect(**kwargs)
+ # Add Russian collation
+ self.connection.enable_load_extension(True)
+ self.connection.load_extension("/home/nazarov/lib/icu.so")
+ self.connection.execute("SELECT icu_load_collation('ru_RU', 'RUSSIAN');")
+
# Register extract, date_trunc, and regexp functions.
self.connection.create_function("django_extract", 2, _sqlite_extract)
self.connection.create_function("django_date_trunc", 2, _sqlite_date_trunc)


Now Django applications can do case-insensitive SELECTs (iexact, icontains etc) for Russian language.

On my target machine I had to start Django development server with

export PYTHONPATH=`pwd`/Django-1.0.2-final/:`pwd`:$PYTHONPATH
export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH;
python library/manage.py runserver 10.0.0.15:8000

since I could not do system-wide install.