PySqlite2: Unicode Bug while Processing Non-unicode Text

Published March 5th, 2008, updated September 11th, 2008.

Pysqlite-2.3.2 accepts binary data on inserts but selects return unicode strings. This results in unicode conversion bugs when non-unicode bytes are stored in the database.

As sqlite3 accepts binary data in text fields, this seems to be a bug in pysqlite. To fix it, one could i) either restrict inserts to unicode strings or ii) change the result from unicode to binary.

However, the first would break compatibility with sqlite and that latter would break compatibility with existing code. Thus, this should be discussed with the authors.

import sqlite3

connection = sqlite3.connect(':memory:')
cursor = connection.cursor()
cursor.execute('''CREATE TABLE test (t TEXT)''')
cursor.execute('''INSERT INTO test (t) VALUES (?)''', (chr(128),))
cursor.execute('''SELECT t FROM test''')
# Traceback (most recent call last):
# File "pysqlite_utf8.py", line 10, in 
#     cursor.execute('''SELECT t FROM test''')
#     sqlite3.OperationalError: Could not decode to UTF-8 column 't' with text '?'

print cursor.fetchone()
connection.close()

pysqlite ticket
Debian ticket
download source code