Tuesday, July 1, 2008

Installing Tsearch2 on Postgres 8.2

Before we start I assume you know how to create new PostgresSQL databases and have access to the root and postgres system accounts.

Among other new shiny features, PostgreSQL 8.3 has Tsearch2 as a contrib package; that is, full text search is available in 8.3 out of the box. That’s really awesome news because enabling it manually on 8.2 databases can be tricky. If you have any chance to upgrade to 8.3, do it. This also makes sense because the latest version runs much, much faster.

Tsearch2 is a part of the postgresql-contrib package. Assuming your operating system is Ubuntu, simply run sudo apt-get install postgresql-contrib and that’s it.

We need a database to start with. It’s really a good idea to try it first on a separate test database. If something goes wrong, you always can drop it and create a new test database without losing any valuable data.

So, let’s create our test database:

$ createdb -E utf8 test

I always use utf8 as the database encoding unless I have a really good reason not to do that. Using utf8 will save you lots of time and probably some gray hair. Trust me, utf8 is your friend; I’ve learned that the hard way.

Now, if you are a happy 8.3 user then all is ready to go. If for some reason the upgrade is not possible, you will have to configure your databases manually to enable Tsearch2.

This is how. First, enable plpgsql, one of the stored procedure languages:

$ sudo -u postgres psql test -c "create language plpgsql"

Now, create service Tsearch2 tables and all necessary supporting functions, data types, etc.

$ sudo -u postgres psql test < /usr/share/postgresql/8.2/contrib/Tsearch2.sql

You have to do both things as a postgres user because otherwise you will get a bunch of permission errors. Unfortunately, all the new objects will be owned by the postrgres user, which makes them essentially unusable by anyone else. To solve this problem, use Eugene Morozov’s grantall.sh script.

#!/bin/sh
# Usage: grantall database role
OBJECTS=$(psql -t -c '\dt' $1 | cut -f 4 -d ' ')
OBJECTS="$OBJECTS $(psql -t -c '\ds' $1 | cut -f 4 -d ' ')"
OBJECTS=$(echo $OBJECTS | sed 's/ /, /g')
SQL="GRANT ALL ON $OBJECTS TO $2"
psql -c "$SQL" $1
psql -c "GRANT ALL ON SCHEMA public TO $2" $1

This script is so useful, so I’d suggest saving it as ~/bin/grantall.sh. Now make it executable:

$ chmod a+x ~/bin/grantall.sh

Then run it.

$ sudo -u postgres ~/bin/grantall.sh test user

User is your database user name, which almost certainly matches your system user name.

Doing this every time you want to enable full text search in a database can be annoying. There is a neat trick that can help. When PostgeSQL creates a new database it actually makes a copy of a special template database, template1 by default. If you install Tsearch2 there, every time you create a new database, it will automatically copy Tsearch2 objects from there.

Unfortunately, the copied objects will retain their original permissions, which means even though you installed Tsearch2 into the template1 database, you still have to run the grantall.sh script.

Now it’s time to start a PostgreSQL command line. Let’s say ‘test’ is the database’s name.

$ psql test
Welcome to psql 8.2.7, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

test=>

Let’s check that it is working properly.

test=> select to_tsvector('simple', 'full text search') @@ to_tsquery('simple', 'text');
?column?
----------
t
(1 row)

We’re there, Hurray!

0 comments:

Post a Comment