Monday, March 31, 2008

libdbi..TEXT Datatype..MySQL VS PostgreSQL

I am currently working with porting a very much sophisticated & advanced NLP software/system written in C++ which is actually targeted to run using a MySQL Database.

My current or short term target is to port it to PostgreSQL.

After created necessary & equivalent tables in PostgreSQL database, I wrote a script to transfer data from MySQL database to PostgreSQL database.
After some fixing it worked fine.

Then I ran the software using PostgreSQL database. Alas...data loading failed :(

I found that if I just changed the database driver from "pgsql" to "mysql" it worked fine.
The problem occurred if database driver is changed from "mysql" to "pgsql"

I thoroughly investigated the table definitions and data in various tables and found that the problem was not originated from the database end.

So, why I faced the problem?? Why???? :(

After loosing some valuable hairs:(, I found this:

In case of MySQL database, libdbi-driver for MySQL wants to fetch TEXT as binary and thats why in the original source code of the software, dbi_result_get_binary was used to fetch TEXT type data.

But, in case of PostgreSQL database, this is different. i.e. libdbi-driver for PostgreSQL wants to fetch TEXT as string not binary.
So, the fix is to replace dbi_result_get_binary with dbi_result_get_string

Ahh...The world is beautiful again :D


Saturday, March 29, 2008

Adding custom DISTINCT ON behavior

In a typical query we sometimes use Distinct keyword to eliminate duplicate rows.

SELECT DISTINCT column1, column2, column3.... FROM table1 WHERE ....

This elimination of duplicate rows occurs after the result table has been generated & the selected columns has been processed (if needed)

Two rows are considered distinct if they differ in at least one column value.
Null values are considered equal in this case.

But this default & obvious consideration can be customized. How? Ok..let me explain with an example.

Say we have a table named Student

Name Bengali English Math
Alex 70 85 96
Barnabus 96 70 85
Alfred 96 70 84

Say, you want to consider those students as distinct who has different total marks of Bengali, English & Math.
So the first two students are not distinct, the first & third are distinct and the 2nd & third are distinct.

SELECT DISTINCT ON (Bengali+English+Math) Name
FROM Student;

Now think yourself about a case where you can apply this custom behavior of Distinct and if you find any then don't hesitate to let me know ;P

SQL..Problems with Distinct ON

The DISTINCT ON clause is not part of the SQL standard &
its result is not always determinate.
So although it is often the most obvious alternative & convenient also, think twice before using it.
Ask yourself:
1. Do I want sql written by me to be portable to sometoher DBMS?
2. Do I want to avoid indeterminate nature of result?

If any of the above question is NO, then try to use GROUP BY and subqueries in FROM to implement the same thing and avoid Distinct ON.

SQL...LIMIT OFFSET without ORDER BY

When using LIMIT, it is important to use an ORDER BY clause that constrains the result rows into a unique order. Otherwise you will get an unpredictable subset of the query's rows.

Don't forget to use an ORDER BY clause while using LIMIT. Why?

It's because it will prevent you to get an unpredictable subset of the whole result set.
Say, you have 30 rows whole result set (i.e. the result set without LIMIT and OFFSET) First you provide LIMIT 2 OFFSET 15. Next you provide LIMIT 5 OFFSET 15. Now you cant say it with full confidence that there will be common rows in these two cases. Your natural intuition says it should be. But it may not be.

Why this happend?

It is because, the query optimizer takes LIMIT into account when making a query plan, so you may get different plans (yielding different row orders) depending on your provided LIMIT and OFFSET.

Monday, March 10, 2008

Get list of databases and tables in a database..Postgresql

To get list of databases in MySQL you can simply run :
SHOW DATABASES
SO what is the equivalent sql statement in Postgresql:
select datname from pg_database;

To get list of tables in a postgres database:

SELECT * FROM pg_tables;

Offcourse you can get list of tables using \dt from the command
prompt for postgres.I am just telling about doing it using sql
query which you can also use outside of command prompt for
postgres.

very simple indeed.. Isn't it;P