Tag Archives: postgresql

Look at materialized views in PostgreSQL 9.3

I needed to use materialized views for the first time in order to handle performance problems that our team encountered when we were developing a reporting tool for one of our clients.

The database that was a MySQL5.1 already contained several views, this caused us a lot of performance bottlenecks especially as the data continued to grow. It took me some time to create and maintain a materialized views to replace normal views so we can get good performance, because MySQL don’t have out-of-box materialized view feature.

Once I started to learn PostgreSQL before couple of weeks, the most interesting thing that attracted me is support of creating and maintaining materialized views.

Database Views

Most people are more familiar with a concept of database views, which creates a virtual table from a result of select query, the view is used usually for:

  • security purposes: in case you want to give certain user a permission on a subset of table data rather than a whole table.
  • Hide the complexity of a query: by encapsulating all the complex query parts in the view definition.

But the views can be a major performance bottleneck in databases such as MySQL – especially when built upon other views, so the concept of materialized views came to light in order to hold that advantages of views with eliminating the performance troubles it may cause.

What are Materialized views and its benefits?

The definition of materialized view (Abbreviated as MV) in wikipedia is

A materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary based on aggregations of a table’s data

so in short, materialized views are objects that stored in the databases, (like an actual table) that holds a precomputed result of a select query as its data. Being an object stored actually on disk will allow queries based on it to take benefit from caching resultsĀ  without re-executing the complex underlying query each time you use the materialized view, and more important implement indices on it.

The concept in theory is simple and a MV can be build initially (as a simple table) easily usingĀ  simple query like that:

create table test_mv 
    select * 
    from test 
    where 
    col1 = 1
    //.... any more complicated conditions
    group by col_name

but the most important issue is how to get the updated data into the materialized view (refresh the MV). The materialized views can be refreshed either by complete refresh (which is executing the underlying query again) or incrementally (via executing the query of subset of records that has been changed recently)

Unfortunately MySQL don’t have built in functionality to that materialized view till now, although there are some interesting open source projects like flexviews.

Example database structure

In order to try out the materialized views in PostgreSQL, I have migrated a subset of testing data that I have to Postgres, and here is the structure:

I have a users table that have the following structure:

mt4data=> \d+ users;
                                                    Table "public.users"
       Column       |            Type             |              Modifiers               | Storage  | Stats target | Description
--------------------+-----------------------------+--------------------------------------+----------+--------------+-------------
 login              | numeric                     | not null                             | main     |              |
 GROUP              | character(16)               | not null                             | extended |              |
 zipcode            | character(16)               | not null                             | extended |              |
 address            | character(128)              | not null                             | extended |              |
 phone              | character(32)               | not null                             | extended |              |
 email              | character(48)               | not null                             | extended |              |
 comment            | character(64)               | not null                             | extended |              |
 id                 | character(32)               | not null                             | extended |              |
 agent_account      | numeric                     | not null                             | main     |              | 
 modify_time        | timestamp without time zone | not null                             | plain    |              |
-- Other columns

and here is the structure of trades table, which holds all trades done by users in trading system, the sample data that I tested on for this article contained about 700,000 rows:
mt4data=> \d+ trades;
                                              Table "public.trades"
      Column      |            Type             |          Modifiers          | Storage  | Stats target | Description
------------------+-----------------------------+-----------------------------+----------+--------------+-------------
 ticket           | numeric                     | not null                    | main     |              |
 login            | numeric                     | not null                    | main     |              |
 symbol           | character(16)               | not null                    | extended |              |
 cmd              | numeric                     | not null                    | main     |              |
 volume           | numeric                     | not null                    | main     |              |
 open_time        | timestamp without time zone | not null                    | plain    |              |
 open_price       | double precision            | not null                    | plain    |              |
 close_time       | timestamp without time zone | not null                    | plain    |              |
 profit           | double precision            | not null                    | plain    |              |
 comment          | character(32)               | not null                    | extended |              |
 modify_time      | timestamp without time zone | not null                    | plain    |              |

now, I want to generate some reports based about special type of trades, which are “deposits”, here is the query:

select trades.TICKET AS ticket,
    trades.LOGIN AS login,
    users."GROUP" AS group_name,
    trades.VOLUME AS volume,
    users.AGENT_ACCOUNT AS agent_account,
    users.ZIPCODE AS zipcode,
    users.STATUS AS status,
    trades.CLOSE_TIME AS close_time,
    trades.PROFIT AS amount,
    trades.COMMENT AS comment ,
    trades.MODIFY_TIME
    from
    trades, users
where
    (users.LOGIN = trades.LOGIN)
        and (trades.CMD = 6)
        and (trades.PROFIT > 0)
and (
 trades.comment like 'DPST%' or
 trades.comment like 'Bonus%' or
 trades.comment like 'Credit%' or
 trades.comment like 'Deposit%'
)

as you can see, the query used to retrieve this data contains many conditions, and can be troublesome to execute. Here is the complexity of query as show in the result of explain query:
 Nested Loop  (cost=4874.69..67486.46 rows=42 width=127)
   ->  Bitmap Heap Scan on trades  (cost=4874.41..67189.44 rows=42 width=72)
         Recheck Cond: (cmd = 6::numeric)
         Filter: ((profit > 0::double precision) AND ((comment ~~ 'DPST%'::text) OR (comment ~~ 'Bonus%'::text) OR (comment ~~ 'Credit%'::text) OR (comment ~~ 'Deposit
%'::text)))
         ->  Bitmap Index Scan on i_cmd  (cost=0.00..4874.40 rows=70186 width=0)
               Index Cond: (cmd = 6::numeric)
   ->  Index Scan using login_index on users  (cost=0.28..7.06 rows=1 width=60)
         Index Cond: (login = trades.login)
(8 rows)

The query needed about 25 seconds to execute on my data sample, so on the actual data it will take much more time.

In Postgres 9.3, I can create the materialized view by issuing:

create materialized view deposits as 
select trades.TICKET AS ticket,
    trades.LOGIN AS login,
    users."GROUP" AS group_name,
    trades.VOLUME AS volume,
    users.AGENT_ACCOUNT AS agent_account,
    users.ZIPCODE AS zipcode,
    users.STATUS AS status,
    trades.CLOSE_TIME AS close_time,
    trades.PROFIT AS amount,
    trades.COMMENT AS comment ,
    trades.MODIFY_TIME
    from
    trades, users
where
    (users.LOGIN = trades.LOGIN)
        and (trades.CMD = 6)
        and (trades.PROFIT > 0)
and (
 trades.comment like 'DPST%' or
 trades.comment like 'Bonus%' or
 trades.comment like 'Credit%' or
 trades.comment like 'Deposit%'
)

the initial build of my materialized view took almost 20 seconds on my sample data (which is almost similar to initial build on MySQL).

here is I added my required indices on the new MV:

mt4data=> create index login_deposit_index on deposits  using btree(login);
CREATE INDEX
mt4data=> create unique index ticket_deposit_index on deposits  using btree(ticket);
CREATE INDEX                              
mt4data=> create index close_time_index on deposits  using btree(close_time);
CREATE INDEX

the number of records in the deposits materialized view, is:
mt4data=> select count(*) from deposits;
 count
-------
   176
(1 row)

now I can issue some queries on the materialized view with low cost, like this query that get the number and total amount of deposits per group:
mt4data=> explain select group_name, count(*), sum(amount) from deposits group by group_name;
                           QUERY PLAN
-----------------------------------------------------------------
 HashAggregate  (cost=7.08..7.60 rows=52 width=25)
   ->  Seq Scan on deposits  (cost=0.00..5.76 rows=176 width=25)
(2 rows)

as explain command result states, the cost indicates fast execution of the query.

The most advantage of materialized views in PostgreSQL 9.3 is the refresh of the MV data. prior to this version, the refresh process was done by monitoring changes on the underlying tables using triggers (which can add performance overhead in environments with high write rate) or by writing custom script or stored procedure that checks modification based on timestamps, this can be troublesome and may be little bit complicated based on your tables and query structure.

After I modified the underlying trades table by inserting another chunk of data, I will refresh the materialized view by using “refresh materialized view” command as following:

mt4data=> refresh materialized view deposits;
REFRESH MATERIALIZED VIEW

now, the materialized view is updated, and I can see the new number of records:
mt4data=> select count(*) from deposits;
 count
-------
   731
(1 row)

the rebuild of the materialized view took about 3-5 seconds on my machine, which is fast (compared to a custom procedure I wrote in Mysql to perform similar functionality).

According to PostgreSQL documentation, the refresh that is performed on MV now is a complete refresh, meaning that the query used in the MV definition is re-executed and the new data is re-filled, but incremental refresh is expected in future releases (or as a patch) which should make refresh process much more efficient.

Finally,

As you need see in this post, PostgreSQL 9.3 offers simple and efficient way to handle materialized views, although more performance improvement can be attained once incremental refresh feature is implemented. I hope MySQL can offer some similar feature in the near future, since its very useful feature and needed frequently by database developers and administrators.

postgreSQL quick start with php

Last week, I accidentally stumbled upon blog stating features of new version of posgreSQL, and I found it pretty interesting, couple of useful features that does not exist in MySQL, are now implemented in PostgreSQL 9.3 (I am especially interested in Materialized views). so I wanted to learn more about this database.

Unfortunately, I haven’t used Postgres before ( although I have several years experience as MySQL developer and administrator), so I had to learn the basics about postgres, and I wanted to share this experience:

Installing PostgreSQL

In order to get the latest version on my Centos machine, I compiled Postgres from the source as following:

first, I got the desired version source files from Postgres site ( I used V9.3):

wget http://ftp.postgresql.org/pub/source/v9.3.2/postgresql-9.3.2.tar.bz2

then, uncompress the file:
tar xvjf postgresql-9.3.2.tar.bz2

then, compile the source files using this simple command inside the extracted folder:
./configure && make && make install

now postgres files should be placed at /usr/local/pgsql.
Postgres operates by default under user named postgres, so we should create the user, create data directory and assign folder ownership to created user:
adduser postgres 
mkdir /usr/local/pgsql/data 
chown postgres:postgres /usr/local/pgsql/data

Then we should initialize the data storage “database cluster” for the server by calling initdb, but first I switched to postgres user because you cannot run this command as a root:
[root@sub bin]# su - postgres
-bash-4.1$ /usr/local/pgsql/bin/initdb -D  /usr/local/pgsql/data/

database cluster is the collection of databases that postgres use. by creating database cluster the data directory will be filled with database files, and sample databases like Postgres and Template1 will be created.

now, I can start postgres server by typing:

/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >logfile 2>&1 &
-D parameter specify the data directory location which also contains the configuration file of postgres, which is named by default postgresql.conf (analogous to my.cnf in mysql).

Now Postgres server is running and we can begin working with sql commands.

PostgreSQL Client

now let us enter the postgres client program by executing psql program, which is the interactive terminal for postgres:

/usr/local/pgsql/bin/psql -hlocalhost -U postgres -w
Here I am using the database using its super user “postgres”.
I will issue \list command to see the databases installed:
postgres=# \list
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)
as shown in the snippet, there are three databases here:

  • Postgres database: which is the default database for postgres (same as mysql database in mysql)
  • template0 and template1: which are two template databases.
Template database is very useful feature in postgres, it enables administrator to create a database by copying all the content from another (template) database, by default any newly created database will be using template0 as a template.

I created a new database:

postgres=# create database test_pg;
CREATE DATABASE

If you want to create a database using template other than template0, you can use template keyword at the end of create command like this:
create database test_pg2 template template_database;

now if you run \list command, you will see the new database there

postgres=# \list
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 test_pg   | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres         +
           |          |          |             |             | postgres=CTc/postgres+
           |          |          |             |             | test_usr=CTc/postgres
(4 rows)

then, I created a new user and granted him a permission to use that database:

postgres=# create user test_usr with password 'test_pass';
CREATE ROLE
postgres=# grant all privileges on database test_pg to test_usr;
GRANT

Now, I will exit (by typing \q), then connect to the new database using the user I have created in the previous step:
-bash-4.1$ /usr/local/pgsql/bin/psql test_pg -Utest_usr -W
Password for user test_usr:
psql (9.3.2)
Type "help" for help.

test_pg=>

create a table I will use for testing:
test_pg=# create table test_tbl( id serial primary key, name varchar(255) );
CREATE TABLE

Serial keyword is similar to auto-increment attribute in other databases, and used to create unique identifier for the table records

Unlike MySQL, Postgres don’t have different types of storage engines (like MyISAM or InnoDB), it has a unified database with one storage engine

and I will insert sample data to the table

test_pg=# insert into test_tbl(name) values('test1'), ('test2'), ('test3');
INSERT 0 3

PHP Script

I will use PDO in order to test php connectivity to postgres, but first php-postgres package must be installed:

yum install php-pgsql.x86_64

then I wrote this simple script:
<?php
try{
   $dbh = new \PDO('pgsql:host=localhost;dbname=test_pg', 'test_usr', 'test_pass');
}catch(Exception $ex){
   die('Error in connecting: '.$ex->getMessage());
}
$stmt = $dbh->prepare("select * from test_bg");
$stmt->execute();

echo $stmt->rowCount(). " records fetched.".chr(10);
while($row = $stmt->fetch()){
    echo "id:".$row['id'].', name:'.$row['name'].chr(10);
}

now you can run the script to see the results:
[root@sub pg_test]# php -f test.php
3 records fetched.
id:1, name:test1
id:2, name:test2
id:3, name:test3

Now you can use Postgres as your data store in a very similar way to MySQL.

Conclusion

In this post, I explained quick introduction to Postgres database, from installation, and creating database and roles, to writing simple PHP script that retrieves data from Postgres. PostgreSQL has great features that I intend to learn more about in order to get most of value out of it.