Tag Archives: database

Using Sonata admin bundle with multiple connections

Last month I was participating in developing solution which provides reporting features using Symfony that deals with data partitioned over multiple MySQL servers. Data was sharded among two reporting databases that have identical structure, each database store trading history for a separate company branch/trading system.
So basically, there is a bundle that deals with multiple connections – thus different entity managers- dynamically. Handling those connections in Symfony is straight forward, but for sonata admin bundle, it needed some work to get it work.

Project Structure

The project contains bundle called WebitReportingBundle which have the entities related to reporting database, I mainly use entity called MT4Trades for this post (it holds trades information performed by clients)
This bundle uses two entity managers, depending on the application context, here is portion of config.yml containing entity managers/connection mappings:

    orm:
        default_entity_manager: default
        entity_managers:
            default:
                connection: default
                mappings:
                    FOSUserBundle: ~
                    ApplicationSonataUserBundle: ~
                    SonataUserBundle: ~
                    AppBundle: ~                    
            reporting_real:
                connection: reporting_real
                mappings:
                    WebitReportingBundle: ~
            reporting_real2:
                connection: reporting_real2
                mappings:
                    WebitReportingBundle: ~

In the config file above, there are two extra entity managers (real_reporting & real_reporting2), each handles connection to separate database.

In Symfony, the entity manager can be specified with doctrine repository simply as following:

<?php

namespace Webit\ReportingBundle\Controller;

use Symfony\Bundle\FrameworkBundle\Controller\Controller;

class DefaultController extends Controller {
    
    public function IndexAction(Request $request) {
        //retrieve trades from server 1
        $obj_server1 = $this->getDoctrine()
            ->getRepository('WebitReportingBundle:MT4Trades', 'reporting_real')
            ->findOneBy(['ticket'=>42342]);
                
        //retrieve trades from server 2
        $obj_server1 = $this->getDoctrine()
            ->getRepository('WebitReportingBundle:MT4Trades', 'reporting_real2')
            ->findOneBy(['ticket'=>123123]);
    }
}

but when I wanted to do the same on sonata admin module, there was no easy way to handle that.

How Sonata Admin Bundle handle multiple connections

When you create sonata admin file for certain entity with “SonataDoctrineORMAdminBundle“, “Admin” class will determine the entity manager to use by calling “getManagerForClass()” method in the associated “ModelManager” object.

As in the following code, which is taken from ModelManager class inside sonataDoctrineORMAdminBundle:

<?php 
namespace Sonata\DoctrineORMAdminBundle\Model;

class ModelManager implements ModelManagerInterface, LockInterface
{
    public function getEntityManager($class)
    {
        if (is_object($class)) {
            $class = get_class($class);
        }

        if (!isset($this->cache[$class])) {
            //detect entity manager based on class name automatically
            $em = $this->registry->getManagerForClass($class);

            if (!$em) {
                throw new \RuntimeException(sprintf('No entity manager defined for class %s', $class));
            }

            $this->cache[$class] = $em;
        }

        return $this->cache[$class];
    }
}

However, this method will be an issue for entities when dealing with sharded databases, so in my case, the first matching entity manager will be always selected – according to mappings defined in config.yml – and there is no direct way to specify the preferred entity manager for the admin class.

Solution

In order to work around this problem, I performed the following:

  1. Define custom model manager extending the ModelManager class, and override methods “getEntityManager” and “createQuery” to use specific entity manager if it is defined in the class, as following:
    <?php 
    namespace Webit\ReportingBundle\Model;
    
    use Sonata\DoctrineORMAdminBundle\Model\ModelManager as BaseModelManager;
    use Sonata\DoctrineORMAdminBundle\Datagrid\ProxyQuery;
    
    class CustomModelManager extends BaseModelManager
    {
        /* @var $emName stores the preferred entity manager name */
        protected $emName;
        
        /**
         * set preferred entity manager name to be used in ModelManager
         * @param string $name
         */
        public function setEntityManagerName($name){        
            $this->emName = $name;
        }
    
        /**
         * {@inheritdoc}
         */    
        public function createQuery($class, $alias = 'o') {
            //adding second parameter to getRepository method specifying entity manager name
            $repository = $this->getEntityManager($class)
                               ->getRepository($class,$this->emName);
    
            return new ProxyQuery($repository->createQueryBuilder($alias));
        }
        
        /**
         * {@inheritdoc}
         */
        public function getEntityManager($class) {        
            if (is_object($class)) {
                $class = get_class($class);
            }                
            if (isset($this->cache[$class]) === false) {
                //return fixed value if preferred entity manager name specified
                if (isset($this->emName) === true) {
                    $this->cache[$class] = $this->registry->getEntityManager($this->emName);
                } else {
                    $this->cache[$class] = parent::getEntityManager($class);
                }
            }
    
            return $this->cache[$class];
        }
    }
    
  2. Add new model manager as a service and associate it with the admin service entries of the entity:
    services:        
        reporting.model_manager:
             class: Webit\ReportingBundle\Model\CustomModelManager
             arguments:
                 - '@doctrine'
    
        reporting.admin.trades:
            class: Webit\ReportingBundle\Admin\MT4TradesAdmin
            tags:
               - { name: sonata.admin, manager_type: orm, group: webit.admin.group.reporting, label: "Trades (server 1)", pager_type: "simple" }
            arguments: [null,Webit\ReportingBundle\Entity\MT4Trades, WebitReportingBundle:Admin\MT4Trades]        
            calls:
               - [setModelManager, ['@reporting.model_manager'] ]  
               
    
        reporting.admin.trades2:
            class: Webit\ReportingBundle\Admin\MT4Trades2Admin
            tags:
               - { name: sonata.admin, manager_type: orm, group: webit.admin.group.reporting, label: "Trades (server 2)", pager_type: "simple" }
            arguments: [null,Webit\ReportingBundle\Entity\MT4Trades, WebitReportingBundle:Admin\MT4Trades]        
            calls:
               - [setModelManager, ['@reporting.model_manager'] ]
               
                       
    Here I am having two admins for the same entity, each admin shall use different connection.
  3. Define the entity manager name inside the admin classes by overriding setModelManager method, as following:
    <?php
    
    namespace Webit\ReportingBundle\Admin;
    
    use Sonata\AdminBundle\Admin\Admin;
    
    /**
     * admin for MT4Trades (Server 1)
     */
    class MT4TradesAdmin extends Admin{
    
        protected $emName = 'reporting_real';
        protected $baseRouteName = 'admin_webit_reporting_mt4trades';    
        protected $baseRoutePattern = 'admin-reporting-mt4trades-server';
         
        public function setModelManager(\Sonata\AdminBundle\Model\ModelManagerInterface $modelManager) {
            parent::setModelManager($modelManager);
            //override setModelManager to specify the preferred entity manager for this admin class
            $modelManager->setEntityManagerName($this->emName); 
        } 
        
        //remaining admin class methods...
    }
       
        
       

    and here the other admin class that is connecting to the second database:
    <?php
    
    namespace Webit\ReportingBundle\Admin;
    
    /**
     * admin for MT4Trades (Server 2)
     */
    class MT4Trades2Admin extends MT4TradesAdmin{
    
        protected $emName = 'reporting_real2'; //second entity manager will be used
        
        //specify route name and pattern to get two admin classes work with same entity
        protected $baseRouteName = 'admin_webit_reporting_mt4trades2';    
        protected $baseRoutePattern = 'admin-reporting-mt4trades-server2';
         
        //...
    }

    I extended second admin class from the first, so they have identical configurations/logic, with different connection used.

Conclusion

Now, I have two identical admin classes for the same entity, each using different entity manager correctly. and if additional server is deployed in the future, I just have to define new admin classes -along with new connection/entity manager – with placing the correct $emName value.

multi-connection-sonata

Applying version stamp to Symfony & SonataAdmin

I spend a lot of time in my job in developing and maintaining backoffice systems to process some workflow logic for Forex portals. Our systems are developed using Symfony2 and depend highly on SonataAdminBundle with deep customization to impose business rules for Forex companies.

Recently some data inconsistency appeared in a system of one of our clients, and after digging into logs, I found that the cause of the problem is two users processing same user application in almost same time range, and that caused one edit operation to override the other one, and many other undesired consequences occurs after that.

so in order to fix this issue, and prevent it from happening again, I worked on adding “Version Stamps” to my entity to maintain offline data consistency within the application, and I would like to share here what I learned.

 

Version Stamps and offline consistency

Version Stamps is a field that changes every time a writing operation is performed on the data, it is used to ensure that no one else has changed data of that row before applying your modification.
There is several ways to implement version stamps, and the simplest way is an integer value which is noted on the read, its value is compared to the submitted data before write, and once validated, the write operation take place with increasing version stamp value.
Let’s say there is a form bind to an entity in Symfony application, version stamp column will be present in the entity and added as a hidden field in the form, once submitted version stamp value submitted will be compared to the one in the database currently, to ensure that no other edit is performed on that entity -in the time between displaying your form initially and submitting it-, if the condition fails, the update operation will be rejected, thus by adding error via constraint.

Implementation in SonataAdmin

The implementation of this concept is quite simple as following:
In the desired entity, I applied those modifications:

  1. Define new field to hold stamp value.
  2. Mark the field to be a version stamp, using doctrine @Version annotation; this will cause doctrine to update versionStamp field every time the object is persisted to the database
<?php

namespace Webit\ForexCoreBundle\Entity;

use Doctrine\ORM\Mapping as ORM;

/**
 * RealProfile
 *
 * @ORM\Table(name="forex_real_profile")
 * @ORM\Entity(repositoryClass="Webit\ForexCoreBundle\Repository\RealProfileRepository")
 * @ORM\HasLifecycleCallbacks()
 */
class RealProfile
{
    
    /**
     * @var integer
     *
     * @ORM\Column(name="id", type="integer", nullable=false)
     * @ORM\Id
     * @ORM\GeneratedValue(strategy="IDENTITY")
     */
    private $id;
    
    /**
     * @var integer
     *
     * @ORM\Version 
     * @ORM\Column(name="version_stamp", type="integer")
     */
    private $versionStamp;
    
    /* Other columns, getters, setters here */

    
}

In Admin class, the following is added:

  1. In configureFormFields() method, the version stamp is added as hidden field, also I set mapped option to false, to prevent persisting its value along the form. version stamp value must be modified only via PreUpdate() method inside the entity.
    <?php
    
    namespace Webit\ForexCoreBundle\Admin;
    
    use Sonata\AdminBundle\Admin\Admin;
    use Sonata\AdminBundle\Datagrid\ListMapper;
    use Sonata\AdminBundle\Datagrid\DatagridMapper;
    use Webit\ForexCoreBundle\Entity\RealProfile;
    
    class RealAccountsAdmin extends Admin
    {
        protected function configureFormFields(\Sonata\AdminBundle\Form\FormMapper $formMapper)
        {
            $formMapper->add('versionStamp','hidden',array('attr'=>array("hidden" => true, 'mapped'=>false)))
            //other fields and groups...
        }
    }
  2. Here is the important point, which is validating version_stamp posted from the form against the one that is already saved in the database. There is two methods to apply that, one if by using doctrine locking mechanism, and the other is using sonata inline validation to add extra validation layer by implementing validate() method in order to apply additional validation layer.

    option 1:

    class RealAccountsAdmin extends Admin
    {
        /**
         * {@inheritdoc}
         */    
        public function getObject($id)
        {
            $uniqid = $this->getRequest()->query->get('uniqid');
            $formData = $this->getRequest()->request->get($uniqid);        
            
            $object = $this->getModelManager()->find($this->getClass(), 
                    $id, 
                    \Doctrine\DBAL\LockMode::PESSIMISTIC_WRITE, 
                    $formData['versionStamp']);
            
            foreach ($this->getExtensions() as $extension) {
                $extension->alterObject($this, $object);
            }
    
            return $object;
        }
    
        /**
         * {@inheritdoc}
         */      
        public function update($object) {
            try{
                parent::update($object);
            }catch(\Doctrine\ORM\OptimisticLockException $e) {
                $this->getConfigurationPool()->getContainer()->get('session')
                        ->getFlashBag()->add('sonata_flash_error', 'someone modified the object in between');
            }
        }

    This approach will take advantage of doctrine locking support. Here is brief explanation:

    • I have overridden getObject() method in admin class, to add two extra parameters for getModelManager()->find() method;
      third parameter indicates locking type, I used here LockMode::PESSIMISTIC_WRITE
      fourth parameter represents the expected version stamp value -to compare with database value before flushing.
    • I have overridden update($object) method so I catch OptimisticLockException exception and add error flash message to handle it

    option 2:
    In this approach, I used sonata inline validation to detect the form as invalid before even trying to persist and flush the object to the database:

        /**
         * {@inheritdoc}
         */
        public function validate(\Sonata\AdminBundle\Validator\ErrorElement $errorElement, $object) { 
            //get all submitted data (with non-mapped fields)
            $uniqid = $this->getRequest()->query->get('uniqid');
            $formData = $this->getRequest()->request->get($uniqid);
            $submitted_version_stamp = $formData['versionStamp'];        
            
            $em = $this->getConfigurationPool()
                  ->getContainer()->get('doctrine')->getManager();
            
            //get up-to-date version stamp value from the database
            $class_name = get_class($object);        
            $q = $em->createQuery("select partial o.{id,versionStamp} from $class_name o"
                                    . " where o.id=".$object->getId());        
            $saved_data = $q->getArrayResult();        
            $saved_version_stamp = $saved_data[0]['versionStamp'];
            
            //compare version stamps and add violation in case it didn't match
            if($saved_version_stamp != $submitted_version_stamp){
                $errorElement->addViolation('Record data seems outdated, probably someone else modified it, please refresh and try again.')->end();
            }
        }
        
    Here is more details about the operations performed inside this method:

    • To get versionStamp value submitted via form inside that method, I used:
      $uniqid = $this->getRequest()->query->get('uniqid');
      $formData = $this->getRequest()->request->get($uniqid);
      $submitted_version_stamp = $formData['versionStamp'];
    • To get an updated value of versionStamp that is stored in the database, I used doctrine query that retrieve partial object
      $em = $this->getConfigurationPool()
                 ->getContainer()->get('doctrine')->getManager();
              
      $class_name = get_class($object);        
      $q = $em->createQuery("select partial o.{id,versionStamp} from $class_name o"
              . " where o.id=".$object->getId());        
      $saved_data = $q->getArrayResult();        
      $saved_version_stamp = $saved_data[0]['versionStamp'];

      *If you retrieve the whole object from the database again, it will cause many issues specially if doctrine result cache is enabled.
    • Then compare the two values with each other, if those values are not equal, an error shall be appear to the user, and that is performed by calling $errorElement->addViolation() method
      if($saved_version_stamp != $submitted_version_stamp){
          $errorElement->addViolation('Record data seems outdated, probably someone else modified it, please refresh and try again.')->end();
      }

That’s all, now I can perform some basic test.

Test Solution

In order to verify that this mechanism solved the issue, I emulated the inconsistency behavior, by opening sonata admin edit page on two browsers, and then try to modify the data and submitted on each browser consequently.
The browser that got submitted last, will not save data and error will appear
Record data seems outdated, probably someone else modified it, please refresh and try again.”
versionstamp test
In that way, the inconsistency is prevented by stopping the second user from overriding the information modified by the other user.

At last

“Version Stamp” approach helped me in preventing data inconsistency in my Symfony/SonataAdmin application, hope it will help others who face similar scenario. I would like to know if anyone else has other idea or better way to handle that issue.

postgreSQL quick start with php

Last week, I accidentally stumbled upon blog stating features of new version of posgreSQL, and I found it pretty interesting, couple of useful features that does not exist in MySQL, are now implemented in PostgreSQL 9.3 (I am especially interested in Materialized views). so I wanted to learn more about this database.

Unfortunately, I haven’t used Postgres before ( although I have several years experience as MySQL developer and administrator), so I had to learn the basics about postgres, and I wanted to share this experience:

Installing PostgreSQL

In order to get the latest version on my Centos machine, I compiled Postgres from the source as following:

first, I got the desired version source files from Postgres site ( I used V9.3):

wget http://ftp.postgresql.org/pub/source/v9.3.2/postgresql-9.3.2.tar.bz2

then, uncompress the file:
tar xvjf postgresql-9.3.2.tar.bz2

then, compile the source files using this simple command inside the extracted folder:
./configure && make && make install

now postgres files should be placed at /usr/local/pgsql.
Postgres operates by default under user named postgres, so we should create the user, create data directory and assign folder ownership to created user:
adduser postgres 
mkdir /usr/local/pgsql/data 
chown postgres:postgres /usr/local/pgsql/data

Then we should initialize the data storage “database cluster” for the server by calling initdb, but first I switched to postgres user because you cannot run this command as a root:
[root@sub bin]# su - postgres
-bash-4.1$ /usr/local/pgsql/bin/initdb -D  /usr/local/pgsql/data/

database cluster is the collection of databases that postgres use. by creating database cluster the data directory will be filled with database files, and sample databases like Postgres and Template1 will be created.

now, I can start postgres server by typing:

/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >logfile 2>&1 &
-D parameter specify the data directory location which also contains the configuration file of postgres, which is named by default postgresql.conf (analogous to my.cnf in mysql).

Now Postgres server is running and we can begin working with sql commands.

PostgreSQL Client

now let us enter the postgres client program by executing psql program, which is the interactive terminal for postgres:

/usr/local/pgsql/bin/psql -hlocalhost -U postgres -w
Here I am using the database using its super user “postgres”.
I will issue \list command to see the databases installed:
postgres=# \list
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)
as shown in the snippet, there are three databases here:

  • Postgres database: which is the default database for postgres (same as mysql database in mysql)
  • template0 and template1: which are two template databases.
Template database is very useful feature in postgres, it enables administrator to create a database by copying all the content from another (template) database, by default any newly created database will be using template0 as a template.

I created a new database:

postgres=# create database test_pg;
CREATE DATABASE

If you want to create a database using template other than template0, you can use template keyword at the end of create command like this:
create database test_pg2 template template_database;

now if you run \list command, you will see the new database there

postgres=# \list
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 test_pg   | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres         +
           |          |          |             |             | postgres=CTc/postgres+
           |          |          |             |             | test_usr=CTc/postgres
(4 rows)

then, I created a new user and granted him a permission to use that database:

postgres=# create user test_usr with password 'test_pass';
CREATE ROLE
postgres=# grant all privileges on database test_pg to test_usr;
GRANT

Now, I will exit (by typing \q), then connect to the new database using the user I have created in the previous step:
-bash-4.1$ /usr/local/pgsql/bin/psql test_pg -Utest_usr -W
Password for user test_usr:
psql (9.3.2)
Type "help" for help.

test_pg=>

create a table I will use for testing:
test_pg=# create table test_tbl( id serial primary key, name varchar(255) );
CREATE TABLE

Serial keyword is similar to auto-increment attribute in other databases, and used to create unique identifier for the table records

Unlike MySQL, Postgres don’t have different types of storage engines (like MyISAM or InnoDB), it has a unified database with one storage engine

and I will insert sample data to the table

test_pg=# insert into test_tbl(name) values('test1'), ('test2'), ('test3');
INSERT 0 3

PHP Script

I will use PDO in order to test php connectivity to postgres, but first php-postgres package must be installed:

yum install php-pgsql.x86_64

then I wrote this simple script:
<?php
try{
   $dbh = new \PDO('pgsql:host=localhost;dbname=test_pg', 'test_usr', 'test_pass');
}catch(Exception $ex){
   die('Error in connecting: '.$ex->getMessage());
}
$stmt = $dbh->prepare("select * from test_bg");
$stmt->execute();

echo $stmt->rowCount(). " records fetched.".chr(10);
while($row = $stmt->fetch()){
    echo "id:".$row['id'].', name:'.$row['name'].chr(10);
}

now you can run the script to see the results:
[root@sub pg_test]# php -f test.php
3 records fetched.
id:1, name:test1
id:2, name:test2
id:3, name:test3

Now you can use Postgres as your data store in a very similar way to MySQL.

Conclusion

In this post, I explained quick introduction to Postgres database, from installation, and creating database and roles, to writing simple PHP script that retrieves data from Postgres. PostgreSQL has great features that I intend to learn more about in order to get most of value out of it.

Introduction to sphinx with PHP – Part 1

I was using MySQL Full text indexing in one of my projects, but I noticed that after data size increased beyond several gigabytes, MySQL won’t scale well in using the index and the queries will get too slow to be used in production environments especially for high traffic websites, so I have read about sphinx search engine, and find it quite powerful for textual searching functionality.

What is Sphinx?

as sphinx official site defines:

Sphinx is an open source full text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind. It’s written in C++ and works on Linux (RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few other systems.

 

Some sphinx advantages

Sphinx has many features that make it excellent option for textual search of large data sizes, some of these advantages (specifically compared to MySQL):

  1. Scalable over large data sizes (both horizontally and vertically, using features like distributed indexes)
  2. Advanced ranking algorithm, calculating relevancy of data based on analyzing the keywords (ability to set customized weight -relevancy importance- for each field)
  3. Index data from multiple sources, including different database types (or different storage engines)
  4. Other important enhancements, like: parallel results,batch queries…etc

Install Sphinx

Sphinx is available on many platforms (including Linux, MacOS, Windows), and its relatively easy to install.
I will cover here compiling Sphinx from source code, (since I have Fedora and I didn’t find a ready made RPM package in my OS).

1- Go to sphinx download page:
http://sphinxsearch.com/downloads/release/

and download the package suitable to your OS, as I mentioned earlier I will go for the first option, which is downloading the source files on sphinx.

If you find a package for your OS, install it and you can skip the remaining installation steps and jump to the next section.

2- Compile Sphinx from source code:
After downloading the source files, extract the source files and run:

./configure

then
make install

In case you want to let sphinx work with mysql, you should have mysql development package installed to your system, you should run:
yum install mysql-devel

By doing these steps successfully, Sphinx should be installed to your machine successfully. The next step is to configure Sphinx sources and indexes to read from MySQL.

Sphinx Configuration

After having installed sphinx, we have to configure data source and indexes for sphinx.

First, let me introduce a sample MySQL table which I will run sphinx indexer to work with. Its a table called “content” which stores news portal articles, containing standard news fields (like title, body, author…etc):

CREATE TABLE `content` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `title` varchar(500) DEFAULT NULL,
  `author` varchar(255) DEFAULT NULL,
  `category_id` int(11) NOT NULL,  
  `status` int(11) NOT NULL, -- used to indicate if content is published, pending...etc  
  `body` longtext,
  `keywords` varchar(255) DEFAULT NULL,
  `is_deleted` int(11) DEFAULT NULL,
  `slug` varchar(255) DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,   
   FULLTEXT KEY `INDEX_TEXT` (`title`,`author`),
   FULLTEXT KEY `title` (`title`),
   FULLTEXT KEY `body` (`body`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

This table was MyISAM table in order to benefit from MySQL Full-Text index prior to use of Sphinx. We will use these indexes to compare sphinx and mySQL performance later.

Now let’s see sphinx configuration file,
Sphinx configuration file, is named sphinx.conf, usually you will find it in /usr/local/etc (in case you compiled sphinx from sources), or in /etc/sphinx (in case you installed a RPM package). you will find also at the same directory, sphinx.conf.dist which have standard configuration file that have all the configuration options.

In the following, the contents of the sphinx.conf file I used with my database

source content
{
        #data source type
        type                    = mysql
        sql_host                = localhost
        sql_user                = root
        sql_pass                = root
        sql_db                  = your_db
        sql_port                = 3306

        #main document fetch query
        sql_query_pre           =   SET NAMES utf8
        sql_query               = \
                                SELECT id, title, author, status, body, keywords, category_id, unix_timestamp(created_at) as created_time  \
                                from content \
                                where is_deleted=0 and status=1 \
                                ;
        #attribute declaration        
        sql_attr_uint           = status
        sql_attr_uint           = category_id
        sql_attr_timestamp      = created_time

}

#defining delta source
source delta: content {
       sql_query                = \
                                SELECT id, title, author, status, body, keywords, category_id, unix_timestamp(created_at) as created_time  \
                                from content \
                                where is_deleted=0 and status=1 and created_at >= CURRENT_TIMESTAMP() - INTERVAL 15 MINUTE\
                                ;
}

#defining indexes
index content_index
{
       source           = content
       path             = /home/zaid/sphinx/data/mdn_content
       docinfo          = extern
}
index delta: content_index
{
      source            = delta
      path              = /home/zaid/sphinx/data/mdn_content_delta
      docinfo           = extern
}

#searchd daemon configurations
searchd
{
     listen   = localhost:3321
     listen   = 9312
     log      = /var/log/searchd.log
     pid_file = /var/log/searchd.pid
}

Here is the explanation of the configuration file parts:

1- Main data source block (mandatory):

source content
{
....
}

this block defines the data source, which sphinx will index its contents, in this block you define mainly three groups of information:

  •     data source connection parameters:   containing information to connect your database, including: database type, username, password, port, database name …etc
    source content
    {
            #data source type
            type                    = mysql
            sql_host                = localhost
            sql_user                = root
            sql_pass                = root
            sql_db                  = your_db
            sql_port                = 3306
    
            ....
    }
  • Query Fetch configurations: containing the main query to fetch the data from your source in order to be indexed in Sphinx, I used
    sql_query_pre to define UTF8 encoding of the incoming data.
    sql_query to fetch the main data to be indexed, which are -in my case- non deleted, approved news articles.
    source content
    {
            ....
            #main document fetch query
            sql_query_pre           =   SET NAMES utf8
            sql_query               = \
                                    SELECT id, title, author, status, body, keywords, category_id, unix_timestamp(created_at) as created_time  \
                                    from content \
                                    where is_deleted=0 and status=1 \
            ....
    }
  • Attribute declaration: mainly data fetched into Sphinx will be full-text indexed, however, you can define other attribute that can be used for filtration, ordering and grouping as non-text fields. Here I used
    sql_attr_uint to define status and category_id columns as unsigned integer attributes,
    sql_attr_timestamp to define created_time as time stamp.
    source content
    {
            ....
            #attribute declaration        
            sql_attr_uint           = status
            sql_attr_uint           = category_id
            sql_attr_timestamp      = created_time
    }

    You may not define any attribute if there is no need for them

 

2- Delta Source Block:
This block defines the data which are recently updated, so we don’t have to run Sphinx indexer on all the data, we will run the indexer periodically only on the recently added contents (delta) in order to add them to the index.

#defining delta source
source delta: content {
       sql_query                = \
                                SELECT id, title, author, status, body, keywords, category_id, unix_timestamp(created_at) as created_time  \
                                from content \
                                where is_deleted=0 and status=1 and created_at >= CURRENT_TIMESTAMP() - INTERVAL 15 MINUTE\
                                ;
}

3- Index Block(s):
Defining the index associated with the data sources, I defined one index for the main source, and another for delta source. this block contains the path where the index will be stored

index content_index
{
       source           = content
       path             = /home/zaid/sphinx/data/mdn_content
       docinfo          = extern
}
index delta: content_index
{
      source            = delta
      path              = /home/zaid/sphinx/data/mdn_content_delta
      docinfo           = extern
}

4- Searchd Daemon Block:
Searchd is the daemon that serves the search queries issued by the clients and retrieve results, you define the port to listen to, log file path and PID (process ID) file path.

#searchd daemon configurations
searchd
{
     listen   = localhost:9312
     listen   = 9312
     log      = /var/log/searchd.log
     pid_file = /var/log/searchd.pid
}

Running Sphinx

Once you have placed your sphinx config file, you are ready to start indexing your data, and requesting search queries from Sphinx.

To index your data source, run the Indexer

indexer --all

I found the indexer pretty fast, it indexed my data source (which is about 1.5G) in about 90 seconds!

After completion of indexing data, start searchd daemon by simply typing

searchd

To make sure that your searchd daemon works, you can type
netstat -nlp | grep searchd
netstat -nlp | grep 9312

Now we need to make the delta index, indexing new data automatically and merge them to the main index, please these in your crontab:

#I added this cronjob to run each 15minutes according to my delta query
*/15 * * * * /usr/bin/indexer --rotate --config /usr/local/etc/sphinx.conf delta
*/15 * * * * /usr/bin/indexer --config /usr/local/etc/sphinx.conf --merge content_index delta --rotate

Now, you are ready to go!

Running sphinx from Command Line

now you can query search phrases from your sphinx using search command

in the following format

 search -i NAME_OF_INDEX -l LIMIT -o OFFSET SEARCH_PHRASE

Here is an example, this command search for “jordan” in content_index that we have just defined.

[zaid@localhost tmp]$ search -i content_index -l 10 -o 20 jordan
Sphinx 2.2.1-id64-dev (r4310)
Copyright (c) 2001-2013, Andrew Aksyonoff
Copyright (c) 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/usr/local/etc/sphinx.conf'...
index 'content_index': query 'jordan ': returned 62 matches of 62 total in 0.000 sec

displaying matches:
21. document=136385, category_id=11, created_time=Sun Feb 26 14:27:32 2012
22. document=138933, category_id=11, created_time=Mon Mar 12 15:39:15 2012
23. document=142949, category_id=11, created_time=Wed Apr  4 13:23:04 2012
24. document=152446, category_id=19, created_time=Sun May 27 14:41:34 2012
25. document=156444, category_id=11, created_time=Sun Jun 17 00:40:47 2012
26. document=180436, category_id=11, created_time=Mon Oct 22 11:03:01 2012
27. document=57574, category_id=1, created_time=Sun Oct  3 18:05:58 2010
28. document=62989, category_id=53, created_time=Tue Nov 30 19:11:22 2010
29. document=76606, category_id=11, created_time=Sat Mar 12 11:29:13 2011
30. document=80203, category_id=17, created_time=Wed Apr  6 23:59:56 2011

words:
1. 'jordan': 62 documents, 164 hits

Note that the results returned DocumentID (which is analogous to id column in our SQL), along with other attributes we defined in the config file, which include category_id and created_time. The search was pretty fast (0.0000sec) take for this search query.

I am aware that I didn’t write any PHP code in this article, I will leave it to part 2 🙂

In the next article, I will write simple PHP script that query results from the indexes we created, and talk a little bit about filtering, grouping and ranking results in sphinx.