Introduction to sphinx with PHP – part2

In Part1, I explained how to install sphinx and configure it to index the data from MySQL source, and use the searchd daemon from command line to retrieve data from defined indexes.

In this post, I will explain a PHP examples of how to use Sphinx API.

The following script is based of the database structure and sphinx config file I used in Part1 of this sphinx introduction.

Example PHP Script

<?php

header('Content-type: text/html; charset=utf8');
include ( "sphinxapi.php" );

mysql_connect('localhost', 'root', 'root');
mysql_select_db('your_database_here');        
mysql_query('set names utf8');        

$phrase = @$_GET['phrase'];
$page = isset($_GET['page']) ? $_GET['page'] : 1;
$date_start = @$_GET['date_start'];
$date_end = @$_GET['date_end'];

$client = new SphinxClient();
$client->SetLimits(($page - 1) * 10, 10);
$client->SetSortMode(SPH_SORT_EXTENDED, '@weight desc, created_time desc');
$client->SetMatchMode(SPH_MATCH_ANY);
$client->SetFieldWeights(array('title'=>4, 'keywords'=>2, 'body'=>1 ));

if(isset($date_start) || isset($date_end)){    
    $start_time = isset($date_start)?strtotime($date_start):null;
    $end_time = isset($date_end)?strtotime($date_end):null;    
    $client->SetFilterRange('created_time', $start_time, $end_time);
}

$res = $client->Query($phrase, 'content_index');


if (!$res) {
    echo 'error: ' . $client->GetLastError();
} else {

    if ($res['total'] == 0 || !isset($res['matches'])) {
        echo 'No results retrieved from Search engine';
    } else {
        echo "Displaying " . (($page - 1) * 10+1).'-'.(min($res['total'],$page * 10)) . " out of " . $res['total_found'] . ' total results';
                
        //var_dump($res);
        $ids_str = implode(', ', array_keys($res['matches']));
        $res_db = mysql_query('select id, title, created_at from content where id in  (' . $ids_str . ') order by field(id,'.$ids_str.')');
        if ($res_db === false) {
            echo "Error in mysql query #" . mysql_errno() . ' - ' . mysql_error();
        } else {
            echo '<ul>';
            while ($row = mysql_fetch_assoc($res_db)) {
                echo '<li>'
                . '<a href="show.php?id=' . $row['id'] . '&phrase='.$phrase.'">' . $row['title'] . '<a>'
                . '<br/> [relevency: '.$res['matches'][$row['id']]['weight'].']'        
                . '<br/> [created_at: '.$row['created_at'].']'        
                . '</li>';
            }
            echo '</ul>';
        }

        echo '<br/><br/>Total Time: ' . $res['time'] . 's';
    }
}

This simple script takes parameters from the webpage, then issue a search request containing the specified phrase and conditions from searchd daemon.

In the first lines (1-13), I declared the database connection along with the parameters that I will use within the search, after that I initialized sphinx client and applied main configurations on it as explained in the next section.

Main SphinxClient Methods

Here are a list of main methods used to configure SphinxClient:

1- SetSortMode:
Sphinx supports multiple flexible sort modes which controls the ordering criteria of the retrieved results,
I will mention brief information about each sort mode – since I consider them as one of most important features in sphinx:

a- SPH_SORT_RELEVANCE: its the default sort mode that sorts the results according to the their relevancy to the search query passed.

$client->SetSortMode(SPH_SORT_RELEVANCE);

Sphinx ranks the results by default using phrase proximity that takes into consideration the phrase words order along with words frequency. We can control the way sphinx computes relevancy by changing Ranking modes (using  SetRankingMode function ).

b- SPH_SORT_ATTR_ASC / SPH_SORT_ATTR_DESC: sort the results in ascending or descending order according to predefined attribute, for example, you can change line 17 to be:

$client->SetSortMode(SPH_SORT_ATTR_DESC, 'created_time');
in this way, the newest articles will come as the first result in the page.

c- SPH_SORT_TIME_SEGMENTS: sorts by combination time ordering then by relevancy

$client->setSortMode(SPH_SORT_TIME_SEGMENTS, 'created_time');

d- SPH_SORT_EXTENDED: sort by a combination of attributes ascending or descending in SQL-like format, as I used in the script above:

$client->SetSortMode(SPH_SORT_ATTR_ASC, '@weight desc, created_time desc');
Here I sorted according to relevancy (represented using @weight computed attribute), then descending according to creation time (in case two results have same weight).

e- SPH_SORT_EXPR: sort using some arithmetic expression, for example you can use a combination of the relevancy and popularity represented by page_views, as an example:

$client->SetSortMode(SPH_SORT_EXPR, '@weight * page_views/100');

unlike MySql, putting expression in sort mode (analogous to order by clause) won’t effect the performance negatively.

2- SetMatchMode():
used to control how sphinx perform a match for the query phrase, here is the most important options:
a- SPH_MATCH_ALL: matches all keywords in the search query.
b- SPH_MATCH_ANY: matches any keyword.
c- SPH_MATCH_PHRASE: match the whole phrase, which require perfect match.
all matching modes can be found here

3- SetFieldWeights():
Using this function, you can distribute the relevancy weight among the fields, in the script above, I used this line:

$client->SetFieldWeights(array('title'=>4, 'keywords'=>2, 'body'=>1 ));

in order to indicate that “title” field is more important than “keywords” field and “body” field, so the results that have matching query phrase in the title will appear before those which have many matching query phrase in the body. This option is very useful to control the relevancy of results.

4- SetFilterRange():
Here you can add filter based on one of the attributes defined in sphinx index, (analogous to adding where condition to the SQL statement). I used it to filter according to the creation time

$client->SetFilterRange('created_time', $start_time, $end_time);

5- Query():
after configuring sphinx search query, this method used to send request to searchd daemon and get the results from sphinx:

$res = $client->Query($phrase, 'content_index');

the Query() method, take the search phrase as the first parameter, and the name of the index(es) to match against as the second parameter.

After calling Query() method on sphinxClient, a result array will be returned containing information about matching records. If we dumped the “matches” index in the result array, we will get similar to those data:

var_dump($res['matches']);
/*********/

  array(2) {
    [181916]=>
    array(2) {
      ["weight"]=>
      string(1) "1"
      ["attrs"]=>
      array(3) {
        ["status"]=>
        string(1) "1"
        ["category_id"]=>
        string(2) "11"
        ["created_time"]=>
        string(10) "1386946964"
      }
    }
    [181915]=>
    array(2) {
      ["weight"]=>
      string(1) "7"
      ["attrs"]=>
      array(3) {
        ["status"]=>
        string(1) "1"
        ["category_id"]=>
        string(2) "12"
        ["created_time"]=>
        string(10) "1386368157"
      }
    }

The data returned for each matched element are:
– documentID (as the key of the array element)
– weight (dynamically calculated according to SetSortMode() and SetFieldWeights() functions, we used earlier)
– attributes values, in “attrs” index (ex. created_time, status…etc), containing sphinx attributes defined in config file.

note that sphinx will not return the textual data itself, because it only index textual data and don’t store it, so we have to get them from our MySQL database:

$ids_str = implode(', ', array_keys($res['matches']));
$res_db = mysql_query('select id, title, created_at from mdn_content where id in  (' . $ids_str . ') order by field(id,'.$ids_str.')');

in this line, I got the records from MySQL using the DocumentIDs, and kept the same ordering as Sphinx by using “Field(id, val1,val2,…)” in order by clause.

Now I got the results IDs from sphinx, fetched associated textual data from MySQL and displayed them into webpage.

Running the code

Now, I would like to query all recording containing word “syria” published in the last two weeks, and here are the results:
Screenshot from 2013-12-14 00:02:11

you can see that articles with “syria” word appeared in title got higher rank than those with “syria” keyword appeared in the body, because of the field weights I used in the script above. also the sphinx took about 0.015 seconds to get those results among 150,000 record, which is extremely fast.

another execution here, searching for syria phrase without any additional filters:
Screenshot from 2013-12-14 00:20:34
and that took about 0.109 seconds to execute!

Quick MySQL comparison

I just wanted to compare sphinx with MySQL, in terms of performance here:
I execute mysql query that have a similar condition to that I executed on sphinx in previous section, and here is the result:

mysql> select id from content where match(body) against('*syria*' in boolean mode) and status=1;
+--------+
| id     |
+--------+
| 145805 |
| 142579 |
| 133329 |
|  59778 |
|  95318 |
|  94979 |
|  83539 |
|  56858 |
| 181915 |
| 181916 |
| 181917 |
| 181918 |
+--------+
12 rows in set (10.74 sec)

MySQL took about 10 seconds to execute the same query compared to about 0.1 second using sphinx.

Conclusion

Now, the simple PHP script is running with sphinx and MySQL, and I explained the main functions to control Sphinx using PHP API, including sorting, matching and filtration.
There are many other powerful features of sphinx, like: MultiQuery, MVA (multi-valued attributes), grouping…etc, that I may write about in the future.

2 thoughts on “Introduction to sphinx with PHP – part2

  1. Very good and effective tutorial.Best sphinx tutorial available online.Thanks for your contribution.
    I got a question,whenever I run the program,I get same result as yours.But says “Deprecated:Do not use or better use,sphinxQL” on top of the browser.
    How can I get rid of this?
    Thanks

    1. In the latest version of sphinx SetMatchMode method is deprecated (to be dropped in the next versions), you can remove its call,
      I will update the post shortly to replace it.

Leave a Reply

Your email address will not be published. Required fields are marked *

Human? prove it... * Time limit is exhausted. Please reload CAPTCHA.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>