找寻引擎Zend_lucene

Zend Lucene

 

1.General

Zend_Search_Lucene is a general purpose text search engine written
entirely in PHP 5. it stores its index on the filesystem and does not
require a database server.

2. How to install Zend Lucene

l DownLoad WebSite :    
http://www.zend.com/community/downloads

l Zend Framework version
:   Zend Framework 1.9 minimal

Download Zend Framework 1.9 minimal from DownLoad WebSite.

Remove everything from Zend Folder but remain following files and
directories:

l Exception.php

l Loader/

l Loader.php

l Search/

 

3.How to create an index.

an example of creating an index as below:

 <?php

//File
Name: createindex.php

require_once
‘Zend/Search/Lucene.php’;

$productsData=
array(

0=>array(“PID”=>1,”url”=>”http://www.cybozu.jp","productName"=&gt;"garoon","Description"=&gt;"garoon
Description”,”lag”=>”en”),

1=>array(“PID”=>2,”url”=>”http://www.cybozu.jp","productName"=&gt;"share360","Description"=&gt;"share360
Description” ,”lag”=>”en”),

2=>array(“PID”=>3,”url”=>”http://www.cybozu.jp
a”,”productName”=>”日本語の製品名前”,”Description”=>”扶桑語の製品”,”lag”=>”jp”),

3=>array(“PID”=>4,”url”=>”http://www.cybozu.jp
a”,”productName”=>”粤语产品名”,”Description”=>”中文产品描述”,”lag”=>”zh”)

);

$index=new
Zend_Search_Lucene(‘index’,true);

$doc

new
Zend_Search_Lucene_Document();

foreach
($productsData
as
$productData)

{

    
$doc->addField(Zend_Search_Lucene_Field::keyword(‘PID’,
$productData[‘PID’],
‘UTF-8’));

    
$doc->addField(Zend_Search_Lucene_Field::Text(‘url’,
$productData[‘url’],
‘UTF-8’));

  
   $doc->addField(Zend_Search_Lucene_Field::Text(‘productName’,
$productData[‘productName’],
‘UTF-8’));

    
 $doc->addField(Zend_Search_Lucene_Field::Text(‘Description’,
$productData[‘Description’],
‘UTF-8’));

    
$doc->addField(Zend_Search_Lucene_Field::unIndexed(‘lan’,
$productData[‘lan’],
‘UTF-8’));  

 $index->addDocument($doc);

  
  $index->commit();

    $index->optimize(); 

}

echo
‘index
has been created!’;

In KB project, index data is come from database, using method above , We
can index all the text from database.

 

4.Searching index

After creating an index , We can search index as below:

<?php

 //File
Name: search.php

 require_once(‘Zend/Search/Lucene.php’);

 $index

new
Zend_Search_Lucene(‘index’);

$keywords=’garoon’;

 echo
“Index
contains {$index->count()}
documents.\n”;

 $query

Zend_Search_Lucene_Search_QueryParser::parse( $keywords,
‘utf-8’
);

 $hits

$index->find($query);

 foreach
($hits
as
$hit)

         
{

        
    echo
‘PID:
‘.$hit->PID.'<br>’;

        
    echo
‘Score:
‘.$hit->score.'<br>’;

        
    echo
‘url:
‘.$hit->url.'<br>’;

        
    echo
‘productName:
‘.$hit->productName.'<br>’;

        
    echo
‘lan:
‘.$hit->lan.'<br>’;

       
}

If we want to search the text for multiple language, We can get value of
lan , and then display different results by lan.

 

5.delete and update index.

If we want to update index , first we must find the document in index by
keyword, then delete it ,after deleting the old document ,We can add a
new document. This is an example to update an index. We delete PID :1
product,and update the description.

<?php

 require_once(‘Zend/Search/Lucene.php’);

   

$index

new
Zend_Search_Lucene(‘index’);

 //new
product data to update

 $productNewData
=array(“PID”=>1,”url”=>”http://www.cybozu.jp","productName"=&gt;"garoon","Description"=&gt;"update
garoon Description”,”lan”=>”en”);

 $keywords=”PID:1″;

 $hits

$index->find($keywords);

 //Delete
PID:1

  
foreach
($hits
as
$hit)

        
{

        
    echo
‘PID:
‘.$hit->PID
.’has
been deleted <br>’;

        
    $index->delete($hit->id);

       
}

       
$index->commit();

 //add
new product data to index   

 $doc

new
Zend_Search_Lucene_Document();

 $doc->addField(Zend_Search_Lucene_Field::keyword(‘PID’,
$productNewData[‘PID’],
‘UTF-8’));

 $doc->addField(Zend_Search_Lucene_Field::Text(‘url’,
$productNewData[‘url’],
‘UTF-8’));

 $doc->addField(Zend_Search_Lucene_Field::Text(‘productName’,
$productNewData[‘productName’],
‘UTF-8’));

 $doc->addField(Zend_Search_Lucene_Field::Text(‘Description’,
$productNewData[‘Description’],
‘UTF-8’));

 $doc->addField(Zend_Search_Lucene_Field::unIndexed(‘lan’,
$productNewData[‘lan’],
‘UTF-8’));

 $index->addDocument($doc);

 $index->commit();

 $index->optimize(); 

 

6.How to search japanese or chinese text by lucene.

As default , lucene can only search English text.But in this project ,
we must search the text by English, Japanese and Chinese. So we have to
change default analyzer of Lucene.

This is an extend of default analyzer of Lucene as below:

<?php

//
File Name:chinese.php

require_once
‘Zend/Search/Lucene/Analysis/Analyzer.php’;

require_once
‘Zend/Search/Lucene/Analysis/Analyzer/Common.php’;

 

class
CN_Lucene_Analyzer
extends
Zend_Search_Lucene_Analysis_Analyzer_Common

{

   
private
$_position;

   
private

$_cnStopWords

array(
);

    

   
public
function setCnStopWords(
$cnStopWords
)

   
{

       
$this->_cnStopWords
= $cnStopWords;

   
}

 

   
/**

   
* Reset token stream

   
*/

   
public
function reset()

   
{

       
$this->_position
= 0;

       

$search

array(“,”,
“/”,
“\\”,
“.”,
“;”,
“:”,
“\””,
“!”,
“~”,
“`”,
“^”,
“(“,
“)”,
“?”,
“-“,
“‘”,
“<“,
“>”,
“$”,
“&”,
“%”,
“#”,
“@”,
“+”,
“=”,
“{“,
“}”,
“[“,
“]”,
“:”,
“)”,
“(”,
“.”,
“。”,
“,”,
“!”,
“;”,
““”,
“””,
“‘”,
“’”,
“[”,
“]”,
“、”,
“—”,
“ ”,
“《”,
“》”,
“-”,
“…”,
“【”,
“】”,
“?”,
“¥”
);

    

       
$this->_input
= str_replace( $search,
”,
$this->_input
);

       
$this->_input
= str_replace( $this->_cnStopWords,

‘,
$this->_input
);

   
}

 

   
/**

   
* Tokenization stream API

   
* Get next token

   
* Returns null at the end of stream

   
*

   
* @return Zend_Search_Lucene_Analysis_Token|null

   
*/

   
public
function nextToken()

   
{

       
if
($this->_input
=== null)

       
{

           
return
null;

       
}

       

$len

strlen($this->_input);

       
//print
“Old string:”.$this->_input.”<br
/>”;

       
while
($this->_position
< $len)

       
{

           
//
Delete space at the begining

           
while
($this->_position
< $len
&&$this->_input[$this->_position]==’
‘ )

           
{

               
$this->_position++;

           
}

           

$termStartPosition

$this->_position;

           

$temp_char

$this->_input[$this->_position];

           

$isCnWord

false;

           
if(ord($temp_char)>127)

           
{

               

$i

0;      

               
while(
$this->_position
< $len
&&
ord( $this->_input[$this->_position]
)>127
)

               
{

                   
$this->_position
= $this->_position

  • 3;

                   
$i
++;

                   
if($i==2)

                   
{

                       

$isCnWord

true;

                      
 break;

                   
}

               
}

 

               
if($i==1)
continue;

           
}

           
else

           
{

               
while
($this->_position
< $len
&&
ctype_alnum( $this->_input[$this->_position]
))

               
{

                   
$this->_position++;

               
}

               
//echo
$this->_position.”:”.$this->_input[$this->_position-1].”\n”;

           
}

           
if
($this->_position
== $termStartPosition)

           
{

               
$this->_position++;

               
continue;

           
}

    

           

$tmp_str

substr($this->_input,
$termStartPosition,
$this->_position

  • $termStartPosition);

            

           

$token

new
Zend_Search_Lucene_Analysis_Token(
$tmp_str,
$termStartPosition,$this->_position
);

            

           

$token

$this->normalize($token);

 

           
if($isCnWord)

           
{

               
$this->_position
= $this->_position

  • 3;

           
}

 

           
if
($token
!==
null)

           
{

               
return
$token;

           
}

       
}

        

       
return
null;

   
}

 

With the help of chinese.php we can search Japanese and Chinese in kb.
And also we must add codes as below before creating an index and
searching.

 

require_once
‘chinese.php’;

Zend_Search_Lucene_Analysis_Analyzer::setDefault(new
CN_Lucene_Analyzer());

 

7.Is Zend Lucene need downtime?

  By
using Zend Lucene , we don’t need any downtime. When add a new article
we can add it to index at the same time, If we edit an article, we need
to delete old document and update index with new one .

 

 

 

相关文章