Internetagentur

Apache Solr Cell 3.3.0 with PHP on Windows


Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

 

Installation on Windows 7 (Single Solr App)

http://wiki.apache.org/solr/SolrTomcat#Single_Solr_app

Additional Notes:

Configure Tomcat to recognize the Solr home directory: Go to All Programs -> Apache Tomcat 7 and start "Configure Tomcat"

 

 

Installation Solr Cell (TIKA) on Windows 7

http://wiki.apache.org/solr/ExtractingRequestHandler#Configuration

Additonal Notes:

In order to function correctly you need to copy the directories contrib and dist from the extracted Solr distribution zip file to your Solr Home directory in subfolder lib.

In the subfolder in your Solr Home directory you need to adjust the paths in solrconfig.xml file. The paths are relative starting vom the Solr Home directory. With the above mentioned directory structure you need to set:

 

Remote Streaming of files

http://wiki.apache.org/solr/ExtractingRequestHandler

Submit the following HTTP GET command (e.g. from your Browser)

http://localhost:8080/solr/update/extract?stream.file=c:/tutorial.html&stream.contentType=text/html&literal.id=tutorial.html&commit=true

If you want Solr to not only index but also store the contents of the file for retrieval, add the field where Solr should store the contents to the URL above with the following parameter &fmap.content=name. This implies that you are using the default schema with a field "name" configured. Otherwise change it to any field desired.

 

Examples for issuing cUrl HTTP POST/GET commands with PHP

Adding a document via XML

$url='http://localhost:8080/solr/update/extract?stream.file=c:/tutorial.html&stream.contentType=application/msword&literal.id=tutorial.html&commit=true';

$postData='<add allowDups="false" overwritePending="true" overwriteCommitted="true"><doc><field name="id">F8V7067-APL-KIT12</field><field name="name">Belkin 1AMobile Power Cord for iPod w/ Dock</field><field name="manu">Belkin</field><field name="cat">connector</field><field name="features">car power adapter, white</field><field name="weight">4</field><field name="price">19.95</field><field name="popularity">1</field><field name="inStock">false</field></doc><doc><field name="id">F9V7067-APL-KIT</field><field name="name">';

$contentType='text/xml; charset=UTF-8';

$timeout=60;

curl_setopt_array($ch, array(
 CURLOPT_RETURNTRANSFER => true,
 CURLOPT_BINARYTRANSFER => true,
 CURLOPT_HEADER => false
 CURLOPT_NOBODY => false,
 CURLOPT_POST => true,
 CURLOPT_URL => $url,
 CURLOPT_POSTFIELDS => $postData,
 CURLOPT_HTTPHEADER => array("Content-Type: {$contentType}"),
 CURLOPT_TIMEOUT => $timeout
));
$ret = curl_exec($ch);

 

PHP Solr Clients

 

Further Links

 

 

 

COMMENTS

 
    No recent messages have been posted.

Please enter your message:

* = required field
  •   
  •   

  • Please enter here the word as displayed in the picture. This is to prevent spamming.
    CAPTCHA image for SPAM prevention