Jul 13, 2012
One of the best open source date indexing tools available, with support for various popular file formats like PDF, HTML, etc.
What is most valuable?
- A very good product for indexing huge data with a very fast response time for search queries.
- Apart from the originally supported Java platform, it can be easily integrated with different platforms as well, including Delphi, Perl, C#, C++, Python, Ruby, and PHP.
- For developers, there are very good community support like forums, mailing lists, etc., available on the web.
- Being open source, developers can add/customize the underlying code-base to suit their needs.
- Is capable of indexing different types of files like PDFs, HTML, Microsoft Word, etc.
- Supports data indexing in UTC encoded data. Meaning, it can index any data as long as UTC supports encoding it. This is independent of any language across the globe.
What needs improvement?
- For Java users, there is a performance penalty due to the well known fact that JVM(Java Virtual Memory) is a memory hogger. Scalability is an issue as well.
- If you have a requirement of adding custom algorithms for indexing data, you might face a little difficulty, as there is not much information available either in Lucene forums or mailing lists.
Though community support is excellent for Java users, for other area specific Programming Platforms like Perl, and Delphi, it is a bit difficult to get solutions for your problems, as the tool is still not that stable in these platforms and is still in the incubation phase.
*Disclosure: My company does not have a business relationship with this vendor other than being a customer.