This benchmark compares thinking_sphinx with acts_as_xapian. We need a search engine that gives us the IDs of matching documents from a fulltext index, basic text search only.

Data

  • one table with 200k entries with 5k of text (avg) in one column
  • one table with 500k entries with 7k of text (avg) in 6 columns
  • one table with 500k entries with 7k of text (avg) in 4 columns

Indexing

Initial indexing took 10 mins with thinking_sphins and 75 mins(!!) on acts_as_xapian

Search performance

The search performance on queries that return only a few items is nearly identical.

The search performance on queries that return many items (~10000) is nearly identical, 90% of the time is spend in ActiveRecord.

In our case - we only need IDs and not the entire documents - sphinx runs at 0.6 secs for a particular query (with 10000 results), where acts_as_xapian needs 4.5 secs. This is because thinking_sphinx allows you to only fetch the ids, where acts_as_xapian insists of pulling the models from the database. When patching acts_as_xapian to allow for pulling ids only, we land at 0.6 vs 0.4 secs.

Results

We will choose sphinx because

  • it is similarily fast to xapian
  • runs over the network by default
  • Indexing is way faster (I guess because acts_as_xapian pulls all data to be index from the database to hand it over, while sphinx can do that itself)
  • acts_as_xapian would need to be patched for performance reasons.

And here you find the related tools: