This benchmark compares thinking_sphinx with acts_as_xapian. We need a search engine that gives us the IDs of matching documents from a fulltext index, basic text search only.
- one table with 200k entries with 5k of text (avg) in one column
- one table with 500k entries with 7k of text (avg) in 6 columns
- one table with 500k entries with 7k of text (avg) in 4 columns
Initial indexing took 10 mins with thinking_sphins and 75 mins(!!) on acts_as_xapian
The search performance on queries that return only a few items is nearly identical.
The search performance on queries that return many items (~10000) is nearly identical, 90% of the time is spend in ActiveRecord.
In our case - we only need IDs and not the entire documents - sphinx runs at 0.6 secs for a particular query (with 10000 results), where acts_as_xapian needs 4.5 secs. This is because thinking_sphinx allows you to only fetch the ids, where acts_as_xapian insists of pulling the models from the database. When patching acts_as_xapian to allow for pulling ids only, we land at 0.6 vs 0.4 secs.
We will choose sphinx because
- it is similarily fast to xapian
- runs over the network by default
- Indexing is way faster (I guess because acts_as_xapian pulls all data to be index from the database to hand it over, while sphinx can do that itself)
- acts_as_xapian would need to be patched for performance reasons.