Monday, November 18, 2013

Search API Notes

Scenario: You've got a great idea that requires indexing and/or search capabilities well beyond your budget.  Where do you go from here?

Thankfully, you have a few options to choose from when deciding how to power your new app.  Sadly, you have ONLY a few options to choose from.  Indexing and searching the Internet is a monstrous task, which is why this industry is a natural fit for the oligopoly we see today.  There are three players in this market that all offer Search APIs, but as of this writing, their products differ considerably.

Yahoo BOSS - http://developer.yahoo.com/boss/search/
If you are looking for something inexpensive, then this is it.  They offer a 'limitedweb' search that is slightly smaller and not as fresh, but it's only $0.40/1000 queries, which is half the price of their 'web' offering.  Other than the cost savings, this service stinks.  Do not use this unless your application allows for a large margin of error and cost is the most important requirement.  I've found 3 types of common problems:
- False positives: returning results that do not contain the query.  It doesn't matter whether you are using an exact phrase search, boolean operators, etc.  Regardless, you will get false positives from time to time.
- False negatives: matching results that are in Yahoo's index fail to be returned sometimes
- Sporadic errors: the errors mentioned above, as well as other outages, occur frequently and randomly.  While developing with this API it was very frustrating because it does not return consistent results.  The same query will return no results one minute, then many results a minute later.  Frustrating.
Bottom line: DO NOT USE ON IMPORTANT WORK

Google - https://developers.google.com/custom-search/

On the other end of the spectrum is the dominant search giant.  Their API is high-quality and VERY expensive ($5/1000 queries).  Notice that that is more than 10X the cost of Yahoo's limitedweb queries.  Nevertheless, the Google results are consistent and of the quality you would expect.
Disadvantages: Besides price, Google's API results often do not match their public search results.  If you have a high volume app, the rate limits may be a deal-breaker for you (it was for us).

Microsoft Bing - http://datamarket.azure.com/dataset/8818F55E-2FE5-4CE3-A617-0B8BA8419F65
I'm rarely a fan of anything Microsoft produces, but they are the winner in my evaluation of web search APIs.  They have just the right mix of consistency, price, and performance, without the restrictions of Google.  They offer unlimited searches at a price that is roughly $1.25/1000 queries.  This is 1/4 of Google, but still 3X more than Yahoo's limitedweb.  For mission critical apps that can't afford the problems of BOSS, Bing is probably the best choice.  Be sure to use the "Web Only" API if you are only using their web search, as it is cheaper than their composite search offering.