From 12be8b34a53d2f93eefe9a3431bc4d6788ac5c05 Mon Sep 17 00:00:00 2001 From: Chris Li Date: Sun, 26 Jun 2016 07:28:59 -0400 Subject: [PATCH] Updated How Search in 1.6 works (markdown) --- How-Search-in-1.6-works.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/How-Search-in-1.6-works.md b/How-Search-in-1.6-works.md index f1011e1..192924f 100644 --- a/How-Search-in-1.6-works.md +++ b/How-Search-in-1.6-works.md @@ -30,5 +30,11 @@ You may have noticed a big flaw in our system till this step. Our system gives x ## Step 4: Utilize xapian probability If xapian give a search result 100% probability value, what does it mean? Well, it means xapian is quite sure this is exactly the article you want. It is very confident. So, we need to give such an article a boost, despite they may have (very) long Levenshtein distance to our search term. On the other hand, if xapian gives an article very low probability, then this article may have nothing to do with out search term. Wen may need to give such articles a penalty, even though they may share great similarity with the search term. -So how do we -In 1.6, the default value \ No newline at end of file +A lot of times, xapian results congregate on the high end of the probability range. In other words, you get a lot of articles with 100%, 99%, etc., but not a lot with 75%, 66%, 45% and things like that. To better differentiate them, we need a non-liner map from probability to a boost/penalty factor. + +In 1.6, I use ln(n-m * prob) as this boost factor, the default value is derived from: +- ln(n - m * 1.00) = 0.1 +- ln(n - m * 0.75) = 1.0 +Solve for m & n + +