Created the start of a knowledge-base article on Searching and Search Results. To be continued...

Julian Harty 2017-12-16 18:14:02 +00:00
parent 9eff407f61
commit e468bcca0e

@ -0,0 +1,31 @@
# Heuristics for testing search
Testing Search effectively, especially given the depth and breadth of contents available for Kiwix. This is compounded by the challenges of making the Search performant, reliable, and relevant on the vast range of Android devices, especially combined with the range of languages in use. Therefore it's hard to specify precisely what we expect in terms of how search will behave and perform. These heuristics are intended as a starting point - they should generally hold true yet you're welcome to adapt, reject or ignore those that don't seem useful.
Note: this wikipage remains a work-in-progress and is not complete or finished. Contributions are welcome.
## Expectations
* Kiwix will be able to search text-based content in ZIM files available to the app. Some storage locations don't seem to be available in practice e.g. OTG connected storage. We don't expect Kiwix to be able to search content it cannot access directly.
* Searches will be based on the ZIM files currently available on the device at runtime. Users may delete files, add files, replace memory cards, etc. while Kiwix is running and between times when Kiwix is used. When users change what's available while Kiwix is running we expect Kiwix to adapt without needing to be restarted.
* Searches will be possible in the language of the content; users will be able to input characters in that language e.g. in Japanese for Japanese content regardless of what language the device is configured to use.
* Users will not be left with a blank results page. If Search doesn't find any results it will tell the users so.
## Search heuristics
* Whitespace is allowed and the first character of whitespace between words is significant. Additional whitespace will silently be ignored in terms of search results. e.g. `white space` and `white space` are considered to be equivalent when searching for results.
* The first character of whitespace at the end of a word in the search box is significant. e.g. `go ` may return different results from `go`. So `go` would match `good`, `go ` would not match `good`.
* Top online search terms for Wikimedia sources will be found (and matched) when searching the equivalent ZIM file in Kiwix-Android. There may be exceptions for highly topical searches e.g. in response to breaking news.
* As more characters are entered there will be fewer results, as characters are removed from the end of the search term more results will be returned. The numbers will broadly be symmetric e.g. for a set of search queries `fun` -> `fund` -> `fun` similar results would be returned for both `fun` queries, fewer results would be returned for `fund` as `fun` matches `function`, `fund` does not match `function`.
Possible sources of top online search terms include:
* Top 5000 Searches for 'last week': https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_pages
* Top 25 Searches for 'last week': (likely to have more topical searches) https://en.wikipedia.org/wiki/Wikipedia:Top_25_Report
* Lots of sources, including both the above: https://en.wikipedia.org/wiki/Wikipedia:Statistics
# Unknown behaviours (yet)
The following are unknown, at least from my perspective. Hopefully we will be able to clarify the expected/desired/actual behaviours for these soon.
* Whether accents are significant in either the term entered or the content matched.
* Whether commonly paired words such as `white space`, `white-space` and `whitespace` are considered to be equivalent in either the term entered or the content matched.
* Whether common abbreviations will be supported and matched with the unabbreviated form e.g. `WW2` and `World War Two`.
* Whether users can enter special characters or otherwise control the behaviours of the search e.g. in terms of case sensitivity, boolean operations, wildcards, etc.