meilisearch#
Warning
This page is not ready.
It’s waiting for structuration and better wording.
TODO list#
There is a lot of files to read and understand. My current priority is to understand MatcherBuilder
and how it is called.
meilisearch/src/search/mod.rsmilli/src/search/new/matches/mod.rsdefinition ofMatcherBuilder.MatchingWords?
I would also need to document the notion of transaction. I think it is related to heed,
and thus to LMDB.
Currently documenting (WIP)#
TODOdocuments:execute_searchMatcherBuilder
- fn execute_search#
execute a query
ctx &mut SearchContextquery: Option<&str>,terms_matching_strategy: TermsMatchingStrategyscoring_strategy: ScoringStrategyexhaustive_number_hits: boolmut universe: RoaringBitmap,sort_criteria: &Option<Vec<AscDesc>>distinct: &Option<String>geo_strategy: geo_sort::Strategyfrom: usizelength: usizewords_limit: Option<usize>placeholder_search_logger: &mut dyn SearchLogger<PlaceholderQuery>query_graph_logger: &mut dyn SearchLogger<QueryGraph>time_budget: TimeBudgetranking_score_threshold: Option<f64>locales: Option<&Vec<Language>>
Hints:
To set
universeto all documents,index.documents_ids(rtxn). If filtering is needed, usefiltered_universe.-logging:
DefaultSearchLoggerplaceholder_search_logger?query_graph_logger?
- struct SearchContext#
index: &Indextxn:: &RoTxndb_cache: DatabaseCacheword_interner: DedupInterner<String>phrase_interner: DedpupInterner<Phrase>term_interner: Interner<QueryTerm>phrase_docids: PhraseDocIdsCacherestricted_fids: Option<RestrictedFids>
- trait SearchLogger#
type generics:
Q: RankingRuleQueryTraitinitial_query: Logs the initial queryinitial_universe: logs the value of the initial set of all candidatesquery_for_initial_universe: Logs the query that was used to compute the set of all candidatesranking_rules: Logs the ranking rules used to perform the search querystart_iteration_ranking_rule: Logs the start of a ranking rule’s iterationnext_bucket_ranking_rule: Logs the end of the computation of a ranking rule bucketskip_bucket_ranking_rule: Logs the skipping of a ranking rule bucketend_iteration_ranking_rule: Logs the end of a ranking rule’s iterationadd_to_results: Logs the addition of document ids to the final results.log_internal_state: Logs an internal state in the search algorithm
The empty struct
DefaultSearchLoggerimplements this trait with empty empty implementations.
RankingRuleQueryTrait is a discrete trait implemented only by specific struct:
PlaceHolderQueryQueryGraphQueryNodeDataenumTerm(LocatedQueryTermSubset)is a regular node representing a word or combination of words.Deletedrepresents a node that was deleted.Startunique, represents the start of the query.Endunique represents the end of a query.
QueryNodedata: QueryNodeDatapredecessors: SmallBitmap<QueryNode>successors: SmallBitmap<QueryNode>
QueryGraphroot_node: Interned<QueryNode>: The index of the start withinself.nodes.end_node:: Interned<QueryNode>: The index of the end node withinself.nodes.nodes: FixedSizeInterner<QueryNode>: The list of all query nodes.
QueryGraph::from_query(ctx: &SearchContext, terms: &[LocatedQueryTerm])To create
termsyou have to do before:tokbuilder = TokenizerBuilder::new()configure
tokbuilder:add the stop words
add separators
set up the words dict
add locales with
allow_listcreate the tokenizer
tokenizerinvoke
tokenizer.tokenize(query)use
located_query_terms_from_tokens(ctx, tokens, words_limit)
What is a ranking rule ?
What is Criterion:
WordsTypoAttributeProximityExactnessSortAscDesc
get_ranking_rules_for_query_graph_search: Return the list of
initialized ranking rules to be used for a query graph search.
ctx: &SearchContextsort_criteria: Option<Vec<AscDesc>>geo_strategy: geo_strategy::Strategyterms_matching_strategy: TermsMatchingStrategy
it returns a BoxRankingRule<QueryGraph>.
From Query API to Milli#
the endpoint to search a specific index is: POST /indexes/<index_uuid>/search (meilisearch doc).
The different files that defines the routes itself:
meilisearch/src/routes/mod.rs: All the routesmeilisearch/src/routes/indexes/mod.rs: All the index routesmeilisearch/src/routes/indexes/search.rstheconfigurefunction does the route configuration.two functions for both
POSTandGET.search_with_url_querysearch_with_post
The POST endpoint refers to SearchQuery type. See later for that.
What is an index scheduler ? it is in a separate crate.
What is a SearchAggregator ? it seems related to analytics.
The important function is
perform_search(&index, query, search_kind, retrieve_vectors, index_scheduler.features()).
Both SearchQuery and perform_search are in the src/search/mod.rs module.
In the perform_search:
prepare_searchthis function returns an interesting type:milli::Search.search_from_kind(search_kind, search)at this level it entierely dispatch tosearch.executeorsearch.execute_hybriddepeding onsearch_kind.make_hitsI suspect this function will takes the document_ids and hits and tries to highlight the important bits that match in the document fields.
So the important point above seems to be milli::Search. Both
struct and impl is defined in milli/src/search/mod.rs.
Let’s focus on execute(...):
create a search context.
filter the universe.
important
execute_search(...)located_query_terms->matching_words
for execute_search see above.
Transactions#
Index impl has the following functions related to transactions:
read_txn(&self)andstatic_read_txn(&self)write_txn(&self)
Those functions are simple wrappers around env: heed::Env.
We thus can say everything related to transactions/session/concurrency is dealt at LMDB level.
A good starting point is the LMDB documentation.