meilisearch#
Warning
This page is not ready.
It’s waiting for structuration and better wording.
TODO list#
There is a lot of files to read and understand. My current priority is to understand MatcherBuilder
and how it is called.
meilisearch/src/search/mod.rs
milli/src/search/new/matches/mod.rs
definition ofMatcherBuilder
.MatchingWords
?
I would also need to document the notion of transaction. I think it is related to heed
,
and thus to LMDB
.
Currently documenting (WIP)#
TODO
documents:execute_search
MatcherBuilder
- fn execute_search#
execute a query
ctx &mut SearchContext
query: Option<&str>,
terms_matching_strategy: TermsMatchingStrategy
scoring_strategy: ScoringStrategy
exhaustive_number_hits: bool
mut universe: RoaringBitmap,
sort_criteria: &Option<Vec<AscDesc>>
distinct: &Option<String>
geo_strategy: geo_sort::Strategy
from: usize
length: usize
words_limit: Option<usize>
placeholder_search_logger: &mut dyn SearchLogger<PlaceholderQuery>
query_graph_logger: &mut dyn SearchLogger<QueryGraph>
time_budget: TimeBudget
ranking_score_threshold: Option<f64>
locales: Option<&Vec<Language>>
Hints:
To set
universe
to all documents,index.documents_ids(rtxn)
. If filtering is needed, usefiltered_universe
.-logging:
DefaultSearchLogger
placeholder_search_logger
?query_graph_logger
?
- struct SearchContext#
index: &Index
txn:: &RoTxn
db_cache: DatabaseCache
word_interner: DedupInterner<String>
phrase_interner: DedpupInterner<Phrase>
term_interner: Interner<QueryTerm>
phrase_docids: PhraseDocIdsCache
restricted_fids: Option<RestrictedFids>
- trait SearchLogger#
type generics:
Q: RankingRuleQueryTrait
initial_query
: Logs the initial queryinitial_universe
: logs the value of the initial set of all candidatesquery_for_initial_universe
: Logs the query that was used to compute the set of all candidatesranking_rules
: Logs the ranking rules used to perform the search querystart_iteration_ranking_rule
: Logs the start of a ranking rule’s iterationnext_bucket_ranking_rule
: Logs the end of the computation of a ranking rule bucketskip_bucket_ranking_rule
: Logs the skipping of a ranking rule bucketend_iteration_ranking_rule
: Logs the end of a ranking rule’s iterationadd_to_results
: Logs the addition of document ids to the final results.log_internal_state
: Logs an internal state in the search algorithm
The empty struct
DefaultSearchLogger
implements this trait with empty empty implementations.
RankingRuleQueryTrait
is a discrete trait implemented only by specific struct:
PlaceHolderQuery
QueryGraph
QueryNodeData
enumTerm(LocatedQueryTermSubset)
is a regular node representing a word or combination of words.Deleted
represents a node that was deleted.Start
unique, represents the start of the query.End
unique represents the end of a query.
QueryNode
data: QueryNodeData
predecessors: SmallBitmap<QueryNode>
successors: SmallBitmap<QueryNode>
QueryGraph
root_node: Interned<QueryNode>
: The index of the start withinself.nodes
.end_node:: Interned<QueryNode>
: The index of the end node withinself.nodes
.nodes: FixedSizeInterner<QueryNode>
: The list of all query nodes.
QueryGraph::from_query(ctx: &SearchContext, terms: &[LocatedQueryTerm])
To create
terms
you have to do before:tokbuilder = TokenizerBuilder::new()
configure
tokbuilder
:add the stop words
add separators
set up the words dict
add locales with
allow_list
create the tokenizer
tokenizer
invoke
tokenizer.tokenize(query)
use
located_query_terms_from_tokens(ctx, tokens, words_limit)
What is a ranking rule ?
What is Criterion
:
Words
Typo
Attribute
Proximity
Exactness
Sort
Asc
Desc
get_ranking_rules_for_query_graph_search
: Return the list of
initialized ranking rules to be used for a query graph search.
ctx: &SearchContext
sort_criteria: Option<Vec<AscDesc>>
geo_strategy: geo_strategy::Strategy
terms_matching_strategy: TermsMatchingStrategy
it returns a BoxRankingRule<QueryGraph>
.
From Query API to Milli#
the endpoint to search a specific index is: POST /indexes/<index_uuid>/search
(meilisearch doc).
The different files that defines the routes itself:
meilisearch/src/routes/mod.rs
: All the routesmeilisearch/src/routes/indexes/mod.rs
: All the index routesmeilisearch/src/routes/indexes/search.rs
theconfigure
function does the route configuration.two functions for both
POST
andGET
.search_with_url_query
search_with_post
The POST
endpoint refers to SearchQuery
type. See later for that.
What is an index scheduler ? it is in a separate crate.
What is a SearchAggregator
? it seems related to analytics.
The important function is
perform_search(&index, query, search_kind, retrieve_vectors, index_scheduler.features())
.
Both SearchQuery
and perform_search
are in the src/search/mod.rs
module.
In the perform_search
:
prepare_search
this function returns an interesting type:milli::Search
.search_from_kind(search_kind, search)
at this level it entierely dispatch tosearch.execute
orsearch.execute_hybrid
depeding onsearch_kind
.make_hits
I suspect this function will takes the document_ids and hits and tries to highlight the important bits that match in the document fields.
So the important point above seems to be milli::Search
. Both
struct and impl is defined in milli/src/search/mod.rs
.
Let’s focus on execute(...)
:
create a search context.
filter the universe.
important
execute_search(...)
located_query_terms
->matching_words
for execute_search
see above.
Transactions#
Index impl
has the following functions related to transactions:
read_txn(&self)
andstatic_read_txn(&self)
write_txn(&self)
Those functions are simple wrappers around env: heed::Env
.
We thus can say everything related to transactions/session/concurrency is dealt at LMDB level.
A good starting point is the LMDB documentation.