meilisearch#

Warning

This page is not ready.

It’s waiting for structuration and better wording.

TODO list#

There is a lot of files to read and understand. My current priority is to understand MatcherBuilder and how it is called.

meilisearch/src/search/mod.rs
milli/src/search/new/matches/mod.rs definition of MatcherBuilder.
MatchingWords ?

I would also need to document the notion of transaction. I think it is related to heed, and thus to LMDB.

Currently documenting (WIP)#

TODO documents:
- execute_search
- MatcherBuilder

fn execute_search#

execute a query

ctx &mut SearchContext
query: Option<&str>,
terms_matching_strategy: TermsMatchingStrategy
scoring_strategy: ScoringStrategy
exhaustive_number_hits: bool
mut universe: RoaringBitmap,
sort_criteria: &Option<Vec<AscDesc>>
distinct: &Option<String>
geo_strategy: geo_sort::Strategy
from: usize
length: usize
words_limit: Option<usize>
placeholder_search_logger: &mut dyn SearchLogger<PlaceholderQuery>
query_graph_logger: &mut dyn SearchLogger<QueryGraph>
time_budget: TimeBudget
ranking_score_threshold: Option<f64>
locales: Option<&Vec<Language>>

Hints:

To set universe to all documents, index.documents_ids(rtxn). If filtering is needed, use filtered_universe.-
logging:
- DefaultSearchLogger
- placeholder_search_logger ?
- query_graph_logger ?

struct SearchContext#

index: &Index
txn:: &RoTxn
db_cache: DatabaseCache
word_interner: DedupInterner<String>
phrase_interner: DedpupInterner<Phrase>
term_interner: Interner<QueryTerm>
phrase_docids: PhraseDocIdsCache
restricted_fids: Option<RestrictedFids>

trait SearchLogger#

type generics: Q: RankingRuleQueryTrait

initial_query: Logs the initial query
initial_universe: logs the value of the initial set of all candidates
query_for_initial_universe: Logs the query that was used to compute the set of all candidates
ranking_rules: Logs the ranking rules used to perform the search query
start_iteration_ranking_rule: Logs the start of a ranking rule’s iteration
next_bucket_ranking_rule: Logs the end of the computation of a ranking rule bucket
skip_bucket_ranking_rule: Logs the skipping of a ranking rule bucket
end_iteration_ranking_rule: Logs the end of a ranking rule’s iteration
add_to_results: Logs the addition of document ids to the final results.
log_internal_state: Logs an internal state in the search algorithm

The empty struct DefaultSearchLogger implements this trait with empty empty implementations.

RankingRuleQueryTrait is a discrete trait implemented only by specific struct:

PlaceHolderQuery
QueryGraph
- QueryNodeData enum
  - Term(LocatedQueryTermSubset) is a regular node representing a word or combination of words.
  - Deleted represents a node that was deleted.
  - Start unique, represents the start of the query.
  - End unique represents the end of a query.
- QueryNode
  - data: QueryNodeData
  - predecessors: SmallBitmap<QueryNode>
  - successors: SmallBitmap<QueryNode>
- QueryGraph
  - root_node: Interned<QueryNode>: The index of the start within self.nodes.
  - end_node:: Interned<QueryNode>: The index of the end node within self.nodes.
  - nodes: FixedSizeInterner<QueryNode>: The list of all query nodes.
QueryGraph::from_query(ctx: &SearchContext, terms: &[LocatedQueryTerm])
- To create terms you have to do before:
  1. tokbuilder = TokenizerBuilder::new()
  2. configure tokbuilder:
    - add the stop words
    - add separators
    - set up the words dict
  3. add locales with allow_list
  4. create the tokenizer tokenizer
  5. invoke tokenizer.tokenize(query)
  6. use located_query_terms_from_tokens(ctx, tokens, words_limit)

What is a ranking rule ?

What is Criterion:

Words
Typo
Attribute
Proximity
Exactness
Sort
Asc
Desc

get_ranking_rules_for_query_graph_search: Return the list of initialized ranking rules to be used for a query graph search.

ctx: &SearchContext
sort_criteria: Option<Vec<AscDesc>>
geo_strategy: geo_strategy::Strategy
terms_matching_strategy: TermsMatchingStrategy

it returns a BoxRankingRule<QueryGraph>.

impl Index#

fn documents_ids#: Returns the internal document ids

From Query API to Milli#

the endpoint to search a specific index is: POST /indexes/<index_uuid>/search (meilisearch doc).

The different files that defines the routes itself:

meilisearch/src/routes/mod.rs: All the routes
meilisearch/src/routes/indexes/mod.rs: All the index routes
meilisearch/src/routes/indexes/search.rs the configure function does the route configuration.
two functions for both POST and GET.
- search_with_url_query
- search_with_post

The POST endpoint refers to SearchQuery type. See later for that.

What is an index scheduler ? it is in a separate crate.

What is a SearchAggregator ? it seems related to analytics.

The important function is perform_search(&index, query, search_kind, retrieve_vectors, index_scheduler.features()).

Both SearchQuery and perform_search are in the src/search/mod.rs module.

In the perform_search:

prepare_search this function returns an interesting type: milli::Search.
search_from_kind(search_kind, search) at this level it entierely dispatch to search.execute or search.execute_hybrid depeding on search_kind.
make_hits I suspect this function will takes the document_ids and hits and tries to highlight the important bits that match in the document fields.

So the important point above seems to be milli::Search. Both struct and impl is defined in milli/src/search/mod.rs.

Let’s focus on execute(...):

create a search context.
filter the universe.
important execute_search(...)
located_query_terms -> matching_words

for execute_search see above.

Transactions#

Index impl has the following functions related to transactions:

read_txn(&self) and static_read_txn(&self)
write_txn(&self)

Those functions are simple wrappers around env: heed::Env.

We thus can say everything related to transactions/session/concurrency is dealt at LMDB level.

A good starting point is the LMDB documentation.

meilisearch#

TODO list#

Currently documenting (WIP)#

From Query API to Milli#

Transactions#

Read Rand

Navigation