Log for tique/src/topterms.rs
-
Update tantivy to 0.14 by Caio 5 years ago
-
Ensure we don't seek back when reading postings 💬 by Caio 5 years ago
This works around the debug-only crash when trying to seek to a doc_id that would have appeared before the current SegmentPostings cursor. Since DocSets now come already initialized, `.seek()`ing without checking if we're already at the desired position is likely a bug (hence the `debug_assert!` biting me, I suspect). On tantivy.git @ 730cceff one can see the subtle bug that the assertion can catch. Code that once looked like: let mut scorer = create_scorer(); if scorer.seek(doc) != doc { ... } Should now look like: let mut scorer = create_scorer(); if scorer.doc() > doc || scorer.seek(doc) != doc { ... } -
Use new DocSet/Scorer API 💬 by Caio 5 years ago
Introduced on tantivy.git @ e25284ba This changeset is sufficient, however upstream's f71b04acb introduced a `debug_assert!(self.doc() <= target)` for `SegmentPostings::seek` that looks overzealous to me. In release mode all tests pass, but given that lot has changed since last I looked I'll be double checking the affected functionality prior to letting this go wild.
-
No need to consume the keyword acceptor by Caio 5 years ago
-
Use a more concise iteration style by Caio 5 years ago
-
Replace `map -> unwrap_or` with `map_or` by Caio 5 years ago
-
Document known error conditions by Caio 5 years ago
-
Add missing docs to exported things 💬 by Caio 5 years ago
And enable `missing_docs` and `missing_doc_code_examples`
-
Do not execute assertion-less doc code examples by Caio 5 years ago
-
Make code examples slightly easier to manage by Caio 6 years ago
-
Expose Keywords::{clone,len,is_empty}() by Caio 6 years ago
-
Support for conversion into weighted queries by Caio 6 years ago
-
Upgrade to tantivy 0.12 by Caio 6 years ago
-
Allow iterating over sorted (by relevance) Terms 💬 by Caio 6 years ago
Knowing the ordered sequence of most relevant terms is very useful and `limit` is unlikely to be a number which makes the `into_sorted_vec` step prohibitive, so this patch simply makes Keywords hold a sorted Vec instead of a BinaryHeap.
-
Document `tique::topterms` by Caio 6 years ago
-
Ensure fields are `text` with frequencies by Caio 6 years ago
-
Swap `visit(score, doc)` with `visit(doc, score)` 💬 by Caio 6 years ago
Aha! I made it backwards to make it easier to output consistently. The consistency part makes sense, but driving a container with score before the item being contained was too confusing.
-
Initial TopTerms implementation 💬 by Caio 6 years ago
TopTerms reads the index and extracts the most relevant terms in a given document or any arbitrary text input. You can use it to build keywords for your documents or, more interestingly, use the result as a query to find similar documents. It's pretty much a reimplementation of Lucene's MoreLikeThis. I don't particularly like this approach in prod (too many knobs, dependency on the index to formulate a query), but it yields pretty good results with little effort. Ref: https://lucene.apache.org/core/8_4_1/queries/org/apache/lucene/queries/mlt/MoreLikeThis.html