Log
-
[maven-release-plugin] prepare release 0.7.0 by Caio 7 years ago
-
Bump to gula-bom 0.1.0 by Caio 7 years ago
-
Merge branch 'sdb' by Caio 7 years ago
-
Remove Util's tempDir on exit too by Caio 7 years ago
-
Cleanup temporary directories after tests 💬 by Caio 7 years ago
This patch uses Junit's `@TempDir` to replace usages of `Files.createTempDirectory(Path)` where possible. There's still one unmanaged directory in the Util class that I'll have to manually add exit handlers if I care enough.
-
Drop RecipeMetadataDatabase.findAllById(List<Long>) 💬 by Caio 7 years ago
No point in supporting this at the moment.
-
SDB: Add some basic tests by Caio 7 years ago
-
Organize the new code a little bit 💬 by Caio 7 years ago
Wrapping up for today, I'll pick the tests up next round.
-
Remove ChronicleMap mention from README by Caio 7 years ago
-
Drop ChronicleMap 💬 by Caio 7 years ago
o/
-
WIP: Simple database to replace ChronicleMap 💬 by Caio 7 years ago
The main reason I'm using ChronicleMap is because I wanted an easy Map interface with persistence and off-heap memory. Getting rid of ChronicleMap reduces complexity, increases performance (though that is negligible, db get()s are far from being the bottleneck) and allows me to push Jdk12 forward. This patch implements a working alternative which can be simplified to a flat file with one recipe serialized (as a flatbuffer) after another. In order to speed up loading I also write to an extra file which contains the total number of recipes and recipe_id to offset associations. The offset lookup table is kept in heap backed by HPCC's primitive collections. Very little validation is done and the code is totally susceptible to bad input attacks, but I have full control of it, so :shrug: A trivial benchmark such as: ``` public class MyBenchmark { @State(Scope.Benchmark) public static class MyState { RecipeMetadataDatabase chronicle; RecipeMetadataDatabase sdb; long[] ids = new long[] {289492, 707192, 1061982, 1708006, 1659287, 1653257, 901573, 1557621, 1639379}; public MyState() { var cerberusPath = System.getProperty("cerberus"); var sdbPath = System.getProperty("sdb"); this.chronicle = ChronicleRecipeMetadataDatabase.open(Path.of(cerberusPath)); this.sdb = new SimpleRecipeMetadataDatabase(Path.of(sdbPath)); } } public MyBenchmark() {} private void check(RecipeMetadataDatabase db, long[] ids) { for (long id : ids) { var recipe = db.findById(id); assert recipe.isPresent(); if (recipe.get().getRecipeId() != id) { throw new RuntimeException("oof!"); } } } @Benchmark public void getChronicle(MyState state) { check(state.chronicle, state.ids); } @Benchmark public void getSdb(MyState state) { check(state.sdb, state.ids); } } ``` Shows that at least things aren't broken. Yet. > Benchmark Mode Cnt Score Error Units > MyBenchmark.getChronicle thrpt 5 62327.352 ± 214.437 ops/s > MyBenchmark.getSdb thrpt 5 2697423.234 ± 28008.573 ops/s -
[maven-release-plugin] prepare release 0.6.3 by Caio 7 years ago
-
Update to gula-bom 0.0.5 by Caio 7 years ago
-
[maven-release-plugin] prepare release 0.6.2 by Caio 7 years ago
-
Bump lucene to 8.1.0 by Caio 7 years ago
-
Bump chronicle-map to 3.17.2 by Caio 7 years ago
-
Mention metadata storage format by Caio 7 years ago
-
[maven-release-plugin] prepare release 0.6.1 by Caio 7 years ago
-
Search: Rewrite the query before using it 💬 by Caio 7 years ago
The lucene query derived from SearchQuery is used for count() and search(); Both calls trigger a `IndexSearcher.rewrite(Query)` call, so this patch reduces the duplicated work by rewriting it before usage. Note that this does not mean `rewrite(Query)` won't be called again, just that its subsequent calls will be cheaper.
-
[maven-release-plugin] prepare release 0.6.0 by Caio 7 years ago
-
SearchQuery: Expose a few helpful derived methods by Caio 7 years ago
-
Start allowing empty SearchQuery as an input 💬 by Caio 7 years ago
This patch doesn't do much by itself, but when using a searcher with policy now one will be able to inspect the parsed query, see that it was empty and do whatever it wants (In my current case: rewrite the query as a MatchAllDocsQuery).
-
Introduce SearchPolicy.rewriteParsedSimilarityQuery 💬 by Caio 7 years ago
Works similarly to rewriteParsedFulltextQuery, but for similarity queries.
-
Remove maxDocFreq restriction from moreLikeThis 💬 by Caio 7 years ago
I'm forgoing these performance-related things in favor of search policy logic, so that direct cerberus usage doesn't need to be influenced by production performance tunings.
-
Remove moreLikeThis StopWords setup 💬 by Caio 7 years ago
This set is the same one used by the analyzer, so the tokens will never exist in the stream.