Log
-
Get rid of updateCentroid and addCentroid proxies π¬ by Caio 10 years ago
Less code is almost always better
-
Skip a binary search when updating centroid π¬ by Caio 10 years ago
Since updateCentroid() is called by picking from a byproduct of the current digest state, we can reuse the information we have to know which index to update without doing another search. Pure micro optimization, but it does improve things a bit at the cost of growing the centroid struct a bit: > $ go test -run XXX -bench . > PASS > BenchmarkAdd1-4 5000000 373 ns/op > BenchmarkAdd10-4 2000000 626 ns/op > BenchmarkAdd100-4 1000000 1579 ns/op > ok github.com/caio/go-tdigest 6.265s
-
Make summary.Add() handle duplicates by updating the centroid π¬ by Caio 10 years ago
No point spreading this logic around.
-
Update centroids in place π¬ by Caio 10 years ago
Unsurprisingly, avoid a bunch of copies is beneficial :-) > $ go test -run XXX -bench . > PASS > BenchmarkAdd1-4 5000000 353 ns/op > BenchmarkAdd10-4 2000000 674 ns/op > BenchmarkAdd100-4 1000000 1788 ns/op
-
Don't add a centroid twice by Caio 10 years ago
-
Rename the last `Percentile` reference to `Quantile` by Caio 10 years ago
-
Rename TDigest.Percentile to Quantile π¬ by Caio 10 years ago
Quantile is the correct name given that the range is [0,1]
-
Test extreme quantiles in TestTInternals by Caio 10 years ago
-
Benchmarks: Only record timings for tdigest.Add() by Caio 10 years ago
-
Remove yet another useless String() method π¬ by Caio 10 years ago
This time I didn't forget to run `go vet`...
-
Add gocover.io flair π¬ by Caio 10 years ago
Feels like a pokΓ©mon trainer going after all these badges
-
summary.String() was useful for a test error message... π¬ by Caio 10 years ago
Yeah, right. I need to get used to this `go vet` thing...
-
Get rid of summary.String() π¬ by Caio 10 years ago
Golang's default toString is more than enough now.
-
Get rid of centroid.Equals()/TestCentroid π¬ by Caio 10 years ago
Useless abstraction is useless
-
Rephrase the disclaimer a bit π¬ by Caio 10 years ago
I did, after all, spend some time making this thing work with acceptable performance.
-
Merge sortedSlice and Summary (i.e.: move away from interface{}) π¬ by Caio 10 years ago
A lot of tedious work, but this allowed me to drastically speed up the computeCentroidQuantile() which was the most expensive function by far. Current timings are looking beter than ever: > $ go test -run XXX -bench . > PASS > BenchmarkAdd1-4 3000000 399 ns/op > BenchmarkAdd10-4 2000000 1048 ns/op > BenchmarkAdd100-4 1000000 2588 ns/op > ok github.com/caio/go-tdigest 7.334s
-
Make serialization compatible with the java version π¬ by Caio 10 years ago
This patch adjusts how deltas are encoded so it behaves exactly the same as the reference implementation. A test is added to make sure this doesn't break in the future.
-
Extract estimateCapacity function π¬ by Caio 10 years ago
The ideal situation is that we _never_ have to grow the slice, 20 is pretty much a very high upper bound which should only happen for very low compression numbers or pathological insertions. I've used a very scienfic approach to come up with the number 10 as the capacity estimator multiplier (read: I put a panic() when the slice had to grow and replayed latency numbers from random places. Started with 5 and went up until it stopped panic()ing then added one. A.K.A: not scientific at all. ymmv).
-
Stabilize the uniform distribution π¬ by Caio 10 years ago
I'm not really confident this is a good idea, but for now I'm really tired of the random failures.
-
Fix complaints from `go vet` π¬ by Caio 10 years ago
Sigh.
-
Exploit the slice for ceilingAndFloorItems and successorAndPredecessorItems π¬ by Caio 10 years ago
Now we're talking: > go test -v ./... -run XXX -bench . > PASS > BenchmarkAdd1-4 3000000 431 ns/op > BenchmarkAdd10-4 1000000 1860 ns/op > BenchmarkAdd100-4 300000 8047 ns/op > ok github.com/caio/go-tdigest 6.151s
-
Use a sorted slice instead of a tree for centroids π¬ by Caio 10 years ago
Got massive improvements already simply by glueing the sortedSlice struct without exploiting its advantages (ceil/floor and surroundings are cheaper operations, relatively speaking) > $ go test -v ./... -run XXX -bench . > PASS > BenchmarkAdd1-4 3000000 511 ns/op > BenchmarkAdd10-4 1000000 4304 ns/op > BenchmarkAdd100-4 100000 22212 ns/op > ok github.com/caio/go-tdigest 8.875s To be done: * Adapt the code to properly use the data structure * Estimate properly the initial capacity * Maybe get rid of the `interface{}` abstraction if I care enough (so instead of TDigest.summary.tree use TDigest.tree directly, for example). This is probably not worth the effort, performance-wise. -
Adjust test naming/phrasing to the new api by Caio 10 years ago
-
Rename `Update()` to `Add()` π¬ by Caio 10 years ago
Trying to keep the interface closer to the reference implementation. Arguably it's a better name anyway since we're adding a new datapoint to the summary.
-
Use the result from Delete() to validate updateCentroid calls π¬ by Caio 10 years ago
Tiny optimization: skip one Find() call per update.
-
Brainfart: centroids can have value of zero by Caio 10 years ago
-
Make sure numCentroids value is sane π¬ by Damian Gryski 10 years ago
Fixes a crash if numCentroids was negative, and potential out-of-memory condition if we try to allocate to huge a digest. And 4 million centroids ought to be enough for anybody... Found by fuzzing.
-
Add a few more internals tests π¬ by Caio 10 years ago
Golfing test coverage is not really fun.
-
Add godoc.org flair to the README by Caio 10 years ago
-
Make the usage example copy and paste-able by Caio 10 years ago