Log for summary.go
-
Add TDigest.Reset by Nova 9 months ago
-
lint fixes by Caio 2 years ago
-
Merge remote-tracking branch 'upstream/master' into ian.uint64 by Ian Wilkes 6 years ago
-
Add TDigest.FromBytes and cleanup RNG interface by Vladimir Mihailenco 7 years ago
-
Add MergeDestructive by Vladimir Mihailenco 7 years ago
-
Use rand.Perm to shuffle data by Vladimir Mihailenco 7 years ago
-
Remove fenwick tree by Vladimir Mihailenco 7 years ago
-
Enable fenwick tree after some threshold by Vladimir Mihailenco 7 years ago
-
Change fenwick tree to accept uint32 to avoid extra copy by Vladimir Mihailenco 7 years ago
-
Use bigger fenwick tree so we don't have to rebuild it on every add by Vladimir Mihailenco 7 years ago
-
Use uint64 instead of uint32 to avoid overflow issues by Eben Freeman 7 years ago
-
Use a fenwick tree to optimize prefix sums 💬 by Caio 8 years ago
This patch makes summary use the excellent fenwick package from https://github.com/yourbasic/fenwick to dramatically speed up Add() calls while adding a bounded cost to memory. benchmark old ns/op new ns/op delta BenchmarkAdd1-4 187 206 +10.16% BenchmarkAdd10-4 332 274 -17.47% BenchmarkAdd100-4 1092 325 -70.24% The tree creation could be faster, but profiling has shown that it doesn't seem to be that much of a problem (that's because `summary.setAt()` happens a *lot* more than `summary.Add()`) -
Get rid of the centroid abstraction 💬 by Caio 8 years ago
This was only being used to pack {float64,uint32}, all the other functionality was skipped or became unused over time for performance reasons. Away it goes. -
Get rid of TDigest._{count,mean} aliases 💬 by Caio 8 years ago
This patch gets rid of the _count() and _mean() helpers and move them into the summary package. What I really wanted were macros, but hey ¯\_(ツ)_/¯ Summary is still leaky, but at least TDigest only uses its public interfaces now.
-
Rename `{}summary.keys` to `means` 💬 by Caio 8 years ago
$ gorename -from '"github.com/caio/go-tdigest".summary.keys' -to means
-
Completely rework the quantile estimation codepath 💬 by Caio 8 years ago
This patch is too big, but there isn't much getting away from it in smaller steps because summary{} and TDigest{} are actually tightly coupled (i.e.: the abstraction is mostly useful for code organization but fails at isolation). The major changes are: - Summaries now hold repeated items instead of just unique means and their respective counts (which led to changes in how the digest adds new centroids too) - Quantile estimation is now a straight port from the reference implementation (issue-84 branch) The digest now doesn't potentially report completely wrong values on distributions with multiple steep hills nor biases in favour of big centroids with few occurrences. This patch closes #12. Some historical details for the motivation for this work can be found on PR #11. -
Get rid of the unused Remove() method by Caio 10 years ago
-
Skip a binary search when updating centroid 💬 by Caio 10 years ago
Since updateCentroid() is called by picking from a byproduct of the current digest state, we can reuse the information we have to know which index to update without doing another search. Pure micro optimization, but it does improve things a bit at the cost of growing the centroid struct a bit: > $ go test -run XXX -bench . > PASS > BenchmarkAdd1-4 5000000 373 ns/op > BenchmarkAdd10-4 2000000 626 ns/op > BenchmarkAdd100-4 1000000 1579 ns/op > ok github.com/caio/go-tdigest 6.265s
-
Make summary.Add() handle duplicates by updating the centroid 💬 by Caio 10 years ago
No point spreading this logic around.
-
Update centroids in place 💬 by Caio 10 years ago
Unsurprisingly, avoid a bunch of copies is beneficial :-) > $ go test -run XXX -bench . > PASS > BenchmarkAdd1-4 5000000 353 ns/op > BenchmarkAdd10-4 2000000 674 ns/op > BenchmarkAdd100-4 1000000 1788 ns/op
-
Get rid of summary.String() 💬 by Caio 10 years ago
Golang's default toString is more than enough now.
-
Get rid of centroid.Equals()/TestCentroid 💬 by Caio 10 years ago
Useless abstraction is useless
-
Merge sortedSlice and Summary (i.e.: move away from interface{}) 💬 by Caio 10 years ago
A lot of tedious work, but this allowed me to drastically speed up the computeCentroidQuantile() which was the most expensive function by far. Current timings are looking beter than ever: > $ go test -run XXX -bench . > PASS > BenchmarkAdd1-4 3000000 399 ns/op > BenchmarkAdd10-4 2000000 1048 ns/op > BenchmarkAdd100-4 1000000 2588 ns/op > ok github.com/caio/go-tdigest 7.334s
-
Exploit the slice for ceilingAndFloorItems and successorAndPredecessorItems 💬 by Caio 10 years ago
Now we're talking: > go test -v ./... -run XXX -bench . > PASS > BenchmarkAdd1-4 3000000 431 ns/op > BenchmarkAdd10-4 1000000 1860 ns/op > BenchmarkAdd100-4 300000 8047 ns/op > ok github.com/caio/go-tdigest 6.151s
-
Use a sorted slice instead of a tree for centroids 💬 by Caio 10 years ago
Got massive improvements already simply by glueing the sortedSlice struct without exploiting its advantages (ceil/floor and surroundings are cheaper operations, relatively speaking) > $ go test -v ./... -run XXX -bench . > PASS > BenchmarkAdd1-4 3000000 511 ns/op > BenchmarkAdd10-4 1000000 4304 ns/op > BenchmarkAdd100-4 100000 22212 ns/op > ok github.com/caio/go-tdigest 8.875s To be done: * Adapt the code to properly use the data structure * Estimate properly the initial capacity * Maybe get rid of the `interface{}` abstraction if I care enough (so instead of TDigest.summary.tree use TDigest.tree directly, for example). This is probably not worth the effort, performance-wise. -
Stop copying/creating centroids all the time 💬 by Caio 10 years ago
This patch makes all internal calls handle *centroid instead of the struct directy. With some crappy profiling in osx, a lot of time was spent allocating and moving memory around so this seemed like a very easy improvement to make. > $ go test -v ./... -bench . -run XXX > PASS > BenchmarkUpdate1-4 1000000 1364 ns/op > BenchmarkUpdate10-4 200000 11540 ns/op > BenchmarkUpdate100-4 30000 58091 ns/op > ok github.com/caio/go-tdigest 6.026s This is roughly an unscientific 300% improvement in the compression=100 case (which is what should generally be used, according to the original paper) - Ref: c6047295a555a07c80a203c9b64e20a423cb6fed.
-
Move everything over to a closure-based iteration model 💬 by Damian Gryski 10 years ago
Cherry-pick dgryski/no-goroutines and resolve the conflicts
-
Unexpose the Summary struct 💬 by Caio 10 years ago
gorename <3
-
Unexpose the Centroid struct (rename it to centroid) by Caio 10 years ago