Log for tdigest_test.go
-
Add TDigest.Clone and TDigest.Compression by Vladimir Mihailenco 7 years ago
-
Add benchmark by Vladimir Mihailenco 7 years ago
-
Add TrimmedMean by Vladimir Mihailenco 7 years ago
-
fix cdf for values near the last centroid by Jeff Wendling 7 years ago
-
Properly compute extreme CDFs 💬 by Caio 7 years ago
When we reach the last centroid in the summary, what we want to do is assume that the last two centroids are of equal width then estimate the CDF via a simple interpolation. Before this patch we would wrongly bound (and compute) this estimation with the last item instead of the one before the last. Fixes #17
-
Report allocations on benchmarks by Caio 8 years ago
-
Tidy things up [gometalinter] by Caio 8 years ago
-
Add tests for a heavily skewed gamma distribution 💬 by Caio 8 years ago
The code is mostly borrowed from TDigestTest.java and could easily be modified to allow testing with multiple distributions/ranges. Closes #13 Note that this patch adds a new test-only dependency but we don't use any form of dependency management - this will come in a subsequent patch.
-
Add new CDF(float64) public method by Caio 8 years ago
-
Remove TDigest.Len() from the public interface 💬 by Caio 8 years ago
I can't think of a scenario where one would really care about how many distinct centroids are in the digest, so away this goes on a major release. Adding it back in case it's needed won't require a major release.
-
Introduce TDigest.Count() 💬 by Caio 8 years ago
Expose the count of samples publicly so that users can more easily deicde what to do when the digest has too many samples.
-
Get rid of the centroid abstraction 💬 by Caio 8 years ago
This was only being used to pack {float64,uint32}, all the other functionality was skipped or became unused over time for performance reasons. Away it goes. -
Make Add take only one parameter, introduce AddWeighted 💬 by Caio 8 years ago
This patch renames the previous Add(float64,uint32) to AddWeighted and introduces a method Add(float64) which is simply an alias to AddWeighted(float64,1).
-
Make New() return an error instead of panic()ing 💬 by Caio 8 years ago
This patch now makes New() return a (*TDigest,error) tuple, which makes deserialization safe without having to trap for panic()s. The only remaining panic() is for bad input in a public function (`Quantile(float64)`). I'm keen on keeping it.
-
Introduce a parameter-less New() 💬 by Caio 8 years ago
Now `tdigest.New()` gives a sane ready-to-use-in-most-cases digest. Configuration should be done via self referential functions. Ex: // create a digest with compression of 200 tdigest.New(tdigest.Compression(200)) Notice that New() can still panic, which means that deserialization if still more dangerous than it should. -
Completely rework the quantile estimation codepath 💬 by Caio 8 years ago
This patch is too big, but there isn't much getting away from it in smaller steps because summary{} and TDigest{} are actually tightly coupled (i.e.: the abstraction is mostly useful for code organization but fails at isolation). The major changes are: - Summaries now hold repeated items instead of just unique means and their respective counts (which led to changes in how the digest adds new centroids too) - Quantile estimation is now a straight port from the reference implementation (issue-84 branch) The digest now doesn't potentially report completely wrong values on distributions with multiple steep hills nor biases in favour of big centroids with few occurrences. This patch closes #12. Some historical details for the motivation for this work can be found on PR #11. -
Make TestMerge more thorough 💬 by Caio 8 years ago
This patch makes the test use multiple partitioning configurations and a lot more items (so that we can partition and merge more). This in turn means that the whole test is significantly slower (still very tolerable though) but helps uncover subtle drifts where precision is lower.
-
Output more details when TestRespectBounds fails by Caio 8 years ago
-
Add TestSingletonInACrowd 💬 by Caio 8 years ago
Notice that this test has its exterme quantile (0.999) skipped because the java reference implementation behaves the same. Reasoning and more details on issue #12
-
Drop TestNonUniformDistribution 💬 by Caio 8 years ago
I know it's weird to drop failing tests, but bear with me: This test was added on 011e706e and has worked nicely since then, however I'm having a hard time accepting the error thresholds for this manually crafted distribution. Given that it fails with the java implementation as well (AVL and Merging -based versions), I'm considering it bugged. I'll revisit this one day, but it looks like to me it would be more productive to actually test distributions and their relative errors in a standard manner as in TDigestTest.java --- FAIL: TestNonUniformDistribution (0.00s) tdigest_test.go:85: T-Digest.Quantile(0.2500) = 420.1363 vs actual 499.9531. Diff (79.8168) >= 11.0000 -
Ensure returned values stay within bounds by Christine Yen 8 years ago
-
gometalinter: Error checking on tests 💬 by Caio 9 years ago
With this patch we now explicitly ignore most errors coming from Add() (as there are specialized tests for it and the consistency checks cover the error scenario) and introduces a check for potential Merge() and Compress() errors. Not really happy with the mass ignore, but I do like the `errcheck` linter, so there's that.
-
gometalinter: Simplify if-return clause by Caio 9 years ago
-
No more `t.Parallel()` anywhere by Caio 9 years ago
-
Stabilize the rng to the Sequential Insertion test 💬 by Caio 9 years ago
Before this patch I would see the occasional failure when running: $ go test -run Sequential -count 30 The failures are consistently _not_ on the extreme quantiles, as expected given the data-structure characteristics. I guess this is because `Add()` and `shuffle()` (which is used by `Compress()`, that gets triggered several times for this test given the small summary size) use the same rng source, but I'm not even remotely close to someone who properly understands all this math. Shouldn't be a real problem, but I'll consider adding support for providing a random source instead of using the default one so that at least we get no contention from multiple things asking for a random number to the global source. -
Assert we don't change counts during Compress() 💬 by Caio 9 years ago
So that we avoid causing a regression (Ref: PR #10)
-
ForEachCentroid should not return value, since this makes assumptions about the behavior of the supplied function. by Andrew Gillis 9 years ago
-
Provide ForEachCentroid() and Len() functins to access internal data. by Andrew Gillis 9 years ago
-
Add tests for the panic() codepaths 💬 by Caio 10 years ago
Pretty much useless, but I was curious how to test for that. A bit awkward at first though it kind of makes sense.