And this is where the sheer number of ToRs, players, and possible trigger events is relevant. When the sample is a "staggeringly low" percentage of the total occurrences, the possibility of it being an anomaly rather than an indicator of anything grows. Over the 28 weeks of the data under discussion, we're talking about 3684 possible trigger events (encounters cleared). During that time, let's assume an average of 8 encounters cleared per ToR. Some will do none, some will do 64, but I'll take a low estimate of 8. With 255,000 total ToRs, that over 57,000,000 possible trigger events. So the data set you're basing your viewpoint on is 6/100 of 1% of the possible trigger events.
There are a few problems with your logic here.
1 - Binomial distributions describe data that are theoretically infinite in scale. The math that describes the distribution of outcomes of rolling a die does so without regard to whether the die could be rolled 10,000 times, 1,000,000 times, or infinitely. The only thing that changes with additional rolls is the tightness of the distribution increase as n approaches infinity. Your 57 million is trivial in this regard.
2 - 3684 trigger events against an engine are 3684 events regardless of whether they are gathered over a day, week, month, or year (assuming constant fixed probabilities). As such, to put them in context of “how many other people triggered the engine in that time” is useless.
3 - You seem to believe there is a certain percent of total population that has to be tested in order for the sample to be relevant. This isn’t an assembly line with complicating factors that impact performance, it is a simple fixed probability machine that either hits or does not hit. The distributions for these types of probabilities are predictable and normalize incredibly quickly.
4 - By your logic, if the data had been 0 for 3684, it still would not be relevant, just “bad luck”.
5 - Your perception of validation via statistics would not even be practical in the real world. Do you think a casino is just going to wait out to reach 1 million samples before they investigate a game? Do you think pollsters are tracking down 10 million people for every survey? Every medical study has 500,000 people in it?
I‘ll tell you what though, let’s for the sake of argument, pretend the total number of triggers matters. Jump on your favorite browser, search for “sample size calculator”. Find one that lets you set the “population proportion” or “response distribution“ so that you can test against 30% and not the 50% default. Enter the data and parameters we know or want (99% confidence, 3% margin of error, 30% proportion/distribution, and whatever population size you want). Maybe 1 Billion? Then simply share which one you used, your parameters if they differ from above, and your results on the recommended sample size.
Effectively, you are taking 3684 events picked at random out of 57,000,000 events and saying that they indicate a disturbing pattern, when they actually do nothing of the sort.
That is literally what validating probabilities is about. Taking a random sample and seeing how it aligns to the advertised rate. See above for why the 57 million number is irrelevant.
Is there a flaw in the ToR algorithm? I don't know and neither do you. And this data doesn't provide any rational reason to believe one way or the other. So call for an investigation to your hearts content, it won't happen based on this.
You are right, neither of us knows. But the data says we should be 99+% confident there is something wrong.