• We are looking for you!
    Always wanted to join our Supporting Team? We are looking for enthusiastic moderators!
    Take a look at our recruitement page for more information and how you can apply:
    Apply

Temple of Relics Rewards Nerfed?

Status
Not open for further replies.

Agent327

FOE Team
Forum Moderator
I agree with Daniel. Every time a subject like this comes up there are two sides discussing the maths and how you are supposed to interpret and use the stats. Those two sides will never agree, so basically the discussion is always the same and while your motives might be right right, I seriously doubt anyone other than the two sides cares anymore after the discussion has started.

To top that every time these discussions start, they become personal with snide remarks that are borderline insults, which in the end will lead to another locked post.
 
Sure it's relevant. You're saying that you think there's a flaw in the all the ToRs based on these two. If they were the only two, you might be right. But the more there are, the higher the probability that some will perform outside your narrow expectations. So the fact that there are over 255,000 ToRs is relevant. Two ToRs is one thousandth of one percent (.00001) of all ToRs. You can't draw any conclusions based on that.
I don’t know what to tell you other than that this is patently untrue and a clear misunderstanding of how probabilities and distribution models work. We are not testing specific ToRs, we are testing an RNG engine that produces events against a published table of probabilities. The distribution of outcomes for any static probability will always look like a cone tipped over to its side with its tip pointed right. Outcomes at very low sample sizes are chaotic and create the wide base of the cone. As the sample size grows, the outcomes normalize (quite rapidly, actually) towards the mean and thus the cone shrinks in width until we reach an insanely high number of samples in which the tip will just run very near the mean for infinity. The whole point of probability-based statistics is to describe data/outcomes as they relate to these well established models. For any fixed probability game/study/experiment that has reached the 1500 sample mark with a 27% success rate, that cone has narrowed to about 6% wide (+/- 3%), and every sample beyond 1500 continues to narrow the cone. The farther you get from 1500 and remain at 27%, the less likely the mean can be 30% (and mind you, 30% was already at the outermost reaches of confidence). That is not to say it is impossible for data to be outside of this expected shape, but when it is we can say the probability of that occurring. And when those odds are staggering low, it may indeed be an indicator that something is not working as expected.

Anyone is free to use the binomial probability calculator of their choice to see the statistical chance of the data provided. Set A has a cumulative probability of .002765 of X<=x. Set B has a cumulative probability of .001654 of X<=x. The combined probabilities of this data is roughly 1/220,000. Not impossible, but sure seems like the kind of long odds that would warrant additional scrutiny.

For any issues you have with “my“ narrow definition, you would need to go have a chat with a long dead Bernoulli. All probability-based statistics are from or continuations of his works. The theorems and models dictate the definition, not people using them.
 

Johnny B. Goode

Well-Known Member
That is not to say it is impossible for data to be outside of this expected shape, but when it is we can say the probability of that occurring. And when those odds are staggering low,
And this is where the sheer number of ToRs, players, and possible trigger events is relevant. When the sample is a "staggeringly low" percentage of the total occurrences, the possibility of it being an anomaly rather than an indicator of anything grows. Over the 28 weeks of the data under discussion, we're talking about 3684 possible trigger events (encounters cleared). During that time, let's assume an average of 8 encounters cleared per ToR. Some will do none, some will do 64, but I'll take a low estimate of 8. With 255,000 total ToRs, that over 57,000,000 possible trigger events. So the data set you're basing your viewpoint on is 6/100 of 1% of the possible trigger events.

Now, let's go back to something else you said.
The probabilities experienced by any single player do not impact any other player.
The exact same statement is true of each single possible trigger event. Just like a roll of the die, each one is individual and not affected by previous or subsequent rolls. Or rolls with other dice or rolls by other people. Same here. You have to take these results in relation to the total possible trigger events. That is not what you are doing. Effectively, you are taking 3684 events picked at random out of 57,000,000 events and saying that they indicate a disturbing pattern, when they actually do nothing of the sort. All it indicates is that this player had slightly less than average luck over 28 weeks.

Is there a flaw in the ToR algorithm? I don't know and neither do you. And this data doesn't provide any rational reason to believe one way or the other. So call for an investigation to your hearts content, it won't happen based on this.
 
And this is where the sheer number of ToRs, players, and possible trigger events is relevant. When the sample is a "staggeringly low" percentage of the total occurrences, the possibility of it being an anomaly rather than an indicator of anything grows. Over the 28 weeks of the data under discussion, we're talking about 3684 possible trigger events (encounters cleared). During that time, let's assume an average of 8 encounters cleared per ToR. Some will do none, some will do 64, but I'll take a low estimate of 8. With 255,000 total ToRs, that over 57,000,000 possible trigger events. So the data set you're basing your viewpoint on is 6/100 of 1% of the possible trigger events.
There are a few problems with your logic here.
1 - Binomial distributions describe data that are theoretically infinite in scale. The math that describes the distribution of outcomes of rolling a die does so without regard to whether the die could be rolled 10,000 times, 1,000,000 times, or infinitely. The only thing that changes with additional rolls is the tightness of the distribution increase as n approaches infinity. Your 57 million is trivial in this regard.
2 - 3684 trigger events against an engine are 3684 events regardless of whether they are gathered over a day, week, month, or year (assuming constant fixed probabilities). As such, to put them in context of “how many other people triggered the engine in that time” is useless.
3 - You seem to believe there is a certain percent of total population that has to be tested in order for the sample to be relevant. This isn’t an assembly line with complicating factors that impact performance, it is a simple fixed probability machine that either hits or does not hit. The distributions for these types of probabilities are predictable and normalize incredibly quickly.
4 - By your logic, if the data had been 0 for 3684, it still would not be relevant, just “bad luck”.
5 - Your perception of validation via statistics would not even be practical in the real world. Do you think a casino is just going to wait out to reach 1 million samples before they investigate a game? Do you think pollsters are tracking down 10 million people for every survey? Every medical study has 500,000 people in it?

I‘ll tell you what though, let’s for the sake of argument, pretend the total number of triggers matters. Jump on your favorite browser, search for “sample size calculator”. Find one that lets you set the “population proportion” or “response distribution“ so that you can test against 30% and not the 50% default. Enter the data and parameters we know or want (99% confidence, 3% margin of error, 30% proportion/distribution, and whatever population size you want). Maybe 1 Billion? Then simply share which one you used, your parameters if they differ from above, and your results on the recommended sample size.

Effectively, you are taking 3684 events picked at random out of 57,000,000 events and saying that they indicate a disturbing pattern, when they actually do nothing of the sort.
That is literally what validating probabilities is about. Taking a random sample and seeing how it aligns to the advertised rate. See above for why the 57 million number is irrelevant.

Is there a flaw in the ToR algorithm? I don't know and neither do you. And this data doesn't provide any rational reason to believe one way or the other. So call for an investigation to your hearts content, it won't happen based on this.
You are right, neither of us knows. But the data says we should be 99+% confident there is something wrong.
 
Status
Not open for further replies.