3blue1brown

Which rating is better, mathematically speaking?

Added 2020-03-13 19:53:39 +0000 UTC

Hey everyone,

Here is the first of three installments for the next probability topic. Imagine you're shopping online, and see three different sellers of the same product with the following ratings:

- 100%, with 10 reviews
- 96%, with 50 reviews
- 93%, with 200 reviews.

Who should you buy from?

There are many different ways you could go about answering this, but for just about any of them, it's helpful to know about the beta distribution, which is the target of these videos. It's a nice topic because it hits many different important points along the way, binomials, pdfs, Bayesian updating in a continuous setting, etc.

Let me know what you think, and if you catch any errors.

---

In other news, the "Exponential growth and Epidemics" video has really set a lot of new records for the channel. Views aren't what matters most, but on almost any metric of engagement, it seems it really struck a chord with people. One interesting metric here is the rate at which translated subtitles come in, which gives some sense of how much people not only want to share it but what to invest their own time to help it become sharable. On that front, it was truly through the roof.

Currently, I'm thinking that once this probability project is wrapped up, another video covering the more accurate models used for epidemics (like SIR) might be in order.

This was also the first time I've ever experience seeing a video get demonetized. Luckily this doesn't really matter for 3b1b. Thanks to this Patreon support those sorts of vicissitudes don't have much bearing on the choices for what content to cover, but it does make me a little more grateful for that fact.

-Grant

Which rating is better, mathematically speaking?

Comments

Beside that I wouldn't start with a non-informative prior (like the uniform distribution) but with a prior distribution somehow estimated from the large number of resellers already rated on amazon (since sellers usually at least try to satisfy customers), have you considered starting with the Jeffreys prior (density proportional to the square root of the Fisher information) beta(0.5, 0.5) instead? I've often thought about which prior beta(u, u) for 0 <= u <= 1 is the "right" prior for the binomial distribution and couldn't come to a final conclusion. But usually, when I have to pick one, I decide for the Jeffreys prior for I don't know what reason.

2020-04-30 21:56:09 +0000 UTC

Hey Grant! I was thinking about the percent of positive over total covid tests the other day, and wondering if that data could somehow be used to estimate the true number of positives in a given region. Just now I realized that question actually bears a surprising resemblance to the problem in this video! Now, I see some differences that complicate the analysis, such as the fact that the number of available tests and the measured proportion of positives are not independent (since more probable cases are tested when you have fewer testing supplies). But I thought it was still interesting!

Stephen Woo

2020-03-29 12:08:17 +0000 UTC

Hi Grant. I usually don't need to pause your videos to go search for extra and additional info, and (probably because I'm tired) had to go search for more on the Laplace's "Rule of Succession". If still in time, would you consider giving a further explanation on why exactly we need to add two more for the correct way of reading a rating? It was a bit confusing for me, even with all the examples given after, couldn't visualize why we add 2 more as opposed to any other number. Maybe a few seconds to demonstrate the "sunrise problem" would be helpful for those in doubt like me? Thank you again for all of your talent and dedication for a better education! You're amazing! <3

2020-03-15 23:42:43 +0000 UTC

I love how your videos are always so relevant!

Programmable Spacecraft

2020-03-15 21:11:05 +0000 UTC

I haven't finished watching this video - but I have always had a question about how to deal with stranger cases - for example some products have reviews that look like two binomials - a group that love it - a group that hate it... other products have almost flat distributions... How to include these other non binomial like cases?

2020-03-15 19:50:12 +0000 UTC

Love the idea of doing power analysis with Bayes rule. Wait why did the coronavirus video get demonetized..?

jpchen

2020-03-15 18:32:05 +0000 UTC

Grant - something on SIR/SIS would be good - I found the first part of this post really good (I felt the second part on Knowledge was good but a separate thing). https://www.meltingasphalt.com/interactive/going-critical/

2020-03-15 13:32:32 +0000 UTC

If you can get data about the time of the reviews, then you could a logistic regression where the parameter s changes over time. You would be able to detect a sharp shift. That would tell you when the shift happened and that there was a shift, but it wouldn't tell you why per se.

2020-03-15 12:30:28 +0000 UTC

The term "proper prior" technically just means a valid probability distribution.

2020-03-15 12:14:46 +0000 UTC

I think just giving factorial formula of the Choose function at least in passing would be beneficial, because that is the question you would get a lot I would think.

Timur Sultanov

2020-03-15 01:48:34 +0000 UTC

Love it. Looking forward to the next two. Very closely related to this article I wrote on estimating the bias of a coin... https://heliosphan.org/estimating-biased-coin.html I also used numerical estimation to get an early peek at the correct results, which helps sanity check the maths as well I feel, and give more confidence in the maths.

2020-03-14 22:08:06 +0000 UTC

@Vasilis +1. As school are (temporarily) closing, it would be nice to have recommendations for no-nonsense math/science textbooks for homeschooling.

Edith Dubiner

2020-03-14 18:01:31 +0000 UTC

not sure, but it could be the exponential&logistic video was demonetized because of the Corona crisis. Google might have decided not to put ads on content related to the pandemic? It could become a breadwinner again once the crisis is over.

Edith Dubiner

2020-03-14 17:51:37 +0000 UTC

Sure, I’ll make that change.

3blue1brown

2020-03-14 17:09:15 +0000 UTC

Also I would love to see a list of your favourite mathematical textbooks!

2020-03-14 13:10:08 +0000 UTC

Really Nice! I don't like probability and statistics so much but here we are, you are changing that!

2020-03-14 13:09:01 +0000 UTC

A million cars... Yeah, what the proper prior be there?

2020-03-14 11:29:52 +0000 UTC

I am currently learning survival analysis for my day job - may also be an interesting topic in the context of pandemic videos, if you want to make this an ongoing theme (plus, I would very much enjoy having a concise explanatory video to point people to).

Max Maass

2020-03-14 10:45:18 +0000 UTC

I would love to see a SIR model of disease. I think your animations would be so well suited for that

2020-03-14 07:36:04 +0000 UTC

Maybe extend that review topic on YouTube videos with their "thumb up / thumb down" rating. And derive the satisfaction value (s) based on the number of "thumb downs". Those appear on even the best videos for some reason.

2020-03-14 06:45:51 +0000 UTC

Fear not: I predict you'll have it very much soon enough

Don Sanderson

2020-03-14 00:36:06 +0000 UTC

At 8 minutes, it might be nice to see (just as one of the things it’s cycling between) how the formula works out to 0 (or near-0) for the histogram values below 42.

Jacob Ford

2020-03-13 23:48:46 +0000 UTC

Oh no, part 2 seems to cover things I would really need for my exam in a few weeks :o

Supreme

2020-03-13 22:56:05 +0000 UTC

Brilliant, Grant! I don't recall having nearly as many "aha moments" 45 years ago attending my probability and statistics classes as I have watching your videos.

Dragi Raos

2020-03-13 22:29:12 +0000 UTC

Well, there was also a Nikola car manufacturer...

Dragi Raos

2020-03-13 22:15:07 +0000 UTC

There are quite a few more in there :)

3blue1brown

2020-03-13 21:13:46 +0000 UTC

As a matter of fact, I contributed a little supplement to that documentary (currently only an extra on their dvd), and fully intend to make a video about her work this year.

3blue1brown

2020-03-13 21:13:06 +0000 UTC

Thanks for the coinflipping tip; I suppose I have it going on every frame, which is 60 per second, so I'm sure slowing it would be friendlier. It feels like a bit of a weird thing for websites to show this "Laplace's rule of succession" review since everyone would read it as the real data... but it's easy enough that we can all just do the translation when we see it.

3blue1brown

2020-03-13 21:12:14 +0000 UTC

This is not actually a pdf, though, it's just a plot of p(data | s) as a function, so there is no normalization requirement. In Bayesian terms, this is the likelihood function. Where this is going, of course, is a lesson on how to get a posterior pdf from this likelihood function, which will, of course, need to be normalized.

3blue1brown

2020-03-13 21:10:33 +0000 UTC

Thanks, subtle catch! That was me being lazy and not animating the transition. Appreciate the pull request thought, probably faster for me since I know right where it is. Code is here, if you're curious: https://github.com/3b1b/manim/blob/shaders/from_3b1b/active/bayes/beta.py It's worth understanding that sometimes animation code itself is done in a flurry of just trying to make things, so isn't always the cleanest.

3blue1brown

2020-03-13 21:09:14 +0000 UTC

Not sure if you can touch on probability of Review Bombing (DingTalk went from 4.9 to 1.0 when required for on-line classrooms) where reviewers intentionally crash or inflate reviews because on conflict-of-interest. How would you ferret that out from the data?

2020-03-13 21:07:03 +0000 UTC

And I assume because it is satire, using the Amazon logos is allowed.

2020-03-13 21:00:47 +0000 UTC

Was I the first to notice the Easter Eggs you left in the video? I was going to ask if that was a real review that you screen grabbed, then I read them. Yes. I do want Delivery before Feb 31st.

2020-03-13 20:56:34 +0000 UTC

Plot twist. There is actually a car company named ni(c)kola. they are even public for a week 🤪

Martin Embeh

2020-03-13 20:32:07 +0000 UTC

Great video! My friend and I were talking about almost this same topic a couple weeks ago in the context of my app having high, but few reviews and comparing them to other apps with more reviews on a list.

2020-03-13 20:21:03 +0000 UTC

cant wait to see part two <3

Youssef Mohamed

2020-03-13 20:19:01 +0000 UTC

As a biostatistician who's worked in epidemiology, both topics are fascinating. I'm looking forward to seeing more on probability. It's International Maths Day today. Yesterday in Auckland we watched the new documentary about Maryam Mirzakhani. I couldn't help wondering that her work would benefit hugely from your talents

Steve Chantry-Taylor

2020-03-13 20:14:04 +0000 UTC

Great video! Definitely helped me understand why it's called a binomial distribution, and leaves me curious for the next part. A couple of unimportant thoughts, definitely not errors: - is it just me, or are the speech bubbles less of an ellipse than usual? - the coin-flipping animations seem pretty fast-paced; would that be an issue for people sensitive to flashing lights? - the "imagine two more reviews" shortcut is fascinating! my immediate thought was "if it's that easy, I wonder why websites don't just show that percentage to begin with".

wye

2020-03-13 20:14:02 +0000 UTC

At 10:30, the integral of the probability distribution for the data given the success rate should still be 1, and the narrower curve should be taller. (The underlying parameter is continuous, not discretized according to the cardinality of the data.)

john kraemer

2020-03-13 20:12:21 +0000 UTC

if it matters, there is a small glitch at the point of the highest value, in the probability distribution @10:29 btw, if you give python video source (or github link) i would have tried to edit it and open a pull request, Thanks

Youssef Mohamed

2020-03-13 20:11:31 +0000 UTC

I'm glad to hear you didn't suffer from the video being demonetized and that it reached a lot of people. This is what pretty much everyone on the planet is talking about, and it is a perfect illustration of math being useful in real life. Nice.

Gabe

2020-03-13 20:09:36 +0000 UTC

Coronavirus is considered a "sensitive topic". I can see how as a blanket policy, advertisers wouldn't want to be seen with their cheery commercials preceding stories involving lots of deaths, and at YouTube's scale, the only way to pragmatically operate as an intermediary is to have blanket policies like this. I will admit it feels a _little_ silly on this video, given that it's really just a math lesson, but I understand where they're coming from. As I said, though, it doesn't really matter too much for this channel, it's more of an interesting sidenote.

3blue1brown

2020-03-13 20:01:19 +0000 UTC

These guys did something along these lines for rating videogames: https://steamdb.info/blog/steamdb-rating/

Mandelbrot

2020-03-13 19:58:56 +0000 UTC

The video was demonetized?? Can you share why?

2020-03-13 19:56:27 +0000 UTC

More Models and Creators

nuppefuhofu19

fanbox

BROOD

gumroad

Fergal Schmudlach

patreon

Lucius BlanC

patreon

agal1502

patreon

Rabbithole

fanbox

Tenshi

patreon

ss13ceveris

patreon

eternitydev

patreon

Hexenv88

patreon

Batdashcolony

patreon

tare

fanbox

Iku_1919

patreon

Ali @ Curiousest

gumroad

yuutayamada2

patreon

Space Panda

gumroad

FatcatS

patreon

BabesAI

patreon

vlack

patreon

Totolini

patreon

KBJ

patreon

nia0122

fanbox

sorcerymod

patreon

Fblue1

patreon

Sweetcheekscabaret

patreon

toda

fanbox

NachocoBana

patreon

Waifu Fan One

patreon

Prophetess Reverend Rita M. Henderson

gumroad

Team Boobs

gumroad

letmebefrank

patreon

onmodel3d

patreon

ソルティ男爵

fanbox

Ellie-Star

fanbox

heavenly_roads

patreon

lacomplex

patreon

Nomadic Fanatic

patreon

Volga014

patreon

Rally__hub__2.0

patreon

MidnightEmmy Art Model

patreon

dokkanwiki

patreon

Yeansman

patreon

pawelkicman

gumroad

Zy_samaArt

gumroad

FrancisXie

gumroad

doublestuffed

patreon

Alwin Red

gumroad

Rilex Lenov

fanbox

chuchubaa2

patreon

RanobeList

boosty

Which rating is better, mathematically speaking?

Comments

I love how your videos are always so relevant!

Love the idea of doing power analysis with Bayes rule. Wait why did the coronavirus video get demonetized..?

Grant - something on SIR/SIS would be good - I found the first part of this post really good (I felt the second part on Knowledge was good but a separate thing). https://www.meltingasphalt.com/interactive/going-critical/

If you can get data about the time of the reviews, then you could a logistic regression where the parameter s changes over time. You would be able to detect a sharp shift. That would tell you when the shift happened and that there was a shift, but it wouldn't tell you why per se.

The term "proper prior" technically just means a valid probability distribution.

I think just giving factorial formula of the Choose function at least in passing would be beneficial, because that is the question you would get a lot I would think.

@Vasilis +1. As school are (temporarily) closing, it would be nice to have recommendations for no-nonsense math/science textbooks for homeschooling.

not sure, but it could be the exponential&amp;logistic video was demonetized because of the Corona crisis. Google might have decided not to put ads on content related to the pandemic? It could become a breadwinner again once the crisis is over.

Sure, I’ll make that change.

Also I would love to see a list of your favourite mathematical textbooks!

Really Nice! I don't like probability and statistics so much but here we are, you are changing that!

A million cars... Yeah, what the proper prior be there?

I am currently learning survival analysis for my day job - may also be an interesting topic in the context of pandemic videos, if you want to make this an ongoing theme (plus, I would very much enjoy having a concise explanatory video to point people to).

I would love to see a SIR model of disease. I think your animations would be so well suited for that

Maybe extend that review topic on YouTube videos with their "thumb up / thumb down" rating. And derive the satisfaction value (s) based on the number of "thumb downs". Those appear on even the best videos for some reason.

Fear not: I predict you'll have it very much soon enough

At 8 minutes, it might be nice to see (just as one of the things it’s cycling between) how the formula works out to 0 (or near-0) for the histogram values below 42.

Oh no, part 2 seems to cover things I would really need for my exam in a few weeks :o

Brilliant, Grant! I don't recall having nearly as many "aha moments" 45 years ago attending my probability and statistics classes as I have watching your videos.

Well, there was also a Nikola car manufacturer...

There are quite a few more in there :)

As a matter of fact, I contributed a little supplement to that documentary (currently only an extra on their dvd), and fully intend to make a video about her work this year.

Not sure if you can touch on probability of Review Bombing (DingTalk went from 4.9 to 1.0 when required for on-line classrooms) where reviewers intentionally crash or inflate reviews because on conflict-of-interest. How would you ferret that out from the data?

And I assume because it is satire, using the Amazon logos is allowed.

Was I the first to notice the Easter Eggs you left in the video? I was going to ask if that was a real review that you screen grabbed, then I read them. Yes. I do want Delivery before Feb 31st.

Plot twist. There is actually a car company named ni(c)kola. they are even public for a week 🤪

Great video! My friend and I were talking about almost this same topic a couple weeks ago in the context of my app having high, but few reviews and comparing them to other apps with more reviews on a list.

cant wait to see part two &lt;3

At 10:30, the integral of the probability distribution for the data given the success rate should still be 1, and the narrower curve should be taller. (The underlying parameter is continuous, not discretized according to the cardinality of the data.)

if it matters, there is a small glitch at the point of the highest value, in the probability distribution @10:29 btw, if you give python video source (or github link) i would have tried to edit it and open a pull request, Thanks

I'm glad to hear you didn't suffer from the video being demonetized and that it reached a lot of people. This is what pretty much everyone on the planet is talking about, and it is a perfect illustration of math being useful in real life. Nice.

These guys did something along these lines for rating videogames: https://steamdb.info/blog/steamdb-rating/

The video was demonetized?? Can you share why?

More Models and Creators

not sure, but it could be the exponential&logistic video was demonetized because of the Corona crisis. Google might have decided not to put ads on content related to the pandemic? It could become a breadwinner again once the crisis is over.

cant wait to see part two <3