The Coupon Collector's Problem (with Geoff Marshall)

Share it with your friends Like

Thanks! Share it with your friends!

Close

Check out Geoff’s channel. Here’s a video I’m in about Platforms Zero: https://www.youtube.com/watch?v=TTHOyTypNs8

Find your nearest Park Run: https://www.parkrun.com/

Thanks to all of Geoff’s running buddies for being involved. This is Matt’s Runderground channel: https://www.youtube.com/c/runderground

Cheers to my Patreon supporters who keep this whole channel running. But not literally. You can also help support and shape the videos I make: https://www.patreon.com/standupmaths

CORRECTIONS
– 10:49 Yes, I said “converges” by accident when filming and I dropped in a “diverges” in the edit. I don’t think anyone will notice.
– I think my big divergent observation may not hold! Clarence Lam was the first to spot that the lead n out the front of the series can explain the increasing times without the series itself needing to diverge. I suspect there is still an argument to be made around the rate at which times go up outpacing n, but I’m not sure it’ll be super intuitive.
-Let me know if you spot any other mistakes!

Early morning filming and editing by Alex Genn-Bash
Props by Matt Parker
Music by Howard Carter
Design by Simon Wright and Adam Robinson

English subtitles by Max, Rob Macdonald, Eric RodrĂ­guez and Matt Parker

MATT PARKER: Stand-up Mathematician
Website: http://standupmaths.com/
US book: https://www.penguinrandomhouse.com/books/610964/humble-pi-by-matt-parker/
UK book: https://mathsgear.co.uk/collections/books/products/humble-pi-signed-paperback

Comments

Stand-up Maths says:

Ok, many are suggestion I should have stood up to reveal an even bigger table next to me. Great concept, but ideas like that require some serious resources. cough http://patreon.com/standupmaths

Jonathan Leister says:

I think random time is a bad assumption. A good runner is likely to have a time of 16:45 for a 5k. A time of 16:01 will not happen by chance. I think you need to do a normal distribution around an expected time.
Also for sandbaggers you should do a normal distribution around the center of a clump of numbers. So say a runner has 3 numbers left 45 46 and 47. And they have a probability given by a normal distribution centered on their target with a standard deviation of 5 seconds or such. Finnally you can use this model to figure out how accurate and percise a sandbagger they are. Is there an offset between the stopwatch time and the official time. This could be fun. Think of the Montecarlo simulations with python!

Amy says:

I’m way ahead… I’m on 75 parkruns and have 42 “coupons” and I’m not sandbagging it at all!

Josh Arnold says:

This was fantastic!

Charles Gregory says:

I'm trying to work out the average number of runs needed until your first duplicate. I found a PDF by Philippe Duchon and Cyril Nicaud
which suggested sqrt((pi * n)/2). So for 60 items, it's ~9.708 (so 9th or 10th run). Am I applying the correct formula?

Azide_zx says:

15:03 id imagine the other peak would be well above 281 because people are naturally going to be able to consistently do a park run in roughly the same amount of time, theyre not going to have one thats super fast followed by an extremely slow one so they might get more repeat times than expected if theyre not trying to get the bingo

SanniRay says:

I will be very disappointed if this video doesn't have a "Parker-run" joke in it

Lleanlleawrg says:

I'm starting to question whether coupon is even a real word at about 8 minutes. Semantic satiation, isn't it?

Jasper Janssen says:

So this is essentially the magic the gathering collector problem.

MinekEzQM says:

It is spelled "parkrun" all lower case. 🙂
Also, some lucky parkrun enthusiasts can run 55 parkruns a year (52 weekends + Christmas double special + x2 New years special)

Bell the Bunny Witch says:

Who else was waiting for that tiny desk?

Carlos Frostygreen says:

Where's the link for this video?

Topi Linkala says:

13:20 I was expecting Matt telling why the graf looks like plot of a log function but it never came.

MWSin1 says:

Following a mathematical rabbit hole, I accidentally proved that 1 = 1 for all values n > 0.

I hate it when I do that.

Anthony Dillon says:

Possible correction:
Wouldn't the last number take 41.23 tries not 60.
Chance of wrong number is 59/60=.9833
.9833^41.241286(tries) = 50% chance of happening.

Ghiaccio says:

My intuition is that the median (and other quantiles) would be more informative than the mean in this case. Average amount of runs to get bingo doesn’t matter if most people only go for it once; what you want is how many runs it should be until you have a 50% chance of having done it by then.

JackCarregan says:

6:15 its not 1/52 times its more like 1/31, because you can get the desired card at any of those times by the 31st try you have a 50% of having gotten the card already.

Ian Hill says:

Let's extend this to something that costs money. Panini stickers. Apparently there are 670 stickers in the 2022 World Cup pack. 670 * (1 + 1/2 + … + 1/670) = 4747. There are five stickers to a pack, so you're looking at just over 949 packs. At 70p per pack, that works out at ÂŁ665 to complete your collection.

Short version – save yourself money and buy them a football instead. Or even a PS5 with FIFA22 AND a football. It's still cheaper.

Chris England says:

The fact that the seconds in a person's park run time is more or less random reminds me of one way to get truly random numbers in a computer. You simply time the intervals between successive user key strokes in microseconds, then throw away everything except the right-most digit. You can do that as often as necessary to obtain a sequence of truly random numbers.

Christopher Mayfield says:

How did I miss this Video… Amazing Job.. I guess we will never see the parker sandbag metric

Chris Amies says:

Maybe I shouldn't have looked it up … I have 59 out of 60 in parkrun bingo. Lacking only '22'. I've done 165 runs but don't know when I got my 59th 'coupon' or what it was.

Ulkomaalainen says:

I have had quite a lot of these over the years and am very aware of the average times. However, at one point a related question came up (assuming no sandbagging of course): What is the expected value of the highest stack once you finish. I. e., once you got your 60th second value, you will most likely have some times you have run twice, thrice, n times. One of those values you will have encountered the most times (or a couple at the same amount), but what is to be expected this highest number?

柯智懷 says:

4:13

Jose V says:

Wouldn't trials be a better term then times in this case?

maxp3141 says:

I’d like to see a proof that expected time is 1/p – it’s intuitive, but that’s not enough in math.

gregoryg72 says:

At around 13:30 Matt says that on average it should take 60 runs to get the last needed number which is obviously incorrect. It would take 30 runs on average to get any particular number.

Write a comment

*