Head to to get a 30-day free trial. The first 200 people will get 20% off their annual subscription. Watch this video ad free on Nebula: ------------------------------------------------------- Today we're looking at HyperLogLog, an algorithm that leverages random chance to count the number of distinct items are in a dataset. It does this by tracking the longest run of zeros in a binary sequence, and uses that as an estimate of cardinality. HLL is a probabilistic algorithm, meaning it's a guess rather than true answer. But due to some clever tricks it is usually within 2% of the correct value, and can do it both quickly and in a memory-efficient manner. A 512kb datastructure can accurately process trillions of items and terrabytes of data, which is pretty impressive! When I made this video, I didn't realize that another #SoME3 was in progress. But a bunch of viewers suggested I enter the video, so I guess this is will be part of the event! ---------------------- 🔬Patreon if that's your jam: 📢Twitter: 📷Instagram: 💻Discord: ---------------------- Journal papers: Flajolet, Philippe, et al. "Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm." _Discrete Mathematics and Theoretical Computer Science_. Discrete Mathematics and Theoretical Computer Science, 2007. Heule, Stefan, Marc Nunkesser, and Alexander Hall. "Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm." _Proceedings of the 16th International Conference on Extending Database Technology_. 2013. Earlier work: Durand, Marianne, and Philippe Flajolet. "Loglog counting of large cardinalities." _Algorithms-ESA 2003: 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003. Proceedings 11_. Springer Berlin Heidelberg, 2003. Flajolet, Philippe, and G. Nigel Martin. "Probabilistic counting algorithms for data base applications." _Journal of computer and system sciences_ 31.2 (1985): 182-209. Articles: - -











