• More Trustworthy A/B Analysis: Less Data Sampling and More Data Reducing

    We are all familiar with terabytes and petabytes. But have you heard about zettabytes (1000 petabytes)? Worldwide data volume is expected to hit 163 zettabytes by 2025, 10 times the data in 2017. Your product will contribute to the surge, especially if it is growing rapidly. Therefore, you should be prepared to manage the increase in data.

    The cost of storage and computation will spike as the data volume keeps increasing. Your data pipeline could even fail to process data if the computation request exceeds the capacity. To avoid these types of issues, you can reduce the data volume by collecting a portion of the data generated. But you need to answer several questions to ensure the data are collected in a trustworthy way: Are you mindful of its impact on the analysis for A/B tests? Do you still have valid and sensitive metrics? Are you confident the A/B analysis is still trustworthy so that you can make correct ship decisions?

    Read more: https://www.microsoft.com/en-us/research/group/experimentation-platform-exp/articles/more-trustworthy-a-b-analysis-less-data-sampling-and-more-data-reducing/

  • How to Evaluate LLMs: A Complete Metric Framework

    With ChatGPT and BingChat, we saw LLMs approach human-level performance in everything from performance on standardized exams to generative art. However, many of these LLM-based features are new and have a lot of unknowns, hence require careful release to preserve privacy and social responsibility. While offline evaluation is suitable for early development of features, it cannot assess how model changes benefit or degrade the user experience in production.

    Measuring LLM performance on user traffic in real product scenarios is essential to evaluate these human-like abilities and guarantee a safe and valuable experience to the end user.

    Read more: https://www.microsoft.com/en-us/research/group/experimentation-platform-exp/articles/how-to-evaluate-llms-a-complete-metric-framework/

  • If you know it will Work, then it is not An Experiment!!

    “Experiments are key to our success. If you know it’s going to work, it’s not an experiment. We must be willing to take bold bets, even if they might fail. We learn from failure, and it paves the way for greater innovation and success.” Jeff Bezos

    How frequently do we engage in conversations like this with our teams: “Why not give this approach a shot?” or “Let’s experiment with this.” But do we genuinely comprehend its essence? We make alterations, we embrace fresh approaches. What more is there to consider? In this article, we explore the art of conducting formal experiments to drive improvement across products, processes, and teams. With a systematic template, we’ll uncover the path to capturing invaluable insights that fuel innovation and success.

    Read more: https://www.linkedin.com/pulse/you-know-work-experiment-ritesh-poudel/

  • Accelerated Experimentation: Testing 15 elements in one test

    Do interactions exist in marketing experiments? Though mathematically possible, interactions tend to be small and often non-existent (though inconsistent creative execution can lead to unwanted interactions). However, if an interaction effect is statistically significant, then it can change your predicted lift or even change the optimal combination. For example, if the 1×2 interaction is large, the hero and headline interact. Maybe both are positive, but the interaction is negative, so the new hero + new headline & copy may not give you as much a lift as predicted by the sum of both main effects.

    Read more: https://www.linkedin.com/pulse/accelerated-experimentation-testing-15-elements-one-test-gordon-bell-sinkf/

  • Increasing the sensitivity of A/B tests by utilizing the variance estimates of experimental units

    Companies routinely turn to A/B testing when evaluating the effectiveness of their product changes. Also known as a randomized field experiment, A/B testing has been used extensively over the past decade to measure the causal impact of product changes or variants of services, and has proved to be an important success factor for businesses making decisions.

    With increased adoption of A/B testing, proper analysis of experimental data is crucial to decision quality. Successful A/B tests must exhibit sensitivity — they must be capable of detecting effects that product changes generate. From a hypothesis-testing perspective, experimenters aim to have high statistical power, or the likelihood that the experiment will detect a nonzero effect when such an effect exists.

    Read more: https://research.facebook.com/blog/2020/10/increasing-the-sensitivity-of-a-b-tests-by-utilizing-the-variance-estimates-of-experimental-units/

  • Testing product changes with network effects

    Experimentation is ubiquitous in online services such as Facebook, where the effects of product changes are explicitly tested and analyzed in randomized trials. Interference, sometimes referred to as network effects in the context of online social networks, is a threat to the validity of these randomized trials as the presence of interference violates the stable unit treatment value assumption (SUTVA) important to the analysis of these experiments.

    Colloquially, interference means that an experimental unit’s response to an intervention depends not just on its own treatment, but also on other units’ treatments. For example, consider a food delivery marketplace that tests a treatment that causes users to order deliveries faster. This could reduce the supply of delivery drivers to users in the control group, leading the experimenter to overstate the effects of the treatment.

    Read more: https://research.facebook.com/blog/2021/8/testing-product-changes-with-network-effects/

  • Overlapping Experiment Infrastructure: More, Better, Faster Experimentation

    In this paper, we describe Google’s overlapping experiment infrastructure that is a key component to solving these problems. In addition, because an experiment infrastructure alone is insufficient, we also discuss the associated tools and educational processes required to use it effectively. We conclude by describing trends that show the success of this overall experimental environment. While the paper specifically describes the experiment system and experimental processes we have in place at Google, we believe they can be generalized and applied by any entity interested in using experimentation to improve search engines and other web applications.

    Read more: https://static.googleusercontent.com/media/research.google.com/en//archive/papers/Overlapping_Experiment_Infrastructure_More_Be.pdf

  • Product Experimentation Best Practices

    Investing in a proper experiment design upfront is the first step in running an experiment that follows best practices. A good design document eliminates much of the ambiguity and uncertainty often encountered in the analysis and decision-making stages.

    The design document should include the following:

    Read more: https://www.statsig.com/blog/product-experimentation-best-practices

  • How to Size For Online Experiments With Ratio Metrics

    Expedia Group™, as a global online travel solution provider, heavily relies on A/B testing to continuously bring the best travel experience to our consumers and business partners. We at the Experimentation Science & Statistics team keep exploring new methodologies and techniques to add to the testing capability, enabling more robust and flexible experimentation across all Expedia Group™ brands.

    One recent example is the capability to conduct an experiment with ratio metrics, especially how to determine the sample size.

    Read more: https://medium.com/expedia-group-tech/how-to-size-for-online-experiments-with-ratio-metrics-3d57362f1967

  • Correlation vs Causation: Understand the Difference for Your Product

    Correlation and causation can seem deceptively similar, but recognizing their differences is crucial to understanding relationships between variables. In this article, we’ll give you a clear definition of the difference between causation and correlation.

    And even if you’re not in the product world, we think you’ll benefit from understanding how to tell the difference between correlation and causation.

    Read more: https://amplitude.com/blog/causation-correlation