Evaluate - A/B Testing Platform

Evaluate is Yahoo’s primary experimentation platform that allows users to run A/B tests on both mobile and web properties to make informed decisions on design and implementation options.

evaluate.png

Team
Lead designer on Evaluate for the last three quarters.

Goals
Create a new experiment report to monitor and review experiments. - Revamp entire product to update workflows and design language.

Background Research
As with other projects, I started doing some background research to understand experimentation at a broader level and how it affected decision making at Yahoo and elsewhere.
Competitor evaluation - I reviewed systems that were currently the most popular in the market for experimentation like Optimizely, VWO and Google Analytics.

User Research
User research on this project was a really great indicator to why user research is so critical to solving the right problem. I scheduled interviews with multiple stakeholders within the Evaluate eco-system ranging from users on the core team to the users who used Evaluate to run experiments on a daily basis.
The interview structure was a loosely structured conversation for the first half and a set of tasks for the users to complete in the next half.
 

Learnings & Outcomes
- Through the interview process, I got some great insight into the different pieces that needed to fit together to enable experimentation at such a large scale - from traffic splitting, to defining buckets, to monitoring and reviewing experiments.
It turned out that the interface was not the primary issue for most users, even though the initial mandate was to update the interface. The more pressing issues were around communication between different teams which impacted the ability to effectively instrument properties, set up experiments and monitor them fon a daily basis.
Another key issue was that the interface did not have the right set of metrics that would lead to confident decision making. Instead most users would rely on the core team to dig up specific reports on an ad hoc basis instead of using the interface.
I documented all these concerns and requests and created a Research Insights doc for the team to review and to help re-align their focus for the upcoming quarters. I also made recommendations for more effective methods of communication within the context of an experiment vs moving to multiple systems.
I also delivered an updated version of the experiment report that can be used as a framework for when they have the right metrics available.

Conclusion
I have since moved to another team, but the Evaluate research effort was greatly successful at identifying issues and at course correction before too much was invested in the project.