A statistical argument about effectiveness is an argument that the observations were not random events. If they are not random then the observations have some predictive power.
This is fundamentally incorrect. To make any conclusion you need all other factors (line mate, d pair, opposition players, referees, goalies, location, pre-game meals, how they slept the night before, family life, etc. etc.) to be identical under both the null and alternative hypothesis. With enough observations of both states (together or not together) those other factors will usually average out.
Consider baseball: a .240 hitter is barely MLB quality and generates ~148 hits per season. A .300 hitter is an all-star and generates ~184 hits per season; about 1 extra hit every 4 games. Basically it's perfectly normal to watch 3 entire games and see no difference at all in production between a superstar and a fringe big league player.