Really good question - for better or for worse, I'm using a z-test assuming a binomial distribution on save percentage. This, of course, assumes that save percentage is a perfect binomial, which presents its own problems, but it's the best that I've thought of so far.
I look at a goaltender's overall playoff performance, and my null hypothesis is that (in a given type of elimination game) the goaltender will perform similarly to his overall level.
I haven't tried putting the two types of elimination games together yet (I think that they'd be different kinds of "clutch" situations), but it's certainly something that would be straightforward to try.
There are some results that could be considered statistically significant. For instance, when a goaltender can eliminate an opponent:
Andy Moog (556 expected saves, 574 actual saves, p value 0.010)
Jose Theodore (219 expected saves, 228 actual saves, p value 0.018)
Ed Belfour (661 expected saves, 675 actual saves, p value 0.030)
When a goaltender can be eliminated:
Henrik Lundqvist (589 expected saves, 609 actual saves, p value 0.0011)
Tim Cheveldae (188 expected saves, 197 actual saves, p value 0.023)
Frank Pietrangelo (143 expected saves, 150 actual saves, p value 0.028)
In mutual elimination games:
Henrik Lundqvist (159 expected saves, 166 actual saves, p value 0.024)
Dwayne Roloson (92 expected saves, 97 actual saves, p value 0.030)
Neat to see Chevy on that middle list.