The concept is a good one. I prefer Wins Above Replacement Level, since the ratio of goals to wins varies from season to season (on average, it took more goals to "buy" one win in the 1984-85 NHL season than it does today). With that said, for goaltenders I publish Goals Above Replacement on my site (as opposed to WAR).
Replacement level is the important metric - if you're doing Goals Above Average (or Wins Above Average), then an average player ends up with a value of zero, and as anyone who's been saddled with below-average goaltending in the playoffs can tell you, an "average" level of play has positive value. Alternatively, the fact that GMs consistently pay large sums of money to "average" players is market proof that "average" players have positive value. With Goals Above Replacement, average players get value consistent with their ability.
There are two distinct categorical problems with the Goals Above Replacement calculation - the first problem is that the calculation framework is brilliant, but you still have to actually allocate the values to individual players. The questions you have above regarding specifics - a "yes" or "no" answer isn't sufficient in any case - some methods make adjustments for the things you describe, and others do not (and the ones that adjust do so with varying levels of ability). If a defense allows a high-quality chance and the goaltender whiffs on it, how much of the blame do you allocate to the skaters and how much do you allocate to the goaltender? (And do you even have available the data to discern that?) Conversely, if you're lined up against Wayne Gretzky all game and hold him off the scoresheet with two shots on goal, how many "Goals Above Replacement" is that worth?
By nature of the calculation, some players will have negative GAR for a game (and therefore, it's possible to have more positive GAR in a game than your team scores goals in total).
The other categorical challenge with GAR is the estimation of replacement level. "Replacement level" is typically defined as the best player that a team can insert into the lineup with no meaningful asset expenditure (either players in trade, or dollars in signings). In actual data, the "replacement level" will vary from season to season - and specifically, each team has its own replacement level different from other teams in the league. Most calculations assume a uniform replacement level league-wide, since (for example) it wouldn't be far to penalize Sidney Crosby solely because the Penguins have a star-in-the-waiting available to call up from Wilkes-Barre (and hence have a higher replacement level). This has more of an in-game influence in evaluating goaltenders, since it's always "either/or" when comparing between different goaltenders playing, whereas if Sidney Crosby is injured in a game, the AHL call-up doesn't assume his exact role in its entirety.
If replacement level reflects some sub-NHL level of play, then why do some players have complete seasons where they finish with below-zero GAR? A few reasons (which may be obvious) - a player's past performance is just that, past performance, and teams will often gamble on a player's established history instead of their current season performance. Players also represent an investment, and although sunk costs may be sunk costs, the salary cap environment often necessitates that poor players can't just be replaced. Next, a player may be playing at a below replacement level, but their specific team doesn't have anyone of replacement level value to replace him with. Last but not least, team management are not perfect evaluators and can't always discern below-replacement level play (especially since they're making their decisions concurrently, while we have the luxury of reviewing retrospectively).
TL;DR: The concept of Goals Above Replacement is a solid framework. However, the devil's in the details (or "The Devils Are In The Details" if you're an old school Lou Lam fan) and the individual choices modelers make when developing their own GAR estimates are what separate a good model from something less.