The modern scientific process often involves the development of a predictive computational model. To improve its accuracy, a computational model can be calibrated to a set of experimental data. A variety of validation metrics can be used to quantify this process. Some of these metrics have direct physical interpretations and a history of use, while others, especially those for probabilistic data, are more difficult to interpret. In this work, a variety of validation metrics are used to quantify the accuracy of different calibration methods. Frequentist and Bayesian perspectives are used with both fixed effects and mixed-effects statistical models. Through a quantitative comparison of the resulting distributions, the most accurate calibration method can be selected. Two examples are included which compare the results of various validation metrics for different calibration methods. It is quantitatively shown that, in the presence of significant laboratory biases, a fixed effects calibration is significantly less accurate than a mixed-effects calibration. This is because the mixed-effects statistical model better characterizes the underlying parameter distributions than the fixed effects model. The results suggest that validation metrics can be used to select the most accurate calibration model for a particular empirical model with corresponding experimental data.