The statistics in this report represent estimates based on samples of students, rather than values that could be calculated if every student in every country had answered every question. Consequently, it is important to measure the degree of uncertainty of the estimates. In PISA, each estimate has an associated degree of uncertainty, which is expressed through a standard error. The use of confidence intervals provides a way to make inferences about the population parameters (e.g. means and proportions) in a manner that reflects the uncertainty associated with sample estimates. If numerous different samples were drawn from the same population, according to the same procedures as the original sample, then in 95 out of 100 samples the calculated confidence interval would encompass the true population parameter. For many parameters, sample estimators follow a normal distribution, and the 95% confidence interval can be constructed as the estimated parameter, plus or minus 1.96 times the associated standard error.
In many cases, readers are primarily interested in whether a given value in a particular country is different from a second value in the same or another country, e.g. whether girls in a country perform better than boys in the same country. In the tables and figures used in this report, differences are labelled as statistically significant when a difference of that size or lager, in either direction, would be observed less than 5% of the time in samples, if there were actually no difference in corresponding population values. Throughout the report, significance tests were undertaken to assess the statistical significance of the comparisons made.
Some analyses explicitly report p-values (e.g. Table I.B1.5.4 in Volume I). P-values represent the probability, under a specified model, that a statistical summary of the data would be equal to or more extreme than its observed value (Wasserstein and Lazar, 2016[1]). For example, in Table I.B1.5.4 in Volume I, the p-value represents the likelihood of observing, in PISA samples, a trend equal to or more extreme (in either direction) than what is reported, when in fact the true trend for the country is flat (equal to 0).