Cohen`s kappa, symbolized by the Greek lowercase letter κ (7), is a robust statistic useful for interrater or intrarater reliability tests. Similar to correlation coefficients, it can range from −1 to +1, where 0 represents the amount of correspondence that can be expected from chance, and 1 represents the perfect match between evaluators. Although kappa levels below 0 are possible, Cohen notes that they are unlikely in practice (8). As with all correlation statistics, kappa is a standardised value and is therefore interpreted in the same way in several studies. Weighted kappa allows for different weighting of disagreements[21] and is particularly useful when ordering codes. [8]:66 Three matrices are involved, the matrix of observed scores, the matrix of expected scores based on random matching and the matrix of weights. The cells of the weight matrix on the diagonal (top left to bottom right) represent a match and therefore contain zeros. Cells outside the diagonal contain weights that indicate the severity of this disagreement. Often, the cells of one of the diagonals are weighted with 1, these two with 2, etc.

The concept of “correspondence between evaluators” is quite simple, and for many years the reliability of evaluators has been measured as a percentage between data collectors. To obtain the measure of percentage agreement, the statistician created a matrix in which the columns represented the different evaluators and the rows of variables for which the evaluators had collected data (Table 1). The cells in the matrix contained the scores that the data collectors entered for each variable. For an example of this procedure, see Table 1. In this example, there are two evaluators (Mark and Susan). They each recorded their scores for variables 1 to 10. To get a percentage match, the researcher subtracted Susan`s scores from Mark`s scores and counted the number of resulting zeros. Dividing the number of zeros by the number of variables measures the match between evaluators. In Table 1, the agreement is 80%. This means that 20% of the data collected in the study is wrong, as only one of the reviewers can be correct if there is disagreement. This statistic is directly interpreted as a percentage of the correct data. The value, 1.00 – percentage of match can be understood as the percentage of incorrect data.

That is, if the percentage match is 82, 1.00-0.82 = 0.18, and 18% is the amount of data that distorts the research data. In addition, Cohen`s Kappa assumes that evaluators are deliberately selected. If your evaluators are randomly selected from a population of evaluators, use Fleiss kappa instead. The first mention of a kappa-type statistic is attributed to Galton (1892); [3] See Smeeton (1985). [4]. Theoretically, confidence intervals are represented by subtracting kappa from the value of the desired CI level multiplied by the kappa standard error. Since the most commonly requested value is 95%, the formula uses 1.96 as the constant by which the kappa standard error (SEκ) is multiplied. The formula for a confidence interval is as follows: where Pr(a) is the actual observed match and Pr(e) is the random agreement. As Marusteri and Bacarea (9) noted, there is never 100% certainty about research results, even when statistical significance is reached. Statistical results to test hypotheses about the relationship between independent and dependent variables become meaningless if there is an inconsistency in the evaluation of variables by evaluators. If the consent is less than 80%, more than 20% of the data analyzed is incorrect. For a reliability of only 0.50 to 0.60, it must be understood that 40% to 50% of the analyzed data is incorrect.

If the kappa values are less than 0.60, the confidence intervals on the kappa obtained are so wide that it can be assumed that about half of the data could be incorrect (10). It is clear that statistical significance does not mean much when there are so many errors in the results tested. The following formula is used for the agreement between two evaluators. If you have more than two reviewers, you must use a formula variant. For example, in SAS, the procedure for Kappa is PROC FREQ, while you must use the SAS MAGREE macro for multiple inspectors. To calculate pe (the probability of a random match), note the following: This is a simple procedure if the values are only zero and one, and the number of data collectors is two. .