The Guardian Newspaper has an interesting article about the Pisa (Program for International Student Assessment) scores for 2012, and it includes data. Since I was interested to see how my own region scored, I downloaded the data into a file called PISA-summary-2012.csv and created a plot summarizing scores in all the sampled regions, with Canada highlighted.

# Graphical summary

(Click the graph to see the full-size version.)

# Code that makes the graph

First, read the data and set up axes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14

regionHighlight <- "Canada"
d <- read.csv('PISA-summary-2012.csv', skip=16, header=FALSE,
col.names=c("rank","region",
"math","mathLow","mathHigh","mathChange",
"reading",'readingChange',
'science','scienceChange'))
n <- length(d$math)
par(mar=c(0.5, 3, 0.5, 0.5), mgp=c(2, 0.7, 0))
range <- range(c(d$math, d$reading, d$science))
plot(c(0, 6), range,
type='n', xlab="", axes=FALSE,
ylab="PISA Score (2012)")
axis(2)
box()

Next, set parameters for label placement.

1
2
3
4

dy <- diff(par('usr')[3:4]) / 50
x0 <- 0
dx <- 1
cex <- 0.65

Show Mathematics scores. The gist is in the line containing the call to `approx()`

, followed by the one calling `segments()`

; this scheme draws lines between a numerical scale and evenly-spaced labels. Thus, the eye is guided not just to the order of the ranking, but also the differences between ranked elements. For example, there is a remarkable gap in each measure, between the top performer and the second-top one.

1
2
3
4
5
6
7
8

o <- order(d$math, decreasing=TRUE)
y <- approx(1:n, seq(range[2],range[1],length.out=n), 1:n)$y
segments(rep(x0, n), d$math[o], rep(x0+dx, n), y,
col=ifelse(d$region[o]==regionHighlight, "red", "gray"))
lines(rep(x0, 2), range(d$math))
text(rep(x0+dx, n), y, d$region[o], pos=4, cex=cex,
col=ifelse(d$region[o]==regionHighlight, "red", "black"))
text(x0+dx, range[2]+dy, "Maths", pos=4, cex=1.2)

Show Reading scores

1
2
3
4
5
6
7
8

x0 <- x0 + 2 * dx
o <- order(d$reading, decreasing=TRUE)
segments(rep(x0, n), d$reading[o], rep(x0+dx, n), y,
col=ifelse(d$region[o]==regionHighlight, "red", "gray"))
lines(rep(x0, 2), range(d$reading))
text(rep(x0+dx, n), y, d$region[o], pos=4, cex=cex,
col=ifelse(d$region[o]==regionHighlight, "red", "black"))
text(x0+dx, range[2]+dy, "Reading", pos=4, cex=1.2)

Finally, show Science scores.

1
2
3
4
5
6
7
8

x0 <- x0 + 2 * dx
o <- order(d$science, decreasing=TRUE)
segments(rep(x0, n), d$science[o], rep(x0+dx, n), y,
col=ifelse(d$region[o]==regionHighlight, "red", "gray"))
lines(rep(x0, 2), range(d$science))
text(rep(x0+dx, n), y, d$region[o], pos=4, cex=cex,
col=ifelse(d$region[o]==regionHighlight, "red", "black"))
text(x0+dx, range[2]+dy, "Science", pos=4, cex=1.2)

# CSV data used in this analysis

The data can be downloaded from a link given above, but it requires google login.