Monitoring sites located in urban areas show a general increment in PM2.5 levels over rural sites. Due to local emissions, urban traffic sites record the highest values. The following box plots, with red lines as mean values, summarise hourly PM2.5 measurements between 2011 and 2015 for 34 urban background sites and 18 urban traffic sites across the UK:
Harwell, Oxfordshire, was the only rural background site within England which provided PM2.5 data over the same time period. The mean value at this site was 10.8 μg/m3. The urban background mean was 12.3 μg/m3.
Harwell is within the south-east of England, where regional background levels are higher than the rest of the UK. Although the one mean value cannot be used to represent the regional background concentration, it still indicates that regional background levels are the dominant contributor to urban background levels. The mean value of the roadside sites was 14.3 µg/m3, an increment of 2 µg/m3 over the urban background.
Data for the pairs of urban traffic and urban background sites within close proximity of each other is summarised as follows:
The site pairs showing the largest differences between them were the Glasgow Centre and Glasgow Kerbside sites. These sites are situated close to each other, but the kerbside site is on a frequently-congested road with built-up surroundings forming a street canyon, whereas the background site is within a pedestrianised area with open surroundings. The large difference between the London sites is likely due to similar reasons.
Analysis in Python
Seaborn box plots show distributions with respect to categories. To use its functions, data must be presented in either one of two forms. The first is as a list of vectors, as contained within the table structure obtained using pandas to read csv files downloaded from the DEFRA website. The pandas DataFrame obtained can be plotted using the boxplot function in Seaborn.
The other data structure that can be plotted is a 2D array, where one vector contains quantitative data and the other contains categorical data. This form was used in in the first of the above two box plots, where data from the different monitoriong sites was categorized into either urban background or urban traffic sites.
It involved reading csv files, one for each of the two different site types, and then extracting data into an N-dimensional array object in NumPy. A 1D array could then created by the numpy.ravel() function and assigned as a column in a new DataFrame. A second column containing the categorical data could similarly be created from a NumPy array, using a list comprehension to create a list of values for the type of monitoring site.
After combining the two Dataframes using pandas concat method, the data is now of a correct shape to create the box plots.
The ‘showmeans’ argument adds mean values and the ‘meanline’ argument creates a line instead of a cross. To help preserve a sensible scale and improve clarity of the plot, passing ‘showfliers=False’ can be used to remove outlying datapoints.