Using PaCeQuantAna to analyze and visualize feature values extracted from Pavement cell microscopy images with PaCeQuant

Yvonne Poeschl

2018-10-26

PaCeQuant (http://mitobo.informatik.uni-halle.de/index.php/Applications/PaCeQuant) is a tool, which provides a platform for fully automatic quantification of pavement cell shape features from confocal input images (Möller, Poeschl, and Bürstenbinder 2017). It runs as plugin for the publicly available image analysis software ImageJ/Fiji (Schindelin et al. 2012) and provides a user-friendly graphical user interface (GUI) for intuitive data import and result generation. PaCeQuant includes an automatic image segmentation approach, which forms the fundament for the extraction of cell contours in a high-throughput fashion. Additionally, PaCeQuant extracts (currently) 27 different shape parameters for each cell, covering global, contour- and skeleton-based, and pavement cell-specific characteristics. The here described R package PaCeQuantAna provides an R-based workflow for comparative statistical analysis and graphical visualization of these pavement cell shape features.

List of pavement cell shape features extracted by PaCeQuant:

Group A: Global features:
ID Name Short Description Unit

1

Area \(A(R)\)

size of cell region \(R\)

\(\mu m^2\)

2

Perimeter \(P(R)\)

length of contour of cell region \(R\)

\(\mu m\)

3

Length \(L(R)\)

length of major axis of ellipse fitted to \(R\)

\(\mu m\)

4

Width \(W(R)\)

length of minor axis of ellipse fitted to \(R\)

\(\mu m\)

5

Circularity \(C(R)\)

relationship of region area and perimeter

6

Eccentricity \(E(R)\)

elongation of the cell region \(R\), the larger the more elongated

7

Area of Convex Hull \(A_{CH}(R)\)

size of convex hull of cell region \(R\)

\(\mu m^2\)

8

Perimeter of Convex Hull \(P_{CH}(R)\)

length of contour of convex hull

\(\mu m\)

9

Roundness \(round(R)\)

circularity based on area of convex hull \(A_{CH}(R)\) instead of \(A(R)\)

10

Convexity \(conv(R)\)

ratio of convex hull and region perimeters \(P_{CH}(R)\) and \(P(R)\)

11

Solidity \(S(R)\)

quotient of region area \(A(R)\) and convex hull area \(A_{CH}(R)\)


Group B: Contour-based features:
ID Name Short Description Unit

12

Margin roughness \(MR(R)\)

deviation of local tangent orientations from a circle

13

Average local contour concavity

average change in tangent orientations along contour

14

Standard deviation of contour concavity

standard deviation of tangent orientations (cf. no. 13)


Group C: Skeleton-based features:
ID Name Short Description Unit

15

Longest skeleton path length

longest path length between any two end-points in region skeleton

\(\mu m\)

16

Branch count

total number of branches in cell region skeleton

17

Average branch length

average length of all branches from an end-point to next branch point

\(\mu m\)

18

Average (branch) end-point distance

average distance of branch end-points to background

\(\mu m\)


Group D: Pavement cell-specific features:
ID Name Short Description Unit

19

Lobe count

number of lobes based on local curvature analysis

20

Average lobe length

average maximal distance from lobe baselines to contour

\(\mu m\)

21

Average apical lobe length

average distance from equators to contour

\(\mu m\)

22

Average basal lobe length

average distance from lobe baselines to equators

\(\mu m\)

23

Average basal lobe width

average length of lobe baselines

\(\mu m\)

24

Average equator lobe width

average length of lobe equators

\(\mu m\)

25

Non-lobe area

area of cell not being part of any lobe

\(\mu m^2\)

26

Minimal core width

width of narrowest part of non-lobe core of a cell

\(\mu m\)

27

Maximal core width

width of widest part of non-lobe core of a cell

\(\mu m\)

Requirements

To make use of all the functionality provided by the R package PaCeQuantAna the following R packages are required (prior to installation):

How to work with PaCeQuantAna in general?

In this section we will explain how the functions contained in PaCeQuantAna can be used for statistical analysis and graphical visualization of shape features extracted by PaCeQuant. In the section Example workflow for the analysis of time series data we provide a runnable example workflow for the analysis of time series data.

PaCeQuantAna supports descriptive analysis, e.g., visualization of feature values in box or violin plots, and statistical analysis between different groups, e.g., different time points, species, genotypes or treatments. Input data must be organized in the folder structure established by PaCeQuant. The feature shape tables (“*-table.txt”) must be saved in a folder named “results”. An example data set is described in more detail in section Example time series data.

Set up the working environment

Load the PaCeQuantAna package with library(PaCeQuantAna).

Set the path to you prefered working directory with setwd("path"). By default the working directory is the currently selected directory.

Set the name of the output folder with setOutputFolder("name_of_folder"). All results (figures and tables) are stored below this output folder. When invoking the function setOutputFolder("name_of_folder") folders for storing the results in a structured way a generated, e.g. folders for storing box_plots, histograms or stats.

Set the path to the directory that contains the data fulfilling the PaCeQuant output structure with setDataDir("path_to_data").

Set up the meta data corresponding to the feature data

Read the meta data containing file with readDescriptionFile("data_description.csv"). The function readDescriptionFile("data_description.csv") expects an “*.csv” file named “data_description.csv” located in the selected output folder. It has to be placed there manually by an experienced user or can be created with default values with createDescriptionFileTemplate(filename="my_data_description_file.csv"). The content of the data description file is organized in a tabular. It has as many rows as folders in the chosen data directory and five columns. The columns contain definitions that are used for producing plots, e.g. plot labels, plot order, colors or font shape (e.g. normal, italic or bold).

Structure of a data description file generated with createDescriptionFileTemplate()
group plot_labels plot_order plot_colors plot_font

folder 1

folder 1

1

#FF0000FF

1

folder 2

folder 2

2

#00FF00FF

1

folder 3

folder 3

3

#0000FFFF

1

The entries in the last four columns can be customized by the user to affect the look of the plots. Also, complete rows can be deleted if a certain species, time point or treatment should be excluded from analysis. The names in the first column have to correspond to the names of the folders (representing the groups) that contain the “results” folders and should not be changed.

Loading and quality checking the data

Load cell shape feature data with data <- readFeatureData(). This function checks if a data description file was provided and loads data if it is in the correct structure. The function will read all “*-table.txt” files recursively from the data directory (and folders specified in the data description file). Each feature table file corresponds to an image and each line within a feature table corresponds to a cell. For each cell the corresponding image and region IDs, the name of the folder (which corresponds to a group e.g. species) and the feature values are stored in a list object named data. Names of the list elements will have the prefix raw.

For a quality check of the data invoke printStatsOnCells(data,type="raw",printToFile=FALSE). This function will print the number of images per group, cells per group, and cells per image and group either to the console or a file. The number of cells detected per image should be similar within a group and between different groups if all advices of proper image preparation and processing were followed, e.g. same conditions for taking the picture and same parameter settings in PaCeQuant. Images having a very low number of cells compared to the others should be checked by manual inspection. A low number of detected cells could result from bad image quality or a non-optimal parameter setting in the image processing with PaCeQuant.

Setting or determining cell size thresholds

Size thresholds (on cell areas) can be set, e.g., to remove small cells such as stomata cells, meristemoids or small pavement cells, which are not yet differentiated. By default all analysis procedures run without size filtering. An a priori size filtering can also be done with PaCeQuant. For cell size filtering two values are needed: first an upper bound for small cell sizes (small or \(t_s\)) and second an upper bound for medium size cells (medium or \(t_m\)). All cells having a size above \(t_m\) are referred to as large. There a different ways to set size thresholds:

These size thresholds have different meaning depending on the context of follow up analyses. In case of a time series analysis the thresholds can be used to define developmental stages of the pavement cells based on there sizes. In case of comparative analyses (which is also possible for a time series data set) the thresholds are used to filter out small cells and to retain cells of almost equal sizes.

Analysis of developmental stages (cells of different size)

For studies on developmental stages time series data is recommended.

To visualize the distribution of cell sizes between different time points in a histogram use plotHistogramOnArea(data, type="raw", show_ts=TRUE, show_tm=TRUE). Earlier time points should contain more smaller cells than later time points.

The values of all 27 features within the defined size categories (developmental stages) and between the individual time points can be visualized in box plots using boxplotsOnDevelopmentalStages(data). This function produces a pdf file for each of the four feature groups containing the box plots for each of the features, in which features are boxed according to the time point and size category (developmental stage).

Comparative analyses of cells of the same developmental stage (cells of similar size)

For comparative analysis of shape features values of cell with similar size (and same developmental stage) between different groups, such as species, genotypes, time points or treatments, small cells have to be filtered out. Remaining cells should have initiated lobe formation and expansion already. Filter out stomata cells and too small pavement cells with data <- filterOutSmallCells(data). This function applies the previously defined threshold for small size cells \(t_s\) and extends the list object data with new list elements which will have the prefix filtered and small. Nothing is discarded, the raw data is split into filtered and small.

To check the number of remaining cells for each group after filtering invoke printStatsOnCells(data,type="filtered",printToFile=FALSE).


Descriptive comparisons

For descriptive analyses the PaCeQuantAna package provides plot functions to visualize different aspects of the feature values. These plot functions generate almost publication ready figures. The default font family is Arial. All plots are saved as *.pdf files in a way that they can be modified with external graphical editors like Inkscape…

Provided plot functions are:

Notes:

All figures are stored in intuitively-named folders in the selected output folder.

All plot functions can be invoked on the types raw, filtered and small.


Statistical comparisons

Additionally the PaCeQuantAna package provides functions for comparative statistical analyses. Statistical comparative analyses are done between all groups (defined in the data description file) by applying for each feature a Kruskal-Wallis-test and post-hoc Dunn-tests of all pairwise combinations of all groups. These statistical tests can be performed by invoking kwDunnTest(data, type="filtered") on the data object.

Resulting p values are saved in two separate files, one for the Kruskal-Wallis-tests and one for the Dunn-tests. p values resulting from the Dunn-tests are additionally visualized in a heatmap where the values are replaced by colors and significance is labled with * padj < 0.05, ** padj < 0.01, *** padj < 0.001. Tables containing the p values and the heatmap are saved in the output folder.

Example time series data

Example time series data can be downloaded from http://mitobo.informatik.uni-halle.de/downloads/paceQuant/time_series.zip. The data set comprises three time points (3DAG, 5DAG, and 7DAG), two .lsm pictures per time point and corresponding feature tables (*-table.txt) organized in the required structure. After downloading und unzipping the “time_series.zip” archive the follow folder structure becomes visible:

time_series
|
-3DAG
| |
| -3_1.lsm
| |
| -3_2.lsm
| |
| -results
|  |
|  -...
|  |
|  -3_1-table.txt
|  |
|  -...
|  |
|  -3_2-table.txt
-5DAG
| |
| -5_1.lsm
| |
| -5_2.lsm
| |
| -results
|  |
|  -...
|  |
|  -5_1-table.txt
|  |
|  -...
|  |
|  -5_2-table.txt
-7DAG
  |
  -7_1.lsm
  |
  -7_2.lsm
  |
  -results
   |
   -...
   |
   -7_1-table.txt
   |
   -...
   |
   -7_2-table.txt

The three folders 3DAG, 5DAG, and 7DAG which contain the time point specific feature data represent three groups. The definition of groups is relevant for visualization aspects and statistical testing.

Example workflow for the analysis of time series data

The workflow is runnable. Users only need to adjust the working directory and the data directory to their specific locations.

library(PaCeQuantAna)

# required packages need to be installed
library(caroline)
library(gplots)
library(dunn.test)
library(multtest)
library(RColorBrewer)
# it is need for using Arial as font family in the plots
library(extrafont)
fonts()

# set working directory
setwd("~/projects/time_series_analysis")

# set output folder
# all plots and results are stored within the "OutputFolder"
# default is "output", it is created if it does not exist in the current working directory
setOutputFolder("out")

# path to the directory containing the feature tables
# either all folders within the "DataDir" or a subset defined within the "DescriptionFile" is included in the analyses
setDataDir(name = "~/projects/data/time_series")

# reads the "DescriptionFile"
# must be located in the "OutputFolder"
readDescriptionFile(filename="my_data_description_file.csv")

# creates a "DescriptionFile" with defaults
createDescriptionFileTemplate(filename="my_data_description_file.csv")
# users can modify the newly created "DescriptionFile"
# reads the newly created "DescriptionFile"
readDescriptionFile(filename="my_data_description_file.csv")

# reads the "FeatureData" which are the feature tables generated by PaCeQuant
data <- readFeatureData()

# calculates and prints stats on the number of images and cells
printStatsOnCells(data=data,type="raw",printToFile=TRUE)

# default unit is microns
# sets the default size thresholds
setDefaultSizeThresholds()
getSizeThresholds()

# detailed histogram on cell sizes (areas)
plotHistogramOnArea(data=data, type="raw", show_ts=TRUE, show_tm=FALSE,legend=1)

# for studies on developmental stages time series data is recommended
# box plots for visual inspection of developmental stages
boxpotOnDevelopmentalStages(data=data)

# filters out small cells for comparative analysis of similar sized cells that have initiated lobe formation and expansion already
data <- filterOutSmallCells(data)

# calculates and prints stats on the number of images and cells
printStatsOnCells(data=data,type="filtered",printToFile=TRUE)
printStatsOnCells(data=data,type="small",printToFile=TRUE)

# detailed histogram on cell sizes (areas)
plotHistogramOnArea(data=data, type="raw", show_ts=TRUE, show_tm=FALSE,legend=2)
plotHistogramOnArea(data=data, type="filtered", show_ts=TRUE, show_tm=FALSE,legend=2)
#plotHistogramOnArea(data=data, type="small", show_ts=TRUE, show_tm=FALSE,legend=1)

# histograms of all cell shape features
plotHistograms(data=data,type="filtered",legend = 1)
# violin plots of all cell shape features
plotViolinplots(data=data,type="filtered",mai_bottom = 1)
# box plots of all cell shape features
plotBoxplots(data=data,type="filtered",mai_bottom = 1)
# cumulative distribution plots of all cell shape features
plotCumulativeDistributions(data=data,type="filtered")
# scatter plots of all cell shape features
plotScatterplots(data=data,type="filtered")

# statistical comparative analyses
kwDunnTest(data=data,type="filtered")

References

Möller, B., Y. Poeschl, and K. Bürstenbinder. 2017. “PaCeQuant: A Tool for High-Throughput Quantification of Pavement Cell Shape Characteristics.” Plant Physiology 175: 998–1017.

Schindelin, J., I. Arganda-Carreras, E. Frise, V. Kaynig, and M. Longair. 2012. “Fiji: An Open-Source Platform for Biological-Image Analysis.” Nature Methods 9: 676–82.