Equipt: qPCR Analysis

Equipt is a suite of tools to make quantitative polymerase chain reaction (qPCR) analysis easier, faster, and more reproducible. qPCR instrument software generally has useful tools for calling Ct values and performing melting curve analyses, but it often falls short in using these values for more complicated calculations. These calculations generally require the user to manually label wells in clunky, point-and-click software which is neither convenient nor reproducible.

Most researchers I know use Excel for qPCR analysis, or write custom scripts in Python or R every time they run an experiment. Not only is this tedious, but using Excel for data analysis is extremely fraught.

Equipt solves these problems by:

  1. Enforcing a set of standardized layouts for 96- or 384-well plates.

  2. Reproducibly labeling wells such that errors are easily detected.

  3. Using these standardizations in modules for common qPCR analyses.

Using equipt will make your qPCR analysis mostly painless and entirely reproducible. For full details, see equipt’s documentation. Below are some basic examples of equipt’s usage.

Efficiency Testing

A necessary task when using qPCR is to determine the efficiency of qPCR primer sets. Equipt’s documentation has a more detailed explanation of primer efficiency and equipt’s functions for efficiecny analysis, but here is a basic example:

[1]:
import equipt
import bokeh.io

bokeh.io.output_notebook()
Loading BokehJS ...

First, import the data and automatically label samples, primers, and dilutions:

[2]:
primers = ['Fus (112734868c1)',
         'Fus (15029724a1)',
         'Ewsr1 (6679715a1)',
         'Ewsr1 (88853580c2)',
         'Taf15 (141803447c1)',
         'Taf15 (141803447c2)',
         'Tsix exon4']

samples = ['mESC total cDNA']

reps = 3

config = 'line'

kwargs = {'with_dil':samples,
         'dil_series':[20,40,80,160],
         'dil_rest':None}

df = equipt.namer('22.11.22_PrimerCurve_Ct.csv',
            primers,
            samples,
            reps,
            config,
            **kwargs)

df.iloc[:6]
[2]:
Pos Cp Primer Name NamePrim
0 A1 17.51 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
1 A2 17.54 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
2 A3 17.55 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
3 A4 18.49 Fus (112734868c1) mESC total cDNA_40 mESC total cDNA_40Fus (112734868c1)
4 A5 18.52 Fus (112734868c1) mESC total cDNA_40 mESC total cDNA_40Fus (112734868c1)
5 A6 18.54 Fus (112734868c1) mESC total cDNA_40 mESC total cDNA_40Fus (112734868c1)

Equipt will then calculate the efficiency values and the correlation correlation of the standard curve:

[3]:
# Kwargs to remove outlier wells
eff_kwargs = {'thresh':0.1,
             'reps':3}

plot_dict, eff_df = equipt.efficiency(df,
                               samples,
                               returnmodel=False,
                               **eff_kwargs)

eff_df
[3]:
Name Primer Efficiency Rsquared
0 mESC total cDNA Ewsr1 (6679715a1) 0.953 0.934628
1 mESC total cDNA Ewsr1 (88853580c2) 0.976 0.934442
2 mESC total cDNA Fus (112734868c1) 0.952 0.921493
3 mESC total cDNA Fus (15029724a1) 0.997 0.932829
4 mESC total cDNA Taf15 (141803447c1) 1.023 0.925023
5 mESC total cDNA Taf15 (141803447c2) 1.000 0.933382
6 mESC total cDNA Tsix exon4 1.108 0.929611

We can see that in this experiment all of the primers except for Tsix exon4 have acceptable efficiency values. Because we applied an outlier filter, we might want to know which wells were removed. This is recorded in a log file:

[4]:
with open('droppedWells.txt','r') as f:
    print(f.read())
Outlier Wells Dropped:
mESC total cDNA_160Fus (112734868c1) - 1

Samples Removed:

Only one sample was removed, and it wasn’t in the sample that had poor efficiency. Equipt also outputs plots of the standard curves, so we can see if Tsix exon4 had some other experiment-specific issues that caused the error:

[5]:
bokeh.io.show(plot_dict['mESC total cDNA'][-1])

The replicate wells are all tightly correlated and the Ct values are reasonable. From this we can conclude that no obvious experiment-specific issues caused the poor efficiency value, and it may be unwise to use this primer.

ΔΔCt Analysis

ΔCt and ΔΔCt are common metrics for examining changes in gene expression or enrichment between two samples. Equipt allows for easy calculation of these values with built-in outlier removal and error propagation. In this example, we will look at gene expression of four genes following 24 hours of siRNA treatment.

As before, we start by importing the data with namer():

[6]:
primers = ('18S','Polr2a','Hdac3','Fus','Ewsr1','Taf15')

samples = ['Scramble-1','Scramble-2',
           'Hdac3-1','Hdac3-2',
           'Fus-1','Fus-2',
           'Taf15-1',
           'Ewsr1-1','Ewsr1-2',
           'Taf15-2']
reps=3

df = equipt.namer('23.01.31_FET-siRNAs_Ct.csv',
             primers,
             samples,
             reps,
             config='line',
             )

df.head()
[6]:
Pos Cp Primer Name NamePrim
0 A1 11.23 18S Scramble-1 Scramble-118S
1 A2 11.20 18S Scramble-1 Scramble-118S
2 A3 11.22 18S Scramble-1 Scramble-118S
3 A4 11.29 18S Scramble-2 Scramble-218S
4 A5 11.31 18S Scramble-2 Scramble-218S

Then simply make a dictionary of control and experimental samples and run deltact():

[7]:
housekeeping = ['18S','Polr2a']

exp_ctrl = {'Hdac3-1':'Scramble-1',
            'Hdac3-2':'Scramble-2',
            'Fus-1':'Scramble-1',
            'Fus-2':'Scramble-2',
            'Taf15-1':'Scramble-1',
            'Ewsr1-1':'Scramble-1',
            'Ewsr1-2':'Scramble-2',
            'Taf15-2':'Scramble-2'}

ddct = equipt.deltact(df,
            housekeeping,
            reps,
            dilution=None,
            thresh=0.1,
            exp_ctrl=exp_ctrl,
            foldchange=True
            )

ddct.head()
[7]:
Experimental Control Primer Exp dCt ddCt StdErr FoldChange
0 Hdac3-1 Scramble-1 Ewsr1 3.770000 0.041667 0.049944 0.971532
1 Hdac3-1 Scramble-1 Fus 3.780000 0.098333 0.027183 0.934111
2 Hdac3-1 Scramble-1 Hdac3 9.210000 2.218333 0.080863 0.214889
3 Hdac3-1 Scramble-1 Taf15 5.796667 0.028333 0.046488 0.980552
4 Hdac3-2 Scramble-2 Ewsr1 3.646667 0.093333 0.054160 0.937354

The calculated values are immediately available in tabular format. For plotting, only some minor modifications to the dataframe are necessary:

[8]:
# Import plotting packages
import holoviews as hv
hv.extension('bokeh')
[11]:
# Remove replicate identifier
ddct['Experimental'] = [i.split('-')[0] for i in ddct['Experimental']]

# Plot
boxwhisker = hv.BoxWhisker(data=ddct,
                            kdims=['Experimental', 'Primer'],
                            vdims=['FoldChange'],
                        ).opts(width=600,
                               xrotation=45,
                            box_color='Primer',
                            cmap={'Fus':'#F58817',
                                 'Ewsr1':'#1B76B7',
                                 'Taf15':'#BD7DAC',
                                 'Hdac3':'#4AA177'}
                        )

boxwhisker
[11]:

Here, the lowest categorical axis indicates the primer, and the one above it indicates the siRNA. The y-axis represents the fold change over a scramble siRNA control. All four siRNAs show knockdown of their target genes but not the other transcripts examined.

Computing Environment

[12]:
%load_ext watermark
%watermark -v -p equipt,jupyterlab,bokeh,holoviews
The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Python implementation: CPython
Python version       : 3.9.17
IPython version      : 8.12.0

equipt    : 1.0.0
jupyterlab: 3.6.3
bokeh     : 3.1.1
holoviews : 1.16.2