deltact

ΔΔCt is the most common metric for quantifying gene expression changes by qPCR. Say you have a group of genes that you hypothesize varies between an experimental sample and a control. In both samples, you perform qPCR for both the genes of interest and some genes that you expect to be constant between the samples. These constant genes, often called housekeeping genes, are essential to the viability of the cell. 18S rRNA, RNA polymerase II, and GAPDH are common examples.

To normalize for differences in cDNA input, subtract the average housekeeping gene Ct from the gene of interest Ct to get the ΔCt:

\[Ct_{gene \: of \: interest} - Ct_{housekeeping} = \Delta Ct\]

Performing this calculation for both the experimental and control samples leaves you with two Ct values for each gene of interest:

\[\begin{split}\Delta Ct_{experimental} \\ \Delta Ct_{control}\end{split}\]

Subtracting the control ΔCt from the experimental yields the ΔΔCt:

\[\Delta Ct_{experimental} - \Delta Ct_{control} = \Delta \Delta Ct\]

The ΔΔCt is then the normalized cycle difference of the gene of interest between the experimental and control sample. Assuming that the efficiencies of all the primers are approximately 1, the ΔΔCt can be used to calculate fold change in gene expression:

\[Fold \: change = 2^{- \Delta \Delta Ct}\]

So say that the ΔΔCt value is 1. This indicates that the PCR reaction in the experimental sample took 1 additional cycle to achieve the second derivative maximum than the control sample. Assuming perfect doubling with each cycle, the experimental sample should have half as much of the target transcript as the control:

\[2^{-1} = 0.5\]

It is worth emphasizing that this method is semiquantitative, as it assumes primer efficiencies of 1. More advanced mathematical models than equipt currently uses can incorporate empirically determined efficiency values to gain true quantitative gene expression ratios. The “Further Reading” section of this document provides additional information on the complexities of relative gene expression using qPCR.

Importing data with namer

As with all equipt tools, using deltact() begins with labeling the data using namer(). For this example, we will examine a dataset for siRNA validation of four genes:

[2]:

import equipt

[3]:

primers = ('18S','Polr2a','Hdac3','Fus','Ewsr1','Taf15')

samples = ['Scramble-1','Scramble-2',
           'Hdac3-1','Hdac3-2',
           'Fus-1','Fus-2',
           'Taf15-1',
           'Ewsr1-1','Ewsr1-2',
           'Taf15-2']
reps=3

df = equipt.namer('data/23.01.31_FET-siRNAs_Ct.csv',
             primers,
             samples,
             reps,
             config='line',
             )

df.head()

[3]:

	Pos	Cp	Primer	Name	NamePrim
0	A1	11.23	18S	Scramble-1	Scramble-118S
1	A2	11.20	18S	Scramble-1	Scramble-118S
2	A3	11.22	18S	Scramble-1	Scramble-118S
3	A4	11.29	18S	Scramble-2	Scramble-218S
4	A5	11.31	18S	Scramble-2	Scramble-218S

In this case, each sample had two biological replicates. Equipt does not currently perform statistical analyses, but having an indicator of the biological replicate in the samples list is required to prevent equipt from merging them into a single sample.

Using deltact

deltact() takes seven parameters. Their documentation is reproduced here:

Params
______

ct_data : a dataframe
    Output of namer().

housekeeping : list
    A list of housekeeping primers.

reps : int
    The number of replicate wells per sample-primer pair in the experiment.

dilution : int or None
    If all samples have the same dilution factor, set to None. If samples
    have different dilution factors, set to the dilution factor of samples
    to be included in the dCt analysis. Default None

thresh : float or None
    Highest acceptable standard deviation of replicate wells. If None, no
    thresholding is performed. If float, divergent wells are removed until
    replicate wells have acceptable deviation, and the number of removed
    wells for each sample primer pair is sent to droppedWells.txt.

exp_ctrl : dictionary or None
    A dictionary containing pairs of experimental and control samples for
    ddCt analysis, e.g. {'+Drug1':'DMSO','+Drug2':'DMSO'}. If None, only
    dCt will be returned.

foldchange : Bool
    Whether to convert ddCt values to fold change. Requires a valid
    dictionary for exp_ctrl.

The function has a single output, a dataframe containing the requested values:

Returns
_______

output : a dataframe
    A Pandas DataFrame containing the specified calculations.

ΔCt analysis

In some cases, only the ΔCt value is necessary for analysis. For example, when using qPCR for restriction fragment length polymorphism (RFLP) analysis, a ΔCt value comparing plus and minus restriction enzyme in the same sample is often sufficient. As such, equipt has the option to stop after calculating ΔCt:

[5]:

# Identify reference genes
housekeeping = ['18S','Polr2a']

dct = equipt.deltact(df,
            housekeeping,
            reps,
            dilution=None,
            thresh=0.1,
            exp_ctrl=None,
            foldchange=False
            )

dct.head()

[5]:

	Name	Primer	dCt	StdErr
0	Ewsr1-1	Ewsr1	4.455000	0.112151
1	Ewsr1-1	Fus	2.981667	0.057155
2	Ewsr1-1	Hdac3	6.718333	0.061644
3	Ewsr1-1	Taf15	5.395000	0.071957
4	Ewsr1-2	Ewsr1	4.808333	0.029627

In this dataframe, we can see that for each sample a ΔCt value has been calculated between the indicated primer and the arithmetic mean Ct of the indicated housekeeping genes. The standard error is calculated from the standard deviations of individual primers:

\[\begin{split}\sigma_{housekeeping} = 0.5 * \sqrt{\sigma_{18S} + \sigma_{Polr2a}} \\ \sigma_{\Delta Ct} = 0.5 * \sqrt{\sigma_{housekeeping} + \sigma_{gene \: of \: interest}}\end{split}\]

ΔΔCt analysis

For ΔΔCt analysis, equipt requires an additional parameters to identify the experimental and control samples:

[6]:

housekeeping = ['18S','Polr2a']

exp_ctrl = {'Hdac3-1':'Scramble-1',
            'Hdac3-2':'Scramble-2',
            'Fus-1':'Scramble-1',
            'Fus-2':'Scramble-2',
            'Taf15-1':'Scramble-1',
            'Ewsr1-1':'Scramble-1',
            'Ewsr1-2':'Scramble-2',
            'Taf15-2':'Scramble-2'}

ddct = equipt.deltact(df,
            housekeeping,
            reps,
            dilution=None,
            thresh=0.1,
            exp_ctrl=exp_ctrl,
            foldchange=True
            )

ddct.head()

[6]:

	Experimental	Control	Primer	Exp dCt	ddCt	StdErr	FoldChange
0	Hdac3-1	Scramble-1	Ewsr1	3.770000	0.041667	0.049944	0.971532
1	Hdac3-1	Scramble-1	Fus	3.780000	0.098333	0.027183	0.934111
2	Hdac3-1	Scramble-1	Hdac3	9.210000	2.218333	0.080863	0.214889
3	Hdac3-1	Scramble-1	Taf15	5.796667	0.028333	0.046488	0.980552
4	Hdac3-2	Scramble-2	Ewsr1	3.646667	0.093333	0.054160	0.937354

Equipt has performed the desired calculations. It also labels which experimental and control pairs it used so that we can easily check whether we made any errors in the exp_ctrl dictionary. Standard error for ΔΔCt is calculated from the ΔCt values:

\[\sigma_{\Delta \Delta Ct} = 0.5 * \sqrt{\sigma_{{\Delta Ct}_{experimental}} + \sigma_{{\Delta Ct}_{experimental}}}\]

Equipt does not return standard errors for fold change as the log-transformation makes these values difficult to meaningfully interpret. Examining the spread of fold changes among biological replicates is typically more meaningful than looking at the technical variation of an individual fold change (assuming that the error of the underlying ΔΔCt is acceptably low for the effect size). Because of these challenges, reporting the ΔΔCt values directly is often more honest to the data than reporting fold changes. Nonetheless, journals and peer reviewers generally expect to see fold changes, so equipt provides the option to output them automatically.

Now that the desired values are available in a tidy dataframe, plotting the data with a high level data visualization package such as Holoviews is simple:

[7]:

# Import plotting packages
import holoviews as hv
import bokeh.io

hv.extension('bokeh')
bokeh.io.output_notebook()

Loading BokehJS ...

[8]:

# Remove replicate identifier
ddct['Experimental'] = [i.split('-')[0] for i in ddct['Experimental']]

# Plot
boxwhisker = hv.BoxWhisker(data=ddct,
                            kdims=['Experimental', 'Primer'],
                            vdims=['FoldChange'],
                        ).opts(width=600,
                               xrotation=45,
                            box_color='Primer',
                            cmap={'Fus':'#F58817',
                                 'Ewsr1':'#1B76B7',
                                 'Taf15':'#BD7DAC',
                                 'Hdac3':'#4AA177'}
                        )

boxwhisker

[8]:

Here, the lowest categorical axis indicates the primer, and the one above it indicates the siRNA. The y-axis represents the fold change over a scramble siRNA control. All four siRNAs show knockdown of their target genes but not the other transcripts examined.

Computing Environment

[9]:

%load_ext watermark
%watermark -v -p equipt,jupyterlab,bokeh,holoviews

Python implementation: CPython
Python version       : 3.9.17
IPython version      : 8.12.0

equipt    : 1.0.0
jupyterlab: 3.6.3
bokeh     : 3.1.1
holoviews : 1.16.2