averager

averager() allows the user to check well replicate reproducibility and remove outlier wells systematically. The function can be used independently but is also wrapped into other functions such as deltact().

Using averager

The first input for averager() is the output of namer(). In this case, we will use the primer efficiency data used in the namer() documentation:

[9]:
import equipt

primers = ['Fus (112734868c1)',
         'Fus (15029724a1)',
         'Ewsr1 (6679715a1)',
         'Ewsr1 (88853580c2)',
         'Taf15 (141803447c1)',
         'Taf15 (141803447c2)',
         'Tsix exon4']

samples = ['mESC total cDNA']

reps = 3

config = 'line'

kwargs = {'with_dil':samples,
         'dil_series':[20,40,80,160],
         'dil_rest':None}

df = equipt.namer('data/22.11.22_PrimerCurve_Ct.csv',
            primers,
            samples,
            reps,
            config,
            **kwargs)

df.head()
[9]:
Pos Cp Primer Name NamePrim
0 A1 17.51 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
1 A2 17.54 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
2 A3 17.55 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
3 A4 18.49 Fus (112734868c1) mESC total cDNA_40 mESC total cDNA_40Fus (112734868c1)
4 A5 18.52 Fus (112734868c1) mESC total cDNA_40 mESC total cDNA_40Fus (112734868c1)

Notice that namer() has supplied the column ‘NamePrim’. ‘NamePrim’ allows averager() to identify technical replicates by looking for wells with identical sample-primer pairs.

averager() takes four parameters. Their documentation is reproduced below:

Params
______

ct_data : a dataframe
    Output of namer().

reps : int
    The number of replicate wells in the sample. Used to flag sample-
    primer pairs where more than half the wells have been removed. Is
    not used if thresh == None.

thresh : float or None
    Highest acceptable standard deviation for a set of sample-primer
    replicate wells. If set to None no wells are removed. Default 0.1

update_data : Bool
    Whether to alter the input dataframe in place or to leave it unaffected.

The simplest use of averager() is to calculate the mean Ct values of replicates and their standard deviation:

[2]:
avg_df = equipt.averager(df,
                        3,
                        thresh=None)

avg_df.head()
[2]:
Primer Name NamePrim AvgCt StdCt
0 Ewsr1 (6679715a1) mESC total cDNA_160 mESC total cDNA_160Ewsr1 (6679715a1) 21.996667 0.016997
1 Ewsr1 (88853580c2) mESC total cDNA_160 mESC total cDNA_160Ewsr1 (88853580c2) 21.000000 0.043205
2 Fus (112734868c1) mESC total cDNA_160 mESC total cDNA_160Fus (112734868c1) 21.040000 0.558629
3 Fus (15029724a1) mESC total cDNA_160 mESC total cDNA_160Fus (15029724a1) 20.700000 0.029439
4 Taf15 (141803447c1) mESC total cDNA_160 mESC total cDNA_160Taf15 (141803447c1) 22.020000 0.008165

The head of avg_df shows that of the first five sample-primer pairs one had an unusually high standard deviation: Fus (112734868c1) paired with 1:160 dilute cDNA. If we look in the original dataframe, we can see the errant well:

[3]:
df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']
[3]:
Pos Cp Primer Name NamePrim
9 A10 20.64 Fus (112734868c1) mESC total cDNA_160 mESC total cDNA_160Fus (112734868c1)
10 A11 20.65 Fus (112734868c1) mESC total cDNA_160 mESC total cDNA_160Fus (112734868c1)
11 A12 21.83 Fus (112734868c1) mESC total cDNA_160 mESC total cDNA_160Fus (112734868c1)

Removing Outliers

Two wells are within 0.01 cycles of one another while the third diverges by more than 1, presumably due to pipetting error. These samples could be removed manually, but averager() can also perform this automatically by searching for the replicate with the highest divergence from the others:

[4]:
avg_df = equipt.averager(df,
                        3,
                        thresh=0.1)

avg_df.head()
[4]:
Primer Name NamePrim AvgCt StdCt
0 Ewsr1 (6679715a1) mESC total cDNA_160 mESC total cDNA_160Ewsr1 (6679715a1) 21.996667 0.016997
1 Ewsr1 (88853580c2) mESC total cDNA_160 mESC total cDNA_160Ewsr1 (88853580c2) 21.000000 0.043205
2 Fus (112734868c1) mESC total cDNA_160 mESC total cDNA_160Fus (112734868c1) 20.645000 0.005000
3 Fus (15029724a1) mESC total cDNA_160 mESC total cDNA_160Fus (15029724a1) 20.700000 0.029439
4 Taf15 (141803447c1) mESC total cDNA_160 mESC total cDNA_160Taf15 (141803447c1) 22.020000 0.008165

Now the standard deviation for that sample-primer pair has been reduced to 0.005. averager() records this removal in a log file:

[5]:
with open('droppedWells.txt','r') as f:
    print(f.read())
Outlier Wells Dropped:
mESC total cDNA_160Fus (112734868c1) - 1

Samples Removed:

In this case, only one well was removed. If more than half the replicate wells for a given sample-primer pair are removed, averager() removes them completely from the dataframe and records them in the log file under ‘Samples Removed:’. Deciding the threshold standard deviation for exclusion is up to the user, but in no case should someone analyze a sample-primer pair where two of three replicate wells have been removed.

Updating input dataframe

Note that by default averager() does not remove wells in place, so the original dataframe remains unaffected:

[6]:
df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']
[6]:
Pos Cp Primer Name NamePrim
9 A10 20.64 Fus (112734868c1) mESC total cDNA_160 mESC total cDNA_160Fus (112734868c1)
10 A11 20.65 Fus (112734868c1) mESC total cDNA_160 mESC total cDNA_160Fus (112734868c1)
11 A12 21.83 Fus (112734868c1) mESC total cDNA_160 mESC total cDNA_160Fus (112734868c1)

In general, it is not advisable to remove replicates from the original dataframe. In some cases (such as the function efficiency()) it is convenient to modify a dataframe in place. For these rare cases, averager() gives the option to modify the input dataframe:

[7]:
equipt.averager(df,
                3,
                thresh=0.1,
                update_data=True)

df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']
[7]:
Pos Cp Primer Name NamePrim
9 A10 20.64 Fus (112734868c1) mESC total cDNA_160 mESC total cDNA_160Fus (112734868c1)
10 A11 20.65 Fus (112734868c1) mESC total cDNA_160 mESC total cDNA_160Fus (112734868c1)
[8]:
%load_ext watermark
%watermark -v -p equipt,jupyterlab
Python implementation: CPython
Python version       : 3.9.17
IPython version      : 8.12.0

equipt    : 1.0.0
jupyterlab: 3.6.3