averager
averager()
allows the user to check well replicate reproducibility and remove outlier wells systematically. The function can be used independently but is also wrapped into other functions such as deltact()
.
Using averager
The first input for averager()
is the output of namer()
. In this case, we will use the primer efficiency data used in the namer()
documentation:
[9]:
import equipt
primers = ['Fus (112734868c1)',
'Fus (15029724a1)',
'Ewsr1 (6679715a1)',
'Ewsr1 (88853580c2)',
'Taf15 (141803447c1)',
'Taf15 (141803447c2)',
'Tsix exon4']
samples = ['mESC total cDNA']
reps = 3
config = 'line'
kwargs = {'with_dil':samples,
'dil_series':[20,40,80,160],
'dil_rest':None}
df = equipt.namer('data/22.11.22_PrimerCurve_Ct.csv',
primers,
samples,
reps,
config,
**kwargs)
df.head()
[9]:
Pos | Cp | Primer | Name | NamePrim | |
---|---|---|---|---|---|
0 | A1 | 17.51 | Fus (112734868c1) | mESC total cDNA_20 | mESC total cDNA_20Fus (112734868c1) |
1 | A2 | 17.54 | Fus (112734868c1) | mESC total cDNA_20 | mESC total cDNA_20Fus (112734868c1) |
2 | A3 | 17.55 | Fus (112734868c1) | mESC total cDNA_20 | mESC total cDNA_20Fus (112734868c1) |
3 | A4 | 18.49 | Fus (112734868c1) | mESC total cDNA_40 | mESC total cDNA_40Fus (112734868c1) |
4 | A5 | 18.52 | Fus (112734868c1) | mESC total cDNA_40 | mESC total cDNA_40Fus (112734868c1) |
Notice that namer()
has supplied the column ‘NamePrim’. ‘NamePrim’ allows averager()
to identify technical replicates by looking for wells with identical sample-primer pairs.
averager()
takes four parameters. Their documentation is reproduced below:
Params
______
ct_data : a dataframe
Output of namer().
reps : int
The number of replicate wells in the sample. Used to flag sample-
primer pairs where more than half the wells have been removed. Is
not used if thresh == None.
thresh : float or None
Highest acceptable standard deviation for a set of sample-primer
replicate wells. If set to None no wells are removed. Default 0.1
update_data : Bool
Whether to alter the input dataframe in place or to leave it unaffected.
The simplest use of averager()
is to calculate the mean Ct values of replicates and their standard deviation:
[2]:
avg_df = equipt.averager(df,
3,
thresh=None)
avg_df.head()
[2]:
Primer | Name | NamePrim | AvgCt | StdCt | |
---|---|---|---|---|---|
0 | Ewsr1 (6679715a1) | mESC total cDNA_160 | mESC total cDNA_160Ewsr1 (6679715a1) | 21.996667 | 0.016997 |
1 | Ewsr1 (88853580c2) | mESC total cDNA_160 | mESC total cDNA_160Ewsr1 (88853580c2) | 21.000000 | 0.043205 |
2 | Fus (112734868c1) | mESC total cDNA_160 | mESC total cDNA_160Fus (112734868c1) | 21.040000 | 0.558629 |
3 | Fus (15029724a1) | mESC total cDNA_160 | mESC total cDNA_160Fus (15029724a1) | 20.700000 | 0.029439 |
4 | Taf15 (141803447c1) | mESC total cDNA_160 | mESC total cDNA_160Taf15 (141803447c1) | 22.020000 | 0.008165 |
The head of avg_df shows that of the first five sample-primer pairs one had an unusually high standard deviation: Fus (112734868c1) paired with 1:160 dilute cDNA. If we look in the original dataframe, we can see the errant well:
[3]:
df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']
[3]:
Pos | Cp | Primer | Name | NamePrim | |
---|---|---|---|---|---|
9 | A10 | 20.64 | Fus (112734868c1) | mESC total cDNA_160 | mESC total cDNA_160Fus (112734868c1) |
10 | A11 | 20.65 | Fus (112734868c1) | mESC total cDNA_160 | mESC total cDNA_160Fus (112734868c1) |
11 | A12 | 21.83 | Fus (112734868c1) | mESC total cDNA_160 | mESC total cDNA_160Fus (112734868c1) |
Removing Outliers
Two wells are within 0.01 cycles of one another while the third diverges by more than 1, presumably due to pipetting error. These samples could be removed manually, but averager()
can also perform this automatically by searching for the replicate with the highest divergence from the others:
[4]:
avg_df = equipt.averager(df,
3,
thresh=0.1)
avg_df.head()
[4]:
Primer | Name | NamePrim | AvgCt | StdCt | |
---|---|---|---|---|---|
0 | Ewsr1 (6679715a1) | mESC total cDNA_160 | mESC total cDNA_160Ewsr1 (6679715a1) | 21.996667 | 0.016997 |
1 | Ewsr1 (88853580c2) | mESC total cDNA_160 | mESC total cDNA_160Ewsr1 (88853580c2) | 21.000000 | 0.043205 |
2 | Fus (112734868c1) | mESC total cDNA_160 | mESC total cDNA_160Fus (112734868c1) | 20.645000 | 0.005000 |
3 | Fus (15029724a1) | mESC total cDNA_160 | mESC total cDNA_160Fus (15029724a1) | 20.700000 | 0.029439 |
4 | Taf15 (141803447c1) | mESC total cDNA_160 | mESC total cDNA_160Taf15 (141803447c1) | 22.020000 | 0.008165 |
Now the standard deviation for that sample-primer pair has been reduced to 0.005. averager()
records this removal in a log file:
[5]:
with open('droppedWells.txt','r') as f:
print(f.read())
Outlier Wells Dropped:
mESC total cDNA_160Fus (112734868c1) - 1
Samples Removed:
In this case, only one well was removed. If more than half the replicate wells for a given sample-primer pair are removed, averager()
removes them completely from the dataframe and records them in the log file under ‘Samples Removed:’. Deciding the threshold standard deviation for exclusion is up to the user, but in no case should someone analyze a sample-primer pair where two of three replicate wells have been removed.
Updating input dataframe
Note that by default averager()
does not remove wells in place, so the original dataframe remains unaffected:
[6]:
df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']
[6]:
Pos | Cp | Primer | Name | NamePrim | |
---|---|---|---|---|---|
9 | A10 | 20.64 | Fus (112734868c1) | mESC total cDNA_160 | mESC total cDNA_160Fus (112734868c1) |
10 | A11 | 20.65 | Fus (112734868c1) | mESC total cDNA_160 | mESC total cDNA_160Fus (112734868c1) |
11 | A12 | 21.83 | Fus (112734868c1) | mESC total cDNA_160 | mESC total cDNA_160Fus (112734868c1) |
In general, it is not advisable to remove replicates from the original dataframe. In some cases (such as the function efficiency()
) it is convenient to modify a dataframe in place. For these rare cases, averager()
gives the option to modify the input dataframe:
[7]:
equipt.averager(df,
3,
thresh=0.1,
update_data=True)
df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']
[7]:
Pos | Cp | Primer | Name | NamePrim | |
---|---|---|---|---|---|
9 | A10 | 20.64 | Fus (112734868c1) | mESC total cDNA_160 | mESC total cDNA_160Fus (112734868c1) |
10 | A11 | 20.65 | Fus (112734868c1) | mESC total cDNA_160 | mESC total cDNA_160Fus (112734868c1) |
[8]:
%load_ext watermark
%watermark -v -p equipt,jupyterlab
Python implementation: CPython
Python version : 3.9.17
IPython version : 8.12.0
equipt : 1.0.0
jupyterlab: 3.6.3