{
"cells": [
{
"cell_type": "markdown",
"id": "fbc41bf2-ad45-40fb-a3b4-750cf9c76263",
"metadata": {},
"source": [
"# averager"
]
},
{
"cell_type": "markdown",
"id": "0e2f11b4-84d5-4266-b259-6351e2ae5d31",
"metadata": {},
"source": [
"`averager()` allows the user to check well replicate reproducibility and remove outlier wells systematically. The function can be used independently but is also wrapped into other functions such as `deltact()`. "
]
},
{
"cell_type": "markdown",
"id": "fb1a9bc9-37e3-45fb-a242-ef39e9b770bf",
"metadata": {},
"source": [
"## Using averager"
]
},
{
"cell_type": "markdown",
"id": "0b8992eb-a3d5-4117-b557-4cb786bd5809",
"metadata": {},
"source": [
"The first input for `averager()` is the output of `namer()`. In this case, we will use the primer efficiency data used in the `namer()` documentation:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b152de80-b87b-43af-8441-c0287d0673d5",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Pos | \n",
" Cp | \n",
" Primer | \n",
" Name | \n",
" NamePrim | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A1 | \n",
" 17.51 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_20 | \n",
" mESC total cDNA_20Fus (112734868c1) | \n",
"
\n",
" \n",
" 1 | \n",
" A2 | \n",
" 17.54 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_20 | \n",
" mESC total cDNA_20Fus (112734868c1) | \n",
"
\n",
" \n",
" 2 | \n",
" A3 | \n",
" 17.55 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_20 | \n",
" mESC total cDNA_20Fus (112734868c1) | \n",
"
\n",
" \n",
" 3 | \n",
" A4 | \n",
" 18.49 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_40 | \n",
" mESC total cDNA_40Fus (112734868c1) | \n",
"
\n",
" \n",
" 4 | \n",
" A5 | \n",
" 18.52 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_40 | \n",
" mESC total cDNA_40Fus (112734868c1) | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Pos Cp Primer Name \\\n",
"0 A1 17.51 Fus (112734868c1) mESC total cDNA_20 \n",
"1 A2 17.54 Fus (112734868c1) mESC total cDNA_20 \n",
"2 A3 17.55 Fus (112734868c1) mESC total cDNA_20 \n",
"3 A4 18.49 Fus (112734868c1) mESC total cDNA_40 \n",
"4 A5 18.52 Fus (112734868c1) mESC total cDNA_40 \n",
"\n",
" NamePrim \n",
"0 mESC total cDNA_20Fus (112734868c1) \n",
"1 mESC total cDNA_20Fus (112734868c1) \n",
"2 mESC total cDNA_20Fus (112734868c1) \n",
"3 mESC total cDNA_40Fus (112734868c1) \n",
"4 mESC total cDNA_40Fus (112734868c1) "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import equipt\n",
"\n",
"primers = ['Fus (112734868c1)',\n",
" 'Fus (15029724a1)',\n",
" 'Ewsr1 (6679715a1)',\n",
" 'Ewsr1 (88853580c2)',\n",
" 'Taf15 (141803447c1)',\n",
" 'Taf15 (141803447c2)',\n",
" 'Tsix exon4']\n",
"\n",
"samples = ['mESC total cDNA']\n",
"\n",
"reps = 3\n",
"\n",
"config = 'line'\n",
"\n",
"kwargs = {'with_dil':samples,\n",
" 'dil_series':[20,40,80,160],\n",
" 'dil_rest':None} \n",
"\n",
"df = equipt.namer('data/22.11.22_PrimerCurve_Ct.csv',\n",
" primers,\n",
" samples,\n",
" reps,\n",
" config,\n",
" **kwargs)\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "1d7cd514-bb67-4ffa-906a-be96b9c991e3",
"metadata": {},
"source": [
"Notice that `namer()` has supplied the column 'NamePrim'. 'NamePrim' allows `averager()` to identify technical replicates by looking for wells with identical sample-primer pairs.\n",
"\n",
"`averager()` takes four parameters. Their documentation is reproduced below:\n",
"\n",
" Params\n",
" ______\n",
" \n",
" ct_data : a dataframe\n",
" Output of namer().\n",
" \n",
" reps : int\n",
" The number of replicate wells in the sample. Used to flag sample-\n",
" primer pairs where more than half the wells have been removed. Is\n",
" not used if thresh == None.\n",
"\n",
" thresh : float or None\n",
" Highest acceptable standard deviation for a set of sample-primer\n",
" replicate wells. If set to None no wells are removed. Default 0.1\n",
" \n",
" update_data : Bool\n",
" Whether to alter the input dataframe in place or to leave it unaffected.\n",
" \n",
"The simplest use of `averager()` is to calculate the mean Ct values of replicates and their standard deviation:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8132ebec-5695-46f3-a18f-b53ac03e6465",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Primer | \n",
" Name | \n",
" NamePrim | \n",
" AvgCt | \n",
" StdCt | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Ewsr1 (6679715a1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Ewsr1 (6679715a1) | \n",
" 21.996667 | \n",
" 0.016997 | \n",
"
\n",
" \n",
" 1 | \n",
" Ewsr1 (88853580c2) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Ewsr1 (88853580c2) | \n",
" 21.000000 | \n",
" 0.043205 | \n",
"
\n",
" \n",
" 2 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (112734868c1) | \n",
" 21.040000 | \n",
" 0.558629 | \n",
"
\n",
" \n",
" 3 | \n",
" Fus (15029724a1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (15029724a1) | \n",
" 20.700000 | \n",
" 0.029439 | \n",
"
\n",
" \n",
" 4 | \n",
" Taf15 (141803447c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Taf15 (141803447c1) | \n",
" 22.020000 | \n",
" 0.008165 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Primer Name \\\n",
"0 Ewsr1 (6679715a1) mESC total cDNA_160 \n",
"1 Ewsr1 (88853580c2) mESC total cDNA_160 \n",
"2 Fus (112734868c1) mESC total cDNA_160 \n",
"3 Fus (15029724a1) mESC total cDNA_160 \n",
"4 Taf15 (141803447c1) mESC total cDNA_160 \n",
"\n",
" NamePrim AvgCt StdCt \n",
"0 mESC total cDNA_160Ewsr1 (6679715a1) 21.996667 0.016997 \n",
"1 mESC total cDNA_160Ewsr1 (88853580c2) 21.000000 0.043205 \n",
"2 mESC total cDNA_160Fus (112734868c1) 21.040000 0.558629 \n",
"3 mESC total cDNA_160Fus (15029724a1) 20.700000 0.029439 \n",
"4 mESC total cDNA_160Taf15 (141803447c1) 22.020000 0.008165 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"avg_df = equipt.averager(df,\n",
" 3,\n",
" thresh=None)\n",
"\n",
"avg_df.head()"
]
},
{
"cell_type": "markdown",
"id": "31c86c2e-1e7d-4ead-a14a-3515ec2a1ce2",
"metadata": {},
"source": [
"The head of avg_df shows that of the first five sample-primer pairs one had an unusually high standard deviation: Fus (112734868c1) paired with 1:160 dilute cDNA. If we look in the original dataframe, we can see the errant well:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c2583b9d-b969-48ff-8948-fbade33f218a",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Pos | \n",
" Cp | \n",
" Primer | \n",
" Name | \n",
" NamePrim | \n",
"
\n",
" \n",
" \n",
" \n",
" 9 | \n",
" A10 | \n",
" 20.64 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (112734868c1) | \n",
"
\n",
" \n",
" 10 | \n",
" A11 | \n",
" 20.65 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (112734868c1) | \n",
"
\n",
" \n",
" 11 | \n",
" A12 | \n",
" 21.83 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (112734868c1) | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Pos Cp Primer Name \\\n",
"9 A10 20.64 Fus (112734868c1) mESC total cDNA_160 \n",
"10 A11 20.65 Fus (112734868c1) mESC total cDNA_160 \n",
"11 A12 21.83 Fus (112734868c1) mESC total cDNA_160 \n",
"\n",
" NamePrim \n",
"9 mESC total cDNA_160Fus (112734868c1) \n",
"10 mESC total cDNA_160Fus (112734868c1) \n",
"11 mESC total cDNA_160Fus (112734868c1) "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']"
]
},
{
"cell_type": "markdown",
"id": "de44ccc4-9072-4c86-a5b8-8a2997eeed06",
"metadata": {},
"source": [
"## Removing Outliers"
]
},
{
"cell_type": "markdown",
"id": "75698041-54f4-453a-9dbf-a9722c4e1032",
"metadata": {},
"source": [
"Two wells are within 0.01 cycles of one another while the third diverges by more than 1, presumably due to pipetting error. These samples could be removed manually, but `averager()` can also perform this automatically by searching for the replicate with the highest divergence from the others:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a00dd8fc-cf52-452e-8aa0-6bd598515071",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Primer | \n",
" Name | \n",
" NamePrim | \n",
" AvgCt | \n",
" StdCt | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Ewsr1 (6679715a1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Ewsr1 (6679715a1) | \n",
" 21.996667 | \n",
" 0.016997 | \n",
"
\n",
" \n",
" 1 | \n",
" Ewsr1 (88853580c2) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Ewsr1 (88853580c2) | \n",
" 21.000000 | \n",
" 0.043205 | \n",
"
\n",
" \n",
" 2 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (112734868c1) | \n",
" 20.645000 | \n",
" 0.005000 | \n",
"
\n",
" \n",
" 3 | \n",
" Fus (15029724a1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (15029724a1) | \n",
" 20.700000 | \n",
" 0.029439 | \n",
"
\n",
" \n",
" 4 | \n",
" Taf15 (141803447c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Taf15 (141803447c1) | \n",
" 22.020000 | \n",
" 0.008165 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Primer Name \\\n",
"0 Ewsr1 (6679715a1) mESC total cDNA_160 \n",
"1 Ewsr1 (88853580c2) mESC total cDNA_160 \n",
"2 Fus (112734868c1) mESC total cDNA_160 \n",
"3 Fus (15029724a1) mESC total cDNA_160 \n",
"4 Taf15 (141803447c1) mESC total cDNA_160 \n",
"\n",
" NamePrim AvgCt StdCt \n",
"0 mESC total cDNA_160Ewsr1 (6679715a1) 21.996667 0.016997 \n",
"1 mESC total cDNA_160Ewsr1 (88853580c2) 21.000000 0.043205 \n",
"2 mESC total cDNA_160Fus (112734868c1) 20.645000 0.005000 \n",
"3 mESC total cDNA_160Fus (15029724a1) 20.700000 0.029439 \n",
"4 mESC total cDNA_160Taf15 (141803447c1) 22.020000 0.008165 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"avg_df = equipt.averager(df,\n",
" 3,\n",
" thresh=0.1)\n",
"\n",
"avg_df.head()"
]
},
{
"cell_type": "markdown",
"id": "df9a129b-4dd5-4b86-90e6-6b21ca0d9813",
"metadata": {},
"source": [
"Now the standard deviation for that sample-primer pair has been reduced to 0.005. `averager()` records this removal in a log file:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3593e2fc-c2cc-46d2-8506-512419c33968",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Outlier Wells Dropped:\n",
"mESC total cDNA_160Fus (112734868c1) - 1\n",
"\n",
"Samples Removed:\n",
"\n"
]
}
],
"source": [
"with open('droppedWells.txt','r') as f:\n",
" print(f.read())"
]
},
{
"cell_type": "markdown",
"id": "b9dd8651-4650-4d15-94b2-d5efaf2d74bb",
"metadata": {},
"source": [
"In this case, only one well was removed. If more than half the replicate wells for a given sample-primer pair are removed, `averager()` removes them completely from the dataframe and records them in the log file under 'Samples Removed:'. Deciding the threshold standard deviation for exclusion is up to the user, but in no case should someone analyze a sample-primer pair where two of three replicate wells have been removed."
]
},
{
"cell_type": "markdown",
"id": "50d6a9a2-4ac1-495b-91bb-39a4269f1ea7",
"metadata": {},
"source": [
"## Updating input dataframe"
]
},
{
"cell_type": "markdown",
"id": "ce4a54ac-2b74-4a58-8d54-423a2fe5ff47",
"metadata": {
"tags": []
},
"source": [
"Note that by default `averager()` does not remove wells in place, so the original dataframe remains unaffected:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "12c3323d-2329-46fa-86ce-bcf1dffe612f",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Pos | \n",
" Cp | \n",
" Primer | \n",
" Name | \n",
" NamePrim | \n",
"
\n",
" \n",
" \n",
" \n",
" 9 | \n",
" A10 | \n",
" 20.64 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (112734868c1) | \n",
"
\n",
" \n",
" 10 | \n",
" A11 | \n",
" 20.65 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (112734868c1) | \n",
"
\n",
" \n",
" 11 | \n",
" A12 | \n",
" 21.83 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (112734868c1) | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Pos Cp Primer Name \\\n",
"9 A10 20.64 Fus (112734868c1) mESC total cDNA_160 \n",
"10 A11 20.65 Fus (112734868c1) mESC total cDNA_160 \n",
"11 A12 21.83 Fus (112734868c1) mESC total cDNA_160 \n",
"\n",
" NamePrim \n",
"9 mESC total cDNA_160Fus (112734868c1) \n",
"10 mESC total cDNA_160Fus (112734868c1) \n",
"11 mESC total cDNA_160Fus (112734868c1) "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']"
]
},
{
"cell_type": "markdown",
"id": "d32dbf28-589f-4040-82bf-1123807dd9f5",
"metadata": {},
"source": [
"In general, it is not advisable to remove replicates from the original dataframe. In some cases (such as the function `efficiency()`) it is convenient to modify a dataframe in place. For these rare cases, `averager()` gives the option to modify the input dataframe:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "35859399-b6f5-4fa4-838a-6849d0c48735",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Pos | \n",
" Cp | \n",
" Primer | \n",
" Name | \n",
" NamePrim | \n",
"
\n",
" \n",
" \n",
" \n",
" 9 | \n",
" A10 | \n",
" 20.64 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (112734868c1) | \n",
"
\n",
" \n",
" 10 | \n",
" A11 | \n",
" 20.65 | \n",
" Fus (112734868c1) | \n",
" mESC total cDNA_160 | \n",
" mESC total cDNA_160Fus (112734868c1) | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Pos Cp Primer Name \\\n",
"9 A10 20.64 Fus (112734868c1) mESC total cDNA_160 \n",
"10 A11 20.65 Fus (112734868c1) mESC total cDNA_160 \n",
"\n",
" NamePrim \n",
"9 mESC total cDNA_160Fus (112734868c1) \n",
"10 mESC total cDNA_160Fus (112734868c1) "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"equipt.averager(df,\n",
" 3,\n",
" thresh=0.1,\n",
" update_data=True)\n",
"\n",
"df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5aae24ab-cec8-47d9-a984-3db60fb5262d",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Python implementation: CPython\n",
"Python version : 3.9.17\n",
"IPython version : 8.12.0\n",
"\n",
"equipt : 1.0.0\n",
"jupyterlab: 3.6.3\n",
"\n"
]
}
],
"source": [
"%load_ext watermark\n",
"%watermark -v -p equipt,jupyterlab"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.17"
}
},
"nbformat": 4,
"nbformat_minor": 5
}