{ "cells": [ { "cell_type": "markdown", "id": "fbc41bf2-ad45-40fb-a3b4-750cf9c76263", "metadata": {}, "source": [ "# averager" ] }, { "cell_type": "markdown", "id": "0e2f11b4-84d5-4266-b259-6351e2ae5d31", "metadata": {}, "source": [ "`averager()` allows the user to check well replicate reproducibility and remove outlier wells systematically. The function can be used independently but is also wrapped into other functions such as `deltact()`. " ] }, { "cell_type": "markdown", "id": "fb1a9bc9-37e3-45fb-a242-ef39e9b770bf", "metadata": {}, "source": [ "## Using averager" ] }, { "cell_type": "markdown", "id": "0b8992eb-a3d5-4117-b557-4cb786bd5809", "metadata": {}, "source": [ "The first input for `averager()` is the output of `namer()`. In this case, we will use the primer efficiency data used in the `namer()` documentation:" ] }, { "cell_type": "code", "execution_count": 9, "id": "b152de80-b87b-43af-8441-c0287d0673d5", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PosCpPrimerNameNamePrim
0A117.51Fus (112734868c1)mESC total cDNA_20mESC total cDNA_20Fus (112734868c1)
1A217.54Fus (112734868c1)mESC total cDNA_20mESC total cDNA_20Fus (112734868c1)
2A317.55Fus (112734868c1)mESC total cDNA_20mESC total cDNA_20Fus (112734868c1)
3A418.49Fus (112734868c1)mESC total cDNA_40mESC total cDNA_40Fus (112734868c1)
4A518.52Fus (112734868c1)mESC total cDNA_40mESC total cDNA_40Fus (112734868c1)
\n", "
" ], "text/plain": [ " Pos Cp Primer Name \\\n", "0 A1 17.51 Fus (112734868c1) mESC total cDNA_20 \n", "1 A2 17.54 Fus (112734868c1) mESC total cDNA_20 \n", "2 A3 17.55 Fus (112734868c1) mESC total cDNA_20 \n", "3 A4 18.49 Fus (112734868c1) mESC total cDNA_40 \n", "4 A5 18.52 Fus (112734868c1) mESC total cDNA_40 \n", "\n", " NamePrim \n", "0 mESC total cDNA_20Fus (112734868c1) \n", "1 mESC total cDNA_20Fus (112734868c1) \n", "2 mESC total cDNA_20Fus (112734868c1) \n", "3 mESC total cDNA_40Fus (112734868c1) \n", "4 mESC total cDNA_40Fus (112734868c1) " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import equipt\n", "\n", "primers = ['Fus (112734868c1)',\n", " 'Fus (15029724a1)',\n", " 'Ewsr1 (6679715a1)',\n", " 'Ewsr1 (88853580c2)',\n", " 'Taf15 (141803447c1)',\n", " 'Taf15 (141803447c2)',\n", " 'Tsix exon4']\n", "\n", "samples = ['mESC total cDNA']\n", "\n", "reps = 3\n", "\n", "config = 'line'\n", "\n", "kwargs = {'with_dil':samples,\n", " 'dil_series':[20,40,80,160],\n", " 'dil_rest':None} \n", "\n", "df = equipt.namer('data/22.11.22_PrimerCurve_Ct.csv',\n", " primers,\n", " samples,\n", " reps,\n", " config,\n", " **kwargs)\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "id": "1d7cd514-bb67-4ffa-906a-be96b9c991e3", "metadata": {}, "source": [ "Notice that `namer()` has supplied the column 'NamePrim'. 'NamePrim' allows `averager()` to identify technical replicates by looking for wells with identical sample-primer pairs.\n", "\n", "`averager()` takes four parameters. Their documentation is reproduced below:\n", "\n", " Params\n", " ______\n", " \n", " ct_data : a dataframe\n", " Output of namer().\n", " \n", " reps : int\n", " The number of replicate wells in the sample. Used to flag sample-\n", " primer pairs where more than half the wells have been removed. Is\n", " not used if thresh == None.\n", "\n", " thresh : float or None\n", " Highest acceptable standard deviation for a set of sample-primer\n", " replicate wells. If set to None no wells are removed. Default 0.1\n", " \n", " update_data : Bool\n", " Whether to alter the input dataframe in place or to leave it unaffected.\n", " \n", "The simplest use of `averager()` is to calculate the mean Ct values of replicates and their standard deviation:" ] }, { "cell_type": "code", "execution_count": 2, "id": "8132ebec-5695-46f3-a18f-b53ac03e6465", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PrimerNameNamePrimAvgCtStdCt
0Ewsr1 (6679715a1)mESC total cDNA_160mESC total cDNA_160Ewsr1 (6679715a1)21.9966670.016997
1Ewsr1 (88853580c2)mESC total cDNA_160mESC total cDNA_160Ewsr1 (88853580c2)21.0000000.043205
2Fus (112734868c1)mESC total cDNA_160mESC total cDNA_160Fus (112734868c1)21.0400000.558629
3Fus (15029724a1)mESC total cDNA_160mESC total cDNA_160Fus (15029724a1)20.7000000.029439
4Taf15 (141803447c1)mESC total cDNA_160mESC total cDNA_160Taf15 (141803447c1)22.0200000.008165
\n", "
" ], "text/plain": [ " Primer Name \\\n", "0 Ewsr1 (6679715a1) mESC total cDNA_160 \n", "1 Ewsr1 (88853580c2) mESC total cDNA_160 \n", "2 Fus (112734868c1) mESC total cDNA_160 \n", "3 Fus (15029724a1) mESC total cDNA_160 \n", "4 Taf15 (141803447c1) mESC total cDNA_160 \n", "\n", " NamePrim AvgCt StdCt \n", "0 mESC total cDNA_160Ewsr1 (6679715a1) 21.996667 0.016997 \n", "1 mESC total cDNA_160Ewsr1 (88853580c2) 21.000000 0.043205 \n", "2 mESC total cDNA_160Fus (112734868c1) 21.040000 0.558629 \n", "3 mESC total cDNA_160Fus (15029724a1) 20.700000 0.029439 \n", "4 mESC total cDNA_160Taf15 (141803447c1) 22.020000 0.008165 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "avg_df = equipt.averager(df,\n", " 3,\n", " thresh=None)\n", "\n", "avg_df.head()" ] }, { "cell_type": "markdown", "id": "31c86c2e-1e7d-4ead-a14a-3515ec2a1ce2", "metadata": {}, "source": [ "The head of avg_df shows that of the first five sample-primer pairs one had an unusually high standard deviation: Fus (112734868c1) paired with 1:160 dilute cDNA. If we look in the original dataframe, we can see the errant well:" ] }, { "cell_type": "code", "execution_count": 3, "id": "c2583b9d-b969-48ff-8948-fbade33f218a", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PosCpPrimerNameNamePrim
9A1020.64Fus (112734868c1)mESC total cDNA_160mESC total cDNA_160Fus (112734868c1)
10A1120.65Fus (112734868c1)mESC total cDNA_160mESC total cDNA_160Fus (112734868c1)
11A1221.83Fus (112734868c1)mESC total cDNA_160mESC total cDNA_160Fus (112734868c1)
\n", "
" ], "text/plain": [ " Pos Cp Primer Name \\\n", "9 A10 20.64 Fus (112734868c1) mESC total cDNA_160 \n", "10 A11 20.65 Fus (112734868c1) mESC total cDNA_160 \n", "11 A12 21.83 Fus (112734868c1) mESC total cDNA_160 \n", "\n", " NamePrim \n", "9 mESC total cDNA_160Fus (112734868c1) \n", "10 mESC total cDNA_160Fus (112734868c1) \n", "11 mESC total cDNA_160Fus (112734868c1) " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']" ] }, { "cell_type": "markdown", "id": "de44ccc4-9072-4c86-a5b8-8a2997eeed06", "metadata": {}, "source": [ "## Removing Outliers" ] }, { "cell_type": "markdown", "id": "75698041-54f4-453a-9dbf-a9722c4e1032", "metadata": {}, "source": [ "Two wells are within 0.01 cycles of one another while the third diverges by more than 1, presumably due to pipetting error. These samples could be removed manually, but `averager()` can also perform this automatically by searching for the replicate with the highest divergence from the others:" ] }, { "cell_type": "code", "execution_count": 4, "id": "a00dd8fc-cf52-452e-8aa0-6bd598515071", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PrimerNameNamePrimAvgCtStdCt
0Ewsr1 (6679715a1)mESC total cDNA_160mESC total cDNA_160Ewsr1 (6679715a1)21.9966670.016997
1Ewsr1 (88853580c2)mESC total cDNA_160mESC total cDNA_160Ewsr1 (88853580c2)21.0000000.043205
2Fus (112734868c1)mESC total cDNA_160mESC total cDNA_160Fus (112734868c1)20.6450000.005000
3Fus (15029724a1)mESC total cDNA_160mESC total cDNA_160Fus (15029724a1)20.7000000.029439
4Taf15 (141803447c1)mESC total cDNA_160mESC total cDNA_160Taf15 (141803447c1)22.0200000.008165
\n", "
" ], "text/plain": [ " Primer Name \\\n", "0 Ewsr1 (6679715a1) mESC total cDNA_160 \n", "1 Ewsr1 (88853580c2) mESC total cDNA_160 \n", "2 Fus (112734868c1) mESC total cDNA_160 \n", "3 Fus (15029724a1) mESC total cDNA_160 \n", "4 Taf15 (141803447c1) mESC total cDNA_160 \n", "\n", " NamePrim AvgCt StdCt \n", "0 mESC total cDNA_160Ewsr1 (6679715a1) 21.996667 0.016997 \n", "1 mESC total cDNA_160Ewsr1 (88853580c2) 21.000000 0.043205 \n", "2 mESC total cDNA_160Fus (112734868c1) 20.645000 0.005000 \n", "3 mESC total cDNA_160Fus (15029724a1) 20.700000 0.029439 \n", "4 mESC total cDNA_160Taf15 (141803447c1) 22.020000 0.008165 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "avg_df = equipt.averager(df,\n", " 3,\n", " thresh=0.1)\n", "\n", "avg_df.head()" ] }, { "cell_type": "markdown", "id": "df9a129b-4dd5-4b86-90e6-6b21ca0d9813", "metadata": {}, "source": [ "Now the standard deviation for that sample-primer pair has been reduced to 0.005. `averager()` records this removal in a log file:" ] }, { "cell_type": "code", "execution_count": 5, "id": "3593e2fc-c2cc-46d2-8506-512419c33968", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Outlier Wells Dropped:\n", "mESC total cDNA_160Fus (112734868c1) - 1\n", "\n", "Samples Removed:\n", "\n" ] } ], "source": [ "with open('droppedWells.txt','r') as f:\n", " print(f.read())" ] }, { "cell_type": "markdown", "id": "b9dd8651-4650-4d15-94b2-d5efaf2d74bb", "metadata": {}, "source": [ "In this case, only one well was removed. If more than half the replicate wells for a given sample-primer pair are removed, `averager()` removes them completely from the dataframe and records them in the log file under 'Samples Removed:'. Deciding the threshold standard deviation for exclusion is up to the user, but in no case should someone analyze a sample-primer pair where two of three replicate wells have been removed." ] }, { "cell_type": "markdown", "id": "50d6a9a2-4ac1-495b-91bb-39a4269f1ea7", "metadata": {}, "source": [ "## Updating input dataframe" ] }, { "cell_type": "markdown", "id": "ce4a54ac-2b74-4a58-8d54-423a2fe5ff47", "metadata": { "tags": [] }, "source": [ "Note that by default `averager()` does not remove wells in place, so the original dataframe remains unaffected:" ] }, { "cell_type": "code", "execution_count": 6, "id": "12c3323d-2329-46fa-86ce-bcf1dffe612f", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PosCpPrimerNameNamePrim
9A1020.64Fus (112734868c1)mESC total cDNA_160mESC total cDNA_160Fus (112734868c1)
10A1120.65Fus (112734868c1)mESC total cDNA_160mESC total cDNA_160Fus (112734868c1)
11A1221.83Fus (112734868c1)mESC total cDNA_160mESC total cDNA_160Fus (112734868c1)
\n", "
" ], "text/plain": [ " Pos Cp Primer Name \\\n", "9 A10 20.64 Fus (112734868c1) mESC total cDNA_160 \n", "10 A11 20.65 Fus (112734868c1) mESC total cDNA_160 \n", "11 A12 21.83 Fus (112734868c1) mESC total cDNA_160 \n", "\n", " NamePrim \n", "9 mESC total cDNA_160Fus (112734868c1) \n", "10 mESC total cDNA_160Fus (112734868c1) \n", "11 mESC total cDNA_160Fus (112734868c1) " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']" ] }, { "cell_type": "markdown", "id": "d32dbf28-589f-4040-82bf-1123807dd9f5", "metadata": {}, "source": [ "In general, it is not advisable to remove replicates from the original dataframe. In some cases (such as the function `efficiency()`) it is convenient to modify a dataframe in place. For these rare cases, `averager()` gives the option to modify the input dataframe:" ] }, { "cell_type": "code", "execution_count": 7, "id": "35859399-b6f5-4fa4-838a-6849d0c48735", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PosCpPrimerNameNamePrim
9A1020.64Fus (112734868c1)mESC total cDNA_160mESC total cDNA_160Fus (112734868c1)
10A1120.65Fus (112734868c1)mESC total cDNA_160mESC total cDNA_160Fus (112734868c1)
\n", "
" ], "text/plain": [ " Pos Cp Primer Name \\\n", "9 A10 20.64 Fus (112734868c1) mESC total cDNA_160 \n", "10 A11 20.65 Fus (112734868c1) mESC total cDNA_160 \n", "\n", " NamePrim \n", "9 mESC total cDNA_160Fus (112734868c1) \n", "10 mESC total cDNA_160Fus (112734868c1) " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "equipt.averager(df,\n", " 3,\n", " thresh=0.1,\n", " update_data=True)\n", "\n", "df[df['NamePrim'] == 'mESC total cDNA_160Fus (112734868c1)']" ] }, { "cell_type": "code", "execution_count": 8, "id": "5aae24ab-cec8-47d9-a984-3db60fb5262d", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python implementation: CPython\n", "Python version : 3.9.17\n", "IPython version : 8.12.0\n", "\n", "equipt : 1.0.0\n", "jupyterlab: 3.6.3\n", "\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -v -p equipt,jupyterlab" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.17" } }, "nbformat": 4, "nbformat_minor": 5 }