{ "cells": [ { "cell_type": "markdown", "id": "751edb23-dee8-4d68-bd89-ba0bfe73bd8a", "metadata": {}, "source": [ "# namer" ] }, { "cell_type": "markdown", "id": "a8c158ce-e386-47bf-a8a6-ee20bbb62456", "metadata": {}, "source": [ "`namer()` is the function that imports qPCR data and labels wells. The output of `namer()` is used for all subsequent equipt functions. \n", "\n", "The first step of `namer()` is a function that imports the Ct values from an .xlsx or .csv file. In the current distribution this is performed by the function `lc480_importer()`, which is specific to the data output of the Roche LightCycler 480. For other instruments and file formats the user can write their own importer and supply it to namer. The only requirements are that it outputs a Pandas Dataframe with one column named 'Pos' that contains well positions, and another named 'Cp' that contains the Ct values. `namer()` does not use the 'Pos' column, but it allows the user to easily verify that wells were accurately named." ] }, { "cell_type": "markdown", "id": "c9ea1ea1-c6b1-4294-9e22-5ae210f47492", "metadata": { "tags": [] }, "source": [ "## Using namer for the Lightcycler 480" ] }, { "cell_type": "code", "execution_count": 1, "id": "84f0da52-35d5-439e-8c94-12103f258b7e", "metadata": { "tags": [] }, "outputs": [], "source": [ "import equipt\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "29abbbe1-c1c3-4d75-bdb5-f6dcfed0f2d5", "metadata": {}, "source": [ "This example uses data from an efficiency curve analysis of seven primer sets tested on four dilutions of a single cDNA sample. The first few lines of the .csv file exported by the Lightcycler 480 looks like this:" ] }, { "cell_type": "code", "execution_count": 2, "id": "5d84470b-2f51-4cda-af70-310e106483b8", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Experiment: DH_22.11.22_PrimerCurve Selected Filter: SYBR Green I / HRM Dye (465-510)\n", "Include\tColor\tPos\tName\tCp\tConcentration\tStandard\tStatus\n", "True\t255\tA1\tSample 1\t17.51\t\t0\t\n", "True\t16711680\tA2\tSample 2\t17.54\t\t0\t? - Detector Call uncertain\n", "True\t255\tA3\tSample 3\t17.55\t\t0\t\n", "True\t255\tA4\tSample 4\t18.49\t\t0\t\n", "True\t255\tA5\tSample 5\t18.52\t\t0\t\n", "True\t255\tA6\tSample 6\t18.54\t\t0\t\n", "True\t255\tA7\tSample 7\t19.53\t\t0\t\n", "True\t255\tA8\tSample 8\t19.59\t\t0\t\n", "True\t255\tA9\tSample 9\t19.59\t\t0\t\n", "True\t255\tA10\tSample 10\t20.64\t\t0\t\n" ] } ], "source": [ "with open('data/22.11.22_PrimerCurve_Ct.csv','r') as f:\n", " print(f.read()[:486])" ] }, { "cell_type": "markdown", "id": "1271675a-f3a7-4c62-9011-3a883f0b27d7", "metadata": {}, "source": [ "The first line contains the experiment name and filter sets used, the second line contains the column names, and the remaining lines contain tab-separated values for the experiment. Before moving onto `namer()`, lets look at the output of the importer function:" ] }, { "cell_type": "code", "execution_count": 3, "id": "585480d7-e857-4959-a600-19c07c9454ff", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PosCp
0A117.51
1A217.54
2A317.55
3A418.49
4A518.52
5A618.54
\n", "
" ], "text/plain": [ " Pos Cp\n", "0 A1 17.51\n", "1 A2 17.54\n", "2 A3 17.55\n", "3 A4 18.49\n", "4 A5 18.52\n", "5 A6 18.54" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "equipt.lc480_importer('data/22.11.22_PrimerCurve_Ct.csv').iloc[:6]" ] }, { "cell_type": "markdown", "id": "974488bc-44aa-4059-815b-2e862443858a", "metadata": {}, "source": [ "The importer skips the header and strips away all columns except for 'Pos' and 'Cp'. The 'Cp' column name is important for subsequent analyses, but `namer()` only uses relative positions to name columns.\n", "\n", "We can now use namer to automatically label the wells. `namer()` uses six parameters. Their documentation is reproduced below:\n", " \n", " Params\n", " ______\n", " \n", " ct_file : str\n", " Path to a CSV or Excel file containing the qPCR data. Currently only\n", " data output from a Lightcycler 480 is supported, but the structure of \n", " namer() allows for other importers to be written without disrupting the\n", " rest of the function.\n", " \n", " primers : list of strings\n", " A list, in order, of the primers. See documentation for supported plate\n", " arrangements.\n", " \n", " samples : list of strings\n", " A list, in order, or the sample names. See documentation for supported\n", " plate arrangements.\n", " \n", " reps : int\n", " Number of replicate wells. 2, 3, or 4.\n", " \n", " config : str\n", " A description of how the samples are arranged: 'square' or 'line'. See\n", " documentation for additional details. Default 'line'\n", " \n", " importer : a custom importer function or None\n", " A user-supplied function that imports data from their qPCR instrument \n", " to a Pandas Dataframe with columns 'Pos', for the well position, and\n", " 'Cp' for the Ct values. If None, namer() defaults to an importer for\n", " data from the Roche Lightcycler 480. Default None\n", " \n", " **kwargs : dictionary\n", " \n", " with_dil : list of strings\n", " List of names of samples that have dilution curves.\n", " \n", " dil_series : list of ints\n", " List of dilution factors in order on plate. Dilutions\n", " should be entered as integers (e.g. a 1:10 dilution \n", " should be entered as 10).\n", " \n", " dil_rest : int or None\n", " The dilution of samples that do not have a dilution \n", " series. If None, with_dil should contain all samples.\n", "\n", "The \\**kwargs parameter should only be used if one or more sample has a dilution series. Otherwise it need not be supplied. For this experiment, the following parameter values were used:" ] }, { "cell_type": "code", "execution_count": 4, "id": "3a981eb6-8cd7-4fca-a7f9-ffc1f6ff58f1", "metadata": { "tags": [] }, "outputs": [], "source": [ "primers = ['Fus (112734868c1)',\n", " 'Fus (15029724a1)',\n", " 'Ewsr1 (6679715a1)',\n", " 'Ewsr1 (88853580c2)',\n", " 'Taf15 (141803447c1)',\n", " 'Taf15 (141803447c2)',\n", " 'Tsix exon4']\n", "\n", "samples = ['mESC total cDNA']\n", "\n", "reps = 3\n", "\n", "config = 'line'\n", "\n", "kwargs = {'with_dil':samples,\n", " 'dil_series':[20,40,80,160],\n", " 'dil_rest':None} " ] }, { "cell_type": "markdown", "id": "34dbe609-f72e-4bba-a10b-bfaf61c3f922", "metadata": {}, "source": [ "Supplying these to `namer()` gives the following output:" ] }, { "cell_type": "code", "execution_count": 5, "id": "d6966b8c-10c3-43c5-ae34-fc7bfa72e267", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PosCpPrimerNameNamePrim
0A117.51Fus (112734868c1)mESC total cDNA_20mESC total cDNA_20Fus (112734868c1)
1A217.54Fus (112734868c1)mESC total cDNA_20mESC total cDNA_20Fus (112734868c1)
2A317.55Fus (112734868c1)mESC total cDNA_20mESC total cDNA_20Fus (112734868c1)
3A418.49Fus (112734868c1)mESC total cDNA_40mESC total cDNA_40Fus (112734868c1)
4A518.52Fus (112734868c1)mESC total cDNA_40mESC total cDNA_40Fus (112734868c1)
5A618.54Fus (112734868c1)mESC total cDNA_40mESC total cDNA_40Fus (112734868c1)
\n", "
" ], "text/plain": [ " Pos Cp Primer Name \\\n", "0 A1 17.51 Fus (112734868c1) mESC total cDNA_20 \n", "1 A2 17.54 Fus (112734868c1) mESC total cDNA_20 \n", "2 A3 17.55 Fus (112734868c1) mESC total cDNA_20 \n", "3 A4 18.49 Fus (112734868c1) mESC total cDNA_40 \n", "4 A5 18.52 Fus (112734868c1) mESC total cDNA_40 \n", "5 A6 18.54 Fus (112734868c1) mESC total cDNA_40 \n", "\n", " NamePrim \n", "0 mESC total cDNA_20Fus (112734868c1) \n", "1 mESC total cDNA_20Fus (112734868c1) \n", "2 mESC total cDNA_20Fus (112734868c1) \n", "3 mESC total cDNA_40Fus (112734868c1) \n", "4 mESC total cDNA_40Fus (112734868c1) \n", "5 mESC total cDNA_40Fus (112734868c1) " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = equipt.namer('data/22.11.22_PrimerCurve_Ct.csv',\n", " primers,\n", " samples,\n", " reps,\n", " config,\n", " **kwargs)\n", "\n", "df.iloc[:6]" ] }, { "cell_type": "markdown", "id": "87db6eb0-27f0-4104-bd8b-82cf1eae2875", "metadata": {}, "source": [ "`namer()` has correctly labeled the primer, assigned sample names with the dilution factor after an underscore, and created a column called 'NamePrim' that allows for replicate wells to be easily detected. This output can be supplied to any of the other tools in equipt." ] }, { "cell_type": "markdown", "id": "2d0ed4af-d944-4140-92da-23a5112ab7bd", "metadata": {}, "source": [ "## Using a custom importer function with namer" ] }, { "cell_type": "markdown", "id": "314ed823-e655-485c-a055-9a6e7a2f3269", "metadata": {}, "source": [ "For data from an instrument other than the LightCycler 480, the user should write a custom import function. For this example, I have taken the exact data used above but reformatted it for a hypothetical qPCR instrument:" ] }, { "cell_type": "code", "execution_count": 6, "id": "d909508a-a6e2-4e01-b7e8-afda658d4e42", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Experiment: DH_22.11.22_PrimerCurve Selected Filter: SYBR Green I / HRM Dye (465-510),,\n", "\"Lorem ipsum dolor sit amet, consectetur adipiscing elit,\",,\n", "ed do eiusmod tempor incididunt ut labore et dolore magna aliqua.,,\n", "\"Ut enim ad minim veniam, quis nostrud exercitation ullamco\",,\n", ",,\n", "Well Location,Name,Ct\n", "A1,Sample 1,17.51\n", "A2,Sample 2,17.54\n", "A3,Sample 3,17.55\n", "A4,Sample 4,18.49\n", "A5,Sample 5,18.52\n", "A6,Sample 6,18.54\n", "A7,Sample 7,19.53\n", "A8,Sample 8,19.59\n", "A9,Sample 9,19.59\n", "A10,Sample 10,20.64\n" ] } ], "source": [ "with open('data/hypotheticaldata.csv','r') as f:\n", " print(f.read()[:488])" ] }, { "cell_type": "markdown", "id": "61d64a0d-2cbe-442c-a2c2-edc2ae9451e3", "metadata": {}, "source": [ "In this case, the file has more lines of information at the top, different row naming conventions, and is comma-separated rather than tab-separated. `namer()` will raise an error if it uses the default importer:" ] }, { "cell_type": "code", "execution_count": 7, "id": "50863306-33ab-43a0-ae0f-06d97977de55", "metadata": { "tags": [] }, "outputs": [ { "ename": "ValueError", "evalue": "Usecols do not match columns, columns expected but not found: ['Cp', 'Pos']", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[7], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mequipt\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mnamer\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mdata/hypotheticaldata.csv\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2\u001b[0m \u001b[43m \u001b[49m\u001b[43mprimers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3\u001b[0m \u001b[43m \u001b[49m\u001b[43msamples\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4\u001b[0m \u001b[43m \u001b[49m\u001b[43mreps\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 5\u001b[0m \u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 6\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/equipt/opener.py:139\u001b[0m, in \u001b[0;36mnamer\u001b[0;34m(ct_file, primers, samples, reps, config, importer, **kwargs)\u001b[0m\n\u001b[1;32m 137\u001b[0m \u001b[38;5;66;03m# Read in the data\u001b[39;00m\n\u001b[1;32m 138\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m importer \u001b[38;5;241m==\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 139\u001b[0m ct_data \u001b[38;5;241m=\u001b[39m \u001b[43mlc480_importer\u001b[49m\u001b[43m(\u001b[49m\u001b[43mct_file\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 140\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 141\u001b[0m ct_data \u001b[38;5;241m=\u001b[39m importer(ct_file)\n", "File \u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/equipt/opener.py:28\u001b[0m, in \u001b[0;36mlc480_importer\u001b[0;34m(ct_file)\u001b[0m\n\u001b[1;32m 23\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m pd\u001b[38;5;241m.\u001b[39mread_excel(ct_file, \n\u001b[1;32m 24\u001b[0m header\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m1\u001b[39m, \n\u001b[1;32m 25\u001b[0m usecols\u001b[38;5;241m=\u001b[39m[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mPos\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mCp\u001b[39m\u001b[38;5;124m'\u001b[39m],\n\u001b[1;32m 26\u001b[0m sep\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;130;01m\\t\u001b[39;00m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m 27\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m---> 28\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mpd\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread_csv\u001b[49m\u001b[43m(\u001b[49m\u001b[43mct_file\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\n\u001b[1;32m 29\u001b[0m \u001b[43m \u001b[49m\u001b[43mheader\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\n\u001b[1;32m 30\u001b[0m \u001b[43m \u001b[49m\u001b[43musecols\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mPos\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mCp\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 31\u001b[0m \u001b[43m \u001b[49m\u001b[43msep\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;130;43;01m\\t\u001b[39;49;00m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:912\u001b[0m, in \u001b[0;36mread_csv\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)\u001b[0m\n\u001b[1;32m 899\u001b[0m kwds_defaults \u001b[38;5;241m=\u001b[39m _refine_defaults_read(\n\u001b[1;32m 900\u001b[0m dialect,\n\u001b[1;32m 901\u001b[0m delimiter,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 908\u001b[0m dtype_backend\u001b[38;5;241m=\u001b[39mdtype_backend,\n\u001b[1;32m 909\u001b[0m )\n\u001b[1;32m 910\u001b[0m kwds\u001b[38;5;241m.\u001b[39mupdate(kwds_defaults)\n\u001b[0;32m--> 912\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_read\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:577\u001b[0m, in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 574\u001b[0m _validate_names(kwds\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnames\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m))\n\u001b[1;32m 576\u001b[0m \u001b[38;5;66;03m# Create the parser.\u001b[39;00m\n\u001b[0;32m--> 577\u001b[0m parser \u001b[38;5;241m=\u001b[39m \u001b[43mTextFileReader\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 579\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m chunksize \u001b[38;5;129;01mor\u001b[39;00m iterator:\n\u001b[1;32m 580\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m parser\n", "File \u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1407\u001b[0m, in \u001b[0;36mTextFileReader.__init__\u001b[0;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[1;32m 1404\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moptions[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhas_index_names\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m kwds[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhas_index_names\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n\u001b[1;32m 1406\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles: IOHandles \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m-> 1407\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_engine \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_make_engine\u001b[49m\u001b[43m(\u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mengine\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1679\u001b[0m, in \u001b[0;36mTextFileReader._make_engine\u001b[0;34m(self, f, engine)\u001b[0m\n\u001b[1;32m 1676\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(msg)\n\u001b[1;32m 1678\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m-> 1679\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mmapping\u001b[49m\u001b[43m[\u001b[49m\u001b[43mengine\u001b[49m\u001b[43m]\u001b[49m\u001b[43m(\u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1680\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m:\n\u001b[1;32m 1681\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n", "File \u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py:140\u001b[0m, in \u001b[0;36mCParserWrapper.__init__\u001b[0;34m(self, src, **kwds)\u001b[0m\n\u001b[1;32m 136\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39morig_names \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 137\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39musecols_dtype \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mstring\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mset\u001b[39m(usecols)\u001b[38;5;241m.\u001b[39missubset(\n\u001b[1;32m 138\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39morig_names\n\u001b[1;32m 139\u001b[0m ):\n\u001b[0;32m--> 140\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_validate_usecols_names\u001b[49m\u001b[43m(\u001b[49m\u001b[43musecols\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43morig_names\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 142\u001b[0m \u001b[38;5;66;03m# error: Cannot determine type of 'names'\u001b[39;00m\n\u001b[1;32m 143\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mnames) \u001b[38;5;241m>\u001b[39m \u001b[38;5;28mlen\u001b[39m(usecols): \u001b[38;5;66;03m# type: ignore[has-type]\u001b[39;00m\n\u001b[1;32m 144\u001b[0m \u001b[38;5;66;03m# error: Cannot determine type of 'names'\u001b[39;00m\n", "File \u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py:959\u001b[0m, in \u001b[0;36mParserBase._validate_usecols_names\u001b[0;34m(self, usecols, names)\u001b[0m\n\u001b[1;32m 957\u001b[0m missing \u001b[38;5;241m=\u001b[39m [c \u001b[38;5;28;01mfor\u001b[39;00m c \u001b[38;5;129;01min\u001b[39;00m usecols \u001b[38;5;28;01mif\u001b[39;00m c \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m names]\n\u001b[1;32m 958\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(missing) \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[0;32m--> 959\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 960\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mUsecols do not match columns, columns expected but not found: \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 961\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mmissing\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 962\u001b[0m )\n\u001b[1;32m 964\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m usecols\n", "\u001b[0;31mValueError\u001b[0m: Usecols do not match columns, columns expected but not found: ['Cp', 'Pos']" ] } ], "source": [ "equipt.namer('data/hypotheticaldata.csv',\n", " primers,\n", " samples,\n", " reps,\n", " config,\n", " **kwargs)" ] }, { "cell_type": "markdown", "id": "72ebab4d-4bcd-4a32-b618-7b41c82b2533", "metadata": { "tags": [] }, "source": [ "To get around this, the user can make a custom importer:" ] }, { "cell_type": "code", "execution_count": 8, "id": "a4566af7-d026-4aae-90d9-1d43e039965d", "metadata": { "tags": [] }, "outputs": [], "source": [ "def customimporter(data):\n", " rename_dict = {'Well Location':'Pos',\n", " 'Ct':'Cp'}\n", " \n", " df = pd.read_csv(data,\n", " header=5,\n", " usecols=['Well Location','Ct'])\n", " \n", " return df.rename(rename_dict,axis=1)" ] }, { "cell_type": "code", "execution_count": 9, "id": "80ca29bc-454a-449d-af48-60e1af235704", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PosCp
0A117.51
1A217.54
2A317.55
3A418.49
4A518.52
5A618.54
\n", "
" ], "text/plain": [ " Pos Cp\n", "0 A1 17.51\n", "1 A2 17.54\n", "2 A3 17.55\n", "3 A4 18.49\n", "4 A5 18.52\n", "5 A6 18.54" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "customimporter('data/hypotheticaldata.csv').iloc[:6]" ] }, { "cell_type": "markdown", "id": "18ee595e-ab93-46c7-8152-90bc53af8aa5", "metadata": {}, "source": [ "We can now see that the data is imported in an equivalent format to the LightCycler 480 importer distributed with the package. To use this function with `namer()`, simply supply the function to the importer parameter:" ] }, { "cell_type": "code", "execution_count": 10, "id": "a2e36beb-b531-4357-b9a4-d366c7bfd9f0", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PosCpPrimerNameNamePrim
0A117.51Fus (112734868c1)mESC total cDNA_20mESC total cDNA_20Fus (112734868c1)
1A217.54Fus (112734868c1)mESC total cDNA_20mESC total cDNA_20Fus (112734868c1)
2A317.55Fus (112734868c1)mESC total cDNA_20mESC total cDNA_20Fus (112734868c1)
3A418.49Fus (112734868c1)mESC total cDNA_40mESC total cDNA_40Fus (112734868c1)
4A518.52Fus (112734868c1)mESC total cDNA_40mESC total cDNA_40Fus (112734868c1)
\n", "
" ], "text/plain": [ " Pos Cp Primer Name \\\n", "0 A1 17.51 Fus (112734868c1) mESC total cDNA_20 \n", "1 A2 17.54 Fus (112734868c1) mESC total cDNA_20 \n", "2 A3 17.55 Fus (112734868c1) mESC total cDNA_20 \n", "3 A4 18.49 Fus (112734868c1) mESC total cDNA_40 \n", "4 A5 18.52 Fus (112734868c1) mESC total cDNA_40 \n", "\n", " NamePrim \n", "0 mESC total cDNA_20Fus (112734868c1) \n", "1 mESC total cDNA_20Fus (112734868c1) \n", "2 mESC total cDNA_20Fus (112734868c1) \n", "3 mESC total cDNA_40Fus (112734868c1) \n", "4 mESC total cDNA_40Fus (112734868c1) " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hypodf = equipt.namer('data/hypotheticaldata.csv',\n", " primers,\n", " samples,\n", " reps,\n", " config,\n", " importer=customimporter,\n", " **kwargs)\n", "\n", "hypodf.head()" ] }, { "cell_type": "markdown", "id": "7d86b44f-124e-4c4d-b3c6-a6fd409531a1", "metadata": {}, "source": [ "The output is identical to that of the original function:" ] }, { "cell_type": "code", "execution_count": 11, "id": "8733c365-3f87-4a57-abb2-13db52457b70", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all(df == hypodf)" ] }, { "cell_type": "markdown", "id": "af25b4a2-19d5-4523-bc77-1d300d85642d", "metadata": {}, "source": [ "## Important Caveats" ] }, { "cell_type": "markdown", "id": "3c8d5e43-0caa-4890-97e3-34bf0a0a5b7d", "metadata": {}, "source": [ "* **False positives and negatives**\n", "\n", " To make `namer()` as modular as possible, it relies on every well that should be labeled having a valid Ct value. If a Ct value is not called and Pandas interprets the well as 'NaN', the function will fail. To get around this, any wells that should have Ct values but which did not come up should be manually labeled in the original file as 'exclude'. Any wells that did not contain sample but which yielded Ct values anyway should be manually deleted. This action should be recorded in a lab notebook, but may be a sign of larger issues with the instrument or plate loading. As always, caution should be used when deciding whether to analyze the experiment or repeat it.\n", "\n", "* **Dilutions**\n", "\n", " By default, `namer()` assumes dilutions are grouped by sample as in the above example. If you have the dilutions in some other conformation, it may be easiest to run `namer()` without the dilutions then update the samples afterwards. For example, say that in the experiment above we instead had samples grouped by dilution rather than dilutions by sample:" ] }, { "cell_type": "code", "execution_count": 12, "id": "5f6d0ddc-72e0-4e31-8382-b594e574d360", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Let's pretend we had seven samples and one primer for this example\n", "samples = ['cDNA1',\n", " 'cDNA2',\n", " 'cDNA3',\n", " 'cDNA4',\n", " 'cDNA5',\n", " 'cDNA6',\n", " 'cDNA7']\n", "\n", "primers = ['Fus (112734868c1)',]\n", "\n", "# Set the dilutions as an ordered list\n", "dilutions = [20,40,80,160]\n", "\n", "# Expand the sample set so that namer() detects the correct number of wells\n", "new_samples = samples * len(dilutions)" ] }, { "cell_type": "code", "execution_count": 13, "id": "c30b4c78-cf3a-4d07-8425-e400623a6013", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PosCpPrimerNameNamePrim
0A117.51Fus (112734868c1)cDNA1cDNA1Fus (112734868c1)
1A217.54Fus (112734868c1)cDNA1cDNA1Fus (112734868c1)
2A317.55Fus (112734868c1)cDNA1cDNA1Fus (112734868c1)
3A418.49Fus (112734868c1)cDNA2cDNA2Fus (112734868c1)
4A518.52Fus (112734868c1)cDNA2cDNA2Fus (112734868c1)
5A618.54Fus (112734868c1)cDNA2cDNA2Fus (112734868c1)
\n", "
" ], "text/plain": [ " Pos Cp Primer Name NamePrim\n", "0 A1 17.51 Fus (112734868c1) cDNA1 cDNA1Fus (112734868c1)\n", "1 A2 17.54 Fus (112734868c1) cDNA1 cDNA1Fus (112734868c1)\n", "2 A3 17.55 Fus (112734868c1) cDNA1 cDNA1Fus (112734868c1)\n", "3 A4 18.49 Fus (112734868c1) cDNA2 cDNA2Fus (112734868c1)\n", "4 A5 18.52 Fus (112734868c1) cDNA2 cDNA2Fus (112734868c1)\n", "5 A6 18.54 Fus (112734868c1) cDNA2 cDNA2Fus (112734868c1)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Run same as last time, but remove kwargs\n", "df = equipt.namer('data/22.11.22_PrimerCurve_Ct.csv',\n", " primers,\n", " new_samples,\n", " reps,\n", " config)\n", "\n", "df.iloc[:6]" ] }, { "cell_type": "code", "execution_count": 14, "id": "10048e3f-a44d-4a05-93bb-c8821a1d27db", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# If using a square conformation, sort by NamePrim to make sure replicate wells\n", "# are contiguous\n", "\n", "# Expand dilutions\n", "new_dils = []\n", "\n", "for d in dilutions:\n", " for i in range(reps*len(samples)):\n", " new_dils.append(d)\n", " \n", "\n", "# Check that length is correct \n", "len(new_dils) == len(df)" ] }, { "cell_type": "code", "execution_count": 15, "id": "058a35ef-b68e-4030-a958-f2f4e199cfcc", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PosCpPrimerNameNamePrim
0A117.51Fus (112734868c1)cDNA1_20cDNA1_20Fus (112734868c1)
1A217.54Fus (112734868c1)cDNA1_20cDNA1_20Fus (112734868c1)
2A317.55Fus (112734868c1)cDNA1_20cDNA1_20Fus (112734868c1)
3A418.49Fus (112734868c1)cDNA2_20cDNA2_20Fus (112734868c1)
4A518.52Fus (112734868c1)cDNA2_20cDNA2_20Fus (112734868c1)
5A618.54Fus (112734868c1)cDNA2_20cDNA2_20Fus (112734868c1)
\n", "
" ], "text/plain": [ " Pos Cp Primer Name NamePrim\n", "0 A1 17.51 Fus (112734868c1) cDNA1_20 cDNA1_20Fus (112734868c1)\n", "1 A2 17.54 Fus (112734868c1) cDNA1_20 cDNA1_20Fus (112734868c1)\n", "2 A3 17.55 Fus (112734868c1) cDNA1_20 cDNA1_20Fus (112734868c1)\n", "3 A4 18.49 Fus (112734868c1) cDNA2_20 cDNA2_20Fus (112734868c1)\n", "4 A5 18.52 Fus (112734868c1) cDNA2_20 cDNA2_20Fus (112734868c1)\n", "5 A6 18.54 Fus (112734868c1) cDNA2_20 cDNA2_20Fus (112734868c1)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add new_dils to Name column\n", "new_dils = [str(i) for i in new_dils]\n", "\n", "# Update Name\n", "df['dils'] = new_dils\n", "df['Name'] = df['Name'] + '_' + df['dils']\n", "\n", "# Update NamePrim\n", "df['NamePrim'] = df['Name'] + df['Primer']\n", "\n", "# Drop dilution column\n", "df.drop('dils',axis=1,inplace=True)\n", "\n", "# Check relabeling\n", "df.iloc[:6]" ] }, { "cell_type": "code", "execution_count": 16, "id": "2ed9e910-feef-4929-a7b7-c9ecae91eb39", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python implementation: CPython\n", "Python version : 3.9.17\n", "IPython version : 8.12.0\n", "\n", "pandas : 2.0.3\n", "equipt : 1.0.0\n", "jupyterlab: 3.6.3\n", "\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -v -p pandas,equipt,jupyterlab" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.17" } }, "nbformat": 4, "nbformat_minor": 5 }