namer
namer()
is the function that imports qPCR data and labels wells. The output of namer()
is used for all subsequent equipt functions.
The first step of namer()
is a function that imports the Ct values from an .xlsx or .csv file. In the current distribution this is performed by the function lc480_importer()
, which is specific to the data output of the Roche LightCycler 480. For other instruments and file formats the user can write their own importer and supply it to namer. The only requirements are that it outputs a Pandas Dataframe with one column named ‘Pos’ that contains well positions, and another named ‘Cp’ that
contains the Ct values. namer()
does not use the ‘Pos’ column, but it allows the user to easily verify that wells were accurately named.
Using namer for the Lightcycler 480
[1]:
import equipt
import pandas as pd
This example uses data from an efficiency curve analysis of seven primer sets tested on four dilutions of a single cDNA sample. The first few lines of the .csv file exported by the Lightcycler 480 looks like this:
[2]:
with open('data/22.11.22_PrimerCurve_Ct.csv','r') as f:
print(f.read()[:486])
Experiment: DH_22.11.22_PrimerCurve Selected Filter: SYBR Green I / HRM Dye (465-510)
Include Color Pos Name Cp Concentration Standard Status
True 255 A1 Sample 1 17.51 0
True 16711680 A2 Sample 2 17.54 0 ? - Detector Call uncertain
True 255 A3 Sample 3 17.55 0
True 255 A4 Sample 4 18.49 0
True 255 A5 Sample 5 18.52 0
True 255 A6 Sample 6 18.54 0
True 255 A7 Sample 7 19.53 0
True 255 A8 Sample 8 19.59 0
True 255 A9 Sample 9 19.59 0
True 255 A10 Sample 10 20.64 0
The first line contains the experiment name and filter sets used, the second line contains the column names, and the remaining lines contain tab-separated values for the experiment. Before moving onto namer()
, lets look at the output of the importer function:
[3]:
equipt.lc480_importer('data/22.11.22_PrimerCurve_Ct.csv').iloc[:6]
[3]:
Pos | Cp | |
---|---|---|
0 | A1 | 17.51 |
1 | A2 | 17.54 |
2 | A3 | 17.55 |
3 | A4 | 18.49 |
4 | A5 | 18.52 |
5 | A6 | 18.54 |
The importer skips the header and strips away all columns except for ‘Pos’ and ‘Cp’. The ‘Cp’ column name is important for subsequent analyses, but namer()
only uses relative positions to name columns.
We can now use namer to automatically label the wells. namer()
uses six parameters. Their documentation is reproduced below:
Params
______
ct_file : str
Path to a CSV or Excel file containing the qPCR data. Currently only
data output from a Lightcycler 480 is supported, but the structure of
namer() allows for other importers to be written without disrupting the
rest of the function.
primers : list of strings
A list, in order, of the primers. See documentation for supported plate
arrangements.
samples : list of strings
A list, in order, or the sample names. See documentation for supported
plate arrangements.
reps : int
Number of replicate wells. 2, 3, or 4.
config : str
A description of how the samples are arranged: 'square' or 'line'. See
documentation for additional details. Default 'line'
importer : a custom importer function or None
A user-supplied function that imports data from their qPCR instrument
to a Pandas Dataframe with columns 'Pos', for the well position, and
'Cp' for the Ct values. If None, namer() defaults to an importer for
data from the Roche Lightcycler 480. Default None
**kwargs : dictionary
with_dil : list of strings
List of names of samples that have dilution curves.
dil_series : list of ints
List of dilution factors in order on plate. Dilutions
should be entered as integers (e.g. a 1:10 dilution
should be entered as 10).
dil_rest : int or None
The dilution of samples that do not have a dilution
series. If None, with_dil should contain all samples.
The **kwargs parameter should only be used if one or more sample has a dilution series. Otherwise it need not be supplied. For this experiment, the following parameter values were used:
[4]:
primers = ['Fus (112734868c1)',
'Fus (15029724a1)',
'Ewsr1 (6679715a1)',
'Ewsr1 (88853580c2)',
'Taf15 (141803447c1)',
'Taf15 (141803447c2)',
'Tsix exon4']
samples = ['mESC total cDNA']
reps = 3
config = 'line'
kwargs = {'with_dil':samples,
'dil_series':[20,40,80,160],
'dil_rest':None}
Supplying these to namer()
gives the following output:
[5]:
df = equipt.namer('data/22.11.22_PrimerCurve_Ct.csv',
primers,
samples,
reps,
config,
**kwargs)
df.iloc[:6]
[5]:
Pos | Cp | Primer | Name | NamePrim | |
---|---|---|---|---|---|
0 | A1 | 17.51 | Fus (112734868c1) | mESC total cDNA_20 | mESC total cDNA_20Fus (112734868c1) |
1 | A2 | 17.54 | Fus (112734868c1) | mESC total cDNA_20 | mESC total cDNA_20Fus (112734868c1) |
2 | A3 | 17.55 | Fus (112734868c1) | mESC total cDNA_20 | mESC total cDNA_20Fus (112734868c1) |
3 | A4 | 18.49 | Fus (112734868c1) | mESC total cDNA_40 | mESC total cDNA_40Fus (112734868c1) |
4 | A5 | 18.52 | Fus (112734868c1) | mESC total cDNA_40 | mESC total cDNA_40Fus (112734868c1) |
5 | A6 | 18.54 | Fus (112734868c1) | mESC total cDNA_40 | mESC total cDNA_40Fus (112734868c1) |
namer()
has correctly labeled the primer, assigned sample names with the dilution factor after an underscore, and created a column called ‘NamePrim’ that allows for replicate wells to be easily detected. This output can be supplied to any of the other tools in equipt.
Using a custom importer function with namer
For data from an instrument other than the LightCycler 480, the user should write a custom import function. For this example, I have taken the exact data used above but reformatted it for a hypothetical qPCR instrument:
[6]:
with open('data/hypotheticaldata.csv','r') as f:
print(f.read()[:488])
Experiment: DH_22.11.22_PrimerCurve Selected Filter: SYBR Green I / HRM Dye (465-510),,
"Lorem ipsum dolor sit amet, consectetur adipiscing elit,",,
ed do eiusmod tempor incididunt ut labore et dolore magna aliqua.,,
"Ut enim ad minim veniam, quis nostrud exercitation ullamco",,
,,
Well Location,Name,Ct
A1,Sample 1,17.51
A2,Sample 2,17.54
A3,Sample 3,17.55
A4,Sample 4,18.49
A5,Sample 5,18.52
A6,Sample 6,18.54
A7,Sample 7,19.53
A8,Sample 8,19.59
A9,Sample 9,19.59
A10,Sample 10,20.64
In this case, the file has more lines of information at the top, different row naming conventions, and is comma-separated rather than tab-separated. namer()
will raise an error if it uses the default importer:
[7]:
equipt.namer('data/hypotheticaldata.csv',
primers,
samples,
reps,
config,
**kwargs)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[7], line 1
----> 1 equipt.namer('data/hypotheticaldata.csv',
2 primers,
3 samples,
4 reps,
5 config,
6 **kwargs)
File ~/opt/anaconda3/lib/python3.9/site-packages/equipt/opener.py:139, in namer(ct_file, primers, samples, reps, config, importer, **kwargs)
137 # Read in the data
138 if importer == None:
--> 139 ct_data = lc480_importer(ct_file)
140 else:
141 ct_data = importer(ct_file)
File ~/opt/anaconda3/lib/python3.9/site-packages/equipt/opener.py:28, in lc480_importer(ct_file)
23 return pd.read_excel(ct_file,
24 header=1,
25 usecols=['Pos','Cp'],
26 sep='\t')
27 else:
---> 28 return pd.read_csv(ct_file,
29 header=1,
30 usecols=['Pos','Cp'],
31 sep='\t')
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:912, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
899 kwds_defaults = _refine_defaults_read(
900 dialect,
901 delimiter,
(...)
908 dtype_backend=dtype_backend,
909 )
910 kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:577, in _read(filepath_or_buffer, kwds)
574 _validate_names(kwds.get("names", None))
576 # Create the parser.
--> 577 parser = TextFileReader(filepath_or_buffer, **kwds)
579 if chunksize or iterator:
580 return parser
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1407, in TextFileReader.__init__(self, f, engine, **kwds)
1404 self.options["has_index_names"] = kwds["has_index_names"]
1406 self.handles: IOHandles | None = None
-> 1407 self._engine = self._make_engine(f, self.engine)
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1679, in TextFileReader._make_engine(self, f, engine)
1676 raise ValueError(msg)
1678 try:
-> 1679 return mapping[engine](f, **self.options)
1680 except Exception:
1681 if self.handles is not None:
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py:140, in CParserWrapper.__init__(self, src, **kwds)
136 assert self.orig_names is not None
137 if self.usecols_dtype == "string" and not set(usecols).issubset(
138 self.orig_names
139 ):
--> 140 self._validate_usecols_names(usecols, self.orig_names)
142 # error: Cannot determine type of 'names'
143 if len(self.names) > len(usecols): # type: ignore[has-type]
144 # error: Cannot determine type of 'names'
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py:959, in ParserBase._validate_usecols_names(self, usecols, names)
957 missing = [c for c in usecols if c not in names]
958 if len(missing) > 0:
--> 959 raise ValueError(
960 f"Usecols do not match columns, columns expected but not found: "
961 f"{missing}"
962 )
964 return usecols
ValueError: Usecols do not match columns, columns expected but not found: ['Cp', 'Pos']
To get around this, the user can make a custom importer:
[8]:
def customimporter(data):
rename_dict = {'Well Location':'Pos',
'Ct':'Cp'}
df = pd.read_csv(data,
header=5,
usecols=['Well Location','Ct'])
return df.rename(rename_dict,axis=1)
[9]:
customimporter('data/hypotheticaldata.csv').iloc[:6]
[9]:
Pos | Cp | |
---|---|---|
0 | A1 | 17.51 |
1 | A2 | 17.54 |
2 | A3 | 17.55 |
3 | A4 | 18.49 |
4 | A5 | 18.52 |
5 | A6 | 18.54 |
We can now see that the data is imported in an equivalent format to the LightCycler 480 importer distributed with the package. To use this function with namer()
, simply supply the function to the importer parameter:
[10]:
hypodf = equipt.namer('data/hypotheticaldata.csv',
primers,
samples,
reps,
config,
importer=customimporter,
**kwargs)
hypodf.head()
[10]:
Pos | Cp | Primer | Name | NamePrim | |
---|---|---|---|---|---|
0 | A1 | 17.51 | Fus (112734868c1) | mESC total cDNA_20 | mESC total cDNA_20Fus (112734868c1) |
1 | A2 | 17.54 | Fus (112734868c1) | mESC total cDNA_20 | mESC total cDNA_20Fus (112734868c1) |
2 | A3 | 17.55 | Fus (112734868c1) | mESC total cDNA_20 | mESC total cDNA_20Fus (112734868c1) |
3 | A4 | 18.49 | Fus (112734868c1) | mESC total cDNA_40 | mESC total cDNA_40Fus (112734868c1) |
4 | A5 | 18.52 | Fus (112734868c1) | mESC total cDNA_40 | mESC total cDNA_40Fus (112734868c1) |
The output is identical to that of the original function:
[11]:
all(df == hypodf)
[11]:
True
Important Caveats
False positives and negatives
To make
namer()
as modular as possible, it relies on every well that should be labeled having a valid Ct value. If a Ct value is not called and Pandas interprets the well as ‘NaN’, the function will fail. To get around this, any wells that should have Ct values but which did not come up should be manually labeled in the original file as ‘exclude’. Any wells that did not contain sample but which yielded Ct values anyway should be manually deleted. This action should be recorded in a lab notebook, but may be a sign of larger issues with the instrument or plate loading. As always, caution should be used when deciding whether to analyze the experiment or repeat it.Dilutions
By default,
namer()
assumes dilutions are grouped by sample as in the above example. If you have the dilutions in some other conformation, it may be easiest to runnamer()
without the dilutions then update the samples afterwards. For example, say that in the experiment above we instead had samples grouped by dilution rather than dilutions by sample:
[12]:
# Let's pretend we had seven samples and one primer for this example
samples = ['cDNA1',
'cDNA2',
'cDNA3',
'cDNA4',
'cDNA5',
'cDNA6',
'cDNA7']
primers = ['Fus (112734868c1)',]
# Set the dilutions as an ordered list
dilutions = [20,40,80,160]
# Expand the sample set so that namer() detects the correct number of wells
new_samples = samples * len(dilutions)
[13]:
# Run same as last time, but remove kwargs
df = equipt.namer('data/22.11.22_PrimerCurve_Ct.csv',
primers,
new_samples,
reps,
config)
df.iloc[:6]
[13]:
Pos | Cp | Primer | Name | NamePrim | |
---|---|---|---|---|---|
0 | A1 | 17.51 | Fus (112734868c1) | cDNA1 | cDNA1Fus (112734868c1) |
1 | A2 | 17.54 | Fus (112734868c1) | cDNA1 | cDNA1Fus (112734868c1) |
2 | A3 | 17.55 | Fus (112734868c1) | cDNA1 | cDNA1Fus (112734868c1) |
3 | A4 | 18.49 | Fus (112734868c1) | cDNA2 | cDNA2Fus (112734868c1) |
4 | A5 | 18.52 | Fus (112734868c1) | cDNA2 | cDNA2Fus (112734868c1) |
5 | A6 | 18.54 | Fus (112734868c1) | cDNA2 | cDNA2Fus (112734868c1) |
[14]:
# If using a square conformation, sort by NamePrim to make sure replicate wells
# are contiguous
# Expand dilutions
new_dils = []
for d in dilutions:
for i in range(reps*len(samples)):
new_dils.append(d)
# Check that length is correct
len(new_dils) == len(df)
[14]:
True
[15]:
# Add new_dils to Name column
new_dils = [str(i) for i in new_dils]
# Update Name
df['dils'] = new_dils
df['Name'] = df['Name'] + '_' + df['dils']
# Update NamePrim
df['NamePrim'] = df['Name'] + df['Primer']
# Drop dilution column
df.drop('dils',axis=1,inplace=True)
# Check relabeling
df.iloc[:6]
[15]:
Pos | Cp | Primer | Name | NamePrim | |
---|---|---|---|---|---|
0 | A1 | 17.51 | Fus (112734868c1) | cDNA1_20 | cDNA1_20Fus (112734868c1) |
1 | A2 | 17.54 | Fus (112734868c1) | cDNA1_20 | cDNA1_20Fus (112734868c1) |
2 | A3 | 17.55 | Fus (112734868c1) | cDNA1_20 | cDNA1_20Fus (112734868c1) |
3 | A4 | 18.49 | Fus (112734868c1) | cDNA2_20 | cDNA2_20Fus (112734868c1) |
4 | A5 | 18.52 | Fus (112734868c1) | cDNA2_20 | cDNA2_20Fus (112734868c1) |
5 | A6 | 18.54 | Fus (112734868c1) | cDNA2_20 | cDNA2_20Fus (112734868c1) |
[16]:
%load_ext watermark
%watermark -v -p pandas,equipt,jupyterlab
Python implementation: CPython
Python version : 3.9.17
IPython version : 8.12.0
pandas : 2.0.3
equipt : 1.0.0
jupyterlab: 3.6.3