namer

namer() is the function that imports qPCR data and labels wells. The output of namer() is used for all subsequent equipt functions.

The first step of namer() is a function that imports the Ct values from an .xlsx or .csv file. In the current distribution this is performed by the function lc480_importer(), which is specific to the data output of the Roche LightCycler 480. For other instruments and file formats the user can write their own importer and supply it to namer. The only requirements are that it outputs a Pandas Dataframe with one column named ‘Pos’ that contains well positions, and another named ‘Cp’ that contains the Ct values. namer() does not use the ‘Pos’ column, but it allows the user to easily verify that wells were accurately named.

Using namer for the Lightcycler 480

[1]:
import equipt
import pandas as pd

This example uses data from an efficiency curve analysis of seven primer sets tested on four dilutions of a single cDNA sample. The first few lines of the .csv file exported by the Lightcycler 480 looks like this:

[2]:
with open('data/22.11.22_PrimerCurve_Ct.csv','r') as f:
    print(f.read()[:486])
Experiment: DH_22.11.22_PrimerCurve  Selected Filter: SYBR Green I / HRM Dye (465-510)
Include Color   Pos     Name    Cp      Concentration   Standard        Status
True    255     A1      Sample 1        17.51           0
True    16711680        A2      Sample 2        17.54           0       ? - Detector Call uncertain
True    255     A3      Sample 3        17.55           0
True    255     A4      Sample 4        18.49           0
True    255     A5      Sample 5        18.52           0
True    255     A6      Sample 6        18.54           0
True    255     A7      Sample 7        19.53           0
True    255     A8      Sample 8        19.59           0
True    255     A9      Sample 9        19.59           0
True    255     A10     Sample 10       20.64           0

The first line contains the experiment name and filter sets used, the second line contains the column names, and the remaining lines contain tab-separated values for the experiment. Before moving onto namer(), lets look at the output of the importer function:

[3]:
equipt.lc480_importer('data/22.11.22_PrimerCurve_Ct.csv').iloc[:6]
[3]:
Pos Cp
0 A1 17.51
1 A2 17.54
2 A3 17.55
3 A4 18.49
4 A5 18.52
5 A6 18.54

The importer skips the header and strips away all columns except for ‘Pos’ and ‘Cp’. The ‘Cp’ column name is important for subsequent analyses, but namer() only uses relative positions to name columns.

We can now use namer to automatically label the wells. namer() uses six parameters. Their documentation is reproduced below:

Params
______

ct_file : str
    Path to a CSV or Excel file containing the qPCR data. Currently only
    data output from a Lightcycler 480 is supported, but the structure of
    namer() allows for other importers to be written without disrupting the
    rest of the function.

primers : list of strings
    A list, in order, of the primers. See documentation for supported plate
    arrangements.

samples : list of strings
    A list, in order, or the sample names. See documentation for supported
    plate arrangements.

reps : int
    Number of replicate wells. 2, 3, or 4.

config : str
    A description of how the samples are arranged: 'square' or 'line'. See
    documentation for additional details. Default 'line'

importer : a custom importer function or None
    A user-supplied function that imports data from their qPCR instrument
    to a Pandas Dataframe with columns 'Pos', for the well position, and
    'Cp' for the Ct values. If None, namer() defaults to an importer for
    data from the Roche Lightcycler 480. Default None

**kwargs : dictionary

    with_dil : list of strings
        List of names of samples that have dilution curves.

    dil_series : list of ints
        List of dilution factors in order on plate. Dilutions
        should be entered as integers (e.g. a 1:10 dilution
        should be entered as 10).

    dil_rest : int or None
        The dilution of samples that do not have a dilution
        series. If None, with_dil should contain all samples.

The **kwargs parameter should only be used if one or more sample has a dilution series. Otherwise it need not be supplied. For this experiment, the following parameter values were used:

[4]:
primers = ['Fus (112734868c1)',
         'Fus (15029724a1)',
         'Ewsr1 (6679715a1)',
         'Ewsr1 (88853580c2)',
         'Taf15 (141803447c1)',
         'Taf15 (141803447c2)',
         'Tsix exon4']

samples = ['mESC total cDNA']

reps = 3

config = 'line'

kwargs = {'with_dil':samples,
         'dil_series':[20,40,80,160],
         'dil_rest':None}

Supplying these to namer() gives the following output:

[5]:
df = equipt.namer('data/22.11.22_PrimerCurve_Ct.csv',
            primers,
            samples,
            reps,
            config,
            **kwargs)

df.iloc[:6]
[5]:
Pos Cp Primer Name NamePrim
0 A1 17.51 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
1 A2 17.54 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
2 A3 17.55 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
3 A4 18.49 Fus (112734868c1) mESC total cDNA_40 mESC total cDNA_40Fus (112734868c1)
4 A5 18.52 Fus (112734868c1) mESC total cDNA_40 mESC total cDNA_40Fus (112734868c1)
5 A6 18.54 Fus (112734868c1) mESC total cDNA_40 mESC total cDNA_40Fus (112734868c1)

namer() has correctly labeled the primer, assigned sample names with the dilution factor after an underscore, and created a column called ‘NamePrim’ that allows for replicate wells to be easily detected. This output can be supplied to any of the other tools in equipt.

Using a custom importer function with namer

For data from an instrument other than the LightCycler 480, the user should write a custom import function. For this example, I have taken the exact data used above but reformatted it for a hypothetical qPCR instrument:

[6]:
with open('data/hypotheticaldata.csv','r') as f:
    print(f.read()[:488])
Experiment: DH_22.11.22_PrimerCurve  Selected Filter: SYBR Green I / HRM Dye (465-510),,
"Lorem ipsum dolor sit amet, consectetur adipiscing elit,",,
ed do eiusmod tempor incididunt ut labore et dolore magna aliqua.,,
"Ut enim ad minim veniam, quis nostrud exercitation ullamco",,
,,
Well Location,Name,Ct
A1,Sample 1,17.51
A2,Sample 2,17.54
A3,Sample 3,17.55
A4,Sample 4,18.49
A5,Sample 5,18.52
A6,Sample 6,18.54
A7,Sample 7,19.53
A8,Sample 8,19.59
A9,Sample 9,19.59
A10,Sample 10,20.64

In this case, the file has more lines of information at the top, different row naming conventions, and is comma-separated rather than tab-separated. namer() will raise an error if it uses the default importer:

[7]:
equipt.namer('data/hypotheticaldata.csv',
                primers,
                samples,
                reps,
                config,
                **kwargs)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 equipt.namer('data/hypotheticaldata.csv',
      2                 primers,
      3                 samples,
      4                 reps,
      5                 config,
      6                 **kwargs)

File ~/opt/anaconda3/lib/python3.9/site-packages/equipt/opener.py:139, in namer(ct_file, primers, samples, reps, config, importer, **kwargs)
    137 # Read in the data
    138 if importer == None:
--> 139     ct_data = lc480_importer(ct_file)
    140 else:
    141     ct_data = importer(ct_file)

File ~/opt/anaconda3/lib/python3.9/site-packages/equipt/opener.py:28, in lc480_importer(ct_file)
     23     return pd.read_excel(ct_file,
     24                          header=1,
     25                          usecols=['Pos','Cp'],
     26                          sep='\t')
     27 else:
---> 28     return pd.read_csv(ct_file, 
     29                          header=1, 
     30                          usecols=['Pos','Cp'],
     31                          sep='\t')

File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:912, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
    899 kwds_defaults = _refine_defaults_read(
    900     dialect,
    901     delimiter,
   (...)
    908     dtype_backend=dtype_backend,
    909 )
    910 kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)

File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:577, in _read(filepath_or_buffer, kwds)
    574 _validate_names(kwds.get("names", None))
    576 # Create the parser.
--> 577 parser = TextFileReader(filepath_or_buffer, **kwds)
    579 if chunksize or iterator:
    580     return parser

File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1407, in TextFileReader.__init__(self, f, engine, **kwds)
   1404     self.options["has_index_names"] = kwds["has_index_names"]
   1406 self.handles: IOHandles | None = None
-> 1407 self._engine = self._make_engine(f, self.engine)

File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1679, in TextFileReader._make_engine(self, f, engine)
   1676     raise ValueError(msg)
   1678 try:
-> 1679     return mapping[engine](f, **self.options)
   1680 except Exception:
   1681     if self.handles is not None:

File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py:140, in CParserWrapper.__init__(self, src, **kwds)
    136 assert self.orig_names is not None
    137 if self.usecols_dtype == "string" and not set(usecols).issubset(
    138     self.orig_names
    139 ):
--> 140     self._validate_usecols_names(usecols, self.orig_names)
    142 # error: Cannot determine type of 'names'
    143 if len(self.names) > len(usecols):  # type: ignore[has-type]
    144     # error: Cannot determine type of 'names'

File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py:959, in ParserBase._validate_usecols_names(self, usecols, names)
    957 missing = [c for c in usecols if c not in names]
    958 if len(missing) > 0:
--> 959     raise ValueError(
    960         f"Usecols do not match columns, columns expected but not found: "
    961         f"{missing}"
    962     )
    964 return usecols

ValueError: Usecols do not match columns, columns expected but not found: ['Cp', 'Pos']

To get around this, the user can make a custom importer:

[8]:
def customimporter(data):
    rename_dict = {'Well Location':'Pos',
                   'Ct':'Cp'}

    df = pd.read_csv(data,
                    header=5,
                    usecols=['Well Location','Ct'])

    return df.rename(rename_dict,axis=1)
[9]:
customimporter('data/hypotheticaldata.csv').iloc[:6]
[9]:
Pos Cp
0 A1 17.51
1 A2 17.54
2 A3 17.55
3 A4 18.49
4 A5 18.52
5 A6 18.54

We can now see that the data is imported in an equivalent format to the LightCycler 480 importer distributed with the package. To use this function with namer(), simply supply the function to the importer parameter:

[10]:
hypodf = equipt.namer('data/hypotheticaldata.csv',
                primers,
                samples,
                reps,
                config,
                importer=customimporter,
                **kwargs)

hypodf.head()
[10]:
Pos Cp Primer Name NamePrim
0 A1 17.51 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
1 A2 17.54 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
2 A3 17.55 Fus (112734868c1) mESC total cDNA_20 mESC total cDNA_20Fus (112734868c1)
3 A4 18.49 Fus (112734868c1) mESC total cDNA_40 mESC total cDNA_40Fus (112734868c1)
4 A5 18.52 Fus (112734868c1) mESC total cDNA_40 mESC total cDNA_40Fus (112734868c1)

The output is identical to that of the original function:

[11]:
all(df == hypodf)
[11]:
True

Important Caveats

  • False positives and negatives

    To make namer() as modular as possible, it relies on every well that should be labeled having a valid Ct value. If a Ct value is not called and Pandas interprets the well as ‘NaN’, the function will fail. To get around this, any wells that should have Ct values but which did not come up should be manually labeled in the original file as ‘exclude’. Any wells that did not contain sample but which yielded Ct values anyway should be manually deleted. This action should be recorded in a lab notebook, but may be a sign of larger issues with the instrument or plate loading. As always, caution should be used when deciding whether to analyze the experiment or repeat it.

  • Dilutions

    By default, namer() assumes dilutions are grouped by sample as in the above example. If you have the dilutions in some other conformation, it may be easiest to run namer() without the dilutions then update the samples afterwards. For example, say that in the experiment above we instead had samples grouped by dilution rather than dilutions by sample:

[12]:
# Let's pretend we had seven samples and one primer for this example
samples = ['cDNA1',
           'cDNA2',
           'cDNA3',
           'cDNA4',
           'cDNA5',
           'cDNA6',
           'cDNA7']

primers = ['Fus (112734868c1)',]

# Set the dilutions as an ordered list
dilutions = [20,40,80,160]

# Expand the sample set so that namer() detects the correct number of wells
new_samples = samples * len(dilutions)
[13]:
# Run same as last time, but remove kwargs
df = equipt.namer('data/22.11.22_PrimerCurve_Ct.csv',
            primers,
            new_samples,
            reps,
            config)

df.iloc[:6]
[13]:
Pos Cp Primer Name NamePrim
0 A1 17.51 Fus (112734868c1) cDNA1 cDNA1Fus (112734868c1)
1 A2 17.54 Fus (112734868c1) cDNA1 cDNA1Fus (112734868c1)
2 A3 17.55 Fus (112734868c1) cDNA1 cDNA1Fus (112734868c1)
3 A4 18.49 Fus (112734868c1) cDNA2 cDNA2Fus (112734868c1)
4 A5 18.52 Fus (112734868c1) cDNA2 cDNA2Fus (112734868c1)
5 A6 18.54 Fus (112734868c1) cDNA2 cDNA2Fus (112734868c1)
[14]:
# If using a square conformation, sort by NamePrim to make sure replicate wells
# are contiguous

# Expand dilutions
new_dils = []

for d in dilutions:
    for i in range(reps*len(samples)):
        new_dils.append(d)


# Check that length is correct
len(new_dils) == len(df)
[14]:
True
[15]:
# Add new_dils to Name column
new_dils = [str(i) for i in new_dils]

# Update Name
df['dils'] = new_dils
df['Name'] = df['Name'] + '_' + df['dils']

# Update NamePrim
df['NamePrim'] = df['Name'] + df['Primer']

# Drop dilution column
df.drop('dils',axis=1,inplace=True)

# Check relabeling
df.iloc[:6]
[15]:
Pos Cp Primer Name NamePrim
0 A1 17.51 Fus (112734868c1) cDNA1_20 cDNA1_20Fus (112734868c1)
1 A2 17.54 Fus (112734868c1) cDNA1_20 cDNA1_20Fus (112734868c1)
2 A3 17.55 Fus (112734868c1) cDNA1_20 cDNA1_20Fus (112734868c1)
3 A4 18.49 Fus (112734868c1) cDNA2_20 cDNA2_20Fus (112734868c1)
4 A5 18.52 Fus (112734868c1) cDNA2_20 cDNA2_20Fus (112734868c1)
5 A6 18.54 Fus (112734868c1) cDNA2_20 cDNA2_20Fus (112734868c1)
[16]:
%load_ext watermark
%watermark -v -p pandas,equipt,jupyterlab
Python implementation: CPython
Python version       : 3.9.17
IPython version      : 8.12.0

pandas    : 2.0.3
equipt    : 1.0.0
jupyterlab: 3.6.3