aghplctools.data package

aghplctools.data.batch module

Batch data processing tools

aghplctools.data.batch.batch_convert_signals_to_csv(folder_path: str, *additional_signals, verbose: Union[bool, int] = True)

Iterates through all .D files in the target directory and writes the signals of those files to csv. Additional signals may be specified to “reprocess” the data.

Parameters:
  • folder_path – folder path to iterate through
  • additional_signals – additional signals to process. Supported inputs are agilent specification strings (e.g. ‘DAD1 A, Sig=210,4 Ref=360,100’), DADSignalInfo objects, or dictionaries of keyword arguments for instantiation of DADSignalInfo objects
  • verbose – logging flag or level for function (prints progress info to console)
aghplctools.data.batch.batch_report_text_to_xlsx(folder: str, watchfor: str = 'Report.TXT')

Batch converts all report text files in a directory to xlsx

Parameters:
  • folder – directory to search
  • watchfor – file name to watch for
aghplctools.data.batch.pull_hplc_data_from_folder(folder, targets, wiggle=0.01, watchfor='Report.TXT')

Pulls the HPLC integrations for all report files within the specified directory. This function was designed to pull all data from a given day. This method only pulls data which exists in the reports, which can result in asymmetric data for timepoint analysis (i.e. it assumes that subsequent runs are unrelated to others). If the data in a folder are for time-course analysis, use pull_hplc_data_from_folder_timepoint().

Parameters:
  • folder – The folder to search for report files
  • targets – target dictionary of the form {‘name’: [wavelength, retention time], …}
  • wiggle – the wiggle time around retention times
  • watchfor – the name of the report file to watch
Returns:

dictionary of HPLCTarget instances in the format {‘name’: HPLCTarget, …}

Return type:

dict

aghplctools.data.batch.pull_hplc_data_from_folder_timepoint(folder, wiggle=0.02, watchfor='Report.TXT')

Pulls all HPLC data from a folder assuming that the contents of a folder are from an ordered, time-course run (i.e. the contents of one report are related to the others in the folder). The method will automatically watch for new retention times and will prepopulate appearing values with zeros. The resulting targets will have a consistent number of values across the folder.

Parameters:
  • folder – The folder to search for report files
  • wiggle – the wiggle time around retention times
  • watchfor – the name of the report file to watch
Returns:

dictionary of HPLCTarget instances in the format {wavelength: {retention_time: HPLCTarget, …}, …}

Return type:

dict

aghplctools.data.sample module

Data types for interacting directly with .D samples (e.g. reprocessing, loading signals directly)

class aghplctools.data.sample.DADSignal(wavelength: Union[float, unithandler.base.UnitFloat], bandwidth: Union[float, unithandler.base.UnitFloat] = 1.0, reference: Union[DADSignal, aghplctools.data.sample.DADSignalInfo, str] = None, name: str = None, spectrum: aghplctools.data.sample.DADSpectrum = None)

Bases: aghplctools.data.sample.DADSignalInfo

Class describing a DAD signal and its data.

Parameters:
  • wavelength – wavelength for the signal
  • bandwidth – band width for the wavelength (signal is centered on the wavelength with this width)
  • reference – reference information for the signal
  • name – convenience name for the signal
  • spectrum – a DADSpectrum object which will be referenced for retrieving data.
as_data_table() → list

Returns the signal as a list-style data table with appropriate headers and data

Returns:data table
as_iterable_data_table()

Returns an iterable which yields a data table with appropriate headers and data

Returns:data table as iterable
band_string

A string representation of the band specified (e.g. “210 (4) nm”)

bandwidth

band width for the signal band

classmethod create_from_DADSignalInfo(obj: aghplctools.data.sample.DADSignalInfo, spectrum: aghplctools.data.sample.DADSpectrum) → aghplctools.data.sample.DADSignal

generates a DADSignal object from a DADSignalInfo object and a spectrum

mean_referenced_intensities

mean referenced band (mean unreferenced intensities minus the mean intensities of the reference)

mean_unreferenced_intensities

mean unreferenced intensities for the band

reference

reference band for the signal band

retention_times

retention times associated with the intensity array

unreferenced_intensities

unreferenced intensities for the band

wavelength

wavelength for the signal

write_signal_to_csv(filename: str, overwrite: bool = False) → str

Writes the signal intensities to the specified csv file.

Parameters:
  • filename – file name to write to
  • overwrite – whether to overwrite the file if it already exists
Returns:

file path that was written

class aghplctools.data.sample.DADSignalInfo(wavelength: Union[float, unithandler.base.UnitFloat], bandwidth: Union[float, unithandler.base.UnitFloat] = 1.0, reference: Union[DADSignalInfo, str] = None, name: str = None)

Bases: object

Class describing a DAD signal and its parameters

Parameters:
  • wavelength – wavelength for the signal
  • bandwidth – band width for the wavelength (signal is centered on the wavelength with this width)
  • reference – reference information for the signal
  • name – convenience name for the signal
DEFAULT_TIME_UNIT = 'min'
DEFAULT_WAVELENGTH_UNIT = 'nm'
agilent_specification_string

the specification string describing this instance (can be passed to create_from_string to reinstantiate)

bandwidth

bandwidth for the signal band

classmethod create_from_CH_file(file_path: Union[str, pathlib.Path]) → aghplctools.data.sample.DADSignalInfo

Creates a DADSignal info instance from a channel file.

Parameters:file_path – target file path
classmethod create_from_agilent_string(string: str, name_override: str = None) → aghplctools.data.sample.DADSignalInfo

Creates a class instance from a standard Agilent signal description string (e.g. ‘DAD1 A, Sig=210,4 Ref=360,100’)

Parameters:
  • string – signal description string
  • name_override – override for name specification
Returns:

DADSignal object

classmethod get_signals_in_directory(file_path: Union[str, pathlib.Path]) → List[aghplctools.data.sample.DADSignalInfo]

Creates a list of signals based on the .CH files in a directory.

Parameters:file_path – path to target directory
Returns:list of signal info objects
classmethod get_values_from_agilent_string(string: str) → dict

Parses a standard Agilent signal description string (e.g. ‘DAD1 A, Sig=210,4 Ref=360,100’) and returns a dictionary of parsed values (can be used to instantiate a DADSignalInfo instance).

Parameters:string – signal description string
Returns:dictionary of parameters
reference

Reference band for the signal band

wavelength

Wavelength for the signal

class aghplctools.data.sample.DADSpectrum(filename=None, ftype=None, data=None)

Bases: aston.tracefile.agilent_uv.AgilentCSDAD2

An object describing an Agilent DAD spectrum for a sample. Inherits Aston AgilentCSDAD2 and has additional methods for retrieving band information.

Parameters:
  • filename – target filetype
  • ftype
  • data
classmethod create_from_D_file(file_path: Union[pathlib.Path, str]) → aghplctools.data.sample.DADSpectrum

Creates a DADSpectrum instance from an Agilent .D file

Parameters:file_path – path to .D sample file
Returns:interpreted .D file with metadata and loaded UV data
get_band_intensities(wavelength: float, bandwidth: float = 1.0) → numpy.ndarray

Retrieve array of values described by the wavelength and band width described. The returned array will have shape [wavelength, retention time]. The corresponding wavelengths are given by DADSpectrum.get_band_wavelengths and the retention times by DADSpectrum.retention_times.

Parameters:
  • wavelength – wavelength
  • bandwidth – band width
Returns:

array of band intensities

get_band_mean_intensity(wavelength: float, bandwidth: float = 1.0) → numpy.ndarray

Retrieve the intensity array described by the wavelength and bandwidth described. The returned array will be the mean of the intensities in the band (wavelength - bandwidth / 2, wavelength + bandwidth / 2).

Parameters:
  • wavelength – wavelength
  • bandwidth – band width
Returns:

array of mean intensities

get_band_wavelengths(wavelength: float, bandwidth: float = 1.0) → list

Returns a list of wavelengths corresponding to the band specified.

Parameters:
  • wavelength – wavelength
  • bandwidth – band width
Returns:

get_component_spectrum(retention_start: float, retention_end: float) → numpy.ndarray

Retrieves the component spectrum for the provided retention time slice.

Parameters:
  • retention_start – retention time start
  • retention_end – retention time end
Returns:

get_intensities_from_signal(signal: aghplctools.data.sample.DADSignalInfo) → numpy.ndarray

Retrieve the intensity array described by the DADSignalInfo object.

Parameters:signal – signal descriptor
Returns:array of mean intensities
maximum_wavelength_array

Array of the wavelengths for the maximum intensity at each retention time

retention_times

retention times corresponding to the data array (min)

total_absorbance_chromatogram

The total absorbance chromatogram for the spectrum (sum of all intensities for each retention time)

wavelengths

list of wavelengths for the DAD

write_to_allotrope(filename: str)
class aghplctools.data.sample.HPLCSample(sample_file_name: str, method_name: str, signals: Union[List[aghplctools.data.sample.DADSignalInfo], List[aghplctools.data.sample.DADSignal], List[str]], datetimestamp: Union[str, datetime.datetime] = None, dad_spectrum: aghplctools.data.sample.DADSpectrum = None, ms_spectra: List[aghplctools.data.sample.MSSpectrum] = None, directory: str = None)

Bases: aghplctools.data.sample.HPLCSampleInfo

Data class for describing an HPLC sample containing metadata and spectral data.

Parameters:
  • sample_file_name – name for sample
  • datetimestamp – date and time stamp for when the sample was run
  • method_name – name of method used to run the sample
  • signals – list of signals associated with the run
  • dad_spectrum – DADSpectrum object with loaded data
  • ms_spectra – list of mass spectra
  • directory – directory path where the sample may be found
add_signal(new_signal: Union[aghplctools.data.sample.DADSignalInfo, dict, str]) → aghplctools.data.sample.DADSignal

Adds a new signal to the HPLCSample instance.

Parameters:new_signal – new signal to add. Supported inputs are Agilent specification strings (e.g. ‘DAD1 A, Sig=210,4 Ref=360,100’) DADSignalInfo objects or a dictionary of keyword arguments for instantiating the same.
Returns:the created signal
classmethod create_from_D_file(file_path: Union[pathlib.Path, str]) → aghplctools.data.sample.HPLCSample

Creates an HPLCSample instance from a .D file.

Parameters:file_path – file path to Agilent .D folder
Returns:instantiated HPLCSample with loaded data
classmethod create_from_acaml(acaml: Union[str, xml.etree.ElementTree.ElementTree]) → aghplctools.data.sample.HPLCSampleInfo

not supported for HPLCSample class

classmethod create_from_xml(xml_path: Union[str, xml.etree.ElementTree.ElementTree]) → aghplctools.data.sample.HPLCSampleInfo

Creates sample structure from a Sample.xml file (old style metadata) in the desired .D folder)

Parameters:xml_path – path to xml file or parsed element tree root
Returns:parsed Sample instance
write_signals_to_csv(directory: Union[str, pathlib.Path] = None, overwrite: bool = False) → List[str]

Writes the signals to csv in the directory specified. If no directory is specified, the csv files will be written to the directory path specified in the directory attribute of the instance.

Parameters:
  • directory – directory path
  • overwrite – whether to overwrite files if they already exist
Returns:

file paths written

write_signals_to_xlsx(output_file: Union[str, pathlib.Path] = None) → str

Writes the signals to a single excel file.

Parameters:output_file – target file path. If this is not specified
Returns:path to the written file
class aghplctools.data.sample.HPLCSampleInfo(sample_file_name: str, method_name: str, signals: Union[List[aghplctools.data.sample.DADSignalInfo], List[str]], datetimestamp: Union[str, datetime.datetime] = None)

Bases: object

Data class for describing an HPLC sample.

Parameters:
  • sample_file_name – name for sample
  • datetimestamp – date and time stamp for when the sample was run
  • method_name – name of method used to run the sample
  • signals – list of signals associated with the run
as_dict() → dict

Returns the sample data as a dictionary

classmethod auto_create(target_path: Union[str, pathlib.Path]) → aghplctools.data.sample.HPLCSampleInfo

Attempts to automatically create an instance from metadata in the target folder

Parameters:target_path – path to metadata file or folder containing metadata files
Returns:HPLCSampleInfo instance
classmethod create_from_acaml(acaml: Union[str, xml.etree.ElementTree.ElementTree]) → aghplctools.data.sample.HPLCSampleInfo

Creates sample structure from an acaml file. (use sequence.acam_ in the desired .D folder)

Parameters:acaml – path to acaml file or parsed element tree root
Returns:parsed Sample instance
classmethod create_from_xml(xml_path: Union[str, xml.etree.ElementTree.ElementTree]) → aghplctools.data.sample.HPLCSampleInfo

Creates sample structure from a Sample.xml file (old style metadata) in the desired .D folder)

Parameters:xml_path – path to xml file or parsed element tree root
Returns:parsed Sample instance
date

date which the sample was run on

static find_acaml(acaml_path: Union[str, pathlib.Path]) → xml.etree.ElementTree.ElementTree

Finds an acaml file and loads the element tree

Parameters:acaml_path – path to acaml file or directory containing acaml file
classmethod find_and_get_metadata(target_path: Union[str, pathlib.Path]) → dict

Attempts to locate and parse metadata files in both old (Result.xml) and new (ACAML) formats. If neither file type can be found, an error will be raised.

Parameters:target_path – target path to search
Returns:parsed dictionary for creating HPLCSampleInfo instance
classmethod get_values_from_acaml(acaml: Union[str, pathlib.Path, xml.etree.ElementTree.ElementTree]) → dict

Gets relevant values from an acaml file. (use sequence.acam_ in the desired .D folder)

Parameters:acaml – path to acaml file or parsed element tree root
Returns:dictionary of values of interest
classmethod get_values_from_result_xml(xml_path: Union[str, pathlib.Path]) → dict

Retrieves values from a Result.xml file. This is an old-style ChemStation metadata file (~B.04.03 era).

Parameters:xml_path – path to xml or directory containing xml file
classmethod get_values_from_sample_xml(xml_path: Union[str, pathlib.Path]) → dict

Retrieves values from a Sample.xml file. From ChemStation C.01.07

Parameters:xml_path – path to xml or directory containing xml file
classmethod get_values_from_xml(xml_path: Union[str, pathlib.Path]) → dict

Attempts to find a Result.xml file and parse sample information from that.

Parameters:xml_path – path to xml or directory containing xml file
timestamp

Time of the day when the sample was run

class aghplctools.data.sample.MSSpectrum(filename=None, ftype=None, data=None)

Bases: aston.tracefile.agilent_ms.AgilentMS

An object describing an Agilent DAD spectrum for a sample. Inherits Aston AgilentCSDAD2 and has additional methods for retrieving band information.

Parameters:
  • filename – target filetype
  • ftype
  • data
auto_resolution(npeaks: int = 4) → float

Attempts to automatically determine the resolution of the spectrum.

Parameters:npeaks – number of peakds to try to find
Returns:estimated resolution
classmethod create_from_D_file(file_path: Union[pathlib.Path, str]) → List[aghplctools.data.sample.MSSpectrum]

Creates a MSSpectrum instance from an Agilent .D file

Parameters:file_path – path to .D file
Returns:instance
extract_function_time_tic()

duck-type method for PythoMS

functions

duck type function information (expected in PythoMS)

get_ion_intensities(start_mz: float, end_mz: float = None) → numpy.ndarray

Returns the intensity integral array (reconstructed single ion monitoring) for the provided ion m/z window.

Parameters:
  • start_mz – start m/z ratio for the region
  • end_mz – end m/z ratio for the region.
get_spectrum_of_retention_period(start_time: float, end_time: float) → numpy.ndarray

Returns the intensity array for the mass spectrum in the retention time region provided.

Parameters:
  • start_time – start retention time (min)
  • end_time – end retention time (min)
get_tic_of_function(function: int) → numpy.ndarray

duck-type method for retrieving the TIC (expected in PythoMS)

get_timepoints_of_function(function: int) → numpy.ndarray

duck-type method for retrieving the timepoints (expected in PythoMS)

masses

array of wavelengths for the DAD

retention_times

retention times corresponding to the data array (min)

summed_intensity_array

returns the summed intensity array of the spectrum

summed_spectrum

returns the mz and summed intensity array for the entire run

aghplctools.data.sample.bisect_slice(array, minimum_value: float, maximum_value: float) → Tuple[int, int]

Finds the slice indicies for a minimum and maximum value in an array.

Parameters:
  • array – array like bisectable (assumes sorted)
  • minimum_value – minimum value
  • maximum_value – maximum value
Returns:

slice indicies

aghplctools.data.sample.check_or_locate_file(path: Union[str, pathlib.Path], file_name: str) → pathlib.Path

Checks whether the provided path points to the provided file name. If not, checks whether the path is a directory and searches for the file in the directory. If there are multiple occurences of the provided file name in a directory, the first is returned.

Parameters:
  • path – path to search
  • file_name – target file name
Returns:

path to desired file

aghplctools.data.sample.retrieve_metadata_from_channel(path: Union[str, pathlib.Path]) → dict

Retrieves metadata from a .CH file

Parameters:path – path to read
Returns:returns a dictionary containing metadata from the channel
aghplctools.data.sample.strptime_agilent_dt(dt_string: str) → datetime.datetime

Performs strptime on Agilent datetime string

Parameters:dt_string – agilent datetime strings
Returns:parsed datetime object

aghplctools.data.time_course module

Tools for monitoring time-course data (tracking signals over time)

class aghplctools.data.time_course.HPLCTarget(wavelength: float, retention_time: float, name: str = None, wiggle: float = 0.2, zero_pad: int = 0)

Bases: object

A data storage class for tracking the retention time, area, width, and height of a target HPLC retention target over multiple sample acquisitions.

Parameters:
  • wavelength (float) – wavelength to track the target on
  • retention_time (float) – retention time to look for the target
  • name (str) – convenience name
  • wiggle (float) – wiggle value in minutes for finding the target around the retention_time (the window will be [retention_time-wiggle, retention_time+wiggle])
  • zero_pad – adds n zeros to the front of the value lists
add_from_pulled(signals, timepoint=None)

Retrieves values from the output of the pull_hplc_area function and stores them in the instance.

Parameters:
  • signals (dict) – output dictionary from pull_hplc_area
  • timepoint (float) – timepoint to save (if None, the current time will be retrieved)
Returns:

area, height, width, timepoint

Return type:

tuple

add_value(area, width=0.0, height=0.0, timepoint=None)

Adds a value to the tracker lists.

Parameters:
  • area (float) – area to add (required)
  • width (float) – width to add (optional)
  • height (float) – height to add (optional)
  • timepoint (float) – timepoint to use (if None, the current time will be called)
retrieve_index(index)

Retrieves the values of the provided index.

Parameters:index – pythonic list index
Returns:{area, width, height, timepoint}
Return type:dict
retrieve_timepoint(timepoint)

Retrieves the values of the provided timepoint.

Parameters:timepoint (float) – time point to retrieve
Returns:{area, width, height, timepoint}
Return type:dict
aghplctools.data.time_course.find_max_area(signals)

Returns the wavelength and retention time corresponding to the maximum area in a set of HPLC peak data.

Parameters:signals (dict) – dict[wavelength][retention time (float)][width/area/height]
Returns:
aghplctools.data.time_course.plot(yvalues, xvalues=None, xlabel='injection #', ylabel=None, hline=None)

plots one set of values :param yvalues: list of y values :param xvalues: list of x values (optional) :param xlabel: label for x :param ylabel: label for y :param hline: plot a horizontal line at this value if specified :return:

aghplctools.data.time_course.stackedplot(rets, xlabel='injection #')

Creates a stacked plot for the dictionary generated by pull_hplc_data_from_folder :param rets: dictionary of retetion times :param xlabel: optional changing of x label