aghplctools.ingestion package

The ingestion module contains modules and methods for parsing and ingesting report files produced by Agilent ChemStation. The exact structure of the exported files may change between installations of ChemStation, but hopefully one of these tools will work for your purposes.

While the report txt and csv files contain metadata information, a more complete set of metadata may be found in the sequence.acam_ file which may be found in the *.D directory. We have written ingesters for this metadata and created data classes around that metadata. You can find these tools in data.sample.

aghplctools.ingestion.csv module

The ingestion.csv module contains methods for retrieving peak tables (pull_hplc_area_from_csv) and metadata (pull_metadata_from_csv) from csv report files. The structure of the report csv files (numerous files are generated) appear to follow some arcane method that is difficult to extract context from. These extraction tools have only worked for us on runs which are not externally referenced. Your mileage may vary.

aghplctools.ingestion.csv.pull_hplc_area_from_csv(folder, report_name='Report')

Pulls HPLC area data from the specified Agilent HPLC CSV report files. Returns the data tables for each wavelength in dictionary format. Each wavelength table is a dictionary with retention time: peak area format.

Due to the unconventional way Agilent structures its CSV files pulling the data is a bit awkward. In essence, the report consists of one CSV files containing all the metadata, and further CSV files (one per detector signal) containing the data, but without column headers or other metadata. Thus, this function extracts bot data and metadata and stores them in the same format as the text based data parsing.

Parameters:
  • folder – The folder to search for report files
  • report_name – File name (without number or extension) of the report file
Returns:

dictionary dict[wavelength][retention time (float)][width/area/height]

aghplctools.ingestion.csv.pull_metadata_from_csv(folder, report_name='Report')

Pulls run metadata from the specified Agilent HPLC CSV report files. Returns the metadata describing the sample in dictionary format.

Parameters:
  • folder – The folder to search for report files
  • report_name – File name (without number or extension) of the report file
Returns:

dictionary containing the metadata

aghplctools.ingestion.text module

The ingestion.text module contains methods for parsing report text files (named by default 'Report.TXT'). The method you are most likely to use in this module is pull_hplc_area_from_txt. This method pulls an HPLC area table from a text file when provided with a valid file path. This method will return a dictionary with the following structure:

Example Output::
{
wavelength: {
retention_time: {
‘Width’: float, ‘Area’: float, ‘Height’: float, ‘Peak’: int, ‘Type’: str, ‘RetTime’: float,

}

wavelength and retention_time will be floats, and the type of each value is noted in the above dictionary. Regex matches have been included for each column type we have encountered. If you encounter an error, please create an issue and provide an example file so that we might expand our matching capabilities.

For convenience, a report_text_to_xlsx method is included which parses a peak table from a Report.TXT file and saves it to an Excel xlsx file.

aghplctools.ingestion.text.build_peak_regex(signal_table: str)

Builds a peak regex from a signal table

Parameters:signal_table – block of lines associated with an area table
Returns:peak line regex object (<=3.6 _sre.SRE_PATTERN, >=3.7 re.Pattern)
aghplctools.ingestion.text.chunk_string(string, n_chars_list)

Chunks a string by n_characters, returning the characters and the remaining string

Parameters:
  • string (str) – string to chunk
  • n_chars_list (lst) – list of number of characters to return
Returns:

chunk, remaining string

aghplctools.ingestion.text.parse_area_report(report_text: str) → dict

Interprets report text and parses the area report section, converting it to dictionary.

Parameters:report_text – plain text version of the report.
Raises:ValueError – if there are no peaks defined in the report text file
Returns:dictionary of signals in the form dict[wavelength][retention time (float)][Width/Area/Height/etc.]
aghplctools.ingestion.text.pull_hplc_area(filename)

Legacy name for pull_hplc_area_from_txt

Returns:dictionary dict[wavelength][retention time (float)][width/area/height]
aghplctools.ingestion.text.pull_hplc_area_from_txt(filename)

Pulls HPLC area data from the specified Agilent HPLC output file Returns the data tables for each wavelength in dictionary format. Each wavelength table is a dictionary with retention time: peak area format.

Parameters:filename (str) – path to file
Returns:dictionary dict[wavelength][retention time (float)][Width/Area/Height/etc.]
aghplctools.ingestion.text.report_text_to_xlsx(target_file: Union[str, pathlib.Path], output_file: Union[str, pathlib.Path] = None) → str

Ingests the specified report text and outputs it to an excel file.

Parameters:
  • target_file – path to target report text file
  • output_file – path to output to (if not provided, it will be saved to “Report.xlsx” in the same directory as the report text file
Returns:

path to the XLSX file that was written