aghplctools.ingestion package¶
The ingestion
module contains modules and methods for parsing and ingesting report files produced by Agilent ChemStation.
The exact structure of the exported files may change between installations of ChemStation, but hopefully one of these
tools will work for your purposes.
While the report txt and csv files contain metadata information, a more complete set of metadata may be found in the
sequence.acam_
file which may be found in the *.D
directory. We have written ingesters for this metadata and
created data classes around that metadata. You can find these tools in data.sample
.
aghplctools.ingestion.csv module¶
The ingestion.csv
module contains methods for retrieving peak tables (pull_hplc_area_from_csv
) and metadata
(pull_metadata_from_csv
) from csv report files. The structure of the report csv files (numerous files are generated)
appear to follow some arcane method that is difficult to extract context from. These extraction tools have only worked
for us on runs which are not externally referenced. Your mileage may vary.
-
aghplctools.ingestion.csv.
pull_hplc_area_from_csv
(folder, report_name='Report')¶ Pulls HPLC area data from the specified Agilent HPLC CSV report files. Returns the data tables for each wavelength in dictionary format. Each wavelength table is a dictionary with retention time: peak area format.
Due to the unconventional way Agilent structures its CSV files pulling the data is a bit awkward. In essence, the report consists of one CSV files containing all the metadata, and further CSV files (one per detector signal) containing the data, but without column headers or other metadata. Thus, this function extracts bot data and metadata and stores them in the same format as the text based data parsing.
Parameters: - folder – The folder to search for report files
- report_name – File name (without number or extension) of the report file
Returns: dictionary dict[wavelength][retention time (float)][width/area/height]
-
aghplctools.ingestion.csv.
pull_metadata_from_csv
(folder, report_name='Report')¶ Pulls run metadata from the specified Agilent HPLC CSV report files. Returns the metadata describing the sample in dictionary format.
Parameters: - folder – The folder to search for report files
- report_name – File name (without number or extension) of the report file
Returns: dictionary containing the metadata
aghplctools.ingestion.text module¶
The ingestion.text module contains methods for parsing report text files (named by default 'Report.TXT'
).
The method you are most likely to use in this module is pull_hplc_area_from_txt
. This method pulls an HPLC area table
from a text file when provided with a valid file path. This method will return a dictionary with the following structure:
- Example Output::
- {
- wavelength: {
- retention_time: {
- ‘Width’: float, ‘Area’: float, ‘Height’: float, ‘Peak’: int, ‘Type’: str, ‘RetTime’: float,
}
wavelength
and retention_time
will be floats, and the type of each value is noted in the above dictionary. Regex
matches have been included for each column type we have encountered. If you encounter an error, please create an issue
and provide an example file so that we might expand our matching capabilities.
For convenience, a report_text_to_xlsx
method is included which parses a peak table from a Report.TXT
file and
saves it to an Excel xlsx
file.
-
aghplctools.ingestion.text.
build_peak_regex
(signal_table: str)¶ Builds a peak regex from a signal table
Parameters: signal_table – block of lines associated with an area table Returns: peak line regex object (<=3.6 _sre.SRE_PATTERN, >=3.7 re.Pattern)
-
aghplctools.ingestion.text.
chunk_string
(string, n_chars_list)¶ Chunks a string by n_characters, returning the characters and the remaining string
Parameters: - string (str) – string to chunk
- n_chars_list (lst) – list of number of characters to return
Returns: chunk, remaining string
-
aghplctools.ingestion.text.
parse_area_report
(report_text: str) → dict¶ Interprets report text and parses the area report section, converting it to dictionary.
Parameters: report_text – plain text version of the report. Raises: ValueError – if there are no peaks defined in the report text file Returns: dictionary of signals in the form dict[wavelength][retention time (float)][Width/Area/Height/etc.]
-
aghplctools.ingestion.text.
pull_hplc_area
(filename)¶ Legacy name for pull_hplc_area_from_txt
Returns: dictionary dict[wavelength][retention time (float)][width/area/height]
-
aghplctools.ingestion.text.
pull_hplc_area_from_txt
(filename)¶ Pulls HPLC area data from the specified Agilent HPLC output file Returns the data tables for each wavelength in dictionary format. Each wavelength table is a dictionary with retention time: peak area format.
Parameters: filename (str) – path to file Returns: dictionary dict[wavelength][retention time (float)][Width/Area/Height/etc.]
-
aghplctools.ingestion.text.
report_text_to_xlsx
(target_file: Union[str, pathlib.Path], output_file: Union[str, pathlib.Path] = None) → str¶ Ingests the specified report text and outputs it to an excel file.
Parameters: - target_file – path to target report text file
- output_file – path to output to (if not provided, it will be saved to “Report.xlsx” in the same directory as the report text file
Returns: path to the XLSX file that was written