Find split peaks where a single analyte is distributed across one or more peakgroups.

identify_split_peaks(
  mzroll_db_con,
  clamr_config,
  max_rt_deviation = 5,
  ic_floor = 2^10,
  anticorrelation_co = -0.1,
  signal_frac_IQR_co = 0.75,
  spectra_corr_co = 0.8,
  stratify_regex = NULL,
  n_top_spectra_summed = 3L,
  quality_weights = c(purity = 2, quality = 1)
)

Arguments

mzroll_db_con

a connection to a mzroll database as produced by mzroll_db_sqlite

clamr_config

a named list of mass spec parameters with special formatting of instrument tolerances generated by build_clamr_config.

max_rt_deviation

maximum rt deviation between two peakgroups which could be the same analyte.

ic_floor

floor all low or missing signals to this value for the purpose of calculating anti-correlation in log-space.

anticorrelation_co

cutoff for what is a strong anti-correlation as a sign of peak splitting (most signal of some samples in group A and others in group B will induce a negative correlation between A and B).

signal_frac_IQR_co

cutoff for interquartile range of fractional signal spread between candidate split peak pairs.

spectra_corr_co

cutoff for correlation between consensus spectra of the two peakgroups.

stratify_regex

NULL for no stratification or a string regular expression indicating categories which should drive peak splitting (e.g., batch or date).

n_top_spectra_summed

integer counts of maximum number of spectra to aggregate

quality_weights

length 2 named vector with names "purity" and "quality" indicating the relative amount to weight by precursor purity (i.e., the amount of isolated signal matching the precursorMz) versus peak quality (i.e., good peak shapes).

Value

a tibble containing two variables:

  • groupId_old - current groupIds in the mzroll_db_con

  • groupId_new - updated groupIds in the mzroll_db_con

Details

To be conservative, identified split peak pairs must satisfy all of the following conditions:

  • mass agreement - within the mass tolerance specified in the clamr_config

  • retention time agreement - RTs are within max_rt_deviation of one another

  • mutual exclusivity - log-abundances must be anti-correlated beyond anticorrelation_co and inter-quartile range of the signal split fracitons (fraction of signal in A versus B: A / (A + B)) above signal_frac_IQR_co.

  • batch-driven [optional] - if batches are provided (using stratify_regex) then signal fraction variation should be explained by batches.

  • fragmentation agreement - fragmentation spectra are correlated above spectra_corr_co