baskerville package
Subpackages
- baskerville.helpers package
- baskerville.scripts package
- Submodules
- baskerville.scripts.hound_data module
- baskerville.scripts.hound_data_align module
- baskerville.scripts.hound_data_read module
- baskerville.scripts.hound_data_write module
- baskerville.scripts.hound_eval module
- baskerville.scripts.hound_eval_spec module
- baskerville.scripts.hound_ism_bed module
- baskerville.scripts.hound_ism_snp module
- baskerville.scripts.hound_predbed module
- baskerville.scripts.hound_snp module
- baskerville.scripts.hound_snp_slurm module
- baskerville.scripts.hound_snpgene module
- baskerville.scripts.hound_train module
- Module contents
Submodules
baskerville.bed module
- baskerville.bed.make_bed_seqs(bed_file, fasta_file, seq_len, stranded=False)[source]
Return BED regions as sequences and regions as a list of coordinate tuples, extended to a specified length.
- baskerville.bed.read_bed_coords(bed_file, seq_len)[source]
Return BED regions as a list of coordinate tuples, extended to a specified length.
- baskerville.bed.write_bedgraph(preds, targets, data_dir: str, out_dir: str, split_label: str, bedgraph_indexes=None)[source]
Write BEDgraph files for predictions and targets from a dataset..
- Parameters:
preds (np.array) – Predictions.
targets (np.array) – Targets.
data_dir (str) – Data directory, for identifying sequences and statistics.
out_dir (str) – Output directory.
split_label (str) – Split label.
bedgraph_indexes (list) – List of target indexes to write.
baskerville.blocks module
- baskerville.blocks.conv_block(inputs, filters=None, kernel_size=1, activation='relu', activation_end=None, stride=1, dilation_rate=1, l2_scale=0, dropout=0, conv_type='standard', pool_size=1, pool_type='max', norm_type=None, bn_momentum=0.99, norm_gamma=None, residual=False, kernel_initializer='he_normal', padding='same')[source]
Construct a single convolution block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
filters – Conv1D filters
kernel_size – Conv1D kernel_size
activation – relu/gelu/etc
stride – Conv1D stride
dilation_rate – Conv1D dilation rate
l2_scale – L2 regularization weight.
dropout – Dropout rate probability
conv_type – Conv1D layer type
residual – Residual connection boolean
pool_size – Max pool width
norm_type – Apply batch or layer normalization
bn_momentum – BatchNorm momentum
norm_gamma – BatchNorm gamma (defaults according to residual)
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.conv_block_2d(inputs, filters=128, activation='relu', conv_type='standard', kernel_size=1, stride=1, dilation_rate=1, l2_scale=0, dropout=0, pool_size=1, norm_type=None, bn_momentum=0.99, norm_gamma='ones', kernel_initializer='he_normal', symmetric=False)[source]
Construct a single 2D convolution block.
- baskerville.blocks.conv_dna(inputs, filters=None, kernel_size=15, activation='relu', stride=1, l2_scale=0, residual=False, dropout=0, dropout_residual=0, pool_size=1, pool_type='max', norm_type=None, bn_momentum=0.99, norm_gamma=None, use_bias=None, se=False, conv_type='standard', kernel_initializer='he_normal', padding='same')[source]
Construct a single convolution block, assumed to be operating on DNA.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
filters – Conv1D filters
kernel_size – Conv1D kernel_size
activation – relu/gelu/etc
stride – Conv1D stride
l2_scale – L2 regularization weight.
dropout – Dropout rate probability
conv_type – Conv1D layer type
pool_size – Max pool width
norm_type – Apply batch or layer normalization
bn_momentum – BatchNorm momentum
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.conv_nac(inputs, filters=None, kernel_size=1, activation='relu', stride=1, dilation_rate=1, l2_scale=0, dropout=0, conv_type='standard', residual=False, pool_size=1, pool_type='max', norm_type=None, bn_momentum=0.99, norm_gamma=None, kernel_initializer='he_normal', padding='same', se=False)[source]
Construct a single convolution block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
filters – Conv1D filters
kernel_size – Conv1D kernel_size
activation – relu/gelu/etc
stride – Conv1D stride
dilation_rate – Conv1D dilation rate
l2_scale – L2 regularization weight.
dropout – Dropout rate probability
conv_type – Conv1D layer type
residual – Residual connection boolean
pool_size – Max pool width
norm_type – Apply batch or layer normalization
bn_momentum – BatchNorm momentum
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.conv_next(inputs, filters=None, kernel_size=7, activation='relu', dense_expansion=2.0, dilation_rate=1, l2_scale=0, dropout=0, residual=False, pool_size=1, pool_type='max', kernel_initializer='he_normal', padding='same', norm_type=None, bn_momentum=0.99)[source]
Construct a single convolution block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
filters – Conv1D filters
kernel_size – Conv1D kernel_size
activation – relu/gelu/etc
dilation_rate – Conv1D dilation rate
l2_scale – L2 regularization weight.
dropout – Dropout rate probability
residual – Residual connection boolean
pool_size – Max pool width
bn_momentum – BatchNorm momentum
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.conv_tower(inputs, filters_init, filters_end=None, filters_mult=None, divisible_by=1, repeat=1, reprs=[], **kwargs)[source]
Construct a reducing convolution block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
filters_init – Initial Conv1D filters
filters_end – End Conv1D filters
filters_mult – Multiplier for Conv1D filters
divisible_by – Round filters to be divisible by (eg a power of two)
repeat – Tower repetitions
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.conv_tower_nac(inputs, filters_init, filters_end=None, filters_mult=None, divisible_by=1, repeat=1, reprs=[], **kwargs)[source]
Construct a reducing convolution block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
filters_init – Initial Conv1D filters
filters_end – End Conv1D filters
filters_mult – Multiplier for Conv1D filters
divisible_by – Round filters to be divisible by (eg a power of two)
repeat – Tower repetitions
reprs – Append representations.
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.conv_tower_v1(inputs, filters_init, filters_mult=1, repeat=1, **kwargs)[source]
Construct a reducing convolution block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
filters_init – Initial Conv1D filters
filters_mult – Multiplier for Conv1D filters
repeat – Conv block repetitions
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.convnext_tower(inputs, filters_init, filters_end=None, filters_mult=None, kernel_size=1, dropout=0, pool_size=2, pool_type='max', divisible_by=1, repeat=1, num_convs=2, reprs=[], **kwargs)[source]
Abc.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
filters_init – Initial Conv1D filters
filters_end – End Conv1D filters
filters_mult – Multiplier for Conv1D filters
kernel_size – Conv1D kernel_size
dropout – Dropout on subsequent convolution blocks.
pool_size – Pool width.
repeat – Residual block repetitions
num_convs – Conv blocks per residual layer
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.dense_block(inputs, units=None, activation='relu', activation_end=None, flatten=False, dropout=0, l2_scale=0, l1_scale=0, residual=False, norm_type=None, bn_momentum=0.99, norm_gamma=None, kernel_initializer='he_normal', **kwargs)[source]
Construct a single convolution block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
units – Conv1D filters
activation – relu/gelu/etc
activation_end – Compute activation after the other operations
flatten – Flatten across positional axis
dropout – Dropout rate probability
l2_scale – L2 regularization weight.
l1_scale – L1 regularization weight.
residual – Residual connection boolean
batch_norm – Apply batch normalization
bn_momentum – BatchNorm momentum
norm_gamma – BatchNorm gamma (defaults according to residual)
- Returns:
[batch_size, seq_length(?), features] output sequence
- baskerville.blocks.dense_nac(inputs, units=None, activation='relu', flatten=False, dropout=0, l2_scale=0, l1_scale=0, residual=False, norm_type=None, bn_momentum=0.99, norm_gamma=None, kernel_initializer='he_normal', **kwargs)[source]
Construct a single convolution block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
units – Conv1D filters
activation – relu/gelu/etc
activation_end – Compute activation after the other operations
flatten – Flatten across positional axis
dropout – Dropout rate probability
l2_scale – L2 regularization weight.
l1_scale – L1 regularization weight.
residual – Residual connection boolean
batch_norm – Apply batch normalization
bn_momentum – BatchNorm momentum
norm_gamma – BatchNorm gamma (defaults according to residual)
- Returns:
[batch_size, seq_length(?), features] output sequence
- baskerville.blocks.dilated_dense(inputs, filters, kernel_size=3, rate_mult=2, conv_type='standard', dropout=0, repeat=1, **kwargs)[source]
Construct a residual dilated dense block.
Args:
Returns:
- baskerville.blocks.dilated_residual(inputs, filters, kernel_size=3, rate_mult=2, dropout=0, repeat=1, conv_type='standard', norm_type=None, round=False, **kwargs)[source]
Construct a residual dilated convolution block.
Args:
Returns:
- baskerville.blocks.dilated_residual_2d(inputs, filters, kernel_size=3, rate_mult=2, dropout=0, repeat=1, symmetric=True, **kwargs)[source]
Construct a residual dilated convolution block.
- baskerville.blocks.dilated_residual_nac(inputs, filters, kernel_size=3, rate_mult=2, dropout=0, repeat=1, **kwargs)[source]
Construct a residual dilated convolution block.
Args:
Returns:
- baskerville.blocks.final(inputs, units, activation='linear', flatten=False, kernel_initializer='he_normal', l2_scale=0, l1_scale=0, **kwargs)[source]
Final simple transformation before comparison to targets.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
units – Dense units
activation – relu/gelu/etc
flatten – Flatten positional axis.
l2_scale – L2 regularization weight.
l1_scale – L1 regularization weight.
- Returns:
[batch_size, seq_length(?), units] output sequence
- baskerville.blocks.res_tower(inputs, filters_init, filters_end=None, filters_mult=None, kernel_size=1, dropout=0, pool_size=2, pool_type='max', divisible_by=1, repeat=1, num_convs=2, reprs=[], **kwargs)[source]
Construct a reducing convolution block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
filters_init – Initial Conv1D filters
filters_end – End Conv1D filters
filters_mult – Multiplier for Conv1D filters
kernel_size – Conv1D kernel_size
dropout – Dropout on subsequent convolution blocks.
pool_size – Pool width.
repeat – Residual block repetitions
num_convs – Conv blocks per residual layer
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.squeeze_excite(inputs, activation='relu', bottleneck_ratio=8, additive=False, norm_type=None, bn_momentum=0.9, **kwargs)[source]
- baskerville.blocks.tconv_nac(inputs, filters=None, kernel_size=1, activation='relu', stride=1, l2_scale=0, dropout=0, conv_type='standard', norm_type=None, bn_momentum=0.99, norm_gamma=None, kernel_initializer='he_normal', padding='same')[source]
Construct a single transposed convolution block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
filters – Conv1D filters
kernel_size – Conv1D kernel_size
activation – relu/gelu/etc
stride – UpSample stride
l2_scale – L2 regularization weight.
dropout – Dropout rate probability
conv_type – Conv1D layer type
norm_type – Apply batch or layer normalization
bn_momentum – BatchNorm momentum
- Returns:
[batch_size, stride*seq_length, features] output sequence
- baskerville.blocks.transformer(inputs, key_size=None, heads=1, out_size=None, activation='relu', dense_expansion=2.0, content_position_bias=True, dropout=0.25, attention_dropout=0.05, position_dropout=0.01, l2_scale=0, mha_l2_scale=0, num_position_features=None, qkv_width=1, mha_initializer='he_normal', kernel_initializer='he_normal', **kwargs)[source]
Construct a transformer block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
key_size – Conv block repetitions
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.transformer2(inputs, key_size=None, heads=1, out_size=None, activation='relu', num_position_features=None, attention_dropout=0.05, position_dropout=0.01, dropout=0.25, dense_expansion=2.0, qkv_width=1, **kwargs)[source]
- Construct a transformer block, with length-wise pooling before
returning to full length.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
key_size – Conv block repetitions
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.transformer_dense(inputs, out_size, dense_expansion, l2_scale, dropout, kernel_initializer)[source]
Transformer block dense portion.
- baskerville.blocks.transformer_split(inputs, splits=2, key_size=None, heads=1, out_size=None, activation='relu', dense_expansion=2.0, content_position_bias=True, dropout=0.25, attention_dropout=0.05, position_dropout=0.01, l2_scale=0, mha_l2_scale=0, num_position_features=None, qkv_width=1, mha_initializer='he_normal', kernel_initializer='he_normal', **kwargs)[source]
Construct a transformer block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
key_size – Conv block repetitions
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.transformer_tower(inputs, repeat=2, block_type='transformer', **kwargs)[source]
Construct a tower of repeated transformer blocks.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
repeat – Conv block repetitions
- Returns:
[batch_size, seq_length, features] output sequence
- baskerville.blocks.unet_concat(inputs, unet_repr, activation='relu', stride=2, l2_scale=0, dropout=0, norm_type=None, bn_momentum=0.99, kernel_size=1, kernel_initializer='he_normal')[source]
Construct a single transposed convolution block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
filters – Conv1D filters
kernel_size – Conv1D kernel_size
activation – relu/gelu/etc
stride – UpSample stride
l2_scale – L2 regularization weight.
dropout – Dropout rate probability
conv_type – Conv1D layer type
norm_type – Apply batch or layer normalization
bn_momentum – BatchNorm momentum
- Returns:
[batch_size, stride*seq_length, features] output sequence
- baskerville.blocks.unet_conv(inputs, unet_repr, activation='relu', stride=2, l2_scale=0, dropout=0, norm_type=None, bn_momentum=0.99, kernel_size=1, kernel_initializer='he_normal', upsample_conv=False)[source]
Construct a feature pyramid network block.
- Parameters:
inputs – [batch_size, seq_length, features] input sequence
kernel_size – Conv1D kernel_size
activation – relu/gelu/etc
stride – UpSample stride
l2_scale – L2 regularization weight.
dropout – Dropout rate probability
norm_type – Apply batch or layer normalization
bn_momentum – BatchNorm momentum
upsample_conv – Conv1D the upsampled input path
- Returns:
[batch_size, seq_length, features] output sequence
baskerville.data module
- class baskerville.data.Contig(genome, chr, start, end)
Bases:
tuple
- chr
Alias for field number 1
- end
Alias for field number 3
- genome
Alias for field number 0
- start
Alias for field number 2
- class baskerville.data.ModelSeq(genome, chr, start, end, label)
Bases:
tuple
- chr
Alias for field number 1
- end
Alias for field number 3
- genome
Alias for field number 0
- label
Alias for field number 4
- start
Alias for field number 2
- baskerville.data.annotate_unmap(mseqs, unmap_bed, seq_length, pool_width)[source]
- Intersect the sequence segments with unmappable regions
and annoate the segments as NaN to possible be ignored.
- Parameters:
mseqs – list of ModelSeq’s
unmap_bed – unmappable regions BED file
seq_length – sequence length (after cropping)
pool_width – pooled bin width
- Returns:
NxL binary NA indicators
- Return type:
seqs_unmap
- baskerville.data.break_large_contigs(contigs, break_t, verbose=False)[source]
Break large contigs in half until all contigs are under the size threshold.
- baskerville.data.contig_sequences(contigs, seq_length, stride, snap=1, label=None)[source]
Break up a list of Contig’s into a list of model length and stride sequence contigs.
- baskerville.data.load_chromosomes(genome_file)[source]
Load genome segments from either a FASTA file or chromosome length table.
- baskerville.data.rejoin_large_contigs(contigs)[source]
Rejoin large contigs that were broken up before alignment comparison.
- baskerville.data.split_contigs(chrom_segments, gaps_file)[source]
Split the assembly up into contigs defined by the gaps.
- Parameters:
chrom_segments – dict mapping chromosome names to lists of (start,end)
gaps_file – file specifying assembly gaps
- Returns:
same, with segments broken by the assembly gaps.
- Return type:
chrom_segments
baskerville.dataset module
- class baskerville.dataset.SeqDataset(data_dir: str, split_label: str, batch_size: int, shuffle_buffer: int = 128, seq_length_crop: int = 0, mode: str = 'eval', tfr_pattern: str | None = None, targets_slice_file: str | None = None)[source]
Bases:
object
Labeled sequence dataset for Tensorflow.
- Parameters:
data_dir (str) – Dataset directory.
split_label (str) – Dataset split, e.g. train, valid, test.
batch_size (int) – Batch size.
shuffle_buffer (int) – Shuffle buffer size. Defaults to 128.
seq_length_crop (int) – Sequence length to crop from sides. Defaults to 0.
mode (str) – Dataset mode, e.g. train/eval. Defaults to ‘eval’.
tfr_pattern (str) – TFRecord pattern to glob. Defaults to split_label.
targets_slice_file (str) – Targets table from which to slice a target subset.
- baskerville.dataset.file_to_records(filename: str)[source]
Read TFRecord file into tf.data.Dataset.
- baskerville.dataset.make_strand_transform(targets_df, targets_strand_df)[source]
Make a sparse matrix to sum strand pairs.
- Parameters:
targets_df (pd.DataFrame) – Targets DataFrame.
targets_strand_df (pd.DataFrame) – Targets DataFrame, with strand pairs collapsed.
- Returns:
Sparse matrix to sum strand pairs.
- Return type:
scipy.sparse.dok_matrix
- baskerville.dataset.targets_prep_strand(targets_df)[source]
Adjust targets table for merged stranded datasets.
- Parameters:
targets_df – pandas DataFrame of targets
- Returns:
- pandas DataFrame of targets, with stranded
targets collapsed into a single row
- Return type:
targets_df
- baskerville.dataset.untransform_preds(preds, targets_df, unscale=False, unclip=True)[source]
Undo the squashing transformations performed for the tasks.
- Parameters:
preds (np.array) – Predictions LxT.
targets_df (pd.DataFrame) – Targets information table.
- Returns:
Untransformed predictions LxT.
- Return type:
preds (np.array)
- baskerville.dataset.untransform_preds1(preds, targets_df, unscale=False, unclip=True)[source]
Undo the squashing transformations performed for the tasks.
- Parameters:
preds (np.array) – Predictions LxT.
targets_df (pd.DataFrame) – Targets information table.
- Returns:
Untransformed predictions LxT.
- Return type:
preds (np.array)
baskerville.dna module
- baskerville.dna.dna_1hot(seq: str, seq_len: int | None = None, n_uniform: bool = False, n_sample: bool = False)[source]
Convert a DNA sequence to a 1-hot encoding.
- Parameters:
seq (str) – DNA sequence.
seq_len (int) – length to extend/trim sequences to.
n_uniform (bool) – represent N’s as 0.25, forcing float16,
n_sample (bool) – sample ACGT for N
- Returns:
1-hot encoding of DNA sequence.
- Return type:
seq_code (np.array)
- baskerville.dna.dna_1hot_index(seq: str, n_sample: bool = False)[source]
Convert a DNA sequence to an index encoding.
- Parameters:
seq (str) – DNA sequence.
n_sample (bool) – sample ACGT for N
- Returns:
Index encoding of DNA sequence.
- Return type:
seq_code (np.array)
- baskerville.dna.dna_rc(seq: str)[source]
Reverse complement a DNA sequence.
- Parameters:
seq (str) – DNA sequence.
- Returns:
Reverse complement of the input sequence.
- baskerville.dna.hot1_augment(Xb, fwdrc: bool = True, shift: int = 0)[source]
Transform a batch of one hot coded sequences to augment training.
- Parameters:
Xb (np.array) – Batch x Length x 4 one hot coded sequences.
fwdrc (bool) – Representing forward versus reverse complement strand.
shift (int) – Shift sequences by this many positions.
- Returns:
Transformed batch of sequences.
- Return type:
Xbt (np.array)
- baskerville.dna.hot1_delete(seq_1hot, pos: int, delete_len: int, pad_value=None)[source]
- Delete nucleotides starting at a given position
in the Lx4 1-hot encoded sequence.
- Parameters:
seq_1hot (np.array) – 1-hot encoded sequence.
pos (int) – Position to start deleting.
delete_len (int) – Number of nucleotides to delete.
pad_value (float) – Value to pad the end with.
- Returns:
In-place transformed sequence.
- Return type:
seq_1hot (np.array)
- baskerville.dna.hot1_dna(seqs_1hot)[source]
Convert 1-hot coded sequences to ACGTN.
- Parameters:
seq_1hot (np.array) – 1-hot encoded sequences.
- Returns:
List of DNA sequences.
- Return type:
seqs [str]
- baskerville.dna.hot1_get(seqs_1hot, pos: int)[source]
- Return the nucleotide corresponding to the one hot coding
of position “pos” in the Lx4 array seqs_1hot.
- Parameters:
seqs_1hot (np.array) – 1-hot encoded sequences.
pos (int) – Position to get nucleotide.
- Returns:
Nucleotide.
- Return type:
nt (str)
- baskerville.dna.hot1_insert(seq_1hot, pos: int, insert_seq: str)[source]
Insert sequence at a given position in the 1-hot encoded sequence.
- Parameters:
seq_1hot (np.array) – 1-hot encoded sequence.
pos (int) – Position to insert sequence.
insert_seq (str) – Sequence to insert.
- Returns:
In-place transformed sequence.
- Return type:
seq_1hot (np.array)
- baskerville.dna.hot1_rc(seqs_1hot)[source]
- Reverse complement a batch of one hot coded sequences,
while being robust to additional tracks beyond the four nucleotides.
- Parameters:
seqs_1hot (np.array) – 1-hot encoded sequences.
- Returns:
Reverse complemented sequences.
- Return type:
seqs_1hot_rc (np.array)
- baskerville.dna.hot1_set(seq_1hot, pos: int, nt: str)[source]
Set position in a 1-hot encoded sequence to given nucleotide.
- Parameters:
seq_1hot (np.array) – 1-hot encoded sequence.
pos (int) – Position to set nucleotide.
nt (str) – Nucleotide to set.
- Returns:
In-place transformed sequence.
- Return type:
seq_1hot (np.array)
baskerville.gene module
- class baskerville.gene.Gene(chrom, strand, kv)[source]
Bases:
object
Class for managing genes in an isoform-agnostic way, taking the union of exons across isoforms.
baskerville.layers module
- class baskerville.layers.CenterAverage(*args, **kwargs)[source]
Bases:
Layer
Average the center of the input.
- Parameters:
center (int) – Length of the center slice.
- call(x)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.CenterSlice(*args, **kwargs)[source]
Bases:
Layer
Scale the input by a learned value.
- Parameters:
axis (int or [int]) – Axis/axes along which to scale.
initializer – Initializer for the scale weight.
- call(x)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.ConcatDist2D(*args, **kwargs)[source]
Bases:
Layer
Concatenate the pairwise distance to 2d feature matrix.
- call(inputs)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- class baskerville.layers.ConcatPosition(*args, **kwargs)[source]
Bases:
Layer
Concatenate position to 1d feature vectors.
- call(inputs)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.EnsembleReverseComplement(*args, **kwargs)[source]
Bases:
Layer
Expand tensor to include reverse complement of one hot encoded DNA sequence.
- call(seqs_1hot)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- class baskerville.layers.EnsembleShift(*args, **kwargs)[source]
Bases:
Layer
Expand tensor to include shifts of one hot encoded DNA sequence.
- call(seqs_1hot)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.FactorInverse(*args, **kwargs)[source]
Bases:
Layer
Inverse a target matrix factorization.
- call(W)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.GlobalContext(*args, **kwargs)[source]
Bases:
Layer
- build(input_shape)[source]
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().
This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).
- Parameters:
input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).
- call(x)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- class baskerville.layers.LengthAverage(*args, **kwargs)[source]
Bases:
Layer
Average across a variable length sequence.
- call(x, seq)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- class baskerville.layers.MultiheadAttention(*args, **kwargs)[source]
Bases:
Layer
Multi-head attention.
- call(inputs, training=False)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.OneToTwo(*args, **kwargs)[source]
Bases:
Layer
Transform 1d to 2d with i,j vectors operated on.
- call(oned)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.Scale(*args, **kwargs)[source]
Bases:
Layer
Scale the input by a learned value.
- Parameters:
axis (int or [int]) – Axis/axes along which to scale.
initializer – Initializer for the scale weight.
- build(input_shape)[source]
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().
This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).
- Parameters:
input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).
- call(x)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.SoftmaxPool1D(*args, **kwargs)[source]
Bases:
Layer
Pooling operation with optional weights.
- build(input_shape)[source]
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().
This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).
- Parameters:
input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).
- call(inputs)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.Softplus(*args, **kwargs)[source]
Bases:
Layer
Safe softplus, clipping large values.
- call(x)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.SqueezeExcite(*args, **kwargs)[source]
Bases:
Layer
- build(input_shape)[source]
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().
This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).
- Parameters:
input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).
- call(x)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.StochasticReverseComplement(*args, **kwargs)[source]
Bases:
Layer
Stochastically reverse complement a one hot encoded DNA sequence.
- call(seq_1hot, training=None)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- class baskerville.layers.StochasticShift(*args, **kwargs)[source]
Bases:
Layer
Stochastically shift a one hot encoded DNA sequence.
- call(seq_1hot, training=None)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.SwitchReverse(*args, **kwargs)[source]
Bases:
Layer
Reverse predictions if the inputs were reverse complemented.
- call(x_reverse)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.SwitchReverseTriu(*args, **kwargs)[source]
Bases:
Layer
- call(x_reverse)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.Symmetrize2D(*args, **kwargs)[source]
Bases:
Layer
Take the average of a matrix and its transpose to enforce symmetry.
- call(x)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- class baskerville.layers.UpperTri(*args, **kwargs)[source]
Bases:
Layer
Unroll matrix to its upper triangular portion.
- call(inputs)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- class baskerville.layers.WheezeExcite(*args, **kwargs)[source]
Bases:
Layer
- build(input_shape)[source]
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().
This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).
- Parameters:
input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).
- call(x)[source]
This is where the layer’s logic lives.
The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
- Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value of a keyword argument.
NumPy array or Python scalar values in inputs get cast as tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method) using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs only.
Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns:
A tensor or list/tuple of tensors.
- get_config()[source]
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
- Returns:
Python dictionary.
- baskerville.layers.positional_features(positions: Tensor, feature_size: int, seq_length: int, symmetric=False)[source]
Compute relative positional encodings/features.
Each positional feature function will compute/provide the same fraction of features, making up the total of feature_size.
- Parameters:
positions – Tensor of relative positions of arbitrary shape.
feature_size – Total number of basis functions.
seq_length – Sequence length denoting the characteristic length that the individual positional features can use. This is required since the parametrization of the input features should be independent of positions while it could still require to use the total number of features.
symmetric – If True, the resulting features will be symmetric across the relative position of 0 (i.e. only absolute value of positions will matter). If false, then both the symmetric and asymmetric version (symmetric multiplied by sign(positions)) of the features will be used.
- Returns:
positions.shape + (feature_size,).
- Return type:
Tensor of shape
baskerville.metrics module
- class baskerville.metrics.MeanSquaredErrorUDot(udot_weight: float = 1, reduction='auto', name: str = 'mse_udot')[source]
Bases:
LossFunctionWrapper
Mean squared error with mean-normalized specificity term.
- Parameters:
udot_weight – Weight of the mean-normalized specificity term.
- class baskerville.metrics.PearsonR(*args, **kwargs)[source]
Bases:
Metric
PearsonR metric for multi-task data.
- Parameters:
num_targets (int) – Number of tasks.
summarize (bool) – Whether to summarize over all tasks.
- class baskerville.metrics.PoissonKL(kl_weight: int = 1, reduction='auto', name='poisson_kl')[source]
Bases:
LossFunctionWrapper
Possion decomposition with KL specificity term.
- Parameters:
kl_weight (float) – Weight of the KL specificity term.
- class baskerville.metrics.PoissonMultinomial(total_weight: float = 1, weight_range: float = 1, weight_exp: int = 4, reduction='auto', name: str = 'poisson_multinomial')[source]
Bases:
LossFunctionWrapper
Possion decomposition with multinomial specificity term.
- Parameters:
total_weight (float) – Weight of the Poisson total term.
- class baskerville.metrics.R2(*args, **kwargs)[source]
Bases:
Metric
R2 metric for multi-task data.
- Parameters:
num_targets (int) – Number of tasks.
summarize (bool) – Whether to summarize over all tasks.
- class baskerville.metrics.SeqAUC(*args, **kwargs)[source]
Bases:
AUC
AUC metric for multi-task sequence data.
- Parameters:
curve (str) – Metric type–‘ROC’ or ‘PR’.
summarize (bool) – Whether to summarize over all tasks.
- baskerville.metrics.mean_squared_error_udot(y_true, y_pred, udot_weight: float = 1)[source]
Mean squared error with mean-normalized specificity term.
- Parameters:
udot_weight – Weight of the mean-normalized specificity term.
- baskerville.metrics.poisson(yt, yp, epsilon: float = 1e-07)[source]
Poisson loss, without mean reduction.
- baskerville.metrics.poisson_kl(y_true, y_pred, kl_weight=1, epsilon=1e-07)[source]
Poisson decomposition with KL specificity term.
- Parameters:
kl_weight (float) – Weight of the KL specificity term.
epsilon (float) – Added small value to avoid log(0).
- baskerville.metrics.poisson_multinomial(y_true, y_pred, total_weight: float = 1, weight_range: float = 1, weight_exp: int = 4, epsilon: float = 1e-07, rescale: bool = False)[source]
Possion decomposition with multinomial specificity term.
- Parameters:
total_weight (float) – Weight of the Poisson total term.
epsilon (float) – Added small value to avoid log(0).
rescale (bool) – Rescale loss after re-weighting.
baskerville.seqnn module
- class baskerville.seqnn.SeqNN(params: dict)[source]
Bases:
object
Sequence neural network model.
- Parameters:
params (dict) – Model specification and parameters.
- build_block(current, block_params)[source]
Construct a SeqNN block.
- Parameters:
current – Current Tensor.
block_params (dict) – Block parameters.
- Returns:
New current Tensor.
- Return type:
current
- build_embed(conv_layer_i: int, batch_norm: bool = True)[source]
Build model to embed sequences into specific layer.
- build_ensemble(ensemble_rc: bool = False, ensemble_shifts=[0])[source]
Build ensemble of models computing on augmented input sequences.
- build_slice(target_slice=None, target_sum: bool = False)[source]
Slice and/or sum across tasks, in graph.
- evaluate(seq_data, head_i=None, loss_label: str = 'poisson', loss_fn=None)[source]
Evaluate model on SeqDataset.
- gradients(seq_1hot, head_i=None, target_slice=None, pos_slice=None, pos_mask=None, pos_slice_denom=None, pos_mask_denom=None, chunk_size=None, batch_size=1, track_scale=1.0, track_transform=1.0, clip_soft=None, pseudo_count=0.0, no_transform=False, use_mean=False, use_ratio=False, use_logodds=False, subtract_avg=True, input_gate=True, smooth_grad=False, n_samples=5, sample_prob=0.875, dtype='float16')[source]
Compute input gradients for sequences (GPU-friendly).
- gradients_func(model, seq_1hot, target_slice, pos_slice, pos_mask=None, pos_slice_denom=None, pos_mask_denom=True, track_scale=1.0, track_transform=1.0, clip_soft=None, pseudo_count=0.0, no_transform=False, use_mean=False, use_ratio=False, use_logodds=False, subtract_avg=True, input_gate=True)[source]
- gradients_func_orig(model, seq_1hot, pos_slice)[source]
Compute input gradients for each task.
- Parameters:
model (tf.keras.Model) – Model to compute gradients for.
seq_1hot (tf.Tensor) – 1-hot encoded sequence.
pos_slice ([int]) – Sequence positions to consider.
- Returns:
Gradients for each task.
- Return type:
grads (tf.Tensor)
- gradients_orig(seq_1hot, head_i=None, pos_slice=None, batch_size=8, dtype='float16')[source]
Compute input gradients for each task.
- Parameters:
seq_1hot (np.array) – 1-hot encoded sequence.
head_i (int) – Model head index.
pos_slice ([int]) – Sequence positions to consider.
batch_size (int) – number of tasks to compute gradients for at once.
dtype – Returned data type.
- Returns:
Gradients for each task.
- predict(seq_data, head_i: int | None = None, generator: bool = False, stream: bool = False, step: int = 1, dtype: str = 'float32', **kwargs)[source]
Predict targets for SeqDataset, with more options.
- Parameters:
seq_data (SeqDataset) – Dataset to predict on.
head_i (int) – Model head index.
generator (bool) – Use generator to predict on dataset.
stream (bool) – Stream predictions from dataset.
step (int) – Step size.
dtype (str) – Data type to return.
- predict_transform(seq_1hot: array, targets_df, strand_transform: array | None = None, untransform_old: bool = False)[source]
Predict a single sequence and transform.
- Parameters:
seq_1hot (np.array) – 1-hot encoded sequence.
targets_df (pd.DataFrame) – Targets dataframe.
strand_transform (np.array) – Strand merging transform.
untransform_old (bool) – Apply old untransform.
- save(model_file, trunk=False)[source]
Save model weights to file.
- Parameters:
model_file (str) – Path to save model weights.
trunk (bool) – Save trunk weights only.
- set_defaults()[source]
Set default parameters.
Only necessary for my bespoke parameters. Others are best defaulted closer to the source.
baskerville.snps module
- class baskerville.snps.GeneSNPCluster[source]
Bases:
SNPCluster
- baskerville.snps.cluster_genes(transcriptome, seq_length: int, center_pct: float)[source]
Cluster genes into regions that will satisfy the required center_pct.
- Parameters:
transcriptome (Transcriptome) – Transcriptome object.
seq_length (int) – Sequence length.
center_pct (float) – Percent of sequence length to cluster genes.
- baskerville.snps.cluster_snps(snps, seq_len: int, center_pct: float)[source]
- Cluster a sorted list of SNPs into regions that will satisfy
the required center_pct.
- Parameters:
[SNP] (snps) – List of SNPs.
seq_len (int) – Sequence length.
center_pct (float) – Percent of sequence length to cluster SNPs.
- baskerville.snps.compute_scores(ref_preds, alt_preds, snp_stats, strand_transform=None)[source]
Compute SNP scores from reference and alternative predictions.
- Parameters:
ref_preds (np.array) – Reference allele predictions.
alt_preds (np.array) – Alternative allele predictions.
[str] (snp_stats) – List of SAD stats to compute.
strand_transform (scipy.sparse) – Strand transform matrix.
- baskerville.snps.initialize_output_h5(out_dir, snp_stats, snps, targets_length, targets_df, num_shifts, geneseq_clusters=None)[source]
Initialize an output HDF5 file for SAD stats.
- Parameters:
out_dir (str) – Output directory.
[str] (snp_stats) – List of SAD stats to compute.
[SNP] (snps) – List of SNPs.
targets_length (int) – Targets’ sequence length
targets_df (pd.DataFrame) – Targets DataFrame.
num_shifts (int) – Number of shifts.
[GeneSNPCluster] (geneseq_clusters) – Gene sequence clusters.
- baskerville.snps.make_alt_1hot(ref_1hot, snp_seq_pos, ref_allele, alt_allele)[source]
Return alternative allele one hot coding.
- Parameters:
ref_1hot (np.array) – Reference allele one hot coding.
snp_seq_pos (int) – SNP position in sequence.
ref_allele (str) – Reference allele.
alt_allele (str) – Alternative allele.
- Returns:
Alternative allele one hot coding.
- Return type:
np.array
- baskerville.snps.make_gene_bedt(genesnp_clusters)[source]
Make a BedTool object for all gene sequences.
- baskerville.snps.score_gene_snps(params_file, model_file, vcf_file, worker_index, options)[source]
Score SNPs in a VCF file with a SeqNN model.
- Parameters:
params_file – Model parameters
model_file – Saved model weights
vcf_file – VCF
:param worker_index :param options: options from cmd args :return:
- baskerville.snps.score_snps(params_file, model_file, vcf_file, worker_index, options)[source]
Score SNPs in a VCF file with a SeqNN model.
- Parameters:
params_file – Model parameters
model_file – Saved model weights
vcf_file – VCF
:param worker_index :param options: options from cmd args :return:
- baskerville.snps.stitch_preds(preds, shifts, pos=None)[source]
Stitch indel left and right compensation shifts.
- Parameters:
[np.array] (preds) – List of predictions.
[int] (shifts) – List of shifts.
pos (int) – SNP position to stitch at.
- baskerville.snps.write_pct(scores_out, snp_stats)[source]
Compute percentile values for each target and write to HDF5.
- Parameters:
scores_out (h5py.File) – Output HDF5 file.
[str] (snp_stats) – List of SAD stats to compute.
- baskerville.snps.write_snp(ref_preds_sum, alt_preds_sum, scores_out, si, snp_stats)[source]
Write SNP predictions to HDF, assuming the length dimension has been collapsed.
- Parameters:
ref_preds_sum (np.array) – Reference allele predictions.
alt_preds_sum (np.array) – Alternative allele predictions.
scores_out (h5py.File) – Output HDF5 file.
si (int) – SNP index.
[str] (snp_stats) – List of SAD stats to compute.
baskerville.trainer module
- class baskerville.trainer.Cyclical1LearningRate(initial_learning_rate: float, maximal_learning_rate: float, final_learning_rate: float, step_size, name: str = 'Cyclical1LearningRate')[source]
Bases:
LearningRateSchedule
A LearningRateSchedule that uses cyclical schedule. https://yashuseth.blog/2018/11/26/hyper-parameter-tuning-best-practices-learning-rate-batch-size-momentum-weight-decay/
- Parameters:
initial_learning_rate (float) – The initial learning rate.
maximal_learning_rate (float) – The maximal learning rate after warm up.
final_learning_rate (float) – The final learning rate after cycle.
step_size (int) – Cycle step size.
name (str, optional) – The name of the schedule. Defaults to “Cyclical1LearningRate”.
- class baskerville.trainer.EarlyStoppingMin(min_epoch: int = 0, **kwargs)[source]
Bases:
EarlyStopping
Stop training when a monitored quantity has stopped improving.
- Parameters:
min_epoch – Minimum number of epochs before considering stopping.
- on_epoch_end(epoch, logs=None)[source]
Called at the end of an epoch.
Subclasses should override for any actions to run. This function should only be called during TRAIN mode.
- Parameters:
epoch – Integer, index of epoch.
logs – Dict, metric results for this training epoch, and for the validation epoch if validation is performed. Validation result keys are prefixed with val_. For training epoch, the values of the Model’s metrics are returned. Example: {‘loss’: 0.2, ‘accuracy’: 0.7}.
- class baskerville.trainer.Trainer(params: dict, train_data, eval_data, out_dir: str, log_dir: str, strategy=None, num_gpu: int = 1, keras_fit: bool = False)[source]
Bases:
object
Model training class.
- Parameters:
params (dict) – Training parameters dictionary.
train_data – Dataset object or list of Dataset objects.
eval_data – Dataset object or list of Dataset objects.
out_dir (str) – Output directory name.
strategy – tf.distribute.Strategy object.
num_gpu (int) – Number of GPUs to use. Default: 1.
keras_fit (bool) – Use Keras fit method instead of custom loop.
- class baskerville.trainer.WarmUp(initial_learning_rate: float, warmup_steps: int, decay_schedule: None, power: float = 1.0, name: str | None = None)[source]
Bases:
LearningRateSchedule
Applies a warmup schedule on a given learning rate decay schedule. (h/t HuggingFace.)
- Parameters:
initial_learning_rate (
float
) – Initial learning rate after the warmup (so this will be the learning rate at the end of the warmup).decay_schedule (
Callable
) – The learning rate or schedule function to apply after the warmup for the rest of training.warmup_steps (
int
) – The number of steps for the warmup part of training.power (
float
, optional) – Power to use for the polynomial warmup (defaults is a linear warmup).name (
str
, optional) – Optional name prefix for the returned tensors during the schedule.
- baskerville.trainer.adaptive_clip_grad(parameters, gradients, clip_factor: float = 0.1, eps: float = 0.001)[source]
Adaptive gradient clipping.
- baskerville.trainer.compute_norm(x, axis, keepdims)[source]
Compute L2 norm of a tensor across an axis.
- baskerville.trainer.parse_loss(loss_label, strategy=None, keras_fit: bool = True, spec_weight: float = 1, total_weight: float = 1, weight_range: float = 1, weight_exp: int = 1)[source]
Parse loss function from label, strategy, and fitting method.
- Parameters:
loss_label (str) – Loss function label.
strategy – tf.distribute.Strategy object.
keras_fit (bool) – Use Keras fit method instead of custom loop.
spec_weight (float) – Specificity weight for PoissonKL.
total_weight (float) – Total weight for PoissionMultinomial.
- Returns:
tf.keras.losses.Loss object.
- Return type:
loss_fn
baskerville.vcf module
- class baskerville.vcf.SNP(vcf_line, pos2=False)[source]
Bases:
object
Represent SNPs read in from a VCF file
- vcf_line
- Type:
str
- baskerville.vcf.dna_length_1hot(seq, length)[source]
Adjust the sequence length and compute a 1hot coding.
- baskerville.vcf.intersect_seqs_snps(vcf_file, seqs, vision_p=1)[source]
Intersect a VCF file with a list of sequence coordinates.
- In
vcf_file: seqs: list of objects w/ chrom, start, end vision_p: proportion of sequences visible to center genes.
- Out
seqs_snps: list of list mapping segment indexes to overlapping SNP indexes
- baskerville.vcf.intersect_snps_seqs(vcf_file, seq_coords, vision_p=1)[source]
Intersect a VCF file with a list of sequence coordinates.
- In
vcf_file: seq_coords: list of sequence coordinates vision_p: proportion of sequences visible to center genes.
- Out
snp_segs: list of list mapping SNP indexes to overlapping sequence indexes
- baskerville.vcf.snp_seq1(snp, seq_len, genome_open)[source]
Produce one hot coded sequences for a SNP.
- Attrs:
snp [SNP] : seq_len (int) : sequence length to code genome_open (File) : open genome FASTA file
- Returns:
list of one hot coded sequences surrounding the SNP
- Return type:
seq_vecs_list [array]
- baskerville.vcf.snps2_seq1(snps, seq_len, genome1_fasta, genome2_fasta, return_seqs=False)[source]
Produce an array of one hot coded sequences for a list of SNPs.
- Attrs:
snps [SNP] : list of SNPs seq_len (int) : sequence length to code genome_fasta (str) : major allele genome FASTA file genome2_fasta (str) : minor allele genome FASTA file
- Returns:
one hot coded sequences surrounding the SNPs seq_headers [str] : headers for sequences seq_snps [SNP] : list of used SNPs
- Return type:
seq_vecs (array)
- baskerville.vcf.snps_seq1(snps, seq_len, genome_fasta, return_seqs=False)[source]
Produce an array of one hot coded sequences for a list of SNPs.
- Attrs:
snps [SNP] : list of SNPs seq_len (int) : sequence length to code genome_fasta (str) : genome FASTA file
- Returns:
one hot coded sequences surrounding the SNPs seq_headers [str] : headers for sequences seq_snps [SNP] : list of used SNPs
- Return type:
seq_vecs (array)