baskerville package

Subpackages

Submodules

baskerville.bed module

baskerville.bed.make_bed_seqs(bed_file, fasta_file, seq_len, stranded=False)[source]

Return BED regions as sequences and regions as a list of coordinate tuples, extended to a specified length.

baskerville.bed.read_bed_coords(bed_file, seq_len)[source]

Return BED regions as a list of coordinate tuples, extended to a specified length.

baskerville.bed.write_bedgraph(preds, targets, data_dir: str, out_dir: str, split_label: str, bedgraph_indexes=None)[source]

Write BEDgraph files for predictions and targets from a dataset..

Parameters:
  • preds (np.array) – Predictions.

  • targets (np.array) – Targets.

  • data_dir (str) – Data directory, for identifying sequences and statistics.

  • out_dir (str) – Output directory.

  • split_label (str) – Split label.

  • bedgraph_indexes (list) – List of target indexes to write.

baskerville.blocks module

baskerville.blocks.center_average(inputs, center, **kwargs)[source]
baskerville.blocks.center_slice(inputs, center, **kwargs)[source]
baskerville.blocks.concat_dist_2d(inputs, **kwargs)[source]
baskerville.blocks.concat_position(inputs, transform='abs', power=1, **kwargs)[source]
baskerville.blocks.conv_block(inputs, filters=None, kernel_size=1, activation='relu', activation_end=None, stride=1, dilation_rate=1, l2_scale=0, dropout=0, conv_type='standard', pool_size=1, pool_type='max', norm_type=None, bn_momentum=0.99, norm_gamma=None, residual=False, kernel_initializer='he_normal', padding='same')[source]

Construct a single convolution block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • filters – Conv1D filters

  • kernel_size – Conv1D kernel_size

  • activation – relu/gelu/etc

  • stride – Conv1D stride

  • dilation_rate – Conv1D dilation rate

  • l2_scale – L2 regularization weight.

  • dropout – Dropout rate probability

  • conv_type – Conv1D layer type

  • residual – Residual connection boolean

  • pool_size – Max pool width

  • norm_type – Apply batch or layer normalization

  • bn_momentum – BatchNorm momentum

  • norm_gamma – BatchNorm gamma (defaults according to residual)

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.conv_block_2d(inputs, filters=128, activation='relu', conv_type='standard', kernel_size=1, stride=1, dilation_rate=1, l2_scale=0, dropout=0, pool_size=1, norm_type=None, bn_momentum=0.99, norm_gamma='ones', kernel_initializer='he_normal', symmetric=False)[source]

Construct a single 2D convolution block.

baskerville.blocks.conv_dna(inputs, filters=None, kernel_size=15, activation='relu', stride=1, l2_scale=0, residual=False, dropout=0, dropout_residual=0, pool_size=1, pool_type='max', norm_type=None, bn_momentum=0.99, norm_gamma=None, use_bias=None, se=False, conv_type='standard', kernel_initializer='he_normal', padding='same')[source]

Construct a single convolution block, assumed to be operating on DNA.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • filters – Conv1D filters

  • kernel_size – Conv1D kernel_size

  • activation – relu/gelu/etc

  • stride – Conv1D stride

  • l2_scale – L2 regularization weight.

  • dropout – Dropout rate probability

  • conv_type – Conv1D layer type

  • pool_size – Max pool width

  • norm_type – Apply batch or layer normalization

  • bn_momentum – BatchNorm momentum

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.conv_nac(inputs, filters=None, kernel_size=1, activation='relu', stride=1, dilation_rate=1, l2_scale=0, dropout=0, conv_type='standard', residual=False, pool_size=1, pool_type='max', norm_type=None, bn_momentum=0.99, norm_gamma=None, kernel_initializer='he_normal', padding='same', se=False)[source]

Construct a single convolution block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • filters – Conv1D filters

  • kernel_size – Conv1D kernel_size

  • activation – relu/gelu/etc

  • stride – Conv1D stride

  • dilation_rate – Conv1D dilation rate

  • l2_scale – L2 regularization weight.

  • dropout – Dropout rate probability

  • conv_type – Conv1D layer type

  • residual – Residual connection boolean

  • pool_size – Max pool width

  • norm_type – Apply batch or layer normalization

  • bn_momentum – BatchNorm momentum

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.conv_next(inputs, filters=None, kernel_size=7, activation='relu', dense_expansion=2.0, dilation_rate=1, l2_scale=0, dropout=0, residual=False, pool_size=1, pool_type='max', kernel_initializer='he_normal', padding='same', norm_type=None, bn_momentum=0.99)[source]

Construct a single convolution block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • filters – Conv1D filters

  • kernel_size – Conv1D kernel_size

  • activation – relu/gelu/etc

  • dilation_rate – Conv1D dilation rate

  • l2_scale – L2 regularization weight.

  • dropout – Dropout rate probability

  • residual – Residual connection boolean

  • pool_size – Max pool width

  • bn_momentum – BatchNorm momentum

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.conv_tower(inputs, filters_init, filters_end=None, filters_mult=None, divisible_by=1, repeat=1, reprs=[], **kwargs)[source]

Construct a reducing convolution block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • filters_init – Initial Conv1D filters

  • filters_end – End Conv1D filters

  • filters_mult – Multiplier for Conv1D filters

  • divisible_by – Round filters to be divisible by (eg a power of two)

  • repeat – Tower repetitions

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.conv_tower_nac(inputs, filters_init, filters_end=None, filters_mult=None, divisible_by=1, repeat=1, reprs=[], **kwargs)[source]

Construct a reducing convolution block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • filters_init – Initial Conv1D filters

  • filters_end – End Conv1D filters

  • filters_mult – Multiplier for Conv1D filters

  • divisible_by – Round filters to be divisible by (eg a power of two)

  • repeat – Tower repetitions

  • reprs – Append representations.

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.conv_tower_v1(inputs, filters_init, filters_mult=1, repeat=1, **kwargs)[source]

Construct a reducing convolution block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • filters_init – Initial Conv1D filters

  • filters_mult – Multiplier for Conv1D filters

  • repeat – Conv block repetitions

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.convnext_tower(inputs, filters_init, filters_end=None, filters_mult=None, kernel_size=1, dropout=0, pool_size=2, pool_type='max', divisible_by=1, repeat=1, num_convs=2, reprs=[], **kwargs)[source]

Abc.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • filters_init – Initial Conv1D filters

  • filters_end – End Conv1D filters

  • filters_mult – Multiplier for Conv1D filters

  • kernel_size – Conv1D kernel_size

  • dropout – Dropout on subsequent convolution blocks.

  • pool_size – Pool width.

  • repeat – Residual block repetitions

  • num_convs – Conv blocks per residual layer

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.cropping_2d(inputs, cropping, **kwargs)[source]
baskerville.blocks.dense_block(inputs, units=None, activation='relu', activation_end=None, flatten=False, dropout=0, l2_scale=0, l1_scale=0, residual=False, norm_type=None, bn_momentum=0.99, norm_gamma=None, kernel_initializer='he_normal', **kwargs)[source]

Construct a single convolution block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • units – Conv1D filters

  • activation – relu/gelu/etc

  • activation_end – Compute activation after the other operations

  • flatten – Flatten across positional axis

  • dropout – Dropout rate probability

  • l2_scale – L2 regularization weight.

  • l1_scale – L1 regularization weight.

  • residual – Residual connection boolean

  • batch_norm – Apply batch normalization

  • bn_momentum – BatchNorm momentum

  • norm_gamma – BatchNorm gamma (defaults according to residual)

Returns:

[batch_size, seq_length(?), features] output sequence

baskerville.blocks.dense_nac(inputs, units=None, activation='relu', flatten=False, dropout=0, l2_scale=0, l1_scale=0, residual=False, norm_type=None, bn_momentum=0.99, norm_gamma=None, kernel_initializer='he_normal', **kwargs)[source]

Construct a single convolution block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • units – Conv1D filters

  • activation – relu/gelu/etc

  • activation_end – Compute activation after the other operations

  • flatten – Flatten across positional axis

  • dropout – Dropout rate probability

  • l2_scale – L2 regularization weight.

  • l1_scale – L1 regularization weight.

  • residual – Residual connection boolean

  • batch_norm – Apply batch normalization

  • bn_momentum – BatchNorm momentum

  • norm_gamma – BatchNorm gamma (defaults according to residual)

Returns:

[batch_size, seq_length(?), features] output sequence

baskerville.blocks.dilated_dense(inputs, filters, kernel_size=3, rate_mult=2, conv_type='standard', dropout=0, repeat=1, **kwargs)[source]

Construct a residual dilated dense block.

Args:

Returns:

baskerville.blocks.dilated_residual(inputs, filters, kernel_size=3, rate_mult=2, dropout=0, repeat=1, conv_type='standard', norm_type=None, round=False, **kwargs)[source]

Construct a residual dilated convolution block.

Args:

Returns:

baskerville.blocks.dilated_residual_2d(inputs, filters, kernel_size=3, rate_mult=2, dropout=0, repeat=1, symmetric=True, **kwargs)[source]

Construct a residual dilated convolution block.

baskerville.blocks.dilated_residual_nac(inputs, filters, kernel_size=3, rate_mult=2, dropout=0, repeat=1, **kwargs)[source]

Construct a residual dilated convolution block.

Args:

Returns:

baskerville.blocks.factor_inverse(inputs, components_file, **kwargs)[source]
baskerville.blocks.final(inputs, units, activation='linear', flatten=False, kernel_initializer='he_normal', l2_scale=0, l1_scale=0, **kwargs)[source]

Final simple transformation before comparison to targets.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • units – Dense units

  • activation – relu/gelu/etc

  • flatten – Flatten positional axis.

  • l2_scale – L2 regularization weight.

  • l1_scale – L1 regularization weight.

Returns:

[batch_size, seq_length(?), units] output sequence

baskerville.blocks.global_context(inputs, **kwargs)[source]
baskerville.blocks.one_to_two(inputs, operation='mean', **kwargs)[source]
baskerville.blocks.res_tower(inputs, filters_init, filters_end=None, filters_mult=None, kernel_size=1, dropout=0, pool_size=2, pool_type='max', divisible_by=1, repeat=1, num_convs=2, reprs=[], **kwargs)[source]

Construct a reducing convolution block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • filters_init – Initial Conv1D filters

  • filters_end – End Conv1D filters

  • filters_mult – Multiplier for Conv1D filters

  • kernel_size – Conv1D kernel_size

  • dropout – Dropout on subsequent convolution blocks.

  • pool_size – Pool width.

  • repeat – Residual block repetitions

  • num_convs – Conv blocks per residual layer

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.squeeze_excite(inputs, activation='relu', bottleneck_ratio=8, additive=False, norm_type=None, bn_momentum=0.9, **kwargs)[source]
baskerville.blocks.swin_transformer(inputs, **kwargs)[source]
baskerville.blocks.symmetrize_2d(inputs, **kwargs)[source]
baskerville.blocks.tconv_nac(inputs, filters=None, kernel_size=1, activation='relu', stride=1, l2_scale=0, dropout=0, conv_type='standard', norm_type=None, bn_momentum=0.99, norm_gamma=None, kernel_initializer='he_normal', padding='same')[source]

Construct a single transposed convolution block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • filters – Conv1D filters

  • kernel_size – Conv1D kernel_size

  • activation – relu/gelu/etc

  • stride – UpSample stride

  • l2_scale – L2 regularization weight.

  • dropout – Dropout rate probability

  • conv_type – Conv1D layer type

  • norm_type – Apply batch or layer normalization

  • bn_momentum – BatchNorm momentum

Returns:

[batch_size, stride*seq_length, features] output sequence

baskerville.blocks.transformer(inputs, key_size=None, heads=1, out_size=None, activation='relu', dense_expansion=2.0, content_position_bias=True, dropout=0.25, attention_dropout=0.05, position_dropout=0.01, l2_scale=0, mha_l2_scale=0, num_position_features=None, qkv_width=1, mha_initializer='he_normal', kernel_initializer='he_normal', **kwargs)[source]

Construct a transformer block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • key_size – Conv block repetitions

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.transformer2(inputs, key_size=None, heads=1, out_size=None, activation='relu', num_position_features=None, attention_dropout=0.05, position_dropout=0.01, dropout=0.25, dense_expansion=2.0, qkv_width=1, **kwargs)[source]
Construct a transformer block, with length-wise pooling before

returning to full length.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • key_size – Conv block repetitions

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.transformer_dense(inputs, out_size, dense_expansion, l2_scale, dropout, kernel_initializer)[source]

Transformer block dense portion.

baskerville.blocks.transformer_split(inputs, splits=2, key_size=None, heads=1, out_size=None, activation='relu', dense_expansion=2.0, content_position_bias=True, dropout=0.25, attention_dropout=0.05, position_dropout=0.01, l2_scale=0, mha_l2_scale=0, num_position_features=None, qkv_width=1, mha_initializer='he_normal', kernel_initializer='he_normal', **kwargs)[source]

Construct a transformer block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • key_size – Conv block repetitions

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.transformer_tower(inputs, repeat=2, block_type='transformer', **kwargs)[source]

Construct a tower of repeated transformer blocks.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • repeat – Conv block repetitions

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.unet_concat(inputs, unet_repr, activation='relu', stride=2, l2_scale=0, dropout=0, norm_type=None, bn_momentum=0.99, kernel_size=1, kernel_initializer='he_normal')[source]

Construct a single transposed convolution block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • filters – Conv1D filters

  • kernel_size – Conv1D kernel_size

  • activation – relu/gelu/etc

  • stride – UpSample stride

  • l2_scale – L2 regularization weight.

  • dropout – Dropout rate probability

  • conv_type – Conv1D layer type

  • norm_type – Apply batch or layer normalization

  • bn_momentum – BatchNorm momentum

Returns:

[batch_size, stride*seq_length, features] output sequence

baskerville.blocks.unet_conv(inputs, unet_repr, activation='relu', stride=2, l2_scale=0, dropout=0, norm_type=None, bn_momentum=0.99, kernel_size=1, kernel_initializer='he_normal', upsample_conv=False)[source]

Construct a feature pyramid network block.

Parameters:
  • inputs – [batch_size, seq_length, features] input sequence

  • kernel_size – Conv1D kernel_size

  • activation – relu/gelu/etc

  • stride – UpSample stride

  • l2_scale – L2 regularization weight.

  • dropout – Dropout rate probability

  • norm_type – Apply batch or layer normalization

  • bn_momentum – BatchNorm momentum

  • upsample_conv – Conv1D the upsampled input path

Returns:

[batch_size, seq_length, features] output sequence

baskerville.blocks.upper_tri(inputs, diagonal_offset=2, **kwargs)[source]
baskerville.blocks.wheeze_excite(inputs, pool_size, **kwargs)[source]

baskerville.data module

class baskerville.data.Contig(genome, chr, start, end)

Bases: tuple

chr

Alias for field number 1

end

Alias for field number 3

genome

Alias for field number 0

start

Alias for field number 2

class baskerville.data.ModelSeq(genome, chr, start, end, label)

Bases: tuple

chr

Alias for field number 1

end

Alias for field number 3

genome

Alias for field number 0

label

Alias for field number 4

start

Alias for field number 2

baskerville.data.annotate_unmap(mseqs, unmap_bed, seq_length, pool_width)[source]
Intersect the sequence segments with unmappable regions

and annoate the segments as NaN to possible be ignored.

Parameters:
  • mseqs – list of ModelSeq’s

  • unmap_bed – unmappable regions BED file

  • seq_length – sequence length (after cropping)

  • pool_width – pooled bin width

Returns:

NxL binary NA indicators

Return type:

seqs_unmap

baskerville.data.break_large_contigs(contigs, break_t, verbose=False)[source]

Break large contigs in half until all contigs are under the size threshold.

baskerville.data.contig_sequences(contigs, seq_length, stride, snap=1, label=None)[source]

Break up a list of Contig’s into a list of model length and stride sequence contigs.

baskerville.data.load_chromosomes(genome_file)[source]

Load genome segments from either a FASTA file or chromosome length table.

baskerville.data.rejoin_large_contigs(contigs)[source]

Rejoin large contigs that were broken up before alignment comparison.

baskerville.data.split_contigs(chrom_segments, gaps_file)[source]

Split the assembly up into contigs defined by the gaps.

Parameters:
  • chrom_segments – dict mapping chromosome names to lists of (start,end)

  • gaps_file – file specifying assembly gaps

Returns:

same, with segments broken by the assembly gaps.

Return type:

chrom_segments

baskerville.data.write_seqs_bed(bed_file, seqs, labels=False)[source]

Write sequences to BED file.

baskerville.dataset module

class baskerville.dataset.SeqDataset(data_dir: str, split_label: str, batch_size: int, shuffle_buffer: int = 128, seq_length_crop: int = 0, mode: str = 'eval', tfr_pattern: str | None = None, targets_slice_file: str | None = None)[source]

Bases: object

Labeled sequence dataset for Tensorflow.

Parameters:
  • data_dir (str) – Dataset directory.

  • split_label (str) – Dataset split, e.g. train, valid, test.

  • batch_size (int) – Batch size.

  • shuffle_buffer (int) – Shuffle buffer size. Defaults to 128.

  • seq_length_crop (int) – Sequence length to crop from sides. Defaults to 0.

  • mode (str) – Dataset mode, e.g. train/eval. Defaults to ‘eval’.

  • tfr_pattern (str) – TFRecord pattern to glob. Defaults to split_label.

  • targets_slice_file (str) – Targets table from which to slice a target subset.

batches_per_epoch()[source]

Compute number of batches per epoch.

compute_stats()[source]

Iterate over the TFRecords to count sequences, and infer seq_depth and num_targets.

distribute(strategy)[source]

Wrap Dataset to distribute across devices.

generate_parser(raw: bool = False)[source]

Generate parser function for TFRecordDataset.

make_dataset(cycle_length=4)[source]

Make tf.data.Dataset w/ transformations.

numpy(return_inputs=True, return_outputs=True, step=1, target_slice=None, dtype='float16')[source]

Convert TFR inputs and/or outputs to numpy arrays.

baskerville.dataset.file_to_records(filename: str)[source]

Read TFRecord file into tf.data.Dataset.

baskerville.dataset.make_strand_transform(targets_df, targets_strand_df)[source]

Make a sparse matrix to sum strand pairs.

Parameters:
  • targets_df (pd.DataFrame) – Targets DataFrame.

  • targets_strand_df (pd.DataFrame) – Targets DataFrame, with strand pairs collapsed.

Returns:

Sparse matrix to sum strand pairs.

Return type:

scipy.sparse.dok_matrix

baskerville.dataset.targets_prep_strand(targets_df)[source]

Adjust targets table for merged stranded datasets.

Parameters:

targets_df – pandas DataFrame of targets

Returns:

pandas DataFrame of targets, with stranded

targets collapsed into a single row

Return type:

targets_df

baskerville.dataset.untransform_preds(preds, targets_df, unscale=False, unclip=True)[source]

Undo the squashing transformations performed for the tasks.

Parameters:
  • preds (np.array) – Predictions LxT.

  • targets_df (pd.DataFrame) – Targets information table.

Returns:

Untransformed predictions LxT.

Return type:

preds (np.array)

baskerville.dataset.untransform_preds1(preds, targets_df, unscale=False, unclip=True)[source]

Undo the squashing transformations performed for the tasks.

Parameters:
  • preds (np.array) – Predictions LxT.

  • targets_df (pd.DataFrame) – Targets information table.

Returns:

Untransformed predictions LxT.

Return type:

preds (np.array)

baskerville.dna module

baskerville.dna.dna_1hot(seq: str, seq_len: int | None = None, n_uniform: bool = False, n_sample: bool = False)[source]

Convert a DNA sequence to a 1-hot encoding.

Parameters:
  • seq (str) – DNA sequence.

  • seq_len (int) – length to extend/trim sequences to.

  • n_uniform (bool) – represent N’s as 0.25, forcing float16,

  • n_sample (bool) – sample ACGT for N

Returns:

1-hot encoding of DNA sequence.

Return type:

seq_code (np.array)

baskerville.dna.dna_1hot_index(seq: str, n_sample: bool = False)[source]

Convert a DNA sequence to an index encoding.

Parameters:
  • seq (str) – DNA sequence.

  • n_sample (bool) – sample ACGT for N

Returns:

Index encoding of DNA sequence.

Return type:

seq_code (np.array)

baskerville.dna.dna_rc(seq: str)[source]

Reverse complement a DNA sequence.

Parameters:

seq (str) – DNA sequence.

Returns:

Reverse complement of the input sequence.

baskerville.dna.hot1_augment(Xb, fwdrc: bool = True, shift: int = 0)[source]

Transform a batch of one hot coded sequences to augment training.

Parameters:
  • Xb (np.array) – Batch x Length x 4 one hot coded sequences.

  • fwdrc (bool) – Representing forward versus reverse complement strand.

  • shift (int) – Shift sequences by this many positions.

Returns:

Transformed batch of sequences.

Return type:

Xbt (np.array)

baskerville.dna.hot1_delete(seq_1hot, pos: int, delete_len: int, pad_value=None)[source]
Delete nucleotides starting at a given position

in the Lx4 1-hot encoded sequence.

Parameters:
  • seq_1hot (np.array) – 1-hot encoded sequence.

  • pos (int) – Position to start deleting.

  • delete_len (int) – Number of nucleotides to delete.

  • pad_value (float) – Value to pad the end with.

Returns:

In-place transformed sequence.

Return type:

seq_1hot (np.array)

baskerville.dna.hot1_dna(seqs_1hot)[source]

Convert 1-hot coded sequences to ACGTN.

Parameters:

seq_1hot (np.array) – 1-hot encoded sequences.

Returns:

List of DNA sequences.

Return type:

seqs [str]

baskerville.dna.hot1_get(seqs_1hot, pos: int)[source]
Return the nucleotide corresponding to the one hot coding

of position “pos” in the Lx4 array seqs_1hot.

Parameters:
  • seqs_1hot (np.array) – 1-hot encoded sequences.

  • pos (int) – Position to get nucleotide.

Returns:

Nucleotide.

Return type:

nt (str)

baskerville.dna.hot1_insert(seq_1hot, pos: int, insert_seq: str)[source]

Insert sequence at a given position in the 1-hot encoded sequence.

Parameters:
  • seq_1hot (np.array) – 1-hot encoded sequence.

  • pos (int) – Position to insert sequence.

  • insert_seq (str) – Sequence to insert.

Returns:

In-place transformed sequence.

Return type:

seq_1hot (np.array)

baskerville.dna.hot1_rc(seqs_1hot)[source]
Reverse complement a batch of one hot coded sequences,

while being robust to additional tracks beyond the four nucleotides.

Parameters:

seqs_1hot (np.array) – 1-hot encoded sequences.

Returns:

Reverse complemented sequences.

Return type:

seqs_1hot_rc (np.array)

baskerville.dna.hot1_set(seq_1hot, pos: int, nt: str)[source]

Set position in a 1-hot encoded sequence to given nucleotide.

Parameters:
  • seq_1hot (np.array) – 1-hot encoded sequence.

  • pos (int) – Position to set nucleotide.

  • nt (str) – Nucleotide to set.

Returns:

In-place transformed sequence.

Return type:

seq_1hot (np.array)

baskerville.gene module

class baskerville.gene.Gene(chrom, strand, kv)[source]

Bases: object

Class for managing genes in an isoform-agnostic way, taking the union of exons across isoforms.

add_exon(start, end)[source]

BED 0-indexing assumed.

get_exons()[source]
midpoint()[source]
output_slice(seq_start, seq_len, model_stride, span=False, majority_overlap=False)[source]
span()[source]
class baskerville.gene.GenomicInterval(start, end, chrom=None, strand=None)[source]

Bases: object

class baskerville.gene.Transcriptome(gtf_file)[source]

Bases: object

bedtool_exon()[source]
bedtool_span()[source]
read_gtf(gtf_file)[source]
write_bed_exon(bed_file)[source]
write_bed_span(bed_file)[source]
baskerville.gene.gtf_kv(s)[source]

Convert the last gtf section of key/value pairs into a dict.

baskerville.layers module

class baskerville.layers.CenterAverage(*args, **kwargs)[source]

Bases: Layer

Average the center of the input.

Parameters:

center (int) – Length of the center slice.

call(x)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.CenterSlice(*args, **kwargs)[source]

Bases: Layer

Scale the input by a learned value.

Parameters:
  • axis (int or [int]) – Axis/axes along which to scale.

  • initializer – Initializer for the scale weight.

call(x)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.ConcatDist2D(*args, **kwargs)[source]

Bases: Layer

Concatenate the pairwise distance to 2d feature matrix.

call(inputs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class baskerville.layers.ConcatPosition(*args, **kwargs)[source]

Bases: Layer

Concatenate position to 1d feature vectors.

call(inputs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.EnsembleReverseComplement(*args, **kwargs)[source]

Bases: Layer

Expand tensor to include reverse complement of one hot encoded DNA sequence.

call(seqs_1hot)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class baskerville.layers.EnsembleShift(*args, **kwargs)[source]

Bases: Layer

Expand tensor to include shifts of one hot encoded DNA sequence.

call(seqs_1hot)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.FactorInverse(*args, **kwargs)[source]

Bases: Layer

Inverse a target matrix factorization.

call(W)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.GlobalContext(*args, **kwargs)[source]

Bases: Layer

build(input_shape)[source]

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:

input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(x)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class baskerville.layers.LengthAverage(*args, **kwargs)[source]

Bases: Layer

Average across a variable length sequence.

call(x, seq)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class baskerville.layers.MultiheadAttention(*args, **kwargs)[source]

Bases: Layer

Multi-head attention.

call(inputs, training=False)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.OneToTwo(*args, **kwargs)[source]

Bases: Layer

Transform 1d to 2d with i,j vectors operated on.

call(oned)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.Scale(*args, **kwargs)[source]

Bases: Layer

Scale the input by a learned value.

Parameters:
  • axis (int or [int]) – Axis/axes along which to scale.

  • initializer – Initializer for the scale weight.

build(input_shape)[source]

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:

input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(x)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.SoftmaxPool1D(*args, **kwargs)[source]

Bases: Layer

Pooling operation with optional weights.

build(input_shape)[source]

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:

input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.Softplus(*args, **kwargs)[source]

Bases: Layer

Safe softplus, clipping large values.

call(x)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.SqueezeExcite(*args, **kwargs)[source]

Bases: Layer

build(input_shape)[source]

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:

input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(x)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.StochasticReverseComplement(*args, **kwargs)[source]

Bases: Layer

Stochastically reverse complement a one hot encoded DNA sequence.

call(seq_1hot, training=None)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class baskerville.layers.StochasticShift(*args, **kwargs)[source]

Bases: Layer

Stochastically shift a one hot encoded DNA sequence.

call(seq_1hot, training=None)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.SwitchReverse(*args, **kwargs)[source]

Bases: Layer

Reverse predictions if the inputs were reverse complemented.

call(x_reverse)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.SwitchReverseTriu(*args, **kwargs)[source]

Bases: Layer

call(x_reverse)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.Symmetrize2D(*args, **kwargs)[source]

Bases: Layer

Take the average of a matrix and its transpose to enforce symmetry.

call(x)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class baskerville.layers.UpperTri(*args, **kwargs)[source]

Bases: Layer

Unroll matrix to its upper triangular portion.

call(inputs)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

class baskerville.layers.WheezeExcite(*args, **kwargs)[source]

Bases: Layer

build(input_shape)[source]

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:

input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(x)[source]

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

baskerville.layers.activate(current, activation, verbose=False)[source]
baskerville.layers.positional_features(positions: Tensor, feature_size: int, seq_length: int, symmetric=False)[source]

Compute relative positional encodings/features.

Each positional feature function will compute/provide the same fraction of features, making up the total of feature_size.

Parameters:
  • positions – Tensor of relative positions of arbitrary shape.

  • feature_size – Total number of basis functions.

  • seq_length – Sequence length denoting the characteristic length that the individual positional features can use. This is required since the parametrization of the input features should be independent of positions while it could still require to use the total number of features.

  • symmetric – If True, the resulting features will be symmetric across the relative position of 0 (i.e. only absolute value of positions will matter). If false, then both the symmetric and asymmetric version (symmetric multiplied by sign(positions)) of the features will be used.

Returns:

positions.shape + (feature_size,).

Return type:

Tensor of shape

baskerville.layers.positional_features_central_mask(positions: Tensor, feature_size: int, seq_length: int)[source]

Positional features using a central mask (allow only central features).

baskerville.layers.relative_shift(x)[source]

Shift the relative logits like in TransformerXL.

baskerville.layers.shift_sequence(seq, shift, pad_value=0)[source]

Shift a sequence left or right by shift_amount.

Parameters:
  • seq – [batch_size, seq_length, seq_depth] sequence

  • shift – signed shift value (tf.int32 or int)

  • pad_value – value to fill the padding (primitive or scalar tf.Tensor)

baskerville.metrics module

class baskerville.metrics.MeanSquaredErrorUDot(udot_weight: float = 1, reduction='auto', name: str = 'mse_udot')[source]

Bases: LossFunctionWrapper

Mean squared error with mean-normalized specificity term.

Parameters:

udot_weight – Weight of the mean-normalized specificity term.

class baskerville.metrics.PearsonR(*args, **kwargs)[source]

Bases: Metric

PearsonR metric for multi-task data.

Parameters:
  • num_targets (int) – Number of tasks.

  • summarize (bool) – Whether to summarize over all tasks.

reset_state()[source]

Reset metric state.

result()[source]

Compute PearsonR result from state.

update_state(y_true, y_pred, sample_weight=None)[source]

Update metric state for a batch.

class baskerville.metrics.PoissonKL(kl_weight: int = 1, reduction='auto', name='poisson_kl')[source]

Bases: LossFunctionWrapper

Possion decomposition with KL specificity term.

Parameters:

kl_weight (float) – Weight of the KL specificity term.

class baskerville.metrics.PoissonMultinomial(total_weight: float = 1, weight_range: float = 1, weight_exp: int = 4, reduction='auto', name: str = 'poisson_multinomial')[source]

Bases: LossFunctionWrapper

Possion decomposition with multinomial specificity term.

Parameters:

total_weight (float) – Weight of the Poisson total term.

class baskerville.metrics.R2(*args, **kwargs)[source]

Bases: Metric

R2 metric for multi-task data.

Parameters:
  • num_targets (int) – Number of tasks.

  • summarize (bool) – Whether to summarize over all tasks.

reset_state()[source]

Reset metric state.

result()[source]

Compute R2 result from state.

update_state(y_true, y_pred, sample_weight=None)[source]

Update metric state for a batch.

class baskerville.metrics.SeqAUC(*args, **kwargs)[source]

Bases: AUC

AUC metric for multi-task sequence data.

Parameters:
  • curve (str) – Metric type–‘ROC’ or ‘PR’.

  • summarize (bool) – Whether to summarize over all tasks.

interpolate_pr_auc()[source]

Add option to remove summary.

result()[source]

Add option to remove summary. It’s not clear why, but these metrics_utils == aren’t working for tf2.6 on. I’m hacking a solution to compare the values instead.

update_state(y_true, y_pred, **kwargs)[source]

Flatten sequence length before update.

baskerville.metrics.mean_squared_error_udot(y_true, y_pred, udot_weight: float = 1)[source]

Mean squared error with mean-normalized specificity term.

Parameters:

udot_weight – Weight of the mean-normalized specificity term.

baskerville.metrics.poisson(yt, yp, epsilon: float = 1e-07)[source]

Poisson loss, without mean reduction.

baskerville.metrics.poisson_kl(y_true, y_pred, kl_weight=1, epsilon=1e-07)[source]

Poisson decomposition with KL specificity term.

Parameters:
  • kl_weight (float) – Weight of the KL specificity term.

  • epsilon (float) – Added small value to avoid log(0).

baskerville.metrics.poisson_multinomial(y_true, y_pred, total_weight: float = 1, weight_range: float = 1, weight_exp: int = 4, epsilon: float = 1e-07, rescale: bool = False)[source]

Possion decomposition with multinomial specificity term.

Parameters:
  • total_weight (float) – Weight of the Poisson total term.

  • epsilon (float) – Added small value to avoid log(0).

  • rescale (bool) – Rescale loss after re-weighting.

baskerville.seqnn module

class baskerville.seqnn.SeqNN(params: dict)[source]

Bases: object

Sequence neural network model.

Parameters:

params (dict) – Model specification and parameters.

build_block(current, block_params)[source]

Construct a SeqNN block.

Parameters:
  • current – Current Tensor.

  • block_params (dict) – Block parameters.

Returns:

New current Tensor.

Return type:

current

build_embed(conv_layer_i: int, batch_norm: bool = True)[source]

Build model to embed sequences into specific layer.

build_ensemble(ensemble_rc: bool = False, ensemble_shifts=[0])[source]

Build ensemble of models computing on augmented input sequences.

build_model(save_reprs: bool = True)[source]

Build the model.

build_sad()[source]

Sum across length axis, in graph.

build_slice(target_slice=None, target_sum: bool = False)[source]

Slice and/or sum across tasks, in graph.

downcast(dtype=tf.float16, head_i=None)[source]

Downcast model output type.

evaluate(seq_data, head_i=None, loss_label: str = 'poisson', loss_fn=None)[source]

Evaluate model on SeqDataset.

get_bn_layer(bn_layer_i=0)[source]

Return specified batch normalization layer.

get_conv_layer(conv_layer_i=0)[source]

Return specified convolution layer.

get_conv_weights(conv_layer_i=0)[source]

Return kernel weights for specified convolution layer.

get_dense_layer(layer_i=0)[source]

Return specified dense layer.

gradients(seq_1hot, head_i=None, target_slice=None, pos_slice=None, pos_mask=None, pos_slice_denom=None, pos_mask_denom=None, chunk_size=None, batch_size=1, track_scale=1.0, track_transform=1.0, clip_soft=None, pseudo_count=0.0, no_transform=False, use_mean=False, use_ratio=False, use_logodds=False, subtract_avg=True, input_gate=True, smooth_grad=False, n_samples=5, sample_prob=0.875, dtype='float16')[source]

Compute input gradients for sequences (GPU-friendly).

gradients_func(model, seq_1hot, target_slice, pos_slice, pos_mask=None, pos_slice_denom=None, pos_mask_denom=True, track_scale=1.0, track_transform=1.0, clip_soft=None, pseudo_count=0.0, no_transform=False, use_mean=False, use_ratio=False, use_logodds=False, subtract_avg=True, input_gate=True)[source]
gradients_func_orig(model, seq_1hot, pos_slice)[source]

Compute input gradients for each task.

Parameters:
  • model (tf.keras.Model) – Model to compute gradients for.

  • seq_1hot (tf.Tensor) – 1-hot encoded sequence.

  • pos_slice ([int]) – Sequence positions to consider.

Returns:

Gradients for each task.

Return type:

grads (tf.Tensor)

gradients_orig(seq_1hot, head_i=None, pos_slice=None, batch_size=8, dtype='float16')[source]

Compute input gradients for each task.

Parameters:
  • seq_1hot (np.array) – 1-hot encoded sequence.

  • head_i (int) – Model head index.

  • pos_slice ([int]) – Sequence positions to consider.

  • batch_size (int) – number of tasks to compute gradients for at once.

  • dtype – Returned data type.

Returns:

Gradients for each task.

num_targets(head_i=None)[source]

Return number of targets.

predict(seq_data, head_i: int | None = None, generator: bool = False, stream: bool = False, step: int = 1, dtype: str = 'float32', **kwargs)[source]

Predict targets for SeqDataset, with more options.

Parameters:
  • seq_data (SeqDataset) – Dataset to predict on.

  • head_i (int) – Model head index.

  • generator (bool) – Use generator to predict on dataset.

  • stream (bool) – Stream predictions from dataset.

  • step (int) – Step size.

  • dtype (str) – Data type to return.

predict_transform(seq_1hot: array, targets_df, strand_transform: array | None = None, untransform_old: bool = False)[source]

Predict a single sequence and transform.

Parameters:
  • seq_1hot (np.array) – 1-hot encoded sequence.

  • targets_df (pd.DataFrame) – Targets dataframe.

  • strand_transform (np.array) – Strand merging transform.

  • untransform_old (bool) – Apply old untransform.

restore(model_file, head_i=0, trunk=False)[source]

Restore weights from saved model.

save(model_file, trunk=False)[source]

Save model weights to file.

Parameters:
  • model_file (str) – Path to save model weights.

  • trunk (bool) – Save trunk weights only.

set_defaults()[source]

Set default parameters.

Only necessary for my bespoke parameters. Others are best defaulted closer to the source.

step(step=2, head_i=None)[source]

Create new model to step positions across sequence.

Parameters:
  • step (int) – Step size.

  • head_i (int) – Model head index.

track_sequence(sequence)[source]

Track pooling, striding, and cropping of sequence.

Parameters:

sequence (tf.Tensor) – Sequence input.

baskerville.snps module

class baskerville.snps.GeneSNPCluster[source]

Bases: SNPCluster

add_gene(gene)[source]

Add gene to cluster.

delimit(seq_len, crop=0)[source]

Delimit sequence boundaries.

class baskerville.snps.SNPCluster[source]

Bases: object

add_snp(snp)[source]

Add SNP to cluster.

delimit(seq_len)[source]

Delimit sequence boundaries.

get_1hots(genome_open)[source]

Get list of one hot coded sequences.

baskerville.snps.cluster_genes(transcriptome, seq_length: int, center_pct: float)[source]

Cluster genes into regions that will satisfy the required center_pct.

Parameters:
  • transcriptome (Transcriptome) – Transcriptome object.

  • seq_length (int) – Sequence length.

  • center_pct (float) – Percent of sequence length to cluster genes.

baskerville.snps.cluster_snps(snps, seq_len: int, center_pct: float)[source]
Cluster a sorted list of SNPs into regions that will satisfy

the required center_pct.

Parameters:
  • [SNP] (snps) – List of SNPs.

  • seq_len (int) – Sequence length.

  • center_pct (float) – Percent of sequence length to cluster SNPs.

baskerville.snps.compute_scores(ref_preds, alt_preds, snp_stats, strand_transform=None)[source]

Compute SNP scores from reference and alternative predictions.

Parameters:
  • ref_preds (np.array) – Reference allele predictions.

  • alt_preds (np.array) – Alternative allele predictions.

  • [str] (snp_stats) – List of SAD stats to compute.

  • strand_transform (scipy.sparse) – Strand transform matrix.

baskerville.snps.initialize_output_h5(out_dir, snp_stats, snps, targets_length, targets_df, num_shifts, geneseq_clusters=None)[source]

Initialize an output HDF5 file for SAD stats.

Parameters:
  • out_dir (str) – Output directory.

  • [str] (snp_stats) – List of SAD stats to compute.

  • [SNP] (snps) – List of SNPs.

  • targets_length (int) – Targets’ sequence length

  • targets_df (pd.DataFrame) – Targets DataFrame.

  • num_shifts (int) – Number of shifts.

  • [GeneSNPCluster] (geneseq_clusters) – Gene sequence clusters.

baskerville.snps.make_alt_1hot(ref_1hot, snp_seq_pos, ref_allele, alt_allele)[source]

Return alternative allele one hot coding.

Parameters:
  • ref_1hot (np.array) – Reference allele one hot coding.

  • snp_seq_pos (int) – SNP position in sequence.

  • ref_allele (str) – Reference allele.

  • alt_allele (str) – Alternative allele.

Returns:

Alternative allele one hot coding.

Return type:

np.array

baskerville.snps.make_gene_bedt(genesnp_clusters)[source]

Make a BedTool object for all gene sequences.

baskerville.snps.make_snp_bedt(snps)[source]

Make a BedTool object for all SNPs

baskerville.snps.map_snps_genes(snps, genesnp_clusters)[source]

Map SNPs to gene sequences.

baskerville.snps.score_gene_snps(params_file, model_file, vcf_file, worker_index, options)[source]

Score SNPs in a VCF file with a SeqNN model.

Parameters:
  • params_file – Model parameters

  • model_file – Saved model weights

  • vcf_file – VCF

:param worker_index :param options: options from cmd args :return:

baskerville.snps.score_snps(params_file, model_file, vcf_file, worker_index, options)[source]

Score SNPs in a VCF file with a SeqNN model.

Parameters:
  • params_file – Model parameters

  • model_file – Saved model weights

  • vcf_file – VCF

:param worker_index :param options: options from cmd args :return:

baskerville.snps.stitch_preds(preds, shifts, pos=None)[source]

Stitch indel left and right compensation shifts.

Parameters:
  • [np.array] (preds) – List of predictions.

  • [int] (shifts) – List of shifts.

  • pos (int) – SNP position to stitch at.

baskerville.snps.write_pct(scores_out, snp_stats)[source]

Compute percentile values for each target and write to HDF5.

Parameters:
  • scores_out (h5py.File) – Output HDF5 file.

  • [str] (snp_stats) – List of SAD stats to compute.

baskerville.snps.write_snp(ref_preds_sum, alt_preds_sum, scores_out, si, snp_stats)[source]

Write SNP predictions to HDF, assuming the length dimension has been collapsed.

Parameters:
  • ref_preds_sum (np.array) – Reference allele predictions.

  • alt_preds_sum (np.array) – Alternative allele predictions.

  • scores_out (h5py.File) – Output HDF5 file.

  • si (int) – SNP index.

  • [str] (snp_stats) – List of SAD stats to compute.

baskerville.trainer module

class baskerville.trainer.Cyclical1LearningRate(initial_learning_rate: float, maximal_learning_rate: float, final_learning_rate: float, step_size, name: str = 'Cyclical1LearningRate')[source]

Bases: LearningRateSchedule

A LearningRateSchedule that uses cyclical schedule. https://yashuseth.blog/2018/11/26/hyper-parameter-tuning-best-practices-learning-rate-batch-size-momentum-weight-decay/

Parameters:
  • initial_learning_rate (float) – The initial learning rate.

  • maximal_learning_rate (float) – The maximal learning rate after warm up.

  • final_learning_rate (float) – The final learning rate after cycle.

  • step_size (int) – Cycle step size.

  • name (str, optional) – The name of the schedule. Defaults to “Cyclical1LearningRate”.

get_config()[source]
class baskerville.trainer.EarlyStoppingMin(min_epoch: int = 0, **kwargs)[source]

Bases: EarlyStopping

Stop training when a monitored quantity has stopped improving.

Parameters:

min_epoch – Minimum number of epochs before considering stopping.

on_epoch_end(epoch, logs=None)[source]

Called at the end of an epoch.

Subclasses should override for any actions to run. This function should only be called during TRAIN mode.

Parameters:
  • epoch – Integer, index of epoch.

  • logs – Dict, metric results for this training epoch, and for the validation epoch if validation is performed. Validation result keys are prefixed with val_. For training epoch, the values of the Model’s metrics are returned. Example: {‘loss’: 0.2, ‘accuracy’: 0.7}.

class baskerville.trainer.Trainer(params: dict, train_data, eval_data, out_dir: str, log_dir: str, strategy=None, num_gpu: int = 1, keras_fit: bool = False)[source]

Bases: object

Model training class.

Parameters:
  • params (dict) – Training parameters dictionary.

  • train_data – Dataset object or list of Dataset objects.

  • eval_data – Dataset object or list of Dataset objects.

  • out_dir (str) – Output directory name.

  • strategy – tf.distribute.Strategy object.

  • num_gpu (int) – Number of GPUs to use. Default: 1.

  • keras_fit (bool) – Use Keras fit method instead of custom loop.

compile(seqnn_model)[source]
fit2(seqnn_model)[source]

Train the model using a custom loop for two separate datasets.

fit_keras(seqnn_model)[source]
fit_tape(seqnn_model)[source]

Train the model using a custom tf.GradientTape loop.

make_optimizer()[source]

Make optimizer object from given parameters.

class baskerville.trainer.WarmUp(initial_learning_rate: float, warmup_steps: int, decay_schedule: None, power: float = 1.0, name: str | None = None)[source]

Bases: LearningRateSchedule

Applies a warmup schedule on a given learning rate decay schedule. (h/t HuggingFace.)

Parameters:
  • initial_learning_rate (float) – Initial learning rate after the warmup (so this will be the learning rate at the end of the warmup).

  • decay_schedule (Callable) – The learning rate or schedule function to apply after the warmup for the rest of training.

  • warmup_steps (int) – The number of steps for the warmup part of training.

  • power (float, optional) – Power to use for the polynomial warmup (defaults is a linear warmup).

  • name (str, optional) – Optional name prefix for the returned tensors during the schedule.

get_config()[source]
baskerville.trainer.adaptive_clip_grad(parameters, gradients, clip_factor: float = 0.1, eps: float = 0.001)[source]

Adaptive gradient clipping.

baskerville.trainer.compute_norm(x, axis, keepdims)[source]

Compute L2 norm of a tensor across an axis.

baskerville.trainer.parse_loss(loss_label, strategy=None, keras_fit: bool = True, spec_weight: float = 1, total_weight: float = 1, weight_range: float = 1, weight_exp: int = 1)[source]

Parse loss function from label, strategy, and fitting method.

Parameters:
  • loss_label (str) – Loss function label.

  • strategy – tf.distribute.Strategy object.

  • keras_fit (bool) – Use Keras fit method instead of custom loop.

  • spec_weight (float) – Specificity weight for PoissonKL.

  • total_weight (float) – Total weight for PoissionMultinomial.

Returns:

tf.keras.losses.Loss object.

Return type:

loss_fn

baskerville.trainer.safe_next(data_iter, retry=5, sleep=10)[source]
baskerville.trainer.unitwise_norm(x)[source]

Compute L2 norm of a tensor across its last dimension.

baskerville.vcf module

class baskerville.vcf.SNP(vcf_line, pos2=False)[source]

Bases: object

Represent SNPs read in from a VCF file

vcf_line
Type:

str

flip_alleles()[source]

Flip reference and first alt allele.

get_alleles()[source]

Return a list of all alleles

indel_size()[source]

Return the size of the indel.

longest_alt()[source]

Return the longest alt allele.

baskerville.vcf.cap_allele(allele, cap=5)[source]

Cap the length of an allele in the figures.

baskerville.vcf.dna_length_1hot(seq, length)[source]

Adjust the sequence length and compute a 1hot coding.

baskerville.vcf.intersect_seqs_snps(vcf_file, seqs, vision_p=1)[source]

Intersect a VCF file with a list of sequence coordinates.

In

vcf_file: seqs: list of objects w/ chrom, start, end vision_p: proportion of sequences visible to center genes.

Out

seqs_snps: list of list mapping segment indexes to overlapping SNP indexes

baskerville.vcf.intersect_snps_seqs(vcf_file, seq_coords, vision_p=1)[source]

Intersect a VCF file with a list of sequence coordinates.

In

vcf_file: seq_coords: list of sequence coordinates vision_p: proportion of sequences visible to center genes.

Out

snp_segs: list of list mapping SNP indexes to overlapping sequence indexes

baskerville.vcf.snp_seq1(snp, seq_len, genome_open)[source]

Produce one hot coded sequences for a SNP.

Attrs:

snp [SNP] : seq_len (int) : sequence length to code genome_open (File) : open genome FASTA file

Returns:

list of one hot coded sequences surrounding the SNP

Return type:

seq_vecs_list [array]

baskerville.vcf.snps2_seq1(snps, seq_len, genome1_fasta, genome2_fasta, return_seqs=False)[source]

Produce an array of one hot coded sequences for a list of SNPs.

Attrs:

snps [SNP] : list of SNPs seq_len (int) : sequence length to code genome_fasta (str) : major allele genome FASTA file genome2_fasta (str) : minor allele genome FASTA file

Returns:

one hot coded sequences surrounding the SNPs seq_headers [str] : headers for sequences seq_snps [SNP] : list of used SNPs

Return type:

seq_vecs (array)

baskerville.vcf.snps_seq1(snps, seq_len, genome_fasta, return_seqs=False)[source]

Produce an array of one hot coded sequences for a list of SNPs.

Attrs:

snps [SNP] : list of SNPs seq_len (int) : sequence length to code genome_fasta (str) : genome FASTA file

Returns:

one hot coded sequences surrounding the SNPs seq_headers [str] : headers for sequences seq_snps [SNP] : list of used SNPs

Return type:

seq_vecs (array)

baskerville.vcf.vcf_count(vcf_file)[source]

Count SNPs in a VCF file

baskerville.vcf.vcf_snps(vcf_file, require_sorted=False, validate_ref_fasta=None, flip_ref=False, pos2=False, start_i=None, end_i=None)[source]

Load SNPs from a VCF file

baskerville.vcf.vcf_sort(vcf_file)[source]

Module contents