baskerville.scripts package
Submodules
baskerville.scripts.hound_data module
- baskerville.scripts.hound_data.curate_peaks(targets_df, out_dir, pool_width, crop_bp)[source]
Merge all peaks, round to nearest pool_width, and add cropped bp.
- baskerville.scripts.hound_data.divide_contigs_chr(contigs, test_chrs, valid_chrs)[source]
Divide list of contigs into train/valid/test lists by chromosome.
- baskerville.scripts.hound_data.divide_contigs_folds(contigs, folds)[source]
Divide list of contigs into cross fold lists.
- baskerville.scripts.hound_data.divide_contigs_pct(contigs, test_pct, valid_pct, pct_abstain=0.2)[source]
Divide list of contigs into train/valid/test lists, aiming for the specified nucleotide percentages.
baskerville.scripts.hound_data_align module
- class baskerville.scripts.hound_data_align.GraphSeq(genome, net, chr, start, end)
Bases:
tuple
- chr
Alias for field number 2
- end
Alias for field number 4
- genome
Alias for field number 0
- net
Alias for field number 1
- start
Alias for field number 3
- baskerville.scripts.hound_data_align.break_large_contigs(contigs, break_t, verbose=False)[source]
Break large contigs in half until all contigs are under the size threshold.
- baskerville.scripts.hound_data_align.connect_contigs(contigs, align_net_file, net_fill_min, net_olap_min, out_dir, genome_out_dirs)[source]
Connect contigs across genomes by forming a graph that includes net format aligning regions and contigs. Compute contig components as connected components of that graph.
- baskerville.scripts.hound_data_align.contig_stats_genome(contigs)[source]
Compute contig statistics within each genome.
- baskerville.scripts.hound_data_align.divide_components_folds(contig_components, folds)[source]
Divide contig connected components into cross fold lists.
- baskerville.scripts.hound_data_align.divide_components_pct(contig_components, test_pct, valid_pct, pct_abstain=0.5)[source]
Divide contig connected components into train/valid/test, and aiming for the specified nucleotide percentages.
- baskerville.scripts.hound_data_align.intersect_contigs_nets(graph_contigs_nets, genome_i, out_dir, genome_out_dir, min_olap=128)[source]
Intersect the contigs and nets from genome_i, adding the overlaps as edges to graph_contigs_nets.
- baskerville.scripts.hound_data_align.make_net_graph(align_net_file, net_fill_min, out_dir)[source]
Construct a Graph with aligned net intervals connected by edges.
- baskerville.scripts.hound_data_align.quantify_leakage(align_net_file, train_contigs, valid_contigs, test_contigs, out_dir)[source]
Quanitfy the leakage across sequence sets.
baskerville.scripts.hound_data_read module
baskerville.scripts.hound_data_write module
- baskerville.scripts.hound_data_write.feature_bytes(values)[source]
Convert numpy arrays to bytes features.
- baskerville.scripts.hound_data_write.feature_floats(values)[source]
Convert numpy arrays to floats features. Requires more space than bytes for float16
- baskerville.scripts.hound_data_write.fetch_dna(fasta_open, chrm, start, end)[source]
Fetch DNA when start/end may reach beyond chromosomes.
baskerville.scripts.hound_eval module
baskerville.scripts.hound_eval_spec module
baskerville.scripts.hound_ism_bed module
baskerville.scripts.hound_ism_snp module
baskerville.scripts.hound_predbed module
- baskerville.scripts.hound_predbed.bigwig_open(bw_file, genome_file)[source]
Open the bigwig file for writing and write the header.
- baskerville.scripts.hound_predbed.bigwig_write(signal, seq_coords, bw_file, genome_file, seq_crop=0)[source]
- Write a signal track to a BigWig file over the region
specified by seqs_coords.
- Args
signal: Sequences x Length signal array seq_coords: (chr,start,end) bw_file: BigWig filename genome_file: Chromosome lengths file seq_crop: Sequence length cropped from each side of the sequence.