cogent3.core.moltype.MolType#

class MolType(name: str, monomers: TStrOrBytes, make_seq: type[c3_sequence.Sequence] | str, gap: str | None = '-', missing: str | None = '?', complements: dict[str, str] | None = None, ambiguities: dict[str, frozenset[str]] | None = None, colors: dict[str, str] | None = None, pairing_rules: dict[frozenset[str], bool] | None = None, mw_calculator: WeightCalculator | None = None, coerce_to: Callable[[bytes], bytes] | None = None)#

MolType handles operations that depend on the sequence type.

Attributes:
alphabet

monomers

degen_alphabet

monomers + ambiguous characters

degen_gapped_alphabet

monomers + gap + ambiguous characters

gapped_alphabet

monomers + gap

gapped_missing_alphabet

monomers + gap

gaps
is_nucleic

is a nucleic acid moltype

label

synonym for name

matching_rules

Methods

can_match(first, second)

Returns True if every pos in 1st could match same pos in 2nd.

can_mispair(first, second)

Returns True if any position in self could mispair with other.

complement(-> str  -> bytes)

converts a string or bytes into it's nucleic acid complement

count_degenerate(seq[, validate])

returns the number of degenerate characters in a sequence

count_gaps(seq)

returns the number of gap characters in a sequence

count_variants(seq)

Counts number of possible sequences matching the sequence, given any ambiguous characters in the sequence.

degap(-> str  -> bytes)

removes all gap and missing characters from a sequence

degenerate_from_seq(seq)

Returns least degenerate symbol that encompasses a set of characters

disambiguate(-> str  -> bytes)

Returns a non-degenerate sequence from a degenerate one.

get_css_style([colors, font_size, font_family])

returns string of CSS classes and {character: <CSS class name>, ...}

get_degenerate_positions(seq[, include_gap, ...])

Return list of position indexs of degenerate characters in the sequence.

has_ambiguity(seq[, validate])

whether sequence has an ambiguity character

is_ambiguity(query_motif[, validate])

Return True if querymotif is an amibiguity character in alphabet.

is_compatible_alphabet(alphabet[, strict])

checks that characters in alphabet are equal to a bound alphabet

is_degenerate(seq[, validate])

checks if a sequence contains degenerate characters

is_gapped(seq[, validate])

checks if a sequence contains gaps

is_valid(seq)

checks against most degenerate alphabet

iter_alphabets()

yield alphabets in order of most to least degenerate

make_seq(*, seq[, name, check_seq])

creates a Sequence object corresponding to the molecular type of this instance.

most_degen_alphabet()

returns the most degenerate alphabet for this instance

mw(seq[, method, delta])

Returns the molecular weight of the sequence.

random_disambiguate(-> str  -> bytes)

disambiguates a sequence by randomly selecting a non-degenerate character

rc(-> str  -> bytes)

reverse reverse complement of a sequence

resolve_ambiguity(ambig_motif[, alphabet, ...])

Returns tuple of all possible canonical characters corresponding to ambig_motif

strand_symmetric_motifs([motif_length])

returns ordered pairs of strand complementary motifs

strip_bad(-> str)

Removes any symbols not in the alphabet.

strip_bad_and_gaps(-> str)

Removes any symbols not in the alphabet, and any gaps.

strip_degenerate(-> str  -> bytes)

removes degenerate characters

to_json()

returns result of json formatted string

to_regex(seq)

returns a regex pattern with ambiguities expanded to a character set

to_rich_dict(**kwargs)

returns dict suitable for serialisation

can_pair

Notes

The only way to create sequences is via a MolType instance. The instance defines different alphabets that are used for data conversions. Create a moltype using the get_moltype() function.