Module eremitalpa.eremitalpa
Functions
def add_sequences_to_tree(tree: dendropy.datamodel.treemodel._tree.Tree,
path: str,
labeller:| None = None,
seq_formatter:| None = None) ‑> None -
Add sequences to leaves inplace.
Args
tree- Dendropy tree.
path- Path to multiple sequence alignment. Taxon labels in tree must match the fasta description.
labeller-
Function that takes a FASTA description and returns the name of associated taxon in the tree. For instance, a full fasta description might look like:
>cdsUUB77424 A/Maryland/12786/2022 2022/04/07 HARAxML would call this taxon just 'cdsUUB77424'. So the callable would have to be something like: lambda x: x.split()[0]
seq_formatter
def color_stack(tree: Tree,
values: dict[str, typing.Any],
color_dict: dict[str, str],
default_color: str | None = None,
x: float = 0,
ax: matplotlib.axes._axes.Axes | None = None,
leg_kwds: dict | None = None) ‑> tuple[matplotlib.axes._axes.Axes, matplotlib.legend.Legend]-
A stack of colored patches that can be plotted adjacent to a tree to show how values vary on the tree leaves.
Must have called eremitalpa.compute_layout on the tree in order to know y values for leaves (done anyway by eremitalpa.plot_tree).
Args
tree- The tree to be plotted next to.
values- Maps taxon labels to values to be plotted.
color_dict- Maps values to colors.
default_color- Color to use for values missing from color_dict.
x- The x value to plot the stack at.
ax- Matplotlib ax
def compare_trees(left,
right,
gap=0.1,
x0=0,
connect_kwds={},
extend_kwds={},
extend_every=10,
left_kwds={},
right_kwds={},
connect_colors={},
extend_colors={})-
Plot two phylogenies side by side, and join the same taxa in each tree.
Args
- left (dendropy Tree)
- right (dendropy Tree)
gap:float- Space between the two trees.
x0:float- The x coordinate of the root of the left hand tree.
connect_kwds:dict- Keywords passed to matplotlib LineCollection. These are used for the lines that connect matching taxa.
extend_kwds:dict- Keywords passed to matplotlib LineCollection. These are used for lines that connect taxa to the connection lines.
extend_every:n- Draw branch extension lines every n leaves.
left_kwds:dict- Passed to plot_tree for the left tree.
right_kwds:dict- Passed to plot_tree for the right tree.
connect_colors:dictorCallable- Maps taxon labels to colors. Ignored if 'colors' is used in connect_kwds.
extend_colors:dictorCallable- Maps taxon labels to colors. Ignored if 'colors' is used in extend_kwds.
Returns
(2-tuple) containing dendropy Trees with _x and _y plot locations on nodes.
def compute_tree_layout(tree: dendropy.datamodel.treemodel._tree.Tree,
has_brlens: bool = True,
copy: bool = False,
round_brlens: int | None = None) ‑> dendropy.datamodel.treemodel._tree.Tree-
Computes layout parameters for a tree.
Each node gets _x and _y values. The tree gets _xlim and _ylim values (tuples).
Args
tree:dp.Tree- The tree to lay out.
has_brlens:bool- Whether the tree has branch lengths.
copy:bool- If True, a fresh copy of the tree is made.
round_brlens:int, optional- The number of digits to round branch lengths to. Defaults to None.
Returns
dp.Tree- The tree with layout parameters.
def deepest_leaf(tree, attr='_x')-
Find the deepest leaf node in the tree.
Args
- tree (dendropy Tree)
attr:str- Either _x or _y. Gets node with max attribute.
Returns
dendropy Node
def estimate_unit_branch_length(branch_lengths: list[float], min_diff: float = 1e-06, round_to: int = 6) ‑> float-
Estimates the fundamental unit length in a set of phylogenetic branch lengths. Assumes that branch lengths occur in approximate integer multiples of a small unit. Algorithm:
1. Compute all pairwise absolute differences between branch lengths. 2. Construct a histogram of these differences with an adaptive bin size. 3. Identify the most common small difference (mode of the histogram), which represents the estimated unit length.Args
branch_lengths:listornp.array- A list of branch lengths.
min_diff:float- Minimum difference between branch lengths to consider. Branch length differences smaller than this value are considered to be zero.
round_to:int- Round branch_lengths to this many decimal places.
Returns
float- Estimated fundamental unit length.
def get_label(node: dendropy.datamodel.treemodel._node.Node)-
Gets the label of a node.
If the node itself has a label, that is returned. Otherwise, the label of the node's taxon is returned.
Args
node:dp.Node- The node to get the label from.
Returns
str- The label of the node.
def get_trunk(tree, attr='_x')-
Ordered nodes in tree, from deepest leaf to root.
Args
tree (dendropy Tree) attr (str)
Returns
tuple containing dendropy Nodes
def node_x_y(nodes: Iterable[dendropy.datamodel.treemodel._node.Node],
jitter_x: float | None = None) ‑> tuple[tuple, tuple]-
Gets the x and y coordinates of nodes.
Args
nodes:Iterable[dp.Node]- An iterable of dendropy Node objects.
jitter_x:float, optional- The amount of jitter to add to the x coordinates. X is jittered by a quarter of this value in both directions. Defaults to None.
Returns
tuple[tuple, tuple]- A tuple containing two tuples: one for x coordinates and one for y coordinates.
def node_x_y_from_taxon_label(tree: Tree,
taxon_label: str) ‑> tuple[float, float]-
Finds the x and y attributes of a node from its taxon label.
Args
tree:Tree- The tree to search in.
taxon_label:str- The taxon label of the node.
Returns
tuple[float, float]- The x and y coordinates of the node.
def plot_leaves_with_labels(tree: dendropy.datamodel.treemodel._tree.Tree,
labels: list[str],
ax: matplotlib.axes._axes.Axes = None,
**kwds)-
Plots leaves that have taxon labels in a given list.
Args
tree:dp.Tree- The tree to plot.
labels:list[str]- A list of taxon labels to plot.
ax:mp.axes.Axes, optional- The matplotlib axes to plot on. Defaults to None.
**kwds- Additional keyword arguments passed to plt.scatter.
def plot_path_to_taxon(tree: dendropy.datamodel.treemodel._tree.Tree | Tree,
taxon_label: str,
ax: matplotlib.axes._axes.Axes | None = None,
label_taxon: bool = True,
label_kwds: dict | None = None,
**kwds) ‑> matplotlib.collections.LineCollection-
Plots the path from the root to a given taxon.
Args
tree:dp.Tree | Tree- The tree to plot.
taxon_label:str- The taxon label of the node to plot the path to.
ax:mp.axes.Axes, optional- The matplotlib axes to plot on.
label_taxon:bool- If True, label the taxon at the end of the path.
label_kwds:dict, optional- Keyword arguments passed to plt.text.
Returns
mp.collections.LineCollection
def plot_subs_on_tree(tree: dendropy.datamodel.treemodel._tree.Tree,
sequences: dict[str, str],
exclude_leaves: bool = True,
on_path_to_taxon: str | None = None,
site_offset: int = 0,
ignore_chars: str = 'X-',
arrow_length: float = 40,
arrow_facecolor: str = 'black',
fontsize: float = 6,
xytext_transform: tuple[float, float] = (1.0, 1.0),
**kwds) ‑> collections.Counter-
Plots substitutions on a tree.
This function plots substitutions on the tree by finding substitutions between each node and its parent node. The substitutions are then plotted at the midpoint of the edge between the node and its parent.
Args
tree:dp.Tree- The tree to annotate.
sequences:dict[str, str]- A mapping of node labels to sequences.
exclude_leaves:bool- If True, exclude leaves from substitution plotting.
on_path_to_taxon:str, optional- If provided, only plot substitutions on the path from the root to this taxon. Defaults to None.
site_offset:int- Value to add to substitution site numbers.
ignore_chars:str- Substitutions involving these characters will not be shown.
arrow_length:float- The length of the arrow pointing to the mutation.
arrow_facecolor:str- The face color of the arrow.
fontsize:float- The font size of the text.
xytext_transform:tuple[float, float]- Multipliers for the xytext offsets.
**kwds- Other keyword arguments passed to plt.annotate.
Returns
Counter- A counter of the number of times each substitution appears in the tree.
def plot_tree(tree: dendropy.datamodel.treemodel._tree.Tree | Tree,
has_brlens: bool = True,
edge_kwds: dict = {'color': 'black', 'linewidth': 0.5, 'clip_on': False, 'capstyle': 'round', 'zorder': 10},
leaf_kwds: dict = {'zorder': 15, 'color': 'black', 's': 0, 'marker': 'o', 'edgecolor': 'white', 'lw': 0.1, 'clip_on': False},
internal_kwds: dict = {'zorder': 12, 'color': 'black', 's': 0, 'marker': 'o', 'edgecolor': 'white', 'lw': 0.1, 'clip_on': False},
ax: matplotlib.axes._axes.Axes = None,
labels: Iterable[str] | Literal['all'] | None = None,
label_kwds: dict = {'horizontalalignment': 'left', 'verticalalignment': 'center', 'fontsize': 8, 'zorder': 15},
label_x_offset: float = 0.0,
compute_layout: bool = True,
fill_dotted_lines: bool = False,
round_brlens: int | None = None,
color_leaves_by_site_aa: int | None = None,
hide_aa: str | None = None,
color_internal_nodes_by_site_aa: int | None = None,
sequences: dict[str, str] | None = None,
jitter_x: float | str | None = None,
scale_bar: bool | None = True,
scale_bar_x_start: float = 0.0) ‑> matplotlib.axes._axes.Axes-
Plots a dendropy tree object.
Tree nodes are plotted in their current order. To ladderize, call tree.ladderize() before plotting.
Args
tree:dp.Tree | Tree- The tree to plot.
has_brlens:bool- If False, all branch lengths are plotted as 1.
edge_kwds:dict- Keyword arguments for edges, passed to matplotlib.collections.LineCollection.
leaf_kwds:dict- Keyword arguments for leaves, passed to ax.scatter.
label_kwds:dict- Keyword arguments passed to plt.text.
internal_kwds:dict- Keyword arguments for internal nodes, passed to ax.scatter.
ax:mp.axes.Axes, optional- The matplotlib axes to plot on. Defaults to None.
- labels (Optional[Union[Iterable[str], Literal["all"]]]): Taxon labels
- to annotate, or "all".
label_kwds:dict- Keyword arguments passed to plt.text.
leaf_label_x_offset:float- Amount to offset leaf labels in the x direction.
compute_layout:bool- If True, compute the layout. If False, assumes the tree nodes already have _x and _y attributes.
fill_dotted_lines:bool- If True, show dotted lines from leaves to the right-hand edge of the tree.
round_brlens:int, optional- The number of decimal places to round
branch lengths to. Passed to
compute_tree_layout(). color_leaves_by_site_aa:int, optional- Color leaves by the amino
acid at this site (1-based). Overwrites 'c' in
leaf_kwds. Requiressequences. hide_aa:str, optional- A string of amino acids to hide when coloring by site.
color_internal_nodes_by_site_aa:int, optional- Same as
color_leaves_by_site_aabut for internal nodes. sequences:dict[str, str], optional- A mapping of taxon labels to sequences. Required for coloring by site.
jitter_x:float | str, optional- Amount of noise to add to the x value of leaves to avoid overplotting. Can be a float or 'auto'.
scale_bar:bool- If True, show a scale bar.
scale_bar_x_start:float- The leftmost x position of the scale bar.
Returns
mp.axes.Axes- The matplotlib axes with the plotted tree. The tree object is returned with added attributes: _xlim, _ylim, and _x, _y on each node.
def plot_tree_interactive(tree: dendropy.datamodel.treemodel._tree.Tree,
has_brlens: bool = True,
leaf_colors: dict | None = None,
default_leaf_color: str = 'black',
leaf_sizes: dict | None = None,
default_leaf_size: int = 5)-
Plots a dendropy tree object interactively using plotly.
Args
tree:dp.Tree- The tree to plot.
has_brlens:bool- If False, all branch lengths are plotted as 1.
leaf_colors:dict, optional- A dictionary mapping taxon labels to colors.
default_leaf_color:str- The default color for taxa not in
leaf_colors. leaf_sizes:dict, optional- A dictionary mapping taxon labels to sizes.
default_leaf_size:int- The default size for taxa not in
leaf_sizes.
def plot_tree_with_subplots(tree: Tree,
aa_seqs: dict,
site: int,
subplot_taxa_shifts: dict[str, tuple[float, float]],
fun: Callable,
fun_kwds: dict | None = None,
subplot_width: float = 0.2,
subplot_height: float = 0.1,
figsize: tuple[float, float] = (8, 12),
sharex: bool = True,
sharey: bool = True,
snap_x: float | None = None,
snap_y: float | None = None,
arrow_origins: dict[str, tuple[float, float]] | None = None,
**kwds) ‑> matplotlib.axes._axes.Axes-
Plot a phylogeny tree with subplots for specified taxa.
This function draws a phylogeny based on a given tree and amino acid sequences. It colors leaves (and internal nodes) according to their amino acid at a specified site, and attaches additional subplots at user-defined nodes for further custom visualization.
Args
tree- eremitalpa.Tree The phylogenetic tree to be plotted.
aa_seqs- dict A dictionary containing amino acid sequences for each taxon. Keys should match the node names in the tree, and values should be the sequences.
site- int The site (1-based) to color the tree's leaves and internal nodes.
subplot_taxa_shifts- dict of str -> tuple of float A mapping from taxon names to tuples (dx, dy). These values control the position of the subplot axes relative to their respective nodes. Uses axes coordinates (i.e. a value of 1 would shift an entire ax worth of distance).
fun- Callable A callable function to generate each subplot. Must accept the current taxon as the first argument and an axes object as the second argument.
fun_kwds- dict
A dictionary of additional keyword arguments passed to the subplot function
fun. subplot_width- float, optional The width of each subplot in figure coordinates, by default 0.2.
subplot_height- float, optional The height of each subplot in figure coordinates, by default 0.1.
figsize- tuple of float, optional The overall size of the figure, by default (8, 12).
sharex- bool, Have the sub axes share x-axes.
sharey- bool, Have the sub axes share y-axes.
snap_x- float, Snap x position of the subplots to a grid. This argument sets the grid size.
snap_y- float, Snap y position of the subplots to a grid. This argument sets the grid size.
arrow_origins- dict of str -> tuple of float. Pass the axes coordinates of each subplot for where its arrow should originate. By default arrows originate from the center.
**kwds- Passed to plot_tree.
Returns
2-tuple containing: matplotlib.axes.Axes - the main axes. dict [str, matplotlib.axes.Axes] containing sub plots.
def prune_nodes_with_labels(tree, labels)-
Prune nodes from tree that have a taxon label in labels.
Args
tree (dendropy Tree) labels (iterable containing str)
Returns
(dendropy Tree)
def read_iqtree_ancestral_states(state_file,
partition_names: list[str] | None = None,
translate_nt: bool = False) ‑> dict[slice(, dict[str, str], None)] | dict[slice( , , None)] -
Read an ancestral state file generated by IQ-TREE. If the file contains multiple partitions (i.e. a 'Part' column is present), then return a dict of dicts containing sequences accessed by [partition][node]. Otherwise return a dict of sequences accessed by node.
Args
state_file- Path to .state file generated by iqtree –ancestral
partition_names- Partitions are numbered from 1 in the .state file. Pass names for each segment (i.e. the order that partition_names appear in the partitions). Only takes effect if multiple partitions are present.
translate_nt- If ancestral states are nucleotide sequences then translate them.
Returns
dict of dicts that maps [node][partition] -> sequence, or dict that maps node -> sequence.
def read_raxml_ancestral_sequences(tree, node_labelled_tree, ancestral_seqs, leaf_seqs=None)-
Read a tree and ancestral sequences estimated by RAxML.
RAxML can estimate marginal ancestral sequences for internal nodes on a tree using a call like:
raxmlHPC -f A -t {treeFile} -s {sequenceFile} -m {model} -n {name}The analysis outputs several files:
- RAxML_nodeLabelledRootedTree.{name} contains a copy of the input tree where all internal nodes have a unique identifier {id}.
- RAxML_marginalAncestralStates.{name} contains the ancestral sequence for each internal node. The format of each line is '{id} {sequence}'
- RAxML_marginalAncestralProbabilities.{name} contains probabilities of each base at each site for each internal node. (Not used by this function.)
Notes
Developed with output from RAxML version 8.2.12.
Args
tree:str- Path to original input tree ({treeFile}).
node_labelled_tree:str- Path to the tree with node labels. (RAxML_nodeLabelledRootedTree.{name})
ancestral_seqs:str- Path to file containing the ancestral sequences. (RAxML_marginalAncestralStates.{name})
leaf_seqs:str- (Optional) path to fasta file containing leaf sequences. ({sequenceFile}). If this is provided, also attach sequences to leaf nodes.
Returns
(dendropy Tree) with sequences attached to nodes. Sequences are attached as 'sequence' attributes on Nodes.
def snap(value: float, grid_size: float) ‑> float-
Find the closest point on a grid for a value.
def sorted_leaf_labels(node)-
Tuple containing the sorted leaf labels of a node.
def taxon_in_node_label(label, node)-
Checks if a node has a matching taxon label.
Args
label:str- The label to check for.
node:dp.Node- The node to check.
Returns
bool- True if the node's taxon label matches, False otherwise.
def taxon_in_node_labels(labels, node)-
Checks if a node's taxon label is in a set of labels.
Args
labels:iterable- A collection of labels to check against.
node:dp.Node- The node to check.
Returns
bool- True if the node's taxon label is in the labels, False otherwise.
Classes
class Column (site, aas)-
Column(site, aas)
Ancestors
- builtins.tuple
Instance variables
var aas-
Alias for field number 1
var site-
Alias for field number 0
class MultipleSequenceAlignment (records, alphabet=None, annotations=None, column_annotations=None)-
Represents a classical multiple sequence alignment (MSA).
By this we mean a collection of sequences (usually shown as rows) which are all the same length (usually with gap characters for insertions or padding). The data can then be regarded as a matrix of letters, with well defined columns.
You would typically create an MSA by loading an alignment file with the AlignIO module:
>>> from Bio import AlignIO >>> align = AlignIO.read("Clustalw/opuntia.aln", "clustal") >>> print(align) Alignment with 7 rows and 156 columns TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273285|gb|AF191659.1|AF191 TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273284|gb|AF191658.1|AF191 TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273287|gb|AF191661.1|AF191 TATACATAAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273286|gb|AF191660.1|AF191 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273290|gb|AF191664.1|AF191 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273289|gb|AF191663.1|AF191 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA gi|6273291|gb|AF191665.1|AF191In some respects you can treat these objects as lists of SeqRecord objects, each representing a row of the alignment. Iterating over an alignment gives the SeqRecord object for each row:
>>> len(align) 7 >>> for record in align: ... print("%s %i" % (record.id, len(record))) ... gi|6273285|gb|AF191659.1|AF191 156 gi|6273284|gb|AF191658.1|AF191 156 gi|6273287|gb|AF191661.1|AF191 156 gi|6273286|gb|AF191660.1|AF191 156 gi|6273290|gb|AF191664.1|AF191 156 gi|6273289|gb|AF191663.1|AF191 156 gi|6273291|gb|AF191665.1|AF191 156You can also access individual rows as SeqRecord objects via their index:
>>> print(align[0].id) gi|6273285|gb|AF191659.1|AF191 >>> print(align[-1].id) gi|6273291|gb|AF191665.1|AF191And extract columns as strings:
>>> print(align[:, 1]) AAAAAAAOr, take just the first ten columns as a sub-alignment:
>>> print(align[:, :10]) Alignment with 7 rows and 10 columns TATACATTAA gi|6273285|gb|AF191659.1|AF191 TATACATTAA gi|6273284|gb|AF191658.1|AF191 TATACATTAA gi|6273287|gb|AF191661.1|AF191 TATACATAAA gi|6273286|gb|AF191660.1|AF191 TATACATTAA gi|6273290|gb|AF191664.1|AF191 TATACATTAA gi|6273289|gb|AF191663.1|AF191 TATACATTAA gi|6273291|gb|AF191665.1|AF191Combining this alignment slicing with alignment addition allows you to remove a section of the alignment. For example, taking just the first and last ten columns:
>>> print(align[:, :10] + align[:, -10:]) Alignment with 7 rows and 20 columns TATACATTAAGTGTACCAGA gi|6273285|gb|AF191659.1|AF191 TATACATTAAGTGTACCAGA gi|6273284|gb|AF191658.1|AF191 TATACATTAAGTGTACCAGA gi|6273287|gb|AF191661.1|AF191 TATACATAAAGTGTACCAGA gi|6273286|gb|AF191660.1|AF191 TATACATTAAGTGTACCAGA gi|6273290|gb|AF191664.1|AF191 TATACATTAAGTATACCAGA gi|6273289|gb|AF191663.1|AF191 TATACATTAAGTGTACCAGA gi|6273291|gb|AF191665.1|AF191Note - This object does NOT attempt to model the kind of alignments used in next generation sequencing with multiple sequencing reads which are much shorter than the alignment, and where there is usually a consensus or reference sequence with special status.
Initialize a new MultipleSeqAlignment object.
Arguments: - records - A list (or iterator) of SeqRecord objects, whose sequences are all the same length. This may be an empty list. - alphabet - For backward compatibility only; its value should always be None. - annotations - Information about the whole alignment (dictionary). - column_annotations - Per column annotation (restricted dictionary). This holds Python sequences (lists, strings, tuples) whose length matches the number of columns. A typical use would be a secondary structure consensus string.
You would normally load a MSA from a file using Bio.AlignIO, but you can do this from a list of SeqRecord objects too:
>>> from Bio.Seq import Seq >>> from Bio.SeqRecord import SeqRecord >>> from Bio.Align import MultipleSeqAlignment >>> a = SeqRecord(Seq("AAAACGT"), id="Alpha") >>> b = SeqRecord(Seq("AAA-CGT"), id="Beta") >>> c = SeqRecord(Seq("AAAAGGT"), id="Gamma") >>> align = MultipleSeqAlignment([a, b, c], ... annotations={"tool": "demo"}, ... column_annotations={"stats": "CCCXCCC"}) >>> print(align) Alignment with 3 rows and 7 columns AAAACGT Alpha AAA-CGT Beta AAAAGGT Gamma >>> align.annotations {'tool': 'demo'} >>> align.column_annotations {'stats': 'CCCXCCC'}Ancestors
- Bio.Align.MultipleSeqAlignment
Methods
def plot(self,
ax: matplotlib.axes._axes.Axes | None = None,
fontsize: int = 6,
variable_sites_kwds: dict | None = None,
rotate_xtick_labels: bool = False,
sites: Iterable[int] | None = None) ‑> matplotlib.axes._axes.Axes-
Plot variable sites in the alignment.
Args
ax- Matplotlib ax.
fontsize- Fontsize of the character labels.
variable_sites_kwds- Passed to MultipleSequenceAlignment.variable_sites.
rotate_xtick_labels- Rotate the xtick labels 90 degrees.
sites- Only plot these sites. (Note: Only variable sites are plotted, so if a site is passed in this argument but it is not variable it will not be displayed.)
def variable_sites(self, min_2nd_most_freq: int = 1) ‑> Generator[Column, None, None]-
Generator for variable sites in the alignment.
Args
min_2nd_most_freq- Used to filter out sites that have low variability. For instance if min_2nd_most_freq is 2 a column containing 'AAAAT' should be excluded because the second most frequent character (T) has a frequency of 1.
class Tree (*args, **kwargs)-
An arborescence, i.e. a fully-connected directed acyclic graph with all edges directing away from the root and toward the tips. The "root" of the tree is represented by the :attr:
Tree.seed_nodeattribute. In unrooted trees, this node is an algorithmic artifact. In rooted trees this node is semantically equivalent to the root.The constructor can optionally construct a |Tree| object by cloning another |Tree| object passed as the first positional argument, or out of a data source if
streamandschemakeyword arguments are passed with a file-like object and a schema-specification string object values respectively.Parameters
*args : positional argument, optional If given, should be exactly one |Tree| object. The new |Tree| will then be a structural clone of this argument.
**kwargs : keyword arguments, optional The following optional keyword arguments are recognized and handled by this constructor:
<code>label</code> The label or description of the new |Tree| object. <code>taxon\_namespace</code> Specifies the |TaxonNamespace| object to be that the new |Tree| object will reference.Examples
Tree objects can be instantiated in the following ways::
# /usr/bin/env python try: from StringIO import StringIO except ImportError: from io import StringIO from dendropy import Tree, TaxonNamespace # empty tree t1 = Tree() # Tree objects can be instantiated from an external data source # using the 'get()' factory class method # From a file-like object t2 = Tree.get(file=open('treefile.tre', 'r'), schema="newick", tree_offset=0) # From a path t3 = Tree.get(path='sometrees.nexus', schema="nexus", collection_offset=2, tree_offset=1) # From a string s = "((A,B),(C,D));((A,C),(B,D));" # tree will be '((A,B),(C,D))' t4 = Tree.get(data=s, schema="newick") # tree will be '((A,C),(B,D))' t5 = Tree.get(data=s, schema="newick", tree_offset=1) # passing keywords to underlying tree parser t7 = dendropy.Tree.get( data="((A,B),(C,D));", schema="newick", taxon_namespace=t3.taxon_namespace, suppress_internal_node_taxa=False, preserve_underscores=True) # Tree objects can be written out using the 'write()' method. t1.write(file=open('treefile.tre', 'r'), schema="newick") t1.write(path='treefile.nex', schema="nexus") # Or returned as a string using the 'as_string()' method. s = t1.as_string("nexml") # tree structure deep-copied from another tree t8 = dendropy.Tree(t7) assert t8 is not t7 # Trees are distinct assert t8.symmetric_difference(t7) == 0 # and structure is identical assert t8.taxon_namespace is t7.taxon_namespace # BUT taxa are not cloned. nds3 = [nd for nd in t7.postorder_node_iter()] # Nodes in the two trees nds4 = [nd for nd in t8.postorder_node_iter()] # are distinct objects, for i, n in enumerate(nds3): # and can be manipulated assert nds3[i] is not nds4[i] # independentally. egs3 = [eg for eg in t7.postorder_edge_iter()] # Edges in the two trees egs4 = [eg for eg in t8.postorder_edge_iter()] # are also distinct objects, for i, e in enumerate(egs3): # and can also be manipulated assert egs3[i] is not egs4[i] # independentally. lves7 = t7.leaf_nodes() # Leaf nodes in the two trees lves8 = t8.leaf_nodes() # are also distinct objects, for i, lf in enumerate(lves3): # but order is the same, assert lves7[i] is not lves8[i] # and associated Taxon objects assert lves7[i].taxon is lves8[i].taxon # are the same. # To create deep copy of a tree with a different taxon namespace, # Use 'copy.deepcopy()' t9 = copy.deepcopy(t7) # Or explicitly pass in a new TaxonNamespace instance taxa = TaxonNamespace() t9 = dendropy.Tree(t7, taxon_namespace=taxa) assert t9 is not t7 # As above, the trees are distinct assert t9.symmetric_difference(t7) == 0 # and the structures are identical, assert t9.taxon_namespace is not t7.taxon_namespace # but this time, the taxa *are* different assert t9.taxon_namespace is taxa # as the given TaxonNamespace is used instead. lves3 = t7.leaf_nodes() # Leaf nodes (and, for that matter other nodes lves5 = t9.leaf_nodes() # as well as edges) are also distinct objects for i, lf in enumerate(lves3): # and the order is the same, as above, assert lves7[i] is not lves9[i] # but this time the associated Taxon assert lves7[i].taxon is not lves9[i].taxon # objects are distinct though the taxon assert lves7[i].taxon.label == lves9[i].taxon.label # labels are the same. # to 'switch out' the TaxonNamespace of a tree, replace the reference and # reindex the taxa: t11 = Tree.get(data='((A,B),(C,D));', 'newick') taxa = TaxonNamespace() t11.taxon_namespace = taxa t11.reindex_subcomponent_taxa() # You can also explicitly pass in a seed node: seed = Node(label="root") t12 = Tree(seed_node=seed) assert t12.seed_node is seedAncestors
- dendropy.datamodel.treemodel._tree.Tree
- dendropy.datamodel.taxonmodel.TaxonNamespaceAssociated
- dendropy.datamodel.basemodel.Annotable
- dendropy.datamodel.basemodel.Deserializable
- dendropy.datamodel.basemodel.NonMultiReadable
- dendropy.datamodel.basemodel.Serializable
- dendropy.datamodel.basemodel.DataObject
Static methods
def find_closest_leaf_node(node: dendropy.datamodel.treemodel._node.Node) ‑> dendropy.datamodel.treemodel._node.Node-
Find the leaf node that has the shortest path length to the given node.
Args
node- The node of interest.
Returns
The leaf node closest to the node of interest.
def from_disk(path: str,
schema: str = 'newick',
preserve_underscores: bool = True,
outgroup: str | None = None,
msa_path: str | None = None,
get_kwds: dict | None = None,
**kwds) ‑> Tree-
Loads a tree from a file.
Args
path:str- Path to the file containing the tree.
schema:str- The schema of the tree file (e.g., "newick"). See dendropy.Tree.get for options.
preserve_underscores:bool- If True, preserve underscores in taxon labels.
outgroup:str, optional- The name of the taxon to use as the outgroup. Defaults to None.
msa_path:str, optional- Path to a FASTA file containing leaf sequences. Defaults to None.
get_kwds:dict, optional- Keyword arguments passed to dendropy.Tree.get. Defaults to None.
**kwds- Additional keyword arguments passed to add_sequences_to_tree.
Returns
Tree- The loaded tree object.
Instance variables
prop multiple_sequence_alignment-
Generates a MultipleSequenceAlignment object from the tree.
Leaf nodes on the tree must have 'sequence' attributes and taxon labels.
Returns
MultipleSequenceAlignment- The generated alignment object.
Methods
def clade_bbox(self, taxon_labels: list[str]) ‑> dict[str, float]-
Calculates the bounding box of a clade.
The bounding box is determined by finding the most recent common ancestor (MRCA) of the specified taxa and then calculating the minimum and maximum x and y coordinates of its child nodes.
Args
taxon_labels:list[str]- A list of taxon labels that define the clade.
Returns
dict[str, float]- A dictionary containing the coordinates of the bounding box with keys: 'min_x', 'max_x', 'min_y', 'max_y'.
def distance_between(self,
node1: dendropy.datamodel.treemodel._node.Node,
node2: dendropy.datamodel.treemodel._node.Node) ‑> float-
Calculates the distance between two nodes.
The distance is calculated as the sum of the branch lengths along the path connecting the two nodes.
Args
node1:dp.Node- The first node.
node2:dp.Node- The second node.
Returns
float- The distance between the two nodes.
def internal_node_mrca(self,
node1: dendropy.datamodel.treemodel._node.Node,
node2: dendropy.datamodel.treemodel._node.Node) ‑> dendropy.datamodel.treemodel._node.Node-
Finds the MRCA of two nodes.
Note
dendropy.tree.mrca only works on leaf nodes (or nodes with taxon labels).
Args
node1:dp.Node- The first node.
node2:dp.Node- The second node.
Returns
dp.Node- The MRCA of the two nodes.
def node_to_root_path(self, taxon: str | dendropy.datamodel.treemodel._node.Node) ‑> Generator[dendropy.datamodel.treemodel._node.Node, None, None]-
Nodes from a taxon to the root node.
Args
taxon:strordendropy.Node- The taxon label of the starting node, or the node object.
Yields
dp.Node- The nodes from the taxon to the root.
def plot_clade_bbox(self,
taxon_labels: list[str],
ax: matplotlib.axes._axes.Axes | None = None,
extend_right: float = 0.0,
extend_down: float = 0.0,
label: str | None = None,
label_kwds: dict | None = None,
**kwds)-
Plots a rectangle around the bounding box of a clade.
Args
taxon_labels:list[str]- A list of taxon labels that define the clade.
ax:mp.axes.Axes, optional- The matplotlib axes to plot on. Defaults to None.
extend_right:float- Amount to extend the box to the right, in axes coordinates.
extend_down:float- Amount to extend the box down, in axes coordinates.
label:str, optional- A label to apply to the box. Defaults to None.
label_kwds:dict, optional- Keyword arguments passed to matplotlib.axes.Axes.text. Defaults to None.
**kwds- Additional keyword arguments passed to matplotlib.patches.Rectangle.
Returns
matplotlib.patches.Rectangle- The rectangle patch added to the axes.
def plot_tree_msa(self,
msa_plot_kwds: dict | None = None,
axes: tuple[matplotlib.axes._axes.Axes, matplotlib.axes._axes.Axes] | None = None) ‑> tuple[matplotlib.axes._axes.Axes, matplotlib.axes._axes.Axes]-
Plots the tree and multiple sequence alignment.
Args
msa_plot_kwds:dict, optional- Keyword arguments passed to the multiple sequence alignment plot function. Defaults to None.
axes:tuple[mp.axes.Axes, mp.axes.Axes], optional- A tuple of two matplotlib axes to plot on. If None, new axes are created. Defaults to None.
Returns
tuple[mp.axes.Axes, mp.axes.Axes]- The matplotlib axes used for plotting.