Package 'phylotools' reference manual

Title:	Phylogenetic Tools for Eco-Phylogenetics
Description:	A collection of tools for building RAxML supermatrix using PHYLIP or aligned FASTA files. These functions will be useful for building large phylogenies using multiple markers.
Authors:	Jinlong Zhang [aut, cre], Nancai Pei [ctb], Xiangcheng Mi [ctb]
Maintainer:	Jinlong Zhang <[email protected]>
License:	GPL-2
Version:	0.2.5
Built:	2025-01-08 03:57:19 UTC
Source:	https://github.com/helixcn/phylotools

Phylogenetic tools for building PHYLIP supermatrix and more

Description

A collection of a few functions for handling DNA-barcoding sequences, building PHYLIP supermatrix for RAxML etc.

Details

Package:	phylotools
Type:	Package
Version:	0.2.1
Date:	2017-12-08
License:	GLP-2
LazyLoad:	yes

Author(s)

Jinlong Zhang

Maintainer: Jinlong Zhang <[email protected]>

Clean the name of a fasta file

Description

Cleaning the names of sequences for a fasta file. The punctuation characters and the white space will be replaced with "_".

Usage

clean.fasta.name(infile = NULL, outfile = "name_cleaned.fasta")
clean.fasta.name(infile = NULL, outfile = "name_cleaned.fasta")

Arguments

`infile`	character string representing the name of the fasta file.
`outfile`	Character string representing the file name to be generated.

Details

Punctuation characters and white space will be replaced by "_". More information can be found at regex.

Value

This is a subroutine without a return value. A fasta file with all the names of sequences renamed will be saved to the working directory.

Author(s)

Jinlong Zhang <[email protected]>

References

http://www.genomatix.de/online_help/help/sequence_formats.html

Examples


  cat(
    ">seq_1*66",  "--TTACAAATTGACTTATTATA",
    ">seq_2()r",  "GATTACAAATTGACTTATTATA",
    ">seq_3:test",  "GATTACAAATTGACTTATTATA",
    ">seq_588",  "GATTACAAATTGACTTATTATA",
    ">seq_8$$yu",  "GATTACAAATTGACTTATTATA",
    ">seq_10", "---TACAAATTGAATTATTATA",
    file = "matk.fasta", sep = "\n")

  clean.fasta.name(infile = "matk.fasta")
  get.fasta.name("name_cleaned.fasta")

  # Delete file
  unlink("matk.fasta")
  unlink("name_cleaned.fasta")
cat(
    ">seq_1*66",  "--TTACAAATTGACTTATTATA",
    ">seq_2()r",  "GATTACAAATTGACTTATTATA",
    ">seq_3:test",  "GATTACAAATTGACTTATTATA",
    ">seq_588",  "GATTACAAATTGACTTATTATA",
    ">seq_8$$yu",  "GATTACAAATTGACTTATTATA",
    ">seq_10", "---TACAAATTGAATTATTATA",
    file = "matk.fasta", sep = "\n")

  clean.fasta.name(infile = "matk.fasta")
  get.fasta.name("name_cleaned.fasta")

  # Delete file
  unlink("matk.fasta")
  unlink("name_cleaned.fasta")

Convert and Save sequence data frame to fasta file

Description

Convert and Save sequence data frame to fasta file.

Usage

dat2fasta(dat, outfile = "out.fasta")
dat2fasta(dat, outfile = "out.fasta")

Arguments

`dat`	data frame by `read.phylip` or `read.fasta`
`outfile`	a character string, representing the name of the fasta file to be generated

Details

The column of the data frame must be: 1. seq.name, 2. seq.text, represent the name of the sequences, the content of the sequence, eg. ATCGGGAAC.

Value

This is a routine without return value.

Author(s)

Jinlong Zhang <[email protected]>

References

http://www.genomatix.de/online_help/help/sequence_formats.html

Examples


cat(
">seq_2", "GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
">seq_3", "GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
">seq_5", "GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
">seq_8", "ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
">seq_9", "ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
file = "trn1.fasta", sep = "\n")

res <- read.fasta("trn1.fasta")
dat2fasta(res)
unlink("trn1.fasta")
unlink("out.fasta")
cat(
">seq_2", "GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
">seq_3", "GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
">seq_5", "GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
">seq_8", "ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
">seq_9", "ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
file = "trn1.fasta", sep = "\n")

res <- read.fasta("trn1.fasta")
dat2fasta(res)
unlink("trn1.fasta")
unlink("out.fasta")

Conver the data frame to sequential PHYLIP format file

Description

Convert and save a data frame to sequential PHYLIP file.

Usage

dat2phylip(dat, outfile = "out.phy")
dat2phylip(dat, outfile = "out.phy")

Arguments

`dat`	the data frame returned by `read.phylip`, `read.fasta`.
`outfile`	character string represents the phylip file to be generated.

Details

The output will be in sequential PHYLIP format.

Value

This is a subroutine, there is no return value.

Note

The names of the sequences should not contain white space or Punctuation characters. See regex for more details.

Author(s)

Jinlong Zhang <[email protected]>

References

http://www.genomatix.de/online_help/help/sequence_formats.html

Examples


  cat(
  ">seq_2", "GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
  ">seq_3", "GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
  ">seq_5", "GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
  ">seq_8", "ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
  ">seq_9", "ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
  file = "trn1.fasta", sep = "\n")

res <- read.fasta("trn1.fasta")
dat2phylip(res)
unlink("trn1.fasta")
unlink("out.phy")
cat(
  ">seq_2", "GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
  ">seq_3", "GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
  ">seq_5", "GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
  ">seq_8", "ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
  ">seq_9", "ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
  file = "trn1.fasta", sep = "\n")

res <- read.fasta("trn1.fasta")
dat2phylip(res)
unlink("trn1.fasta")
unlink("out.phy")

get the names of all the sequences of fasta file

Description

get the names of all the sequences of a fasta file, and perform cleaning of the names of the sequences

Usage

get.fasta.name(infile, clean_name = FALSE)
get.fasta.name(infile, clean_name = FALSE)

Arguments

`infile`	character string representing the name of the fasta file.
`clean_name`	logical, representing cleaning of the names will be performed.

Value

a character vector containing the names of the sequences

Note

Punctuation characters and white space be replaced by "_". Definition of Punctuation characters can be found at regex.

Author(s)

Jinlong Zhang <[email protected]>

References

http://www.genomatix.de/online_help/help/sequence_formats.html

Examples


  cat(
  ">seq_2", "GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
  ">seq_3", "GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
  ">seq_5", "GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
  ">seq_8", "ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
  ">seq_9", "ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
  file = "trn1.fasta", sep = "\n")
get.fasta.name("trn1.fasta")
unlink("trn1.fasta")
cat(
  ">seq_2", "GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
  ">seq_3", "GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
  ">seq_5", "GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
  ">seq_8", "ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
  ">seq_9", "ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
  file = "trn1.fasta", sep = "\n")
get.fasta.name("trn1.fasta")
unlink("trn1.fasta")

get the names of sequences from a PHYLIP file

Description

get the names of sequences from a PHYLIP file.

Usage

get.phylip.name(infile, clean_name = FALSE)
get.phylip.name(infile, clean_name = FALSE)

Arguments

`infile`	character representing the name or path of the phylip file.
`clean_name`	logical, representing cleaning of the names will be performed.

Details

Punctuation characters and white space be replaced by "_". Definition of Punctuation characters can be found at regex.

Value

a character vector of the names of the sequences

Author(s)

Jinlong Zhang <[email protected]>

Examples


  cat("6 22",
  "seq_1    --TTACAAATTGACTTATTATA",
  "seq_2    GATTACAAATTGACTTATTATA",
  "seq_3    GATTACAAATTGACTTATTATA",
  "seq_5    GATTACAAATTGACTTATTATA",
  "seq_8    GATTACAAATTGACTTATTATA",
  "seq_10   ---TACAAATTGAATTATTATA",
  file = "matk.phy", sep = "\n")
get.phylip.name("matk.phy")
unlink("matk.phy")
cat("6 22",
  "seq_1    --TTACAAATTGACTTATTATA",
  "seq_2    GATTACAAATTGACTTATTATA",
  "seq_3    GATTACAAATTGACTTATTATA",
  "seq_5    GATTACAAATTGACTTATTATA",
  "seq_8    GATTACAAATTGACTTATTATA",
  "seq_10   ---TACAAATTGAATTATTATA",
  file = "matk.phy", sep = "\n")
get.phylip.name("matk.phy")
unlink("matk.phy")

Read FASTA file

Description

Read and convert the fasta file to data frame

Usage

read.fasta(file = NULL, clean_name = FALSE)
read.fasta(file = NULL, clean_name = FALSE)

Arguments

`file`	character string representing the name of the fasta file.
`clean_name`	logical, representing cleaning of the names will be performed. Punctuation characters and white space be replaced by "_" . See `regex` for more details.

Details

In this function, names of the sequences are identified by ">", and all the lines before next ">" will be concatenated.

Value

a data frame with two columns: (1) seq.name, the names for all the sequences. (2) seq.text, the raw sequence data.

Note

Punctuation characters and white space in the names of the sequences will be replaced by "_".

Author(s)

Jinlong Zhang <[email protected]>

References

http://www.genomatix.de/online_help/help/sequence_formats.html

Examples


cat(
">seq_2", "GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
">seq_3", "GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
">seq_5", "GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
">seq_8", "ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
">seq_9", "ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
file = "trn1.fasta", sep = "\n")

res <- read.fasta("trn1.fasta")
unlink("trn1.fasta")
cat(
">seq_2", "GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
">seq_3", "GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
">seq_5", "GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
">seq_8", "ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
">seq_9", "ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
file = "trn1.fasta", sep = "\n")

res <- read.fasta("trn1.fasta")
unlink("trn1.fasta")

read phylip file

Description

read the phylip file, and store the sequences and their names in a dataframe.

Usage

read.phylip(infile, clean_name = TRUE, seq.name.length = NA)
read.phylip(infile, clean_name = TRUE, seq.name.length = NA)

Arguments

`infile`	character string for the name of the phylip file.
`clean_name`	logical, representing cleaning of the names will be performed.
`seq.name.length`	Number of characters for the sequence names, if not specified, the function will use the first white space as the identifier. All the character before the first white space will be treated as sequence names, and the rest will be treated as sequences.

Details

read.phylip accepts both interleaved and sequential phylip, the number of sequences is identified by parsing the first line of the file. Sequences and their names will be stored in a data frame.

If clean_name is TRUE, punctuation characters and white space be replaced by "_". Definition of punctuation characters can be found at regex.

Value

a data frame with two columns: (1) seq.name, the names for all the sequences; (2) seq.text, the raw sequence data.

Note

the Punctuation characters and white space in the names of the sequences will be replaced by "_".

Author(s)

Jinlong Zhang <[email protected]>

Examples


  cat("6 22",
  "seq_1    --TTACAAATTGACTTATTATA",
  "seq_2    GATTACAAATTGACTTATTATA",
  "seq_3    GATTACAAATTGACTTATTATA",
  "seq_5    GATTACAAATTGACTTATTATA",
  "seq_8    GATTACAAATTGACTTATTATA",
  "seq_10   ---TACAAATTGAATTATTATA",
  file = "matk.phy", sep = "\n")

res <- read.phylip(infile = "matk.phy")
unlink("matk.phy")
cat("6 22",
  "seq_1    --TTACAAATTGACTTATTATA",
  "seq_2    GATTACAAATTGACTTATTATA",
  "seq_3    GATTACAAATTGACTTATTATA",
  "seq_5    GATTACAAATTGACTTATTATA",
  "seq_8    GATTACAAATTGACTTATTATA",
  "seq_10   ---TACAAATTGAATTATTATA",
  file = "matk.phy", sep = "\n")

res <- read.phylip(infile = "matk.phy")
unlink("matk.phy")

Rename the sequences for a fasta file

Description

Rename the sequences within a fasta file according to a data frame supplied.

Usage

rename.fasta(infile = NULL, ref_table, outfile = "renamed.fasta")
rename.fasta(infile = NULL, ref_table, outfile = "renamed.fasta")

Arguments

`infile`	character string containing the name of the fasta file.
`ref_table`	a data frame with first column for original name, second column for the new name of the sequence.
`outfile`	The name of the fasta file with sequences renamed.

Details

If the orginal name was not found in the ref_table, the name for the sequence will be changed into "old_name_" + orginal name.

Value

This is a subroutine without return value.

Note

Since whitespace and punctuation characters will be replaced with "_", name of a sequence might change. It is suggest to obtain the name of the sequences by calling read.fasta first, and save the data.frame to a csv file to obtain the "original" name for the sequences.

Author(s)

Jinlong Zhang <[email protected]>

References

http://www.genomatix.de/online_help/help/sequence_formats.html

Examples


cat(
    ">seq_1",  "--TTACAAATTGACTTATTATA",
    ">seq_2",  "GATTACAAATTGACTTATTATA",
    ">seq_3",  "GATTACAAATTGACTTATTATA",
    ">seq_5",  "GATTACAAATTGACTTATTATA",
    ">seq_8",  "GATTACAAATTGACTTATTATA",
    ">seq_10", "---TACAAATTGAATTATTATA",
    file = "matk.fasta", sep = "\n")
old_name <- get.fasta.name("matk.fasta")
new_name <- c("Magnolia", "Ranunculus", "Carex", "Morus", "Ulmus", "Salix")
ref2 <- data.frame(old_name, new_name)
rename.fasta(infile = "matk.fasta", ref_table = ref2, outfile = "renamed.fasta")
unlink("matk.fasta")
unlink("renamed.fasta")
cat(
    ">seq_1",  "--TTACAAATTGACTTATTATA",
    ">seq_2",  "GATTACAAATTGACTTATTATA",
    ">seq_3",  "GATTACAAATTGACTTATTATA",
    ">seq_5",  "GATTACAAATTGACTTATTATA",
    ">seq_8",  "GATTACAAATTGACTTATTATA",
    ">seq_10", "---TACAAATTGAATTATTATA",
    file = "matk.fasta", sep = "\n")
old_name <- get.fasta.name("matk.fasta")
new_name <- c("Magnolia", "Ranunculus", "Carex", "Morus", "Ulmus", "Salix")
ref2 <- data.frame(old_name, new_name)
rename.fasta(infile = "matk.fasta", ref_table = ref2, outfile = "renamed.fasta")
unlink("matk.fasta")
unlink("renamed.fasta")

Delete sequences from fasta file

Description

Delete sequences from fasta file

Usage

rm.sequence.fasta(infile, outfile = "sequence.removed.fasta", to.rm = NULL)
rm.sequence.fasta(infile, outfile = "sequence.removed.fasta", to.rm = NULL)

Arguments

`infile`	Character string representing the name of the fasta file.
`outfile`	Character string representing the name of the output fasta file.
`to.rm`	Vector of character string containing the names of sequences to be deleted.

Details

Delete sequences from a fasta file.

Value

This is a subroutine without return value.

Author(s)

Jinlong Zhang <[email protected]>

References

http://www.genomatix.de/online_help/help/sequence_formats.html

Examples


cat(
">seq_1",  "---TCCGCCCCCCTACTCTA",
">seq_3",  "CTCTCCGCCCCTCTACTCTA",
">seq_5",  "---TCCGCCC-TTTACTCTA",
">seq_6",  "---TCCGCCCCTCTACTCTA",
">seq_9",  "---TCCGCCC-TCTACTCTA",
">seq_12", "CTCTCCGCCC-TCTACTCTA",
file = "trn2.fasta", sep = "\n")

rm.sequence.fasta(infile = "trn2.fasta", to.rm = c("seq_1","seq_12"))

unlink("trn2.fasta")
unlink("sequence.removed.fasta")
cat(
">seq_1",  "---TCCGCCCCCCTACTCTA",
">seq_3",  "CTCTCCGCCCCTCTACTCTA",
">seq_5",  "---TCCGCCC-TTTACTCTA",
">seq_6",  "---TCCGCCCCTCTACTCTA",
">seq_9",  "---TCCGCCC-TCTACTCTA",
">seq_12", "CTCTCCGCCC-TCTACTCTA",
file = "trn2.fasta", sep = "\n")

rm.sequence.fasta(infile = "trn2.fasta", to.rm = c("seq_1","seq_12"))

unlink("trn2.fasta")
unlink("sequence.removed.fasta")

grouping the data frame containing sequences and names and generate fasta file

Description

Splite the data frame of sequences based on the reference table of grouping.

Usage

split_dat(dat, ref_table)
split_dat(dat, ref_table)

Arguments

`dat`	data frame generated by `read.phylip` or `read.fasta`
`ref_table`	data frame with first column for the name of the sequence, second column for the group the sequence belongs to.

Details

Each group of sequences will be saved to a fasta file. Sequences not included in the ref_table will be saved in "Ungrouped.fasta"

Value

This is a subroutine, there is no return value.

Author(s)

Jinlong Zhang <[email protected]>

References

http://www.genomatix.de/online_help/help/sequence_formats.html

Examples


  cat(
  ">seq_1",   "--TTACAAATTGACTTATTATA",
  ">seq_2",   "GATTACAAATTGACTTATTATA",
  ">seq_3",   "GATTACAAATTGACTTATTATA",
  ">seq_5",   "GATTACAAATTGACTTATTATA",
  ">seq_8",   "GATTACAAATTGACTTATTATA",
  ">seq_10",  "---TACAAATTGAATTATTATA",
  ">seq_11",  "--TTACAAATTGACTTATTATA",
  ">seq_12",  "GATTACAAATTGACTTATTATA",
  ">seq_13",  "GATTACAAATTGACTTATTATA",
  ">seq_15",  "GATTACAAATTGACTTATTATA",
  ">seq_16",  "GATTACAAATTGACTTATTATA",
  ">seq_17",  "---TACAAATTGAATTATTATA",
  file = "trnh.fasta", sep = "\n")

sequence_name <- get.fasta.name("trnh.fasta")
sequence_group <- c("group1","group1","group1","group1","group1",
"group2","group2","group2","group3","group3","group3","group3")
group <- data.frame(sequence_name, sequence_group)

fasta <- read.fasta("trnh.fasta")
split_dat(fasta, group)

unlink("trnh.fasta")
unlink("ungrouped.fasta")
unlink("group1.fasta")
unlink("group2.fasta")
unlink("group3.fasta")

cat(
  ">seq_1",   "--TTACAAATTGACTTATTATA",
  ">seq_2",   "GATTACAAATTGACTTATTATA",
  ">seq_3",   "GATTACAAATTGACTTATTATA",
  ">seq_5",   "GATTACAAATTGACTTATTATA",
  ">seq_8",   "GATTACAAATTGACTTATTATA",
  ">seq_10",  "---TACAAATTGAATTATTATA",
  ">seq_11",  "--TTACAAATTGACTTATTATA",
  ">seq_12",  "GATTACAAATTGACTTATTATA",
  ">seq_13",  "GATTACAAATTGACTTATTATA",
  ">seq_15",  "GATTACAAATTGACTTATTATA",
  ">seq_16",  "GATTACAAATTGACTTATTATA",
  ">seq_17",  "---TACAAATTGAATTATTATA",
  file = "trnh.fasta", sep = "\n")

sequence_name <- get.fasta.name("trnh.fasta")
sequence_group <- c("group1","group1","group1","group1","group1",
"group2","group2","group2","group3","group3","group3","group3")
group <- data.frame(sequence_name, sequence_group)

fasta <- read.fasta("trnh.fasta")
split_dat(fasta, group)

unlink("trnh.fasta")
unlink("ungrouped.fasta")
unlink("group1.fasta")
unlink("group2.fasta")
unlink("group3.fasta")

Substitute the tip labels of a phylogenetic tree

Description

Substitute the tip labels of a phylogenetic tree according to a reference data table.

Usage

sub.taxa.label(tree, dat)
sub.taxa.label(tree, dat)

Arguments

`tree`	Phylogenetic tree
`dat`	A dataframe with the first column the tip labels and the second column the new names.

Value

A Phylogenetic tree with the tip labels substituted

Author(s)

Jinlong Zhang [email protected]

Examples


library(ape)
data(bird.families)
tips <- bird.families$tip.label
abr <- paste("fam",1:length(tips), sep = "")
dat <- data.frame(tips, abr)
ntree <- sub.taxa.label(bird.families, dat)

library(ape)
data(bird.families)
tips <- bird.families$tip.label
abr <- paste("fam",1:length(tips), sep = "")
dat <- data.frame(tips, abr)
ntree <- sub.taxa.label(bird.families, dat)

Build PHYLIP supermatrix and RAxML partition file using aligned FASTA or PHYLIP files.

Description

Build PHYLIP supermatrix and create RAxML partition file using aligned fasta or phylip files.

Usage

supermat(infiles, outfile = "supermat.out.fas",
         partition.file = "gene_partition.txt")
supermat(infiles, outfile = "supermat.out.fas",
         partition.file = "gene_partition.txt")

Arguments

`infiles`	a character string vector for phylip or aligned fasta file.
`outfile`	the name of the PHYLIP supermatrix
`partition.file`	partition data summary describing the genes.

Details

Supermatrix here means a phylip file with combined aligned sequences. The missing sequences should be replaced with either "?" or "-".

Value

A list containing: (1)supermat.dat:a list containing all the data frames read by read.phylip or read.fasta (2)res.super.dat: a data frame containing the sequences and the names (3)partition.dat: summary for all the fasta or phylip files (4)partition.dat.vector: character string vector for the partition file for RAxML

Note

Punctuation characters and white space in the names of the sequences will be replaced by "_". More information can be found at regex. Type of the sequence in the RAxML partition file should be changed manually according to the manual of RAxML.

Author(s)

Jinlong Zhang <[email protected]>

References

Kress, W. J., Erickson, D. L., Jones, F. A., Swenson, N. G., Perez, R., Sanjur, O., & Bermingham, E. (2009). Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proceedings of the National Academy of Sciences, 106(44), 18621-18626.

de Queiroz, A.and Gatesy, J. (2007). The supermatrix approach to systematics. Trends in Ecology & Evolution, 22(1), 34-41.

https://github.com/stamatak/standard-RAxML

Examples


  cat("6 22",
  "seq_1    --TTACAAATTGACTTATTATA",
  "seq_2    GATTACAAATTGACTTATTATA",
  "seq_3    GATTACAAATTGACTTATTATA",
  "seq_5    GATTACAAATTGACTTATTATA",
  "seq_8    GATTACAAATTGACTTATTATA",
  "seq_10   ---TACAAATTGAATTATTATA",
  file = "matk.phy", sep = "\n")

  cat("5 15",
  "seq_1     GATTACAAATTGACT",
  "seq_3     GATTACAAATTGACT",
  "seq_4     GATTACAAATTGACT",
  "seq_5     GATTACAAATTGACT",
  "seq_8     GATTACAAATTGACT",
  file = "rbcla.phy", sep = "\n")

  cat("5 50",
  "seq_2          GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
  "seq_3          GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
  "seq_5          GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
  "seq_8          ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
  "seq_9          ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
  file = "trn1.phy", sep = "\n")

supermat(infiles = c("matk.phy", "rbcla.phy", "trn1.phy"))
unlink(c("matk.phy", "rbcla.phy", "trn1.phy"))
unlink(c("supermat.out.phy","gene_partition.txt"))

cat(
  ">seq_1",  "--TTACAAATTGACTTATTATA",
  ">seq_2",  "GATTACAAATTGACTTATTATA",
  ">seq_3",  "GATTACAAATTGACTTATTATA",
  ">seq_5",  "GATTACAAATTGACTTATTATA",
  ">seq_8",  "GATTACAAATTGACTTATTATA",
  ">seq_10", "---TACAAATTGAATTATTATA",
  file = "matk.fasta", sep = "\n")

cat(
  ">seq_1", "GATTACAAATTGACT",
  ">seq_3", "GATTACAAATTGACT",
  ">seq_4", "GATTACAAATTGACT",
  ">seq_5", "GATTACAAATTGACT",
  ">seq_8", "GATTACAAATTGACT",
  file = "rbcla.fasta", sep = "\n")

cat(
  ">seq_2", "GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
  ">seq_3", "GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
  ">seq_5", "GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
  ">seq_8", "ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
  ">seq_9", "ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
  file = "trn1.fasta", sep = "\n")

supermat(infiles = c("matk.fasta", "rbcla.fasta", "trn1.fasta"))
unlink(c("matk.fasta", "rbcla.fasta", "trn1.fasta"))

unlink(c("supermat.out.phy","gene_partition.txt"))

cat("6 22",
  "seq_1    --TTACAAATTGACTTATTATA",
  "seq_2    GATTACAAATTGACTTATTATA",
  "seq_3    GATTACAAATTGACTTATTATA",
  "seq_5    GATTACAAATTGACTTATTATA",
  "seq_8    GATTACAAATTGACTTATTATA",
  "seq_10   ---TACAAATTGAATTATTATA",
  file = "matk.phy", sep = "\n")

  cat("5 15",
  "seq_1     GATTACAAATTGACT",
  "seq_3     GATTACAAATTGACT",
  "seq_4     GATTACAAATTGACT",
  "seq_5     GATTACAAATTGACT",
  "seq_8     GATTACAAATTGACT",
  file = "rbcla.phy", sep = "\n")

  cat("5 50",
  "seq_2          GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
  "seq_3          GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
  "seq_5          GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
  "seq_8          ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
  "seq_9          ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
  file = "trn1.phy", sep = "\n")

supermat(infiles = c("matk.phy", "rbcla.phy", "trn1.phy"))
unlink(c("matk.phy", "rbcla.phy", "trn1.phy"))
unlink(c("supermat.out.phy","gene_partition.txt"))

cat(
  ">seq_1",  "--TTACAAATTGACTTATTATA",
  ">seq_2",  "GATTACAAATTGACTTATTATA",
  ">seq_3",  "GATTACAAATTGACTTATTATA",
  ">seq_5",  "GATTACAAATTGACTTATTATA",
  ">seq_8",  "GATTACAAATTGACTTATTATA",
  ">seq_10", "---TACAAATTGAATTATTATA",
  file = "matk.fasta", sep = "\n")

cat(
  ">seq_1", "GATTACAAATTGACT",
  ">seq_3", "GATTACAAATTGACT",
  ">seq_4", "GATTACAAATTGACT",
  ">seq_5", "GATTACAAATTGACT",
  ">seq_8", "GATTACAAATTGACT",
  file = "rbcla.fasta", sep = "\n")

cat(
  ">seq_2", "GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA",
  ">seq_3", "GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG",
  ">seq_5", "GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC",
  ">seq_8", "ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG",
  ">seq_9", "ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA",
  file = "trn1.fasta", sep = "\n")

supermat(infiles = c("matk.fasta", "rbcla.fasta", "trn1.fasta"))
unlink(c("matk.fasta", "rbcla.fasta", "trn1.fasta"))

unlink(c("supermat.out.phy","gene_partition.txt"))

Package 'phylotools'

Help Index

Phylogenetic tools for building PHYLIP supermatrix and more

Description

Details

Author(s)

Clean the name of a fasta file

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Convert and Save sequence data frame to fasta file

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Conver the data frame to sequential PHYLIP format file

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

get the names of all the sequences of fasta file

Description

Usage

Arguments

Value

Note

Author(s)

References

See Also

Examples

get the names of sequences from a PHYLIP file

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Read FASTA file

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

read phylip file

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Rename the sequences for a fasta file

Description

Usage