网站首页 美食营养 游戏数码 手工爱好 生活家居 健康养生 运动户外 职场理财 情感交际 母婴教育 时尚美容

转录组和基因组组装质量评估软件之一—BUSCO

时间:2024-10-24 20:59:34

1、使用方法:usage: python BUSCO.py -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]

2、必须参数:-i FASTA FILE, --in FASTA FILE Input sequence file in FASTA 熠硒勘唏format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set.这个是输入文件,为组装好的文件,可以为基因组,转录组,注释的评估,格式为fasta格式 -o OUTPUT, --out OUTPUT Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path输出文件的名,不能加路径 -m MODE, --mode MODE Specify which BUSCO analysis mode to run. There are three valid modes: - geno or genome, for genome assemblies (DNA) 基因组组装 - tran or transcriptome, for transcriptome assemblies (DNA) 转录组组装 - prot or proteins, for annotated gene sets (protein) 注释 -l LINEAGE, --lineage LINEAGE Specify location of the BUSCO lineage data to be used. Visit http://busco.ezlab.org for available lineages. 比对的数据库

3、可选参数:optional arguments: -c N, -颍骈城茇-cpu N Specify the number (N=integer) of threads/cores to use.潮贾篡绐CPU线程数 -e N, --evalue N E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)比对的e值 -f, --force Force rewriting of existing files. Must be used when output files with the provided name already exist.覆盖以前生成的文件-r, --restart Restart an uncompleted run. Not available for the protein mode重新运行未完成的任务 -sp SPECIES, --species SPECIES Name of existing Augustus species gene finding parameters. See Augustus documentation for available options. --augustus_parameters AUGUSTUS_PARAMETERS Additional parameters for the fine-tuning of Augustus run. For the species, do not use this option. Use single quotes as follow: '--param1=1 --param2=2', see Augustus documentation for available options. -t PATH, --tmp PATH Where to store temporary files (Default: ./tmp) --limit REGION_LIMIT How many candidate regions to consider (default: 3) --long Optimization mode Augustus self-training (Default: Off) adds considerably to the run time, but can improve results for some non-model organisms -q, --quiet Disable the info logs, displays only errors只输出error信息 -z, --tarzip Tarzip the output folders likely to contain thousands of files压缩输出文件夹 -v, --version Show this version and exit -h, --help Show this help message and exit

4、例子:/USER/xwf/software/busco/BUSCO.py -i ../bridger_out_dir/Bridger.fasta -o L -l /USER/xwf/database/eukaryota_odb9 -m tran -c 30 -f -e 1e-10

5、生成的文件包括run_L(因为上面的例子中,设置了输出前缀为L) 和tmp,主要看的是run_L里面的short_summary_L.txt,其中S:Single copy D:Duplicated F:Fragmented M:Missing结果中要S+D的值不能太低,因为BUSCO才用的数据库是同源物种的保守蛋白,所以组装出来的结果要有一定数量的同源物种保守蛋白才为最好

转录组和基因组组装质量评估软件之一—BUSCO
© 2025 一点知道
信息来自网络 所有数据仅供参考
有疑问请联系站长 site.kefu@gmail.com