LICENSE.txt · master · Satria Ardhe Kautsar / bigslice_v2 · GitLab

added original BiG-SLiCE v1.1.1 code (squashed all previous commits) · a4c00e8a

Satria A Kautsar authored Oct 14, 2019

Added biosynthetic pfam domains not covered by the ECDomainMiner dataset

added a script to generate biosynthetic hmm library

Include 19 PFAMs missed by the ECDomainMiner into biopfam list

Added subpfam generating script

fix generate_database.py not copying final hmms from temp folder

added md5sum check for the generated databases

make subpfams directory if not exists

on_progress: sqlite3-based output storage

implemented chem_class mapping for sqlite3 storage

fix typo in antiSMASH5 candidate cluster type

update requirements.txt

Implements MIBiG 2.0 parser

added support for antiSMASH4 and plantiSMASH GBKs

fix typo in sql schema

Implements HMM database processing and storage

also downloads MIBiG GBKs when generating databases

update database wrapper and sql schema

locus_tag is not required for CDS

Implemented run management

put placeholders for run phases

fix generate_databases script

speed up cached re-runs

throw error for db-update cases:

speed up inserts at the cost of limiting multiple inserts to one process

preserve db connection throughout the program to reduce overheads and enabled memory-based db

implements in-memory database manipulation (will be dumped into the db file after program ends)

update hmm.py class structure

update run.py structure

reimplements bgc.taxon storage

added sanity check for buffered inputs

generate_databases also do hmm_press

implemented hmmscanning (first half of it)

backup old database file when closing on memory mode rather than deleting it

add confirmation when database backup exists and using memory

completed biosyn_pfam scans handling

fix --resume mode

swap the confirmation message for output folder checks

fix program not reading the input folder

fix chem_class assignments

change class assignment of 'blactam'

implemented subpfam_scan (up to hmmscan calls) TODO: too much I/O

implements subpfam scans (up to hmmscanning)

fix bug in subpfam_scan

fix error type

for antiSMASH4 clustergbks, use filename as the bgc name

remove print statement from debugging

be more verbose when loading / dumping in-memory database

apply cutoff thresholds for biosynpfam_scan and subpfam_scan

change schema structure, only save hsp alignments on biosyn_pfam_scans

fix get_chunk returning the wrong bins

fix bottleneck when querying list of bgcs and core pfams

fix error type

fix antiSMASH4 parsing

added parameter for setting up subpfam chunk size

dump database file in-between phases

optimize subpfam_scan queries

improve wording

fix multiprocessing only run on 1 core

perform hmmscan in chunks

assign mp_pool to each core

fix taskset

perform subpfam and hmmscan checking in chunks

fix error message in osx

hmmscan should commit only once per chunk

fix sql schema

store BGC's taxonomy level in the database

when using --mem and parsing input folder, only dump to file if there are new BGCs

added SQLite3 indexes

fix run status not updated

added features extraction

linux: set main process to use all pooled CPUs

remove redundant parameter in features class

core: implemented BIRCH-based clustering step

performance: build_subpfam now uses models-long sequencs

bugfix: handle cases where no input gbk is included (reference-only)

update: subpfam_scan now use hsp sequences for hmmscan (rather than full gene) for optimization

core: change subpfam construction

update: sub_pfam features now only take top-3 hits and rank normalize it

fix: fix subpfam generation script

core: now include antiSMASH biosynthetic domains

core: apply sub_pfams generation for antiSMASH domains

fix: sub_pfam building will skip core domains without enough ref

Update README.md

fix: biosyn_pfam generation not taking correct antismash HMMs

fix: sub_pfam generation not taking the correct core domains

update: remove build_subpfam from requirements.txt

overhaul: optimize hsp parsing code

overhaul: when hmmscanning for subpfam, use --noali to save parsing time

overhaul: no longer use reference folder (mibig) for clustering. takes only BGCs inside the input folder

overhaul: organize input gbks into datasets

overhaul: now thresholds are adjustable

overhaul: implement logging

fix: parse only antiSMASH clustergbks/regiongbks

fix: parsed subpfam returns < 255 values

overhaul: for subpfam features, use max() instead of mean()

overhaul: for antiSMASH 5 gbks, use regions instead of candidate clusters

fix: also include mibig 2.0 clustergbk pattern

overhaul: when chunk_size is too large, use equally split chunks

overhaul: turned off db-checking for previously parsed datasets

overhaul: implement "download mode" as the default database initialization strategy

style: fix indentation

style: change exit style

style: fix indentation

style: remove unused import

style: change shebang line to python3

feature: enable "--complete" to build GCF centroids based on complete BGCs only

core: implements gcf membership assignment

style: split the gcf building and membership assignment blocks

overhaul: taxonomy data structure and input method overhaul

fix: taxonomy check failed to look in buffer

overhaul: --mem is now the default mode (replaced with --scratch to turn off)

overhaul: change table structure for bgc.orig_gbk_path

overhaul: remove run.num_resumes, should be trackable from run_log

misc: added LICENSE.txt

fix: leftover code

core: implements output visualization module

fix: sql query error

feature: added the ability to set port for output viz

ui: improve navigation layout

ui: added navigation auto show/collapse for sub items

ui: implement dataset page, visual improvements

ui: add pretty basic server-side datatables for bgc list in dataset page

ui: implements 'browse' datatable in dataset page

ui: incremental improvement

ui: implement ui for Runs

ui: implements overview block on "Runs" page

ui: incremental improvement

ui: incremental improvement

ui: incremental improvement

ui: implements table data fetch for "Runs-statistics"

ui: implements "Run-statistics" page

ui: implement data fetch for "Run-Browse"

ui: incremental improvement

ui: new logo, incremental improvement

overhaul: save incremental GCF IDs per run (for accession)

core: enables --n_ranks to set the number of top hits for membership assignment

overhaul: store bgc and gcf features in database, to enable query access

fix: error when --complete is turned on

ui: logo improvement

ui: logo fix

ui: incremental improvement

ui: logo improvement

ui: add favico, update logo

ui: add favico (html link)

ui: fix taxonomy information on "Dataset" page

ui: added "BGC" details page

ui: fix dataset page failed to recover on_contig_edge info

fix: error when --complete is turned on

core: remove typo on CDS fasta preparation

ui: added copy to clipboard functionality

ui: implemented "BGC -> annotations -> gene table"

ui: incremental improvement

ui: implements BGC page "features word cloud"

ui: implements GCF hits datatable for BGC page

ui: incremental improvement

ui: implements "homologous BGCs" datatable

ui: use local js file for plotlyJS

ui: adds arrower visualization for BGC annotation page

ui: don't show view button if domains not present (bgc-annotation)

ui: add modal opener

ui: improve modal function

ui: implements per-homolog arrower visualization

ui: use responsive datatables

ui: add GCF page (with a temporary visualization for demo purpose)

core: added the ability to specify specific run_id to --resume

fix: --resume failed when status = 7

fix: --resume not using the original parameters

fix: infinite loop when n < num_cpu

core: implements --query mode (todo: visualization)

fix: membership assignment fails when in regular mode

core: cache GCF models via pickle

fix: membership assignment takes all bgcs in regular runs

general: replace all occurences of "bigsscuit" with "bigslice"

workflow: change installation method to setuptools

Update README.md

Update README.md

Update README.md

fix: folder structure re-organization to support setuptools

Update README.md

core: add sql indexing for hsp_alignment

fix: setuptools packaging

fix: setuptools packaging

core: implements --version command

core: store hsp relationship from subpfam -> core pfam hits

Update README.md

fix: --version exception catching

git: update .gitignore

core: update db_download script

db: update HMM libraries md5 checksum

ui: enable showing subpfam clades annotation in the arrowers

ui: shows sub_pfam signatures on BGC's gene table domain details

fix: sql schema referencing for --query mode

ui: temporarily hide GCF hits bar chart and heatmap blocks until its implementation later

Update README.md

Update README.md

Update README.md

Update README.md

Update README.md

Update README.md

misc: add input template folder

ui: implements gcf view (with members datatables and arrowers)

misc: add images for github readme

misc: resize image

misc: added picture for readme

misc: replace figure_1 with higer resolution version

ui: delete figures, github won't support full res image previews

ui: delete unused gcf demo template

ui: implements help & reports page

Update README.md

Update README.md

Update README.md

Update README.md

Update README.md

Update README.md

Update README.md

Update README.md

Update README.md

Update README.md

Update README.md

git: update .gitignore

Update README.md

ui: update version

Update README.md

ui: added dummy placeholders for help page

core: refactor reports folder and data structure

misc: tidy up --help messages

core: change default n_ranks to 5

core: change default --threshold to 300

core: store run id in reports index

ui: implements query results page (overview)

fix: reports query detail view not copied

ui: incremental improvement (query overview page)

ui: implements gcf -> models page

ui: bgc page -> color gcf & bgc rows based on distance threshold

ui: update FAQ

ui: summary page -> incremental improvement

ui: gcf detail page --> implement graphs

ui: incremental improvement

ui: temporarily disable 'how it works' section until we fill it with contents

ui: add feedback message

ui: fix feedback view page linking

ui: add 'about' page

ui: do not show bgc class with count < 1 on the dataset page

ui: change link to LICENSE.txt

ui: add --query mode explanation

ui: implements query detail page (1 of 2)

ui: implements query detail: gcf hits table

ui: implements query detail -> homologous bgcs

ui: show threshold in query overview

ui: add query similar bgc arrower

ui: fix query detail similar bgcs showing the wrong counts

ui: another fix

ui: fix bgc homology table not displaying the correct pagination

misc: add long_description to setup.py

Update README.md

fix: slowdown when fetching data due to lack of indexing (bump schema version to 1.0.1)

Update README.md

fix: regex extraction of sql schema

fix: taxonomy parser assigning double taxa for e.g. A143/ when A14/ is present

fix: download_bigslice_hmmdb doesn't work correctly when used from source

misc: add ascii-art for help text

sql: bump query report sql version to 1.0.1

bump big-slice version to 1.1.0

misc: update download link for hmm models

Update README.md

fix: input error when running fresh

Update README.md

Update README.md

fix broken link on pre-processed input data

Update README.md

Update README.md

Update README.md

misc: added a script for generating custom antiSMASH GenBank files

Update README.md

misc: added script for generating taxonomy from GTDB-API

misc: calling --version does not need to present input folder path

fix: html output summary page breaks when results contain runs with no clustering

Select cpus from os.sched_getaffinity() (#30)

Fix: on a multi-user cluster with cgroups enforcing CPU affinity, the taskset lines fail, as cores not assigned to the job may be selected.

fix: taxonomy script throws error from empty GTDB-API results (Resolves #33)

fix: Resolves #36

misc: track HMM databases versions and display accordingly

misc: limit biopython required version for up to v1.76

input: accepts antiSMASH6 regiongbks

core: treat MIBiG BGCs as incomplete (therefore won't be used to generate GCF models)

core: sort BGC features low-to-high before feeding into BIRCH algorithm

misc: change default --n_ranks to 1

release: update version to 1.1.1

misc: added a script to extract features matrix into a tsv file

misc: added a script to extract bgcs' metadata into a tsv file

update citation

core: add classes mapping for antiSMASH 6.0 BGCs

a4c00e8a

This project is licensed under the GNU Affero General Public License v3.0. Learn more