-
Satria A Kautsar authored
Added biosynthetic pfam domains not covered by the ECDomainMiner dataset added a script to generate biosynthetic hmm library Include 19 PFAMs missed by the ECDomainMiner into biopfam list Added subpfam generating script fix generate_database.py not copying final hmms from temp folder added md5sum check for the generated databases make subpfams directory if not exists on_progress: sqlite3-based output storage implemented chem_class mapping for sqlite3 storage fix typo in antiSMASH5 candidate cluster type update requirements.txt Implements MIBiG 2.0 parser added support for antiSMASH4 and plantiSMASH GBKs fix typo in sql schema Implements HMM database processing and storage also downloads MIBiG GBKs when generating databases update database wrapper and sql schema locus_tag is not required for CDS Implemented run management put placeholders for run phases fix generate_databases script speed up cached re-runs throw error for db-update cases: speed up inserts at the cost of limiting multiple inserts to one process preserve db connection throughout the program to reduce overheads and enabled memory-based db implements in-memory database manipulation (will be dumped into the db file after program ends) update hmm.py class structure update run.py structure reimplements bgc.taxon storage added sanity check for buffered inputs generate_databases also do hmm_press implemented hmmscanning (first half of it) backup old database file when closing on memory mode rather than deleting it add confirmation when database backup exists and using memory completed biosyn_pfam scans handling fix --resume mode swap the confirmation message for output folder checks fix program not reading the input folder fix chem_class assignments change class assignment of 'blactam' implemented subpfam_scan (up to hmmscan calls) TODO: too much I/O implements subpfam scans (up to hmmscanning) fix bug in subpfam_scan fix error type for antiSMASH4 clustergbks, use filename as the bgc name remove print statement from debugging be more verbose when loading / dumping in-memory database apply cutoff thresholds for biosynpfam_scan and subpfam_scan change schema structure, only save hsp alignments on biosyn_pfam_scans fix get_chunk returning the wrong bins fix bottleneck when querying list of bgcs and core pfams fix error type fix antiSMASH4 parsing added parameter for setting up subpfam chunk size dump database file in-between phases optimize subpfam_scan queries improve wording fix multiprocessing only run on 1 core perform hmmscan in chunks assign mp_pool to each core fix taskset perform subpfam and hmmscan checking in chunks fix error message in osx hmmscan should commit only once per chunk fix sql schema store BGC's taxonomy level in the database when using --mem and parsing input folder, only dump to file if there are new BGCs added SQLite3 indexes fix run status not updated added features extraction linux: set main process to use all pooled CPUs remove redundant parameter in features class core: implemented BIRCH-based clustering step performance: build_subpfam now uses models-long sequencs bugfix: handle cases where no input gbk is included (reference-only) update: subpfam_scan now use hsp sequences for hmmscan (rather than full gene) for optimization core: change subpfam construction update: sub_pfam features now only take top-3 hits and rank normalize it fix: fix subpfam generation script core: now include antiSMASH biosynthetic domains core: apply sub_pfams generation for antiSMASH domains fix: sub_pfam building will skip core domains without enough ref Update README.md fix: biosyn_pfam generation not taking correct antismash HMMs fix: sub_pfam generation not taking the correct core domains update: remove build_subpfam from requirements.txt overhaul: optimize hsp parsing code overhaul: when hmmscanning for subpfam, use --noali to save parsing time overhaul: no longer use reference folder (mibig) for clustering. takes only BGCs inside the input folder overhaul: organize input gbks into datasets overhaul: now thresholds are adjustable overhaul: implement logging fix: parse only antiSMASH clustergbks/regiongbks fix: parsed subpfam returns < 255 values overhaul: for subpfam features, use max() instead of mean() overhaul: for antiSMASH 5 gbks, use regions instead of candidate clusters fix: also include mibig 2.0 clustergbk pattern overhaul: when chunk_size is too large, use equally split chunks overhaul: turned off db-checking for previously parsed datasets overhaul: implement "download mode" as the default database initialization strategy style: fix indentation style: change exit style style: fix indentation style: remove unused import style: change shebang line to python3 feature: enable "--complete" to build GCF centroids based on complete BGCs only core: implements gcf membership assignment style: split the gcf building and membership assignment blocks overhaul: taxonomy data structure and input method overhaul fix: taxonomy check failed to look in buffer overhaul: --mem is now the default mode (replaced with --scratch to turn off) overhaul: change table structure for bgc.orig_gbk_path overhaul: remove run.num_resumes, should be trackable from run_log misc: added LICENSE.txt fix: leftover code core: implements output visualization module fix: sql query error feature: added the ability to set port for output viz ui: improve navigation layout ui: added navigation auto show/collapse for sub items ui: implement dataset page, visual improvements ui: add pretty basic server-side datatables for bgc list in dataset page ui: implements 'browse' datatable in dataset page ui: incremental improvement ui: implement ui for Runs ui: implements overview block on "Runs" page ui: incremental improvement ui: incremental improvement ui: incremental improvement ui: implements table data fetch for "Runs-statistics" ui: implements "Run-statistics" page ui: implement data fetch for "Run-Browse" ui: incremental improvement ui: new logo, incremental improvement overhaul: save incremental GCF IDs per run (for accession) core: enables --n_ranks to set the number of top hits for membership assignment overhaul: store bgc and gcf features in database, to enable query access fix: error when --complete is turned on ui: logo improvement ui: logo fix ui: incremental improvement ui: logo improvement ui: add favico, update logo ui: add favico (html link) ui: fix taxonomy information on "Dataset" page ui: added "BGC" details page ui: fix dataset page failed to recover on_contig_edge info fix: error when --complete is turned on core: remove typo on CDS fasta preparation ui: added copy to clipboard functionality ui: implemented "BGC -> annotations -> gene table" ui: incremental improvement ui: implements BGC page "features word cloud" ui: implements GCF hits datatable for BGC page ui: incremental improvement ui: implements "homologous BGCs" datatable ui: use local js file for plotlyJS ui: adds arrower visualization for BGC annotation page ui: don't show view button if domains not present (bgc-annotation) ui: add modal opener ui: improve modal function ui: implements per-homolog arrower visualization ui: use responsive datatables ui: add GCF page (with a temporary visualization for demo purpose) core: added the ability to specify specific run_id to --resume fix: --resume failed when status = 7 fix: --resume not using the original parameters fix: infinite loop when n < num_cpu core: implements --query mode (todo: visualization) fix: membership assignment fails when in regular mode core: cache GCF models via pickle fix: membership assignment takes all bgcs in regular runs general: replace all occurences of "bigsscuit" with "bigslice" workflow: change installation method to setuptools Update README.md Update README.md Update README.md fix: folder structure re-organization to support setuptools Update README.md core: add sql indexing for hsp_alignment fix: setuptools packaging fix: setuptools packaging core: implements --version command core: store hsp relationship from subpfam -> core pfam hits Update README.md fix: --version exception catching git: update .gitignore core: update db_download script db: update HMM libraries md5 checksum ui: enable showing subpfam clades annotation in the arrowers ui: shows sub_pfam signatures on BGC's gene table domain details fix: sql schema referencing for --query mode ui: temporarily hide GCF hits bar chart and heatmap blocks until its implementation later Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md misc: add input template folder ui: implements gcf view (with members datatables and arrowers) misc: add images for github readme misc: resize image misc: added picture for readme misc: replace figure_1 with higer resolution version ui: delete figures, github won't support full res image previews ui: delete unused gcf demo template ui: implements help & reports page Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md git: update .gitignore Update README.md ui: update version Update README.md ui: added dummy placeholders for help page core: refactor reports folder and data structure misc: tidy up --help messages core: change default n_ranks to 5 core: change default --threshold to 300 core: store run id in reports index ui: implements query results page (overview) fix: reports query detail view not copied ui: incremental improvement (query overview page) ui: implements gcf -> models page ui: bgc page -> color gcf & bgc rows based on distance threshold ui: update FAQ ui: summary page -> incremental improvement ui: gcf detail page --> implement graphs ui: incremental improvement ui: temporarily disable 'how it works' section until we fill it with contents ui: add feedback message ui: fix feedback view page linking ui: add 'about' page ui: do not show bgc class with count < 1 on the dataset page ui: change link to LICENSE.txt ui: add --query mode explanation ui: implements query detail page (1 of 2) ui: implements query detail: gcf hits table ui: implements query detail -> homologous bgcs ui: show threshold in query overview ui: add query similar bgc arrower ui: fix query detail similar bgcs showing the wrong counts ui: another fix ui: fix bgc homology table not displaying the correct pagination misc: add long_description to setup.py Update README.md fix: slowdown when fetching data due to lack of indexing (bump schema version to 1.0.1) Update README.md fix: regex extraction of sql schema fix: taxonomy parser assigning double taxa for e.g. A143/ when A14/ is present fix: download_bigslice_hmmdb doesn't work correctly when used from source misc: add ascii-art for help text sql: bump query report sql version to 1.0.1 bump big-slice version to 1.1.0 misc: update download link for hmm models Update README.md fix: input error when running fresh Update README.md Update README.md fix broken link on pre-processed input data Update README.md Update README.md Update README.md misc: added a script for generating custom antiSMASH GenBank files Update README.md misc: added script for generating taxonomy from GTDB-API misc: calling --version does not need to present input folder path fix: html output summary page breaks when results contain runs with no clustering Select cpus from os.sched_getaffinity() (#30) Fix: on a multi-user cluster with cgroups enforcing CPU affinity, the taskset lines fail, as cores not assigned to the job may be selected. fix: taxonomy script throws error from empty GTDB-API results (Resolves #33) fix: Resolves #36 misc: track HMM databases versions and display accordingly misc: limit biopython required version for up to v1.76 input: accepts antiSMASH6 regiongbks core: treat MIBiG BGCs as incomplete (therefore won't be used to generate GCF models) core: sort BGC features low-to-high before feeding into BIRCH algorithm misc: change default --n_ranks to 1 release: update version to 1.1.1 misc: added a script to extract features matrix into a tsv file misc: added a script to extract bgcs' metadata into a tsv file update citation core: add classes mapping for antiSMASH 6.0 BGCs
a4c00e8a
This project is licensed under the GNU Affero General Public License v3.0.
Learn more
Loading