Skip to content
  • Satria A Kautsar's avatar
    added original BiG-SLiCE v1.1.1 code (squashed all previous commits) · a4c00e8a
    Satria A Kautsar authored
    Added biosynthetic pfam domains not covered by the ECDomainMiner dataset
    
    added a script to generate biosynthetic hmm library
    
    Include 19 PFAMs missed by the ECDomainMiner into biopfam list
    
    Added subpfam generating script
    
    fix generate_database.py not copying final hmms from temp folder
    
    added md5sum check for the generated databases
    
    make subpfams directory if not exists
    
    on_progress: sqlite3-based output storage
    
    implemented chem_class mapping for sqlite3 storage
    
    fix typo in antiSMASH5 candidate cluster type
    
    update requirements.txt
    
    Implements MIBiG 2.0 parser
    
    added support for antiSMASH4 and plantiSMASH GBKs
    
    fix typo in sql schema
    
    Implements HMM database processing and storage
    
    also downloads MIBiG GBKs when generating databases
    
    update database wrapper and sql schema
    
    locus_tag is not required for CDS
    
    Implemented run management
    
    put placeholders for run phases
    
    fix generate_databases script
    
    speed up cached re-runs
    
    throw error for db-update cases:
    
    speed up inserts at the cost of limiting multiple inserts to one process
    
    preserve db connection throughout the program to reduce overheads and enabled memory-based db
    
    implements in-memory database manipulation (will be dumped into the db file after program ends)
    
    update hmm.py class structure
    
    update run.py structure
    
    reimplements bgc.taxon storage
    
    added sanity check for buffered inputs
    
    generate_databases also do hmm_press
    
    implemented hmmscanning (first half of it)
    
    backup old database file when closing on memory mode rather than deleting it
    
    add confirmation when database backup exists and using memory
    
    completed biosyn_pfam scans handling
    
    fix --resume mode
    
    swap the confirmation message for output folder checks
    
    fix program not reading the input folder
    
    fix chem_class assignments
    
    change class assignment of 'blactam'
    
    implemented subpfam_scan (up to hmmscan calls) TODO: too much I/O
    
    implements subpfam scans (up to hmmscanning)
    
    fix bug in subpfam_scan
    
    fix error type
    
    for antiSMASH4 clustergbks, use filename as the bgc name
    
    remove print statement from debugging
    
    be more verbose when loading / dumping in-memory database
    
    apply cutoff thresholds for biosynpfam_scan and subpfam_scan
    
    change schema structure, only save hsp alignments on biosyn_pfam_scans
    
    fix get_chunk returning the wrong bins
    
    fix bottleneck when querying list of bgcs and core pfams
    
    fix error type
    
    fix antiSMASH4 parsing
    
    added parameter for setting up subpfam chunk size
    
    dump database file in-between phases
    
    optimize subpfam_scan queries
    
    improve wording
    
    fix multiprocessing only run on 1 core
    
    perform hmmscan in chunks
    
    assign mp_pool to each core
    
    fix taskset
    
    perform subpfam and hmmscan checking in chunks
    
    fix error message in osx
    
    hmmscan should commit only once per chunk
    
    fix sql schema
    
    store BGC's taxonomy level in the database
    
    when using --mem and parsing input folder, only dump to file if there are new BGCs
    
    added SQLite3 indexes
    
    fix run status not updated
    
    added features extraction
    
    linux: set main process to use all pooled CPUs
    
    remove redundant parameter in features class
    
    core: implemented BIRCH-based clustering step
    
    performance: build_subpfam now uses models-long sequencs
    
    bugfix: handle cases where no input gbk is included (reference-only)
    
    update: subpfam_scan now use hsp sequences for hmmscan (rather than full gene) for optimization
    
    core: change subpfam construction
    
    update: sub_pfam features now only take top-3 hits and rank normalize it
    
    fix: fix subpfam generation script
    
    core: now include antiSMASH biosynthetic domains
    
    core: apply sub_pfams generation for antiSMASH domains
    
    fix: sub_pfam building will skip core domains without enough ref
    
    Update README.md
    
    fix: biosyn_pfam generation not taking correct antismash HMMs
    
    fix: sub_pfam generation not taking the correct core domains
    
    update: remove build_subpfam from requirements.txt
    
    overhaul: optimize hsp parsing code
    
    overhaul: when hmmscanning for subpfam, use --noali to save parsing time
    
    overhaul: no longer use reference folder (mibig) for clustering. takes only BGCs inside the input folder
    
    overhaul: organize input gbks into datasets
    
    overhaul: now thresholds are adjustable
    
    overhaul: implement logging
    
    fix: parse only antiSMASH clustergbks/regiongbks
    
    fix: parsed subpfam returns < 255 values
    
    overhaul: for subpfam features, use max() instead of mean()
    
    overhaul: for antiSMASH 5 gbks, use regions instead of candidate clusters
    
    fix: also include mibig 2.0 clustergbk pattern
    
    overhaul: when chunk_size is too large, use equally split chunks
    
    overhaul: turned off db-checking for previously parsed datasets
    
    overhaul: implement "download mode" as the default database initialization strategy
    
    style: fix indentation
    
    style: change exit style
    
    style: fix indentation
    
    style: remove unused import
    
    style: change shebang line to python3
    
    feature: enable "--complete" to build GCF centroids based on complete BGCs only
    
    core: implements gcf membership assignment
    
    style: split the gcf building and membership assignment blocks
    
    overhaul: taxonomy data structure and input method overhaul
    
    fix: taxonomy check failed to look in buffer
    
    overhaul: --mem is now the default mode (replaced with --scratch to turn off)
    
    overhaul: change table structure for bgc.orig_gbk_path
    
    overhaul: remove run.num_resumes, should be trackable from run_log
    
    misc: added LICENSE.txt
    
    fix: leftover code
    
    core: implements output visualization module
    
    fix: sql query error
    
    feature: added the ability to set port for output viz
    
    ui: improve navigation layout
    
    ui: added navigation auto show/collapse for sub items
    
    ui: implement dataset page, visual improvements
    
    ui: add pretty basic server-side datatables for bgc list in dataset page
    
    ui: implements 'browse' datatable in dataset page
    
    ui: incremental improvement
    
    ui: implement ui for Runs
    
    ui: implements overview block on "Runs" page
    
    ui: incremental improvement
    
    ui: incremental improvement
    
    ui: incremental improvement
    
    ui: implements table data fetch for "Runs-statistics"
    
    ui: implements "Run-statistics" page
    
    ui: implement data fetch for "Run-Browse"
    
    ui: incremental improvement
    
    ui: new logo, incremental improvement
    
    overhaul: save incremental GCF IDs per run (for accession)
    
    core: enables --n_ranks to set the number of top hits for membership assignment
    
    overhaul: store bgc and gcf features in database, to enable query access
    
    fix: error when --complete is turned on
    
    ui: logo improvement
    
    ui: logo fix
    
    ui: incremental improvement
    
    ui: logo improvement
    
    ui: add favico, update logo
    
    ui: add favico (html link)
    
    ui: fix taxonomy information on "Dataset" page
    
    ui: added "BGC" details page
    
    ui: fix dataset page failed to recover on_contig_edge info
    
    fix: error when --complete is turned on
    
    core: remove typo on CDS fasta preparation
    
    ui: added copy to clipboard functionality
    
    ui: implemented "BGC -> annotations -> gene table"
    
    ui: incremental improvement
    
    ui: implements BGC page "features word cloud"
    
    ui: implements GCF hits datatable for BGC page
    
    ui: incremental improvement
    
    ui: implements "homologous BGCs" datatable
    
    ui: use local js file for plotlyJS
    
    ui: adds arrower visualization for BGC annotation page
    
    ui: don't show view button if domains not present (bgc-annotation)
    
    ui: add modal opener
    
    ui: improve modal function
    
    ui: implements per-homolog arrower visualization
    
    ui: use responsive datatables
    
    ui: add GCF page (with a temporary visualization for demo purpose)
    
    core: added the ability to specify specific run_id to --resume
    
    fix: --resume failed when status = 7
    
    fix: --resume not using the original parameters
    
    fix: infinite loop when n < num_cpu
    
    core: implements --query mode (todo: visualization)
    
    fix: membership assignment fails when in regular mode
    
    core: cache GCF models via pickle
    
    fix: membership assignment takes all bgcs in regular runs
    
    general: replace all occurences of "bigsscuit" with "bigslice"
    
    workflow: change installation method to setuptools
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    fix: folder structure re-organization to support setuptools
    
    Update README.md
    
    core: add sql indexing for hsp_alignment
    
    fix: setuptools packaging
    
    fix: setuptools packaging
    
    core: implements --version command
    
    core: store hsp relationship from subpfam -> core pfam hits
    
    Update README.md
    
    fix: --version exception catching
    
    git: update .gitignore
    
    core: update db_download script
    
    db: update HMM libraries md5 checksum
    
    ui: enable showing subpfam clades annotation in the arrowers
    
    ui: shows sub_pfam signatures on BGC's gene table domain details
    
    fix: sql schema referencing for --query mode
    
    ui: temporarily hide GCF hits bar chart and heatmap blocks until its implementation later
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    misc: add input template folder
    
    ui: implements gcf view (with members datatables and arrowers)
    
    misc: add images for github readme
    
    misc: resize image
    
    misc: added picture for readme
    
    misc: replace figure_1 with higer resolution version
    
    ui: delete figures, github won't support full res image previews
    
    ui: delete unused gcf demo template
    
    ui: implements help & reports page
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    git: update .gitignore
    
    Update README.md
    
    ui: update version
    
    Update README.md
    
    ui: added dummy placeholders for help page
    
    core: refactor reports folder and data structure
    
    misc: tidy up --help messages
    
    core: change default n_ranks to 5
    
    core: change default --threshold to 300
    
    core: store run id in reports index
    
    ui: implements query results page (overview)
    
    fix: reports query detail view not copied
    
    ui: incremental improvement (query overview page)
    
    ui: implements gcf -> models page
    
    ui: bgc page -> color gcf & bgc rows based on distance threshold
    
    ui: update FAQ
    
    ui: summary page -> incremental improvement
    
    ui: gcf detail page --> implement graphs
    
    ui: incremental improvement
    
    ui: temporarily disable 'how it works' section until we fill it with contents
    
    ui: add feedback message
    
    ui: fix feedback view page linking
    
    ui: add 'about' page
    
    ui: do not show bgc class with count < 1 on the dataset page
    
    ui: change link to LICENSE.txt
    
    ui: add --query mode explanation
    
    ui: implements query detail page (1 of 2)
    
    ui: implements query detail: gcf hits table
    
    ui: implements query detail -> homologous bgcs
    
    ui: show threshold in query overview
    
    ui: add query similar bgc arrower
    
    ui: fix query detail similar bgcs showing the wrong counts
    
    ui: another fix
    
    ui: fix bgc homology table not displaying the correct pagination
    
    misc: add long_description to setup.py
    
    Update README.md
    
    fix: slowdown when fetching data due to lack of indexing (bump schema version to 1.0.1)
    
    Update README.md
    
    fix: regex extraction of sql schema
    
    fix: taxonomy parser assigning double taxa for e.g. A143/ when A14/ is present
    
    fix: download_bigslice_hmmdb doesn't work correctly when used from source
    
    misc: add ascii-art for help text
    
    sql: bump query report sql version to 1.0.1
    
    bump big-slice version to 1.1.0
    
    misc: update download link for hmm models
    
    Update README.md
    
    fix: input error when running fresh
    
    Update README.md
    
    Update README.md
    
    fix broken link on pre-processed input data
    
    Update README.md
    
    Update README.md
    
    Update README.md
    
    misc: added a script for generating custom antiSMASH GenBank files
    
    Update README.md
    
    misc: added script for generating taxonomy from GTDB-API
    
    misc: calling --version does not need to present input folder path
    
    fix: html output summary page breaks when results contain runs with no clustering
    
    Select cpus from os.sched_getaffinity() (#30)
    
    Fix: on a multi-user cluster with cgroups enforcing CPU affinity, the taskset lines fail, as cores not assigned to the job may be selected.
    
    fix: taxonomy script throws error from empty GTDB-API results (Resolves #33)
    
    fix: Resolves #36
    
    misc: track HMM databases versions and display accordingly
    
    misc: limit biopython required version for up to v1.76
    
    input: accepts antiSMASH6 regiongbks
    
    core: treat MIBiG BGCs as incomplete (therefore won't be used to generate GCF models)
    
    core: sort BGC features low-to-high before feeding into BIRCH algorithm
    
    misc: change default --n_ranks to 1
    
    release: update version to 1.1.1
    
    misc: added a script to extract features matrix into a tsv file
    
    misc: added a script to extract bgcs' metadata into a tsv file
    
    update citation
    
    core: add classes mapping for antiSMASH 6.0 BGCs
    a4c00e8a
This project is licensed under the GNU Affero General Public License v3.0. Learn more
Loading