Skip to content

CSI-Cancer/ocular_streamlining

Repository files navigation

OCULAR Streamlining: Training & Pipeline

Getting Started

Clone this repository, as well as CSI-Cancer/csi_utils to the same directory.

git clone git@github.com:CSI-Cancer/ocular_streamlining.git
git clone git@github.com:CSI-Cancer/csi_utils.git

Enter the ocular_streamlining directory, create a virtual environment, activate and install the package and all dependencies:

cd ocular_streamlining
python -m venv .venv
source .venv/bin/activate
make install

If you do not have poetry installed, you can install it globally with:

sudo apt install pipx
pipx install poetry

Or just locally:

pip install poetry

You should now be able to run scripts and training.

Project Organization

├── Makefile                <- `make` aliases; `make install`, `make data`, etc.
├── README.md               <- You really should read this
├── pyproject.toml          <- Project configuration file with package metadata, 
│                              configurations, and dependencies.
├── requirements.txt        <- Install dependencies manually with 
|                              `pip install -r requirements.txt`
├── data                    <- Directory for storing data
│ ├── external                  <- Data from third party sources
│ ├── interim                   <- Intermediate data that has been transformed
│ ├── processed                 <- The final, canonical data sets for modeling
│ └── raw                       <- The original, immutable data dump
│
├── docs                    <- pdocs generated HTML documentation
|
├── models                  <- Trained models, predictions, or model summaries
|
├── notebooks               <- Jupyter notebooks. Naming convention is a number 
|                              (for ordering), the creator's initials, and a 
|                              short `-` delimited description, e.g. 
|                              `1.0-RMN-initial-data-exploration`
|
├── references              <- Data dictionaries, manuals, etc.

├── reports                 <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures                   <- Generated graphics and figures for reporting
│
├── scripts                 <- Pipeline scripts
| └── do_streamlining.py    <- Script to run models with OCULAR outputs.
|
├── ocular_streamlining     <- Model application source code for pipeline use.
| ├── __init__.py
| ├── channel_classifier.py
| └── streamlining_classifier.py
│
└── streamlining_training   <- Training source code, including supportive modules
  ├── __init__.py
  ├── config.py                 <- Store useful variables and configuration
  ├── dataset.py                <- Scripts to download or generate data
  ├── features.py               <- Code to create features for modeling
  ├── plots.py                  <- Code to create visualizations
  └── modeling                  <- Model training and evaluation source code
    ├── __init__.py
    ├── eval.py                     <- Code to test models
    ├── predict.py                  <- Code to infer using trained models
    └── train.py                    <- Code to train models

Input file

The list of slides in the Excel sheet format is used as the raw immutable data from which canonical data that is input the model is used. Name of the file does not matter as long as the file extension is .xlsx. Typically, the Excel sheet has two columns: "slide_id" and "classification" ("classification" is optional). This needs to be manually placed in the ./data/raw/ folder.

slide_id classifications
0B58703 NBD

Commands

The Makefile contains the central entry points for common tasks related to this project:

To extract and create interim dataset

make prepare_data

To feature select from the set of features that exists in ocular

make feature_select

To Train the model

make train

To evaluate the trained model on the test and val dataset

make evaluate

To clean the intermediate and processed canonical data

make clean_venv

To clean up the trained model

make clean_models

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •