OCULAR Streamlining: Training & Pipeline

Getting Started

Clone this repository, as well as CSI-Cancer/csi_utils to the same directory.

git clone git@github.com:CSI-Cancer/ocular_streamlining.git
git clone git@github.com:CSI-Cancer/csi_utils.git

Enter the ocular_streamlining directory, create a virtual environment, activate and install the package and all dependencies:

cd ocular_streamlining
python -m venv .venv
source .venv/bin/activate
make install

If you do not have poetry installed, you can install it globally with:

sudo apt install pipx
pipx install poetry

Or just locally:

pip install poetry

You should now be able to run scripts and training.

Project Organization

├── Makefile                <- `make` aliases; `make install`, `make data`, etc.
├── README.md               <- You really should read this
├── pyproject.toml          <- Project configuration file with package metadata, 
│                              configurations, and dependencies.
├── requirements.txt        <- Install dependencies manually with 
|                              `pip install -r requirements.txt`
├── data                    <- Directory for storing data
│ ├── external                  <- Data from third party sources
│ ├── interim                   <- Intermediate data that has been transformed
│ ├── processed                 <- The final, canonical data sets for modeling
│ └── raw                       <- The original, immutable data dump
│
├── docs                    <- pdocs generated HTML documentation
|
├── models                  <- Trained models, predictions, or model summaries
|
├── notebooks               <- Jupyter notebooks. Naming convention is a number 
|                              (for ordering), the creator's initials, and a 
|                              short `-` delimited description, e.g. 
|                              `1.0-RMN-initial-data-exploration`
|
├── references              <- Data dictionaries, manuals, etc.

├── reports                 <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures                   <- Generated graphics and figures for reporting
│
├── scripts                 <- Pipeline scripts
| └── do_streamlining.py    <- Script to run models with OCULAR outputs.
|
├── ocular_streamlining     <- Model application source code for pipeline use.
| ├── __init__.py
| ├── channel_classifier.py
| └── streamlining_classifier.py
│
└── streamlining_training   <- Training source code, including supportive modules
  ├── __init__.py
  ├── config.py                 <- Store useful variables and configuration
  ├── dataset.py                <- Scripts to download or generate data
  ├── features.py               <- Code to create features for modeling
  ├── plots.py                  <- Code to create visualizations
  └── modeling                  <- Model training and evaluation source code
    ├── __init__.py
    ├── eval.py                     <- Code to test models
    ├── predict.py                  <- Code to infer using trained models
    └── train.py                    <- Code to train models

Input file

The list of slides in the Excel sheet format is used as the raw immutable data from which canonical data that is input the model is used. Name of the file does not matter as long as the file extension is .xlsx. Typically, the Excel sheet has two columns: "slide_id" and "classification" ("classification" is optional). This needs to be manually placed in the ./data/raw/ folder.

slide_id	classifications
0B58703	NBD

Commands

The Makefile contains the central entry points for common tasks related to this project:

To extract and create interim dataset

make prepare_data

To feature select from the set of features that exists in ocular

make feature_select

To Train the model

make train

To evaluate the trained model on the test and val dataset

make evaluate

To clean the intermediate and processed canonical data

make clean_venv

To clean up the trained model

make clean_models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OCULAR Streamlining: Training & Pipeline

Getting Started

Project Organization

Input file

Commands

To extract and create interim dataset

To feature select from the set of features that exists in ocular

To Train the model

To evaluate the trained model on the test and val dataset

To clean the intermediate and processed canonical data

To clean up the trained model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
docs		docs
models		models
notebooks		notebooks
ocular_streamlining		ocular_streamlining
references		references
reports		reports
scripts		scripts
streamlining_training		streamlining_training
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
make_docker.sh		make_docker.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

CSI-Cancer/ocular_streamlining

Folders and files

Latest commit

History

Repository files navigation

OCULAR Streamlining: Training & Pipeline

Getting Started

Project Organization

Input file

Commands

To extract and create interim dataset

To feature select from the set of features that exists in ocular

To Train the model

To evaluate the trained model on the test and val dataset

To clean the intermediate and processed canonical data

To clean up the trained model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages