-
Notifications
You must be signed in to change notification settings - Fork 6
Protein function prediction with GO - Part 3 #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 64 commits
Commits
Show all changes
78 commits
Select commit
Hold shift + click to select a range
bdba442
script to evaluate go predictions
aditya0by0 264bd94
Merge branch 'dev' into protein_prediction
aditya0by0 6c0fce1
add fmax to evaluation script
aditya0by0 154e827
Merge branch 'dev' into protein_prediction
aditya0by0 58ae92d
add base code for deep_go data migration
aditya0by0 78a38de
varry fmax threshold as per paper
aditya0by0 3a4e007
go_uniprot: add sequence len to docstring
aditya0by0 227a014
update experiment evidence codes as per DeepGo SE
aditya0by0 33436e8
Merge branch 'dev' into protein_prediction
aditya0by0 c6d60cd
consIder `X` as a valid amino acid as per DeepGO-SE
aditya0by0 ca5461f
deepgo se mirgration : add class to migrate
aditya0by0 af54954
Merge branch 'dev' into protein_prediction
aditya0by0 dfb9430
migration: rectify errors
aditya0by0 085b13b
protein trigram containing tokenS with `X`
aditya0by0 3e0bae0
protein token unigram contain `X`
aditya0by0 99b5af1
add migration for deepgo1 - 2018 paper
aditya0by0 a15d492
deepgo1: create non-exclusive val set as a placeholder
aditya0by0 e0a8524
deepgo1: further split train set into train and val for
aditya0by0 093be28
migration script update
aditya0by0 14db9d6
add classes to use migrated deepgo data
aditya0by0 8922d4d
deepgo: minor code change
aditya0by0 796356c
modify prints to display actual file name
aditya0by0 3c11a69
create sub dir for deego dataset and move rel files
aditya0by0 2b571c5
update imports as per new deepGO dir
aditya0by0 f75e30b
update import dir for pretrain test
aditya0by0 1b8b270
migration fix : truncate seq and save data with labels
aditya0by0 bcda11c
Delete protein_protein_interactions.py
aditya0by0 85c47a0
migration: replace invalid amino acid with "X" notation
aditya0by0 fbb5c58
update deepgo configs
aditya0by0 272446d
add esm2 reader for deepGO
aditya0by0 a12354b
increase electra vocab size
66732a7
fix: print right name of missing file
aditya0by0 e7b3d80
migration : add esm2 embeddings
aditya0by0 862c8ef
scope dataset: add scope abstract code
aditya0by0 7da8963
base: make _name property abstract method
aditya0by0 976f2b8
add simple Feed-forward network (for ESM2->chebi task)
sfluegel05 3b17487
reformat using Black
sfluegel05 f4d1d74
scope: data preparation code
aditya0by0 431da47
scope: include all levels
aditya0by0 43d4550
scope: remove domain level from one hot encoding
aditya0by0 6735e41
scope: add documentation
aditya0by0 c3ba8da
scope: add OverX classes and their derivaties
aditya0by0 764b812
scope: modify select classes and labels save operation
aditya0by0 b23f1f6
scope: data config
aditya0by0 6370572
deepgo: remove label_number from docstring
aditya0by0 191c979
ffn: update for as per deepgo2 mlp architecture
aditya0by0 b7ca0e5
scope: map invalid amino acids to "X"
aditya0by0 ad24fa7
fix MLPBlock hidden_dim
sfluegel05 357752f
esm2 reader: save reader to default global data dir
aditya0by0 a52d8de
configs: move configs to data specific sub-dir
aditya0by0 b2f51f9
adding SCOPe50 dataset
sfluegel05 e0bbb0e
add scope50 config
sfluegel05 e3659cb
Merge branch 'protein_prediction' of https://github.com/ChEB-AI/pytho…
sfluegel05 22aa985
scope: version number should str not float
aditya0by0 45c1015
Merge branch 'protein_prediction' of https://github.com/ChEB-AI/pytho…
aditya0by0 d3fd0f2
scope: data filtering update
aditya0by0 c791893
scope: avoid data fragmentation and add progress bar
aditya0by0 aad16d9
scope: vectorized operation instead of df.itterows
aditya0by0 13b8795
scope: fix multiple chain filtering
aditya0by0 4572272
scope: tutorial for scope data exploration
aditya0by0 eba0417
scope: update tutorial
aditya0by0 dad6f76
scope: add more scope details to tutorial
aditya0by0 fd6dd01
minor changes: deepgo configs + scope
aditya0by0 4a8f821
deepgo2 migration: exp_annoations not needed
aditya0by0 1c432de
fix scope version in scope50.yml
sfluegel05 f13e935
modify notebook introduction
sfluegel05 36e6162
ffn: fix error for loss kwargs
aditya0by0 93c7fc5
scope: fix for no True labels for some classes/columns
aditya0by0 6d7b467
Merge branch 'dev' into protein_prediction
aditya0by0 767b210
scope: fix for true values less given threshold for some labels
aditya0by0 081b44d
go_notebook: update import statement
aditya0by0 81c1348
scope notebook: add scope description and minor changes
aditya0by0 2b0ed0a
electra config: increase max_postional_embeddings to 3000
aditya0by0 ef4bc0b
comment protein-related requirements
aditya0by0 58bcf05
Revert "comment protein-related requirements"
aditya0by0 831f70d
scope: filter out sequence with len gt than given len
aditya0by0 158d6f3
Revert "electra config: increase max_postional_embeddings to 3000"
aditya0by0 bb0b4db
electra config: reset the vocab size to previous default value for scope
aditya0by0 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| from typing import Any, Dict, List, Optional, Tuple | ||
|
|
||
| import torch | ||
| from torch import Tensor, nn | ||
|
|
||
| from chebai.models import ChebaiBaseNet | ||
|
|
||
|
|
||
| class FFN(ChebaiBaseNet): | ||
| # Reference: https://github.com/bio-ontology-research-group/deepgo2/blob/main/deepgo/models.py#L121-L139 | ||
|
|
||
| NAME = "FFN" | ||
|
|
||
| def __init__( | ||
| self, | ||
| input_size: int, | ||
| hidden_layers: List[int] = [ | ||
| 1024, | ||
| ], | ||
| **kwargs | ||
| ): | ||
| super().__init__(**kwargs) | ||
|
|
||
| layers = [] | ||
| current_layer_input_size = input_size | ||
| for hidden_dim in hidden_layers: | ||
| layers.append(MLPBlock(current_layer_input_size, hidden_dim)) | ||
| layers.append(Residual(MLPBlock(hidden_dim, hidden_dim))) | ||
| current_layer_input_size = hidden_dim | ||
|
|
||
| layers.append(torch.nn.Linear(current_layer_input_size, self.out_dim)) | ||
| layers.append(nn.Sigmoid()) | ||
| self.model = nn.Sequential(*layers) | ||
|
|
||
| def _get_prediction_and_labels(self, data, labels, model_output): | ||
| d = model_output["logits"] | ||
| loss_kwargs = data.get("loss_kwargs", dict()) | ||
| if "non_null_labels" in loss_kwargs: | ||
| n = loss_kwargs["non_null_labels"] | ||
| d = data[n] | ||
aditya0by0 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| return torch.sigmoid(d), labels.int() if labels is not None else None | ||
|
|
||
| def _process_for_loss( | ||
| self, | ||
| model_output: Dict[str, Tensor], | ||
| labels: Tensor, | ||
| loss_kwargs: Dict[str, Any], | ||
| ) -> Tuple[Tensor, Tensor, Dict[str, Any]]: | ||
| """ | ||
| Process the model output for calculating the loss. | ||
| Args: | ||
| model_output (Dict[str, Tensor]): The output of the model. | ||
| labels (Tensor): The target labels. | ||
| loss_kwargs (Dict[str, Any]): Additional loss arguments. | ||
| Returns: | ||
| tuple: A tuple containing the processed model output, labels, and loss arguments. | ||
| """ | ||
| kwargs_copy = dict(loss_kwargs) | ||
| if labels is not None: | ||
| labels = labels.float() | ||
| return model_output["logits"], labels, kwargs_copy | ||
|
|
||
| def forward(self, data, **kwargs): | ||
| x = data["features"] | ||
| return {"logits": self.model(x)} | ||
|
|
||
|
|
||
| class Residual(nn.Module): | ||
| """ | ||
| A residual layer that adds the output of a function to its input. | ||
| Args: | ||
| fn (nn.Module): The function to be applied to the input. | ||
| References: | ||
| https://github.com/bio-ontology-research-group/deepgo2/blob/main/deepgo/base.py#L6-L35 | ||
| """ | ||
|
|
||
| def __init__(self, fn): | ||
| """ | ||
| Initialize the Residual layer with a given function. | ||
| Args: | ||
| fn (nn.Module): The function to be applied to the input. | ||
| """ | ||
| super().__init__() | ||
| self.fn = fn | ||
|
|
||
| def forward(self, x): | ||
| """ | ||
| Forward pass of the Residual layer. | ||
| Args: | ||
| x: Input tensor. | ||
| Returns: | ||
| torch.Tensor: The input tensor added to the result of applying the function `fn` to it. | ||
| """ | ||
| return x + self.fn(x) | ||
|
|
||
|
|
||
| class MLPBlock(nn.Module): | ||
| """ | ||
| A basic Multi-Layer Perceptron (MLP) block with one fully connected layer. | ||
| Args: | ||
| in_features (int): The number of input features. | ||
| output_size (int): The number of output features. | ||
| bias (boolean): Add bias to the linear layer | ||
| layer_norm (boolean): Apply layer normalization | ||
| dropout (float): The dropout value | ||
| activation (nn.Module): The activation function to be applied after each fully connected layer. | ||
| References: | ||
| https://github.com/bio-ontology-research-group/deepgo2/blob/main/deepgo/base.py#L38-L73 | ||
| Example: | ||
| ```python | ||
| # Create an MLP block with 2 hidden layers and ReLU activation | ||
| mlp_block = MLPBlock(input_size=64, output_size=10, activation=nn.ReLU()) | ||
| # Apply the MLP block to an input tensor | ||
| input_tensor = torch.randn(32, 64) | ||
| output = mlp_block(input_tensor) | ||
| ``` | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| in_features, | ||
| out_features, | ||
| bias=True, | ||
| layer_norm=True, | ||
| dropout=0.1, | ||
| activation=nn.ReLU, | ||
| ): | ||
| super().__init__() | ||
| self.linear = nn.Linear(in_features, out_features, bias) | ||
| self.activation = activation() | ||
| self.layer_norm: Optional[nn.LayerNorm] = ( | ||
| nn.LayerNorm(out_features) if layer_norm else None | ||
| ) | ||
| self.dropout: Optional[nn.Dropout] = nn.Dropout(dropout) if dropout else None | ||
|
|
||
| def forward(self, x): | ||
| x = self.activation(self.linear(x)) | ||
| if self.layer_norm: | ||
| x = self.layer_norm(x) | ||
| if self.dropout: | ||
| x = self.dropout(x) | ||
| return x | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,3 +18,4 @@ W | |
| E | ||
| V | ||
| H | ||
| X | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.