Skip to content
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
876d946
basic data processing for go-uniprot dataset
aditya0by0 Jul 15, 2024
b2d13e9
Merge branch 'dev' into protein_prediction
aditya0by0 Jul 21, 2024
4844380
prepare_data : sequence added to graph creation process
aditya0by0 Jul 21, 2024
795c017
prepare_data: filter out any rows without any True value
aditya0by0 Jul 21, 2024
4f06b62
setup data phase : preprocessing
aditya0by0 Jul 25, 2024
1367975
add reader for protein data
aditya0by0 Jul 26, 2024
f202579
config : GO 50
aditya0by0 Jul 26, 2024
a07c020
Update setup.py
aditya0by0 Jul 26, 2024
07e5114
fix - local permission error for swiss data
aditya0by0 Jul 26, 2024
b334929
go_uniprot : docstrings + variable namings
aditya0by0 Jul 28, 2024
5cdc9b8
chebi.py : additional/more specific docstrings
aditya0by0 Jul 31, 2024
0ee241a
base class for datasets following new dynamics splits feature
aditya0by0 Aug 2, 2024
d182a22
update _ChEBIDataExtractor as per newly inherited _DynamicDataset bas…
aditya0by0 Aug 2, 2024
25a9594
update _GOUniprotDataExtractor to inherit _DynamicDataset
aditya0by0 Aug 2, 2024
4ac6bc2
Merge branch 'dev' into protein_prediction
aditya0by0 Aug 9, 2024
5a4860d
add load_processed_data to base
aditya0by0 Aug 10, 2024
53daf97
go data: changes
aditya0by0 Aug 13, 2024
499fafc
update _graph_to_raw_dataset method
aditya0by0 Aug 14, 2024
19c47c1
fix tokenizing process in reader class for protein
aditya0by0 Aug 14, 2024
ecb276a
protein tokens - 20 natural amino acid tokens
aditya0by0 Aug 14, 2024
5f9ff93
minor updates
aditya0by0 Aug 14, 2024
b916994
filter out swiss protein as per given criterias in paper
aditya0by0 Aug 14, 2024
079269b
fixes: go_branch filtering, protein sequence
aditya0by0 Aug 15, 2024
638598a
update logic to select go classes based on proteins dataset
aditya0by0 Aug 15, 2024
9200b73
fix: dataframe column addition performance warning
aditya0by0 Aug 16, 2024
f9c10f7
consistent prefix "GOUniProt" for all classes
aditya0by0 Aug 25, 2024
f39916b
update go configs for new class names
aditya0by0 Aug 25, 2024
4db76ce
extra documentation for ragged coll as per the comment
aditya0by0 Sep 9, 2024
06ab981
minor changes
aditya0by0 Sep 9, 2024
62a3f45
parameter for maximum length (default: 1002)
aditya0by0 Sep 21, 2024
6f463de
remove label number for GO_UniProt classes
aditya0by0 Sep 21, 2024
108d9ca
trigrams / n-grams combining several amino acids into one token
aditya0by0 Sep 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions chebai/preprocessing/bin/protein_token/tokens.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
M
S
I
G
A
T
R
L
Q
N
D
K
Y
P
C
F
W
E
V
H
Loading