A proposed algorithm to recognize Baybayin writing system using support vector machine. You can check the full paper here: https://peerj.com/articles/cs-360/.
NOTE: The complete system files (generated SVM models, example images, etc.) described here can be downloaded in Release section. Its source filename is An.OCR.System.for.Baybayin.Scripts.using.SVM.zip. The following link provides the compressed system file page:
https://github.com/rbp0803/An-OCR-System-for-Baybayin-Scripts-using-SVM/releases/tag/v1.0.2.
The following codes and variables are produced entirely in MATLAB whose functions or uses are describe below:
Multi-SVM classifiers
• Latin_Character_Classifier_00330.mat - for classification of Latin characters
• Baybayin_Character_Classifier_00379.mat - for classification of Baybayin Characters
Binary classifiers for script categorization and Baybayin diacritic classification
• LvsB_classifier_00125.mat
• Subscript_Classifier_00000.mat
• Superscript_Classifier_00000.mat
Binary classifiers which will be used for reclassification of confusive Baybayin characters
• AVSMa_00225.mat
• KaVSEI_00100.mat
• HaVSSa_00050.mat
• LaVSTa_00100.mat
• PaVSYa_00550.mat
(Note: smile.mat does not have any function. It is just there to uphold the sequence of the confuselist.)
• Proposed_Baybayin_OCR.m - the main system is coded here and has subfunctions that supports the recognition algorithm.
• Baybayin_letter_revised_segueway.m - this is a subfunction from the Proposed OCR Algorithm for classifying one component
Baybayin characters.
• Latin_Letter_guesser.m - this is a subfunction from the Proposed OCR Algorithm for classifying Latin characters.
• seg.m - this is a subfunction from the Proposed OCR Algorithm for classifying Baybayin characters with diacritics.
• seg2.m - this is a subfunction from the Proposed OCR Algorithm and intentionally made to recognize the Baybayin character 'E/I'.
Otherwise, the algorithm is assigned to find if the other components are part of the main body or simply its accent
• kmeans_mod.m - this is a subfunction from the Proposed OCR Algorithm for clustering a grayscaled image into 2 intensities
intended for image binarization
• c2bw.m - this is a subfunction from the Proposed OCR Algorithm for converting the input raw image into binary image using the
modified kmeans function.
• feature_vector_extractor.m - this is a subfunction from the Proposed OCR Algorithm that outputs the 1x3136 feature vector array
of the input square matrix.
• data_training_testing2_revised2.m - this function generates binary SVM models for Script classification, Baybayin diacritics
categorization, and binary classifiers for confusive Baybayin characters.
• data_traintest3_multiclass_17classes_v2_revised.m - this function generates multiclass SVM models in classifying 17
Baybayin characters.
• data_traintest3_multiclass_26classes_v2_revised_latin.m - this function generates multiclass SVM models in classifying 26
Latin characters.
Example_run.m contains an example script to execute the proposed algorithm.