This project (available on GitHub), was part of the Computer Vision and Pattern Recognition (CVaPR) course, and focuses on the classification of medical data related to tumour metastasis. Using both simple classifiers and advanced neural networks, the goal was to develop a robust model to classify data into "metastasis" (class 0) or "no metastasis" (class 1). The project leverages anonymized data derived from real-world patient studies, ensuring compliance with privacy standards.
The original final report containing the results of the different classification methods (in Polish) is available on GitHub. The translated report (in English) was added for this submission and is also available on GitHub (the report was translated quickly due to limited time and may contain some errors).
/src/
DirectoryContains key scripts and notebooks for building and evaluating classifiers:
cnn_classification.ipynb
: Implements a Convolutional Neural Network with optimized hyperparameters.simple_classifier_knn_svm_bayes.ipynb
: Implements simple classifiers, including:
dense_network_classification.ipynb
: Explores a Dense Neural Network as a baseline for comparison.xtrain_feature_selection/
: Scripts for feature selection on training data:
ranking_method.ipynb
: Assesses individual feature impact.wrapper_method.ipynb
: Tests combinations of features.embedded_method.ipynb
: Selects features during training.simple_classifier_ranking_method.ipynb
: Applies ranking to simple classifiers.all_feature_selection/
: Tests feature selection methods on the full dataset.optimizers/
: Contains scripts for hyperparameter tuning:
optimalization_test.ipynb
hiperparametr_optymalization.ipynb
learning_optimalization.ipynb
/models/
DirectoryStores trained models, including:
xtrain_feature_selection/
and all_feature_selection/
).optimizers/
)./data/
DirectoryHouses all data used for training and testing:
labels_features.csv
: Combines features and labels.features.csv
: Contains features only.labels.csv
: Contains labels only.clinical_radiomics_imported_from_tsv.xlsx
: The original dataset.Feature selection was critical for reducing dimensionality and enhancing model performance:
SVMs map input data to a high-dimensional feature space, making it possible to separate data points using a hyperplane. They are effective for both linear and non-linear classification, regression tasks, and outlier detection.
KNN classifies data based on the closest training samples in the feature space. The algorithm determines the class of a data point by majority voting among its k
nearest neighbors, with proximity measured using distance metrics.
Random Forest is an ensemble learning method that builds multiple decision trees on random subsets of data and features. The final classification is determined by majority voting among the trees, offering robustness and accuracy.
This probabilistic classifier is based on Bayes' theorem, assuming feature independence. It calculates the probability of each class for a given data point and assigns it to the class with the highest probability.
CNNs are specialized neural networks designed for processing structured data like images. They consist of convolutional layers that extract local patterns, such as edges and textures, followed by pooling layers that reduce dimensionality. The features are then passed to fully connected layers for classification. CNNs are particularly effective in handling complex patterns and features, making them suitable for medical data analysis.
A simpler type of neural network where every neuron in one layer is connected to every neuron in the next. DNNs were tested as a baseline but lacked the specialized feature extraction capabilities of CNNs.
This combination of simple classifiers and advanced neural networks provided a comprehensive evaluation of the classification task, balancing interpretability and accuracy.
This project demonstrated the importance of combining advanced classification models with effective feature selection and hyperparameter optimization. The use of anonymized real-world data ensures the results are both impactful and ethically sound.