๐ FDA Product Classification using Machine Learning and Deep Learning Models
This project presents a robust multi-class classification pipeline to predict the marketing category of FDA-listed pharmaceutical products. Leveraging a blend of classical ML models and deep learning, the pipeline handles structured metadata to categorize each product into NDA, ANDA, OTC, BLA, or UNAPPROVED.
๐ Project Highlights
Task: Multi-class classification of FDA drug entries by marketing category
Deep learning outperformed traditional ML after sufficient tuning
Category imbalance addressed via grouping; future work can explore SMOTE
SHAP revealed strong impact of route and proprietary name features
Future Directions:
Experiment with BERT-style tabular encoders or TabNet
Integrate external drug description texts for enrichment
Add explainability dashboard (Gradio or Streamlit)
๐ง Summary
This project showcases a complete, reproducible pipeline for real-world regulatory data classification using both traditional ML and deep learning. It highlights preprocessing, feature importance, visual evaluation, and production-ready model deployment.
๐ฏ Deployed models and data are hosted on Hugging Face for public access and further experimentation.