Evaluating LLMs: Reproducing Hugging Face Leaderboard Benchmarks