Building CLIP from Scratch: A Tutorial on Multi-Modal Learning