Training a Transformer Model from Scratch