Rebuilding the Transformer Architecture for Improved Reasoning in Long Context and Pretrained Facts