Deep learning based methods have been widely used in industrial recommendation systems. Previous works adopt an Embedding&MLP paradigm: raw features are embedded into low dimensional vectors, which are fed onto MLP for final recommendation. This paper proposes the use of a transformer model to capture the sequential signals underlying user behavior.
In the era of deep learning, embedding and MLP have been the standard paradigm for industrial RSs: large numbers of raw features are embedded into low-dimensional spaces as vectors, and then fed into fully connected layers, known as multi layer perceptron (MLP), to predict whether a user will click an item or not. Here we apply the self-attention mechanism to learn a better representation for each item in a user’s behavior sequence by considering the sequential information in embedding stage, and then feed them into MLPs to predict users’ responses to candidate items.
Therefore, we propose the user behavior sequence transformer (BST) for e-commerce recommendation.
In the rank stage, we model the recommendation task as ClickThrough Rate (CTR) prediction problem, which can be defined as follows: given a user’s behavior sequence
Depending on the dataset at hand the different features appear differently in each case . For example i was not able to implement their technique for acquiring positional embeddings due to data constraints.
From the image of the architecture we clearly see how each of the different features are well shown in different colours. Features (in red) and positional features (in dark blue) are concatenated to create an embedding matrix
The choice of dataset in this implementation varies, you will need to choose which features are necessary for your use case and how to separate each one of them for the model to understand.
The original transformer paper proposed a position embedding layer. This paper uses the same, however the position value
In this publication i will not be explaining how the transformer works otherwise it will be too lengthy.
To predict whether a user will click the target item
The researchers used the AUC metric which is a common metric from binary classification tasks. Their performance is as shown below compared by other models.
To finish of here are some of the settings of the model as used in the paper.
They used the