time2Vec: learning a vector representation of time
Learn time embeddings?!
- Main idea: provide a model-agnostic representation of time, specifically a learned vector representation of time
- Related work:
- model time directly with bins, point processes and flows
- time decomposition techniques that encode a temporal signal into a set of frequencies: like Fourier transforms (but allow frequencies to be learned)
- concatenate time with input
- new neural architectures that take into account time (TimeLSTM)
- LSTM + T: time is another feature that is concatenated with the input and use standard LSTM
- TimeLSTM: add time gates to the architecture of the LSTM with peepholes
- Desirable properties:
- Periodicity
- Invariance to rescaling
- Simplicity
- time2vec formulation
\(\mathbf{t} \mathbf{2} \mathbf{v}(\tau)[i]=\left\{\begin{array}{ll}{\omega_{i} \tau+\varphi_{i},} & {\text { if } i=0} \\ {\mathcal{F}\left(\omega_{i} \tau+\varphi_{i}\right),} & {\text { if } 1 \leq i \leq k}\end{array}\right.\)
- $\mathcal F$ is a periodic activation function, $\mathcal w, \varphi$ are the frequency and phase-shift respectively (which can be learned)
- linear term used for capturing non-periodic patterns that depend on time.
- different datasets used:
- synthetic dataset with integers to represent the day of the year, positive class if divisible by 7 (classification)
- event mnist (classification): input pixel intensities as time series
- tidigits (classification): input time and channel indices (out of 64) that are activated
- stack overflow: sequence of badges, predict next badge (recommendation)
- last.fm: listening history (recommendation)
- citeulike: what and when posts (recommendation)
- metrics: accuracy and recall@q
- recall@q: generate a recommendation list of k items (k-1 random and correct), and model ranks. recall@q is % of lists that have the correct item in top q
- Main questions:
- is it effective?
- on all datasets, t2v results in no harm and in most, there is improvement
- can be used with long sequences and time horizons
- used with other architectures?
- used for TimeLSTM, yes
- what do the sine functions learn?
- learns correct period and phase-shifts
- can non-periodic activation functions be learned?
- non-periodic cannot capture periodic behaviors, also as time becomes large saturation occurs and for ReLU exploding/vanishing gradients
- should the frequencies even be learned?
- not always clear that is better, but for asynchronous time series might be better to learn them
- parameter overhead is not large, and it doesn’t hurt for sure (attention is all you need)
- is it effective?
- Conclusion: a learned vector representation of time that can capture periodic and non-periodic patterns, is invariant to scale and is portable across models