time2Vec: learning a vector representation of time

Learn time embeddings?!

Main idea: provide a model-agnostic representation of time, specifically a learned vector representation of time
Related work:
- model time directly with bins, point processes and flows
- time decomposition techniques that encode a temporal signal into a set of frequencies: like Fourier transforms (but allow frequencies to be learned)
- concatenate time with input
- new neural architectures that take into account time (TimeLSTM)
LSTM + T: time is another feature that is concatenated with the input and use standard LSTM
TimeLSTM: add time gates to the architecture of the LSTM with peepholes
Desirable properties:
1. Periodicity
2. Invariance to rescaling
3. Simplicity
time2vec formulation $\mathbf{t} \mathbf{2} \mathbf{v}(\tau)[i]=\left\{\begin{array}{ll}{\omega_{i} \tau+\varphi_{i},} & {\text { if } i=0} \\ {\mathcal{F}\left(\omega_{i} \tau+\varphi_{i}\right),} & {\text { if } 1 \leq i \leq k}\end{array}\right.$
- $\mathcal F$ is a periodic activation function, $\mathcal w, \varphi$ are the frequency and phase-shift respectively (which can be learned)
- linear term used for capturing non-periodic patterns that depend on time.
different datasets used:
- synthetic dataset with integers to represent the day of the year, positive class if divisible by 7 (classification)
- event mnist (classification): input pixel intensities as time series
- tidigits (classification): input time and channel indices (out of 64) that are activated
- stack overflow: sequence of badges, predict next badge (recommendation)
- last.fm: listening history (recommendation)
- citeulike: what and when posts (recommendation)
metrics: accuracy and recall@q
- recall@q: generate a recommendation list of k items (k-1 random and correct), and model ranks. recall@q is % of lists that have the correct item in top q
Main questions:
- is it effective?
  - on all datasets, t2v results in no harm and in most, there is improvement
  - can be used with long sequences and time horizons
- used with other architectures?
  - used for TimeLSTM, yes
- what do the sine functions learn?
  - learns correct period and phase-shifts
- can non-periodic activation functions be learned?
  - non-periodic cannot capture periodic behaviors, also as time becomes large saturation occurs and for ReLU exploding/vanishing gradients
- should the frequencies even be learned?
  - not always clear that is better, but for asynchronous time series might be better to learn them
  - parameter overhead is not large, and it doesn’t hurt for sure (attention is all you need)
Conclusion: a learned vector representation of time that can capture periodic and non-periodic patterns, is invariant to scale and is portable across models