Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers (ICDE'21)
This repository contains the source code for the paper Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers, by Dongjin Lee and Kijung Shin, presented at ICDE 2021.
In this work, we propose SOFIA, an online algorithm for factorizing real-world tensors that evolve over time with missing entries and outliers. By smoothly and tightly combining tensor factorization, outlier detection, and temporal-pattern detection, SOFIA achieves the following strengths over state-of-the-art competitors:
- Robust and accurate: SOFIA yields up to 76% and 71% lower imputation and forecasting error than its best competitors.
- Fast: Compared to the second-most accurate method, using SOFIA makes imputation up to 935X faster.
- Scalable: SOFIA incrementally processes new entries in a time-evolving tensor, and it scales linearly with the number of new entries per time step.
Name | Description | Size | Granularity in Time | Processed Dataset | Original Source |
---|---|---|---|---|---|
Intel Lab Sensor | locations x sensor x time | 54 x 4 x 1152 | every 10 minutes | Dataset | Link |
Network Traffic | sources x destinations x time | 23 x 23 x 2000 | hourly | Dataset | Link |
Chicago Taxi | sources x destinations x time | 77 x 77 x 2016 | hourly | Dataset | Link |
NYC Taxi | sources x destinations x time | 265 x 265 x 904 | daily | Dataset | Link |
- Tensor Toolbox v3.1 for tensor computation.
- Download and link the library.
- Optimization Toolbox for non-linear programming solver (fmincon function in Matlab).
We provide two running example codes for online tensor completion and forecasting, respectively.
- Online tensor completion
- Tensor forecasting
Please see supplementary
This code is free and open source for only academic/research purposes (non-commercial). If you use this code as part of any published research, please acknowledge the following paper.
@inproceedings{lee2021robust,
title={Robust factorization of real-world tensor streams with patterns, missing values, and outliers},
author={Lee, Dongjin and Shin, Kijung},
booktitle={2021 IEEE 37th International Conference on Data Engineering (ICDE)},
pages={840--851},
year={2021},
organization={IEEE}
}