As forecasting problem where the part of the main prediction which are chess steps has been already done in a train data set. Also the first noise predictions between the steps has been done. The goal would be to look if the noise between the steps which will most probably also depend on the chess steps before be predictable.
Furthermore finally baseline evaluation has to be done
Data attached: dtwin.ipynb -- Program so far
train_new.npy -- train data set
Slides on evaluation and also digital twins attached.
(original task: Available data
12220 historical states (training set)
You need to build a digital model from the data
ARIMA, XGBoost, LSTM, Variational Autoencoder, Support
Vector Machine, Linear regression, home-brewed model, . . . ?
Prediction task
You need to predict 10660 future states of the physical twin
Secret test set not available
Available Domain Information:
You donât know much about the physical twin
Goal: Focus on KDDM methods instead of using general knowledge
Some information available
State matrix indices refer to locationsâ spatial data
True states are obscured by noise
12220 states are sorted chronologically
Further Note on Difficulty
The prediction task is not directly solvable
Major components of spatio-temporal process are not predictable
â Need human intellect to solve problem
â Need to build good digital model to help human
Solution Example (not the true task) I
1. You receive historical states
2. You remove all noise using Butterworth filter
3. You use PCA to find three important features
4. You visualize these features in a 3d plot
Human realizes that these are x, y, z positions
Human observes regular rotation
5. You detect that the rotation period is 3600 states
6. You realize you are observing a clockâs minute hand
7. You realize that forecasting the future state is trivial
Recommendations:
Focus effort on pre-processing, decomposition, stationarity, etc.
Use many visualizations)
This job is already closed and no longer accepting applicants, sorry.