Efficient tracking algorithms are a crucial part of particle tracking detectors. While a lot of work has been done in designing a plethora of algorithms, these usually require tedious tuning for each use case. (Weakly) supervised Machine Learning-based approaches can leverage the actual raw data for maximal performance. Yet in realistic scenarios, sufficient high-quality labeled data is not available. While training might be performed on simulated data, the reproduction of realistic signal and noise in the detector requires substantial effort, compromising this approach. Here we propose a novel, fully unsupervised, approach to track reconstruction. The introduced model for learning to disentangle the factors of variation in a geometrically meaningful way employs geometrical space invariances. We train it through constraints on the equivariance between the image space and the latent representation in a Deep Convolutional Autoencoder. Using experimental results on synthetic data we show that a combination of different space transformations is required for meaningful disentanglement of factors of variation. We also demonstrate the performance of our model on real data from tracking detectors.