Human violence recognition is an area of great interest in the scientific community, given its broad spectrum of applications, especially in video surveillance systems, since detecting violence in real-time could prevent criminal acts and save lives. Despite the number of existing proposals and research, most focus on the precision of results, leaving aside efficiency and its practical implementation. Thus, this work proposes a model that is effective and efficient in recognizing human violence in real-time. The proposed model consists of three modules: a first module called Spatial Motion Extractor (SME), in charge of extracting regions of interest from a frame; a second module called Short Temporal Extractor (STC), whose function is to extract temporal characteristics of rapid movements, finally the Global Temporal Extractor (GET) module, responsible for identifying long-lasting temporal features and fine-tuning the model. The proposal was evaluated regarding efficiency, effectiveness, and ability to operate in real-time. The results obtained on Hockey, Movies, and RWF-2000 datasets demonstrated that this approach is highly efficient compared to other alternatives. A VioPeru dataset was created to validate real-time applicability with violent and non-violent videos captured by real video surveillance cameras in Peru. The effectiveness results in this dataset outperformed the best existing proposal. Therefore, our proposal has contributions in efficiency, effectiveness, and real-time.