Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
In this paper, we propose a new approach for dynamic hand gesture recognition using intensity, depth and skeleton joint data captured by KinectTM sensor. The proposed approach integrates global and local information of a dynamic gesture.... more
In this paper, we propose a new approach for dynamic hand gesture recognition using intensity, depth and skeleton joint data captured by KinectTM sensor. The proposed approach integrates global and local information of a dynamic gesture. First, we represent the skeleton 3D trajectory in spherical coordinates. Then, we extract the key frames corresponding to the points with more angular and distance difference. In each key frame, we calculate the spherical distance from the hands, wrists and elbows to the shoulder center, also we record the hands position changes to obtain the global information. Finally, we segment the hands and use SIFT descriptor on intensity and depth data. Then, Bag of Visual Words (BOW) approach is used to extract local information. The system was tested with the ChaLearn 2013 gesture dataset and our own Brazilian Sign Language dataset, achieving an accuracy of 88.39% and 98.28%, respectively.
Bone suppression in radiography is a suitable technique to evaluate the health of soft tissues in exams. For instance, these techniques are essential in evaluating chest radiography images during the COVID-19 outbreak. The purpose of this... more
Bone suppression in radiography is a suitable technique to evaluate the health of soft tissues in exams. For instance, these techniques are essential in evaluating chest radiography images during the COVID-19 outbreak. The purpose of this work is to propose an alternative to solve the bone suppression task in chest radiography images using Generative Adversarial Networks (GANs). Specifically, we used a conditional GAN type (cGAN) to provide a bone-suppressed version of the initial image. To quantify the results, it was necessary to review the main metrics and some state-of-the-art papers related to ours. We compared our result to works from the literature that used the same dataset as the proposal or related techniques. The most used dataset was the Japanese Society of Radiological Technology (JSRT) in these works. With this set of images, we reached a PSNR index of 34.96, which was better than that reviewed in the literature, and a similarity coefficient, known as SSIM, of 0.94. As...
The increasing popularity of Social Networks makes change the way people interact. These interactions produce a huge amount of data and it opens the door to new strategies and marketing analysis. According to Instagram... more
The increasing popularity of Social Networks makes change the way people interact. These interactions produce a huge amount of data and it opens the door to new strategies and marketing analysis. According to Instagram (https://instagram.com/press/) and Tumblr (https://www.tumblr.com/press), an average of 80 and 59 million photos respectively are published every day, and those pictures contain several implicit or explicit brand logos. The analysis and detection of logos in natural images can provide information about how widespread is a brand. In this paper, we propose a real-time brand logo recognition system, that outperforms all other state-of-the-art methods for the challenging FlickrLogos-32 dataset. We experimented with 5 different approaches, all based on the Single Shot MultiBox Detector (SSD). Our best results were achieved with the SSD 512 pretrained, where we outperform by 2.5% of F-score and by 7.4% of recall the best results on this dataset. Besides the higher accuracy,...
The prediction of university dropout is a complex problem, given the number and diversity of variables involved. Therefore, different strategies are applied to understand this educational phenomenon, although the most outstanding derive... more
The prediction of university dropout is a complex problem, given the number and diversity of variables involved. Therefore, different strategies are applied to understand this educational phenomenon, although the most outstanding derive from the joint application of statistical approaches and computational techniques based on machine learning. Student Dropout Prediction (SDP) is a challenging problem that can be addressed following various strategies. On the one hand, machine learning approaches formulate it as a classification task whose objective is to compute the probability of belonging to a class based on a specific feature vector that will help us to predict who will drop out. Alternatively, survival analysis techniques are applied in a time-varying context to predict when abandonment will occur. This work considered analytical mechanisms for supporting the decision-making process on higher education dropout. We evaluated different computational methods from both approaches fo...
Clustering methods are the most used algorithms for unsupervised learning. However, there is no unique optimal approach for all datasets since different clustering algorithms produce different partitions. To overcome this issue of... more
Clustering methods are the most used algorithms for unsupervised learning. However, there is no unique optimal approach for all datasets since different clustering algorithms produce different partitions. To overcome this issue of selecting an appropriate technique and its corresponding parameters, cluster ensemble strategies are used for improving accuracy and robustness by a weighted combination of two or more approaches. However, this process is often carried out almost in a blind manner, testing different combinations of methods and assessing if its performance is beneficial for the defined purpose. Thus, the procedure for selecting the best combination tests many clustering ensembles until the desired result is achieved. This paper proposes a novel analytic tool for clustering ensemble generation, based on quantitative metrics and interactive visual resources. Our approach allows the analysts to display different results from state-of-the-art clustering methods and analyze their performance based on specific metrics and visual inspection. Based on their requirements/experience, the analysts can interactively assign weights to the different methods to set their contributions and manage (create, store, compare, and merge), such as for ensembles. Our approach's effectiveness is shown through a set of experiments and case studies, attesting to its usefulness in practical applications.
Ear recognition has gained attention within the biometrics community recently. Ear images can be captured from a distance without contact, and the explicit cooperation of the subject is not needed. In addition, ears do not suffer extreme... more
Ear recognition has gained attention within the biometrics community recently. Ear images can be captured from a distance without contact, and the explicit cooperation of the subject is not needed. In addition, ears do not suffer extreme change over time and are not affected by facial expressions. All these characteristics are convenient when implementing surveillance and security applications. At the same time, applying any Deep Learning (DL) algorithm usually demands large amounts of samples to train networks. Thus, we introduce a large-scale database and explore fine-tuning pre-trained Convolutional Neural Networks (CNN) to adapt ear domain images taken under uncontrolled conditions. We built an ear dataset from the VGGFace dataset by profiting the face recognition field. Moreover, according to our experiments, adapting the VGGFace model to the ear domain leads to a better performance than using a model trained on general image recognition. The efficiency of the trained models ha...
Automatic violence detection in video surveillance is essential for social and personal security. Monitoring the large number of surveillance cameras used in public and private areas is challenging for human operators. The manual nature... more
Automatic violence detection in video surveillance is essential for social and personal security. Monitoring the large number of surveillance cameras used in public and private areas is challenging for human operators. The manual nature of this task significantly increases the possibility of ignoring important events due to human limitations when paying attention to multiple targets at a time. Researchers have proposed several methods to detect violent events automatically to overcome this problem. So far, most previous studies have focused only on classifying short clips without performing spatial localization. In this work, we tackle this problem by proposing a weakly supervised method to detect spatially and temporarily violent actions in surveillance videos using only video-level labels. The proposed method follows a Fast-RCNN style architecture, that has been temporally extended. First, we generate spatiotemporal proposals (action tubes) leveraging pre-trained person detectors,...
High and persistent dropout rates represent one of the biggest challenges for improving the efficiency of the educational system, particularly in underdeveloped countries. A range of features influence college dropouts, with some... more
High and persistent dropout rates represent one of the biggest challenges for improving the efficiency of the educational system, particularly in underdeveloped countries. A range of features influence college dropouts, with some belonging to the educational field and others to non-educational fields. Understanding the interplay of these variables to identify a student as a potential dropout could help decision makers interpret the situation and decide what they should do next to reduce student dropout rates based on corrective actions. This paper presents SDA-Vis, a visualization system that supports counterfactual explanations for student dropout dynamics, considering various academic, social, and economic variables. In contrast to conventional systems, our approach provides information about feature-perturbed versions of a student using counterfactual explanations. SDA-Vis comprises a set of linked views that allow users to identify variables alteration to chance predefined stude...
Skin cancer is one of the cancers that most aggravates the problem in public health. Among the types of cancer, melanoma is the most aggressive type. Its early diagnosis is essential to increase the possibility of adequate treatment,... more
Skin cancer is one of the cancers that most aggravates the problem in public health. Among the types of cancer, melanoma is the most aggressive type. Its early diagnosis is essential to increase the possibility of adequate treatment, aiming to reduce the mortality rate. Dermatologists generally use manual methods to diagnose skin lesions. These methods, in addition to being time-consuming, as they are performed manually, can present different results for the same lesion when analyzed by different specialists. Therefore, an automated diagnosis may be necessary to deal with this issue as well as avoid invasive tests. For this, the task of segmenting the skin lesion in the dermoscopic image can be fundamental, as it is a basic task in the image analysis process. In the present work, a Convolutional Neural Network (CNN) model, based on the U-Net, is used to segment the lesion in dermoscopic images. This proposal achieved an accuracy of 0.949 and Jaccard of 0.833 for the 2017 ISIC base, and an accuracy of 0.954 and Jaccard of 0.850 for the 2018 ISIC base. The proposed model has a simpler architecture, in addition to requiring less computational resources. The experiments made it possible to observe that the proposed model results are promising compared with other CNN models presented in the literature.
Person re-identificacion consists of reidentificating person through a set of images that is taken by different camera views. Despite recent advances in this field, this problem still remains a challenge due to partial occlusions, changes... more
Person re-identificacion consists of reidentificating person through a set of images that is taken by different camera views. Despite recent advances in this field, this problem still remains a challenge due to partial occlusions, changes in illumination, variation in human body poses. In this paper, we present an enhanced Triplet CNN based on body-parts for person re-identification (AETCNN). We design a new model able to learn local body-part features and integrate them to produce the final feature representation of each input person. In addition, to avoid over-fitting due to the small size of the dataset, we propose an improvement in triplet assignment to speed up the convergence and improve performance. Experiments show that our approach achieves very promising results in (CUHK01) dataset and we advance state of the art, improving most of the results of the state of the art with a simpler architecture, achieving 76.50% in rank 1.
Computer science is a dynamic field of study that requires constant review and updating of the curricular designs in academic programs—in general, measuring the impact of plan changes has been little explored in the literature. In most... more
Computer science is a dynamic field of study that requires constant review and updating of the curricular designs in academic programs—in general, measuring the impact of plan changes has been little explored in the literature. In most cases, it focuses only on structuring its curricula, leaving aside several factors associated with important events or facts such as student dropout, retention, and inclusion. However, these features provide academic institutions with many opportunities to understand student performance and propose more effective preventive/corrective actions to avoid dropouts. This work focuses on the curricular changes’ influence on student gender imbalance, socioeconomic provenance, and dropout. Specifically, we employ three different approaches for our analysis: (i) a longitudinal study of four curricula from informatics engineering to computer science transition at San Pablo Catholic University, (ii) an exploratory analysis for identifying essential features that...
Ear recognition has gained attention in recent years. The possibility of being captured from a distance, contactless, without the cooperation of the subject and not be affected by facial expressions makes ear recognition a captivating... more
Ear recognition has gained attention in recent years. The possibility of being captured from a distance, contactless, without the cooperation of the subject and not be affected by facial expressions makes ear recognition a captivating choice for surveillance and security applications, and even more in the current COVID-19 pandemic context where modalities like face recognition fail due to mouth and facial covering masks usage. Applying any deep learning (DL) algorithm usually demands a large amount of training data and appropriate network architectures, therefore we introduce a large-scale database and explore fine-tuning pre-trained convolutional neural networks (CNNs) looking for a robust representation of ear images taken under uncontrolled conditions. Taking advantage of the face recognition field, we built an ear dataset based on the VGGFace dataset and use the Mask-RCNN for ear detection. Besides, adapting the VGGFace model to the ear domain leads to a better performance than using a model trained for general image recognition. Experiments on the UERC dataset have shown that fine-tuning from a face recognition model and using a larger dataset leads to a significant improvement of around 9% compared to state-of-the-art methods on the ear recognition field. In addition, we have explored score-level fusion by combining matching scores of the fine-tuning models which leads to an improvement of around 4% more. Open-set and close-set experiments have been performed and evaluated using Rank-1 and Rank-5 recognition rate metrics.
The mosquito Aedes aegypti can transmit some diseases, which makes the study of the proliferation of this vector a necessary task. With the use of traps made in the laboratory, called ovitraps, it is possible to map egg deposition in a... more
The mosquito Aedes aegypti can transmit some diseases, which makes the study of the proliferation of this vector a necessary task. With the use of traps made in the laboratory, called ovitraps, it is possible to map egg deposition in a community. Through a camera, coupled with a magnifying glass, are acquired images containing the elements (eggs) to be counted. First, the goal is to find pixels with a similar color to mosquito eggs; for that, we take advantage of the slice color method. From these already worked images, a process of transfer learning with a convolutional neural network (CNN) is carried out. The intention is to separate which elements are eggs from the others. In 10% of the test images, the count performed by the model, and the ground truth of the number of eggs was considered weakly correlated. This problem occurs in images that have a high density of eggs or appear black elements that resemble mosquito eggs, but they are not. For the remaining 90% of the test images, the counting was considered to be perfectly correlated.
In this paper, we propose an approach for dynamic hand gesture recognition, which exploits depth and skeleton joint data captured by Kinect™ sensor. Also, we select the most relevant points in the hand trajectory with our proposed method... more
In this paper, we propose an approach for dynamic hand gesture recognition, which exploits depth and skeleton joint data captured by Kinect™ sensor. Also, we select the most relevant points in the hand trajectory with our proposed method to extract keyframes, reducing the processing time in a video. In addition, this approach combines pose and motion information of a dynamic hand gesture, taking advantage of the transfer learning property of CNNs. First, we use the optical flow method to generate a flow image for each keyframe, next we extract the pose and motion information using two pre-trained CNNs: a CNN-flow for flow-images and a CNN-pose for depth-images. Finally, we analyze different schemes to fusion both informations in order to achieve the best method. The proposed approach was evaluated in different datasets, achieving promising results compared to other methods, outperforming state-of-the-art methods.
The first step for video-content analysis, content-based video browsing and retrieval is the partitioning of a video sequence into shots. A shot is the fundamental unit of a video, it captures a continuous action from a single camera and... more
The first step for video-content analysis, content-based video browsing and retrieval is the partitioning of a video sequence into shots. A shot is the fundamental unit of a video, it captures a continuous action from a single camera and represents a spatio-temporally coherent sequence of frames. Thus, shots are considered as the primitives for higher level content analysis, indexing and classification. Although many video shot boundary detection algorithms have been proposed in the literature, in most approaches, several parameters and thresholds have to be set in order to achieve good results. In this paper, we present a robust learning detector of sharp cuts without any threshold to set nor any pre-processing step to compensate motion or post-processing filtering to eliminate false detected transitions. The experiments, following strictly the TRECVID 2002 competition protocol, provide very good results dealing with a large amount of features thanks to our kernel-based SVM classif...
The amount of data produced every day on the internet increases every day and with the increasing popularity of the social networks the number of published photos are huge, and those pictures contain several implicit or explicit brand... more
The amount of data produced every day on the internet increases every day and with the increasing popularity of the social networks the number of published photos are huge, and those pictures contain several implicit or explicit brand logos. Detecting this logos in natural images can provide information about how widespread is a brand, discover unwanted copyright distribution, analyze marketing campaigns, etc. In this paper, we propose a real-time brand logo recognition system that outperforms all other state-of-the-art in two different datasets. Our approach is based on the Single Shot MultiBox Detector (SSD), we explore this tool in a different domain and also experiment the impact of training with pretrained weights and the impact of warp transformations in the input images. We conducted our experiments in two datasets, the FlickrLogos-32 (FL32) and the Logos-32Plus (L32plus), which is an extension of the training set of the FL32. On the FL32, we outperform the state-of-the-art b...
Abstract Sign language recognition has made significant advances in recent years. Many researchers show interest in encouraging the development of different applications to simplify the daily life of deaf people and to integrate them into... more
Abstract Sign language recognition has made significant advances in recent years. Many researchers show interest in encouraging the development of different applications to simplify the daily life of deaf people and to integrate them into the hearing society. The use of the Kinect sensor (developed by Microsoft) for sign language recognition is steadily increasing. However, there are limited publicly available RGB–D and skeleton joint datasets that provide complete information for dynamic signs captured by Kinect sensor, most of them lack effective and accurate labeling or stored in a single data format. Given the limitations of existing datasets, this article presents a challenging public dataset, named LIBRAS–UFOP. The dataset is based on the concept of minimal pairs, which follows specific categorization criteria; the signs are labeled correctly, and validated by an expert in sign language; the dataset presents complete RGB–D and skeleton data. The dataset consists of 56 different signs with high similarity grouped into four categories. Besides, a baseline method is presented that consists of the generation of dynamic images from each multimodal data, which are the input to two flow stream CNN architectures. Finally, we propose an experimental protocol to conduct evaluations on the proposed dataset. Due to the high similarity between signs, the experimental results using a baseline method reports a recognition rate of 74.25% on the proposed dataset. This result highlights how challenging this dataset is for sign language recognition and let room for future research works to improve the recognition rate.
Recognition using ear images has been an active field of research in recent years. Besides faces and fingerprints, ears have a unique structure to identify people and can be captured from a distance, contactless, and without the subject’s... more
Recognition using ear images has been an active field of research in recent years. Besides faces and fingerprints, ears have a unique structure to identify people and can be captured from a distance, contactless, and without the subject’s cooperation. Therefore, it represents an appealing choice for building surveillance, forensic, and security applications. However, many techniques used in those applications—e.g., convolutional neural networks (CNN)—usually demand large-scale datasets for training. This research work introduces a new dataset of ear images taken under uncontrolled conditions that present high inter-class and intra-class variability. We built this dataset using an existing face dataset called the VGGFace, which gathers more than 3.3 million images. in addition, we perform ear recognition using transfer learning with CNN pretrained on image and face recognition. Finally, we performed two experiments on two unconstrained datasets and reported our results using Rank-bas...
Hierarchical image segmentation provides a region-oriented scale-space, i.e. a set of image segmentations at different detail levels in which the segmentations at finer levels are nested with respect to those at coarser levels. However,... more
Hierarchical image segmentation provides a region-oriented scale-space, i.e. a set of image segmentations at different detail levels in which the segmentations at finer levels are nested with respect to those at coarser levels. However, most image segmentation algorithms, among which a graph-based image segmentation method relying on a region merging criterion was proposed by Felzenszwalb–Huttenlocher in 2004, do not lead to a hierarchy. In order to cope with a demand for hierarchical segmentation, Guimarães et al. proposed in 2012 a method for hierarchizing the popular Felzenszwalb–Huttenlocher method, without providing an algorithm to compute the proposed hierarchy. This paper is devoted to providing a series of algorithms to compute the result of this hierarchical graph-based image segmentation method efficiently, based mainly on two ideas: optimal dissimilarity measuring and incremental update of the hierarchical structure. Experiments show that, for an image of size 321 [Formul...
Despite the recent developments in spatiotemporal local features for action recognition in video sequences, local color information has so far been ignored. However, color has been proved an important element to the success of automated... more
Despite the recent developments in spatiotemporal local features for action recognition in video sequences, local color information has so far been ignored. However, color has been proved an important element to the success of automated recognition of objects and scenes. In this paper we extend the space-time interest point descriptor STIP to take into account the color information on the features' neighborhood. We compare the performance of our color-aware version of STIP (which we have called HueSTIP) with the original one.

And 14 more

International Journal of Computer Science and Information Security (IJCSIS – established since May 2009), is a global venue to promote research and development results of high significance in the theory, design, implementation, analysis,... more
International Journal of Computer Science and Information Security (IJCSIS – established since May 2009), is a global venue to promote research and development results of high significance in the theory, design, implementation, analysis, and application of computing and security. As a scholarly open access peer-reviewed international journal, the main objective is to provide the academic community and industry a forum for dissemination of original research related to Computer Science and Security. High caliber authors regularly contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences relevant to latest advances in the Computer Science & Information Security.
IJCSIS archives all publications in major academic/scientific databases; abstracting/indexing, editorial board and other important information are available online on homepage.

https://sites.google.com/site/ijcsis/Home
Research Interests: