The document discusses using machine learning algorithms and supervised learning methods to develop an automated system for detecting nanoparticles and estimating their size and spatial distribution from scanning electron microscope images. The goal is to enable industrial-scale manufacturing of nanomaterials by applying quality control tools. Specifically, the research uses support vector machines and scale-invariant feature transform to extract features from images and classify pixels as nanorods or background in order to predict locations and dimensions of nanorods.
This document discusses using particle swarm optimization (PSO) to design optimal close-range photogrammetry networks. PSO is introduced as a heuristic optimization algorithm inspired by bird flocking behavior that can be used to solve complex optimization problems. The document then provides an overview of close-range photogrammetry network design and the four design stages. It explains that PSO will be used to optimize the first stage of determining optimal camera station positions. Mathematical models of PSO for close-range photogrammetry network design are developed. Experimental tests are carried out to develop a PSO algorithm that can determine optimum camera positions and evaluate the accuracy of the developed network.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
Β
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Utilization of Super Pixel Based Microarray Image Segmentationijtsrd
Β
In the division of PC vision pictures, Super pixels are go probably as key part from 10 years prior. There are various counts and methodology to separate the Super pixels anyway whole all of them the best super pixel looking at strategy is Simple Linear Iterative Clustering SLIC have come to pivot continuously recently. The concentrating of small scale group quality verbalization from MRI imaging is more useful to perceive tumors or some other dangerous development contaminations, so the fundamental DNA cDNA microarray is a grounded device for analyzing the same. The division of microarray pictures is the essential development in a microarray assessment. In this paper, we proposed a figuring to dividing the cDNA small show picture using Simple Linear Iterative Clustering SLIC based Self Organizing Maps SOM method. In any case, the proposed figuring is taken up a moving task to look at the bad quality of pictures in addition. There are two phases to separate the image, introductory, a pre setting up the applied picture to diminish fuss levels and second, to piece the image using SLIC based SOM approach. Mr. Davu Manikanta | Mr. Parasurama N | K Keerthi "Utilization of Super Pixel Based Microarray Image Segmentation" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-5 , August 2021, URL: https://www.ijtsrd.com/papers/ijtsrd46274.pdf Paper URL: https://www.ijtsrd.com/engineering/electronics-and-communication-engineering/46274/utilization-of-super-pixel-based-microarray-image-segmentation/mr-davu-manikanta
GRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATAcscpconf
Β
Many studies uses different data mining techniques to analyze mass spectrometry data and
extract useful knowledge about biomarkers. These Biomarkers allow the medical experts to
determine whether an individual has a disease or not. Some of these studies have proposed
models that have obtained high accuracy. However, the black-box nature and complexity of the
proposed models have posed significant issues. Thus, to address this problem and build an
accurate model, we use a genetic algorithm for feature selection along with a rule-based
classifier, namely Genetic Rule-Based Classifier algorithm for Mass Spectra data (GRC-MS).
According to the literature, rule-based classifiers provide understandable rules, but not
accurate. In addition, genetic algorithms have achieved excellent results when used with
different classifiers for feature selection. Experiments are conducted on real dataset and the
proposed classifier GRC-MS achieves 99.7% accuracy. In addition, the generated rules are
more understandable than those of other classifier models
This document discusses random forest machine learning algorithms and their use in predictive modeling. It provides context on random forests, including that they perform well for both classification and regression tasks, are less prone to overfitting than decision trees, and provide good predictive accuracy while also being interpretable. The document then discusses preprocessing methods like stemming, removing punctuation and stop words that can be applied before using natural language processing algorithms. It highlights the advantages of random forests, such as their ability to handle different data types, parallelizability, and stability. It also notes limitations like lack of interpretability for some users and potential for overfitting on some data sets.
Comparison between the genetic algorithms optimization and particle swarm opt...IAEME Publication
Β
The document compares the genetic algorithms optimization and particle swarm optimization methods for designing close range photogrammetry networks. It presents the genetic algorithm and particle swarm optimization as two popular meta-heuristic algorithms inspired by natural evolution and collective animal behavior, respectively. The document develops mathematical models representing the genetic algorithm and particle swarm optimization for close range photogrammetry network design and evaluates them in a test field to reinforce the theoretical aspects.
This document provides an overview of a project report on simulating a single server queuing problem. The report includes an introduction to operations research, simulation, and the queuing problem. It discusses the research methodology, which involves defining the problem, developing a simulation model, validating the model, analyzing the data, and presenting findings and recommendations. The goal is to use simulation to provide optimal solutions to the queuing problem under study.
Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...Tarun Kumar
Β
In this computing era, image processing has
spread its wings in human life upto the extent that image
has become an integral part of their life. There are various
applications of image processing in the field of commerce,
engineering, graphic design, journalism, architecture and
historical research. In this research work, Image
processing is considered for the analysis of plant leaf
diseases. Plant leaf diseases can be detected based on the
disease symptoms. Here, dataset of disease affected leaves
is considered for experimentation. This dataset contains
the plant leaves suffered from the
AlternariaAlternata,Cercospora Leaf Spot, Anthracnose
andBacterial Blight along with some healthy leaf images.
For this analysis, an autonomous approach of modified
SVM-CS is introduces. Here, concept of cuckoo search is
considered to optimize the classification parameters. These
parameters further help to find more accurate solutions.
This autonomous approach also extracts the healthy
portion and disease affected leaf portion along with the
accuracy of results.
Identification of Disease in Leaves using Genetic Algorithmijtsrd
Β
Plant disease is an impairment of normal state of a plant that interrupts or modifies its vital functions. Many leaf diseases are caused by pathogens. Agriculture is the mains try of the Indian economy. Perception of human eye is not so much stronger so as to observe minute variation in the infected part of leaf. In this paper, we are providing software solution to automatically detect and classify plant leaf diseases. In this we are using image processing techniques to classify diseases and quickly diagnosis can be carried out as per disease. This approach will enhance productivity of crops. It includes image processing techniques starting from image acquisition, preprocessing, testing, and training. K. Beulah Suganthy ""Identification of Disease in Leaves using Genetic Algorithm"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd22901.pdf
Paper URL: https://www.ijtsrd.com/engineering/electronics-and-communication-engineering/22901/identification-of-disease-in-leaves-using-genetic-algorithm/k-beulah-suganthy
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
IRJET- Plant Leaf Disease Detection using Image ProcessingIRJET Journal
Β
This document discusses a technique for early detection of plant diseases through image processing. The technique involves preprocessing leaf images through color space conversion and enhancement. The region of interest (disease area) is segmented and features are extracted. A minimum distance classifier compares the features to a database of known plant diseases and identifies the disease. The methodology achieves over 90% accuracy in detecting diseases. The system could help farmers monitor crops efficiently and apply treatments early to reduce losses from diseases. Future work may involve integrating audio cues and recommending specific treatments to increase productivity and reduce costs and pollution.
Plant disease detection and classification using deep learning JAVAID AHMAD WANI
Β
This document describes a project on plant disease detection and classification using deep learning. The objectives are to automatically detect plant diseases as early as symptoms appear on leaves in order to increase crop productivity. Deep learning techniques like convolutional neural networks (CNNs) are implemented using libraries like TensorFlow and Keras. Two CNN models, VGG16 and VGG19, are compared for classifying diseases in a dataset of 38 classes and 87k images of 14 crop species. The system achieved over 95% accuracy on validation. Future work involves developing a mobile app and integrating disease recommendations to help farmers.
This document outlines a proposed plant leaf disease detection system using image processing on Android mobile phones. The system aims to help farmers easily and cost-effectively detect plant diseases, identify severity levels, and receive treatment suggestions. It will use algorithms like blob detection and HSV color modeling to analyze leaf images and determine diseases. The Android app is intended to provide an affordable solution to identify a variety of disease types and inform farmers in their local language.
This document discusses a face recognition system that aims to improve verification rates under varying lighting conditions. It proposes a framework that combines image normalization, feature extraction, and subspace representation. Each stage increases resistance to illumination variations. The framework achieves significant improvements over other methods, with a verification rate of 88.1% at a 0.1% false acceptance rate. Key components of the system include preprocessing techniques like integral normalized gradient images, feature extraction methods like local binary patterns and Gabor wavelets, and classification using score fusion based on log-likelihood ratios of classifier outputs.
IRJET - Disease Detection in Plant using Machine LearningIRJET Journal
Β
This document discusses using machine learning and image processing techniques to detect diseases in plants. The proposed system utilizes convolutional neural networks (CNNs) to classify plant images as either healthy or diseased based on features extracted from the images. The system architecture includes preprocessing the images, extracting color and texture features, running the features through a CNN model for classification training and testing, and outputting whether plants are normal or abnormal. The goal is to help farmers automatically detect plant diseases early on by analyzing images of plant leaves.
Feature selection for multiple water quality status: integrated bootstrapping...IJECEIAES
Β
STORET is one method to determine the river water quality, and to classify them into four classes (very good, good, medium and bad) based on the data of water for each attribute or feature. The success of the formation of pattern recognition model much depends on the quality of data. There are two issues as the concern of this research as follows, the data having disproportionate amount among the classes (imbalance class) and the finding of noise on its attribute. Therefore, this research integrates the SMOTE Technique and bootstrapping to handle the problem of imbalance class. While an experiment is conducted to eliminate the noise on the attribute by using some feature selection algorithms with filter approach (information gain, rule, derivation, correlation and chi square). This research has some stages as follows: data understanding, pre-processing, imbalance class, feature selection, classification and performance evaluation. Based on the result of testing using 10-fold cross validation, it shows that the use of the SMOTE-bootstrapping technique is able to increase the accuracy from 83.3% to be 98.8%. While the process of noise elimination onthe data attribute is also able to increase the accuracy to be 99.5% (the use of feature subset produced by the information gain algorithm and the decision tree classification algorithm).
The classification of different types of tumors is of great importance in cancer diagnosis and its drug discovery. Cancer classification via gene expression data is known to contain the keys for solving the fundamental problems relating to the diagnosis of cancer. The recent advent of DNA microarray technology has made rapid monitoring of thousands of gene expressions possible. With this large quantity of gene expression data, scientists have started to explore the opportunities of classification of cancer using a gene expression dataset. To gain a profound understanding of the classification of cancer, it is necessary to take a closer look at the problem, the proposed solutions, and the related issues altogether. In this research thesis, I present a new way for Leukemia classification using the latest AI technique of Deep learning using Google TensorFlow on gene expression data.
- The physical-mathematical model of the actual natural or technological phenomena can include
different variables, the finite amount of which is defined by a researcher/conscious observer. The a priori
overall error inherent this model due its finiteness could be compared with the actual experimental measurement
error and should be useful in guiding future investigations. In this context, we propose a strategy relying on
thermodynamic theory of information processes, to estimate this error that cannot be done an arbitrarily small.
For the considered assumptions, the calculated error of the main researched variable, measured in conventional
field studies, should not be less than the error caused by the limited number of dimensional variables of the
physical-mathematical model. Examples of practical application of the proposed concept for spacecraft heating,
climate prediction, thermal energy storage and food freezing are discussed
The document discusses how to optimize social media campaigns using hashtags. It explains what hashtags are and how they can be used to search for and promote content. The document provides tips for effective hashtag use, such as being specific, catering hashtags to the social network, and using relevant unbranded hashtags. It also lists tips to avoid, like having too many hashtags or hashtags that are too long. Finally, the document describes several tools that can be used to track and analyze hashtag usage and conversations.
Hashtag are fundamental part of Social Media Marketing. Have you yet created your own? Here are 7 rules to create the perfect hashtag to boost your business!
This document provides an overview of Twitter, including how it allows users to communicate in real time with the world through short messages and features like @replies, mentions, retweets, hashtags, and direct messages. It discusses how businesses can use Twitter for marketing, customer service, event promotion, and more. The document also presents different types of Twitter users and advises becoming a "Twitter Dad" to manage customer relations, crises, corporate reputation, promote products and events, and advocate for issues.
Hashtag 101 - All You Need to Know About HashtagsModicum
Β
Social media today moves at a mind-blowing pace. As soon as we feel like weβve gotten the hang of one thing, something new flies onto the radar. Itβs tough to keep up with it all. For example, itβs likely youβve heard of hashtags. Suddenly, theyβve become part of our everyday lives, but many of us donβt truly understand how to use them. Never fear! In this #Hashtag 101 infographic, our adorable friend the hashbot has fun teaching exactly what you need to know about optimizing the usage of hashtags for business.
This document provides a lighthearted guide to typography using dating and relationships as a metaphor. It discusses various typographic concepts such as typeface vs. font, type anatomy, font families, pairing typefaces, kerning and leading, and more. Each section relates these concepts to different stages of a relationship from the initial attraction and compatibility to long-term commitment. The document uses humor and analogies to make typically dry typographic topics more engaging and accessible.
Artificial intelligence (AI) is everywhere, promising self-driving cars, medical breakthroughs, and new ways of working. But how do you separate hype from reality? How can your company apply AI to solve real business problems?
Hereβs what AI learnings your business should keep in mind for 2017.
The document describes a skin cancer detection mobile application that uses image processing and machine learning. The application analyzes skin images for characteristics of melanoma like asymmetry, border, color, diameter and texture. It trains a model using the MobileNet-v2 architecture on datasets containing thousands of images. The trained model achieves 70% accuracy in detecting melanoma and differentiating normal and abnormal skin lesions when tested on new images. The application has potential to help identify skin cancer in early stages and assist medical practitioners.
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNINGgerogepatton
Β
This paper proposes a deep learning-based model to segment gastrointestinal tract (GI) magnetic
resonance images (MRI). The application of this model will be useful in potentially accelerating treatment
times and possibly improve the quality of the treatments for the patients who must undergo radiation
treatments in cancer centers. The proposed model employs the U-net architecture, which provides
outstanding overall performance in medical image segmentation tasks. The model that was developed
through this project has a score of 81.86% using a combination of the dice coefficient and the Hausdorff
distance measures, rendering it highly accurate in segmenting and contouring organs in the
gastrointestinal system
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNINGgerogepatton
Β
This paper proposes a deep learning-based model to segment gastrointestinal tract (GI) magnetic
resonance images (MRI). The application of this model will be useful in potentially accelerating treatment
times and possibly improve the quality of the treatments for the patients who must undergo radiation
treatments in cancer centers. The proposed model employs the U-net architecture, which provides
outstanding overall performance in medical image segmentation tasks. The model that was developed
through this project has a score of 81.86% using a combination of the dice coefficient and the Hausdorff
distance measures, rendering it highly accurate in segmenting and contouring organs in the
gastrointestinal system.
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNINGijaia
Β
This paper proposes a deep learning-based model to segment gastrointestinal tract (GI) magnetic
resonance images (MRI). The application of this model will be useful in potentially accelerating treatment
times and possibly improve the quality of the treatments for the patients who must undergo radiation
treatments in cancer centers. The proposed model employs the U-net architecture, which provides
outstanding overall performance in medical image segmentation tasks. The model that was developed
through this project has a score of 81.86% using a combination of the dice coefficient and the Hausdorff
distance measures, rendering it highly accurate in segmenting and contouring organs in the
gastrointestinal system.
An efficient convolutional neural network-based classifier for an imbalanced ...IAESIJAI
Β
Imbalanced datasets pose a major challenge for the researchers while addressing machine learning tasks. In these types of datasets, samples of different classes are not in equal proportion rather the gap between the numbers of individual class samples is significantly large. Classification models perform better for datasets having equal proportion of data tuples in both the classes. But, in reality, the medical image datasets are skewed and hence are not always suitable for a model to achieve improved classification performance. Therefore, various techniques have been suggested in the literature to overcome this challenge. This paper applies oversampling technique on an imbalanced dataset and focuses on a customized convolutional neural network model that classifies the images into two categories: diseased and non-diseased. Outcome of the proposed model can assist the health experts in the detection of oral cancer. The proposed model exhibits 99% accuracy after data augmentation. Performance metrics such as precision, recall and F1-score values are very close to 1. In addition, statistical test is performed to validate the statistical significance of the model. It has been found that the proposed model is an optimised classifier in terms of number of network layers and number of neurons.
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...gerogepatton
Β
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASAβs turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...ijaia
Β
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASAβs turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
Β
This summary provides the high-level information from the document in 3 sentences:
The document proposes a Particle Swarm Optimization (PSO) based ensemble classification model to improve classification of high-dimensional biomedical datasets. It develops an optimized PSO technique to select optimal features and initialize weights for base classifiers in the ensemble model. Experimental results on microarray datasets show the proposed model achieves higher accuracy, true positive rate, and lower error rate compared to traditional feature selection based classification models.
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
Β
Machine learning is the scientific study of algorithms and statistical models that is used by the machines to perform a specific task depending on patterns and inference rather than explicit instructions. This research and analysis aims to observe how precisely a machine can predict that a patient suspected of breast cancer is having malignant or benign cancer.In this paper the classification of cancer type and prediction of risk levels is done by various model of machine learning and is pictorially depicted by various tools of visual analytics.
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
Β
Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly.
For more information:
http://societyofdatascientists.com/controlling-informative-features-for-improved-accuracy-and-faster-predictions-in-omentum-cancer-models/?src=slideshare
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET Journal
Β
This document compares three machine learning techniques - Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB) - for predicting breast cancer using a dataset of 198 patient records. It finds that SVM achieved the highest accuracy of 96.97% for classification, followed by RF at 96.45% and NB at 95.45%. SVM also had the highest recall rate at 0.97, indicating it was best at correctly identifying malignant tumors. While NB had the lowest precision of 0.92, meaning it incorrectly identified some benign cases as malignant, all three techniques showed high performance in predicting breast cancer.
IRJET- A Novel Segmentation Technique for MRI Brain Tumor ImagesIRJET Journal
Β
This document summarizes several research papers on techniques for segmenting brain tumors in MRI images. It discusses challenges in brain tumor segmentation and describes various approaches that have been proposed, including methods using feature selection, kernel sparse representation, multiple kernel learning (MKL), and post-processing techniques. The document also reviews state-of-the-art segmentation, registration, and modeling methods for brain tumor images and their performance.
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET Journal
Β
This document describes a study that uses supervised machine learning algorithms to predict breast cancer. Three algorithms - decision tree, logistic regression, and random forest - are applied to preprocessed breast cancer data. The random forest model achieved the best accuracy at 98.6% for predicting whether a tumor was benign or malignant. The study aims to develop an early prediction system for breast cancer using machine learning techniques.
This document reviews different machine learning methods for classifying malignant mesothelioma using a dataset of 324 patient records. It evaluates the performance of support vector machine (SVM), decision tree, neural network, bagging, and AdaBoost in terms of classification accuracy and computational time. SVM with a linear kernel, decision tree, and bagging using decision trees achieved the highest accuracy of 100% across different training set sizes. However, bagging had longer computational time than SVM and decision tree. Therefore, the document concludes that SVM with a linear kernel or decision tree would be the best methods for classifying malignant mesothelioma, especially for large datasets, due to their high accuracy and low computational time.
Melanoma Skin Cancer Detection using Deep LearningIRJET Journal
Β
This document presents research on developing a deep learning model to detect melanoma skin cancer. The researchers created a convolutional neural network called Xception to analyze images of skin lesions and classify them as benign or malignant. They developed a web application using Flask that allows users to upload images for analysis. The Xception model achieved 97% accuracy on a test dataset. The web app was also able to accurately classify images, demonstrating its potential to assist dermatologists in early detection of melanoma skin cancer. However, further improvements are still needed before the model and web app can be fully relied upon for clinical diagnosis.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
Β
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
3D Segmentation of Brain Tumor ImagingIJAEMSJORNAL
Β
A brain tumor is a collection of anomalous cells that grow in or around the brain. Brain tumors affect the humans badly, it can disrupt proper brain function and be life-threatening. In this project, we have proposed a system to detect, segment, and classify the tumors present in the brain. Once the brain tumor is identified at the very beginning, proper treatments can be done and it may be cured.
Software Cost Estimation Using Clustering and Ranking SchemeEditor IJMTER
Β
Software cost estimation is an important task in the software design and development process.
Planning and budgeting tasks are carried out with reference to the software cost values. A variety of
software properties are used in the cost estimation process. Hardware, products, technology and
methodology factors are used in the cost estimation process. The software cost estimation quality is
measured with reference to the accuracy levels.
Software cost estimation is carried out using three types of techniques. They are regression based
model, anology based model and machine learning model. Each model has a set of technique for the
software cost estimation process. 11 cost estimation techniques fewer than 3 different categories are
used in the system. The Attribute Relational File Format (ARFF) is used maintain the software product
property values. The ARFF file is used as the main input for the system.
The proposed system is designed to perform the clustering and ranking of software cost
estimation methods. Non overlapped clustering technique is enhanced with optimal centroid estimation
mechanism. The system improves the clustering and ranking process accuracy. The system produces
efficient ranking results on software cost estimation methods.
Computer Aided System for Detection and Classification of Breast CancerIJITCA Journal
Β
Breast cancer is one of the most important causes of death among all type of cancers for grown-up and
older women, mainly in developed countries, and its rate is rising. Since the cause of this disease is not yet
known, early detection is the best way to decrease the breast cancer mortality. At present, early detection of
breast cancer is attained by means of mammography. An intelligent computer-aided diagnosis system can
be very helpful for radiologist in detecting and diagnosing cancerous cell patterns earlier and faster than
typical screening programs. This paper proposes a computer aided system for automatic detection and
classification of breast cancer in mammogram images. Intuitionistic Fuzzy C-Means clustering technique
has been used to identify the suspicious region or the Region of Interest automatically. Then, the feature
data base is designed using histogram features, Gray Level Concurrence wavelet features and wavelet
energy features. Finally, the feature database is submitted to self-adaptive resource allocation network
classifier for classification of mammogram image as normal, benign or malignant. The proposed system is
verified with 322 mammograms from the Mammographic Image Analysis Society Database. The results
show that the proposed system produces better results.
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISIRJET Journal
Β
This document presents a semi-supervised spatial EM framework for microarray analysis to efficiently classify and predict diseases based on gene expression data. It uses a spatial EM algorithm to cluster gene expression data, followed by an SVM classifier to predict diseases and their severity levels. The proposed approach is evaluated based on classification accuracy, computation time, and ability to identify biologically significant genes. Experimental results on disease datasets show improved accuracy compared to other supervised and unsupervised methods. The authors conclude that using the same classifier for gene selection and classification enhances predictive performance, and future work will focus on partitioning genes into clusters correlated with sample categories to further improve accuracy.
PriMA Prosthetics developed a 3D printed myoelectric prosthetic arm that is strong, lightweight, and life-like in appearance and function for under $1,500. The arm uses DC motors, force sensors, and myoelectric sensors to allow for a full range of motion and gestures through machine learning algorithms. It was designed through client and industry surveys to be comfortable, easy to use, and mimic the motion of a natural hand. 3D printing allowed for complex contours to be manufactured quickly and inexpensively.
Here are the key points from the literature review:
- Weight and size of the prosthesis are crucial factors that determine comfort and usability. Even if the same weight as a natural arm, an additional prosthesis is perceived as heavy.
- Range of motion, grasp types, and grip force ability determine how well a prosthesis can mimic the functions of a natural arm. A wider range is better.
- Battery life, durability, cosmetic appearance, and sensory feedback also contribute to making a prosthetic arm similar to a natural one.
The design should focus on making the prosthesis lightweight, customizable to individual sizes, capable of a wide range of motion and grip forces, durable, and providing sensory feedback -
PriMA is developing a lower-cost prosthetic arm using 3D printing and sensory feedback. The project is being conducted by interdisciplinary students at Florida Institute of Technology as part of their capstone design project. The goal is to create an affordable prosthetic arm option with high functionality. The team aims to eventually start a 3D printing technology company and bring innovation to the prosthetics industry.
This document summarizes a project to develop a 3D printer capable of constructing hybrid rocket engine solid fuel components in complex shapes. 3D printing porous fuel can exponentially increase oxidizer flux and improve performance. The document outlines hybrid rocket engine fundamentals, advantages over solid-fuel rockets like controllable regression rates. It proposes innovations to an existing 3D printer like adding a vacuum chamber, extrusion pump and nozzle to print liquid fuel into a solid shape, and addresses issues like potential clogging and need for a cooling system. The goal is eliminating high costs and enabling industrial-scale production of hybrid rockets.
This report analyzes and proposes a solution to a vibration issue in BMW's X3 SAV side mirror design. The initial design analysis found high vibration and resonance when the door closed. The source was identified as a low damping ratio of 0.0644.
A solution of increasing the damping ratio to at least 0.456 through adding a damper was proposed. Calculations showed this would reduce maximum overshoot to below 20% and lower settling time. A spreadsheet calculated damping ratios for different dampers, selecting one with ratio 0.5-0.6.
Numerical analysis and CAD models showed the modified design with damper met requirements. Recommendations included the damper solution and considering fatigue in future designs
1. The document describes modifications made to improve an industrial pipe cooling system. The original system cooled extruded metal pipes with water nozzles but took too long (52 feet) and inconsistently cooled the pipes before the plastic coating melted.
2. The proposed solution added chillers and increased the water pressure and flow rate through nozzles. Modeling showed this would reduce the required cooling length to 24 feet, freeing up plant space.
3. A cost analysis determined the modifications would cost $34,838.05 but that the additional plant space alone would offset the cost within a quarter by enabling new processes and manufacturing.
This document outlines an experiment to analyze converging-diverging (CD) nozzles. The experiment has two parts: 1) analyzing the outlet conditions of a single nozzle by comparing experimental and theoretical data, and 2) studying relationships between nozzle design parameters (area ratio, divergence angle) and flow characteristics (outlet speed, efficiency) for several nozzles. The document provides relevant theory, equipment descriptions, procedures, and methods to determine the success of each part. The goal is to better understand subsonic flow through CD nozzles and relationships between nozzle geometry and performance.
This document summarizes research on developing a machine learning system to automatically detect and characterize nanoparticles in micrographs. The research used supervised learning algorithms like support vector machines (SVM) and feature extraction methods like SIFT to classify nanoparticles by size and spatial distribution. Initial results showed SIFT features and features based on neighboring pixels improved classifier accuracy over simple (x,y) coordinates. Future work could optimize clustering methods, classification algorithms, and develop software for industrial nanoparticle analysis.
1. Berger 1
QUALITY ISSUES IN
NANOMANUFACTURING
THADDEUS BERGER1 , MOSTAFA GILANIFAR2 , TANMOY DAS2 , GRANT KLEINER2 , DR. ABHISHEK
SHRIVASTAVA2 *
1 FLORIDA INSTITUTE OF TECHNOLOGY
2 FLORIDA STATE UNIVERSITY
HIGH-PERFORMANCE MATERIALS INSTITUTE
FLORIDA A&M UNIVERSITY β FLORIDA STATE UNIVERSITY, COLLEGE OF ENGINEERING
2525 POTTSDAMER STREET, TALLAHASSEE, FLORIDA 32310
ABSTRACT
Nanoparticles have potential in a variety of applications, including cancer diagnosis and treatment,
structural health monitoring and βsmartβ buildings, and improved solar cells. Nanoparticle
fabrication, however, is currently not standardized and not viable on an industrial scale. The scale-
up of nanomanufacturing requires application of quality engineering tools for optimizing process
yield, variance reduction, and process monitoring and control. This requires methods for estimating
nanoparticle dimensions and spatial arrangement, as these significantly influence nanomaterial
thermal, physical, optical, and electromagnetic properties. The objective of this research is to use
supervised learning algorithms and machine learning to develop a system to automatically detect
nanoparticles and estimate size and spatial distribution.
This paper will first go into some detail on the applications of nanotechnology, the necessity of an
industrial-scale manufacturing procedure, and how supervised learning ties into achieving the goal
of commercialization. Then, background will be given on supervised learning before discussing
classification. After outlining the flow of the project, detail will be given on our chosen classification
and feature extraction techniques as well as clustering methods used for multi-class classification.
2. Berger 2
1: INTRODUCTION
Improving the scalability of nanomaterial
production has many commercial applications.
Composites with nanomaterials are becoming
increasingly common in research, and
applications for nanomaterials exist in many
commercial venues including cancer cell
targeting and treatment [1] [2], structural
health monitoring [3] [4], and more effective
solar panels [5]. The high surface area-to-
volume ratios and ideal thermal, physical,
optical, and electromagnetic properties of
nanoparticles [6] make nanoparticles critical to
the commercialization of modern technology.
Currently, labs around the country are using
nanomaterials for research and design of
lighter, stronger materials. Numerous labs use
scanning electronmicroscopes (SEM), but their
purposes are largely qualitative. No all-in-one
process or software package currently exists to
provide a quick, affordable way to make
nanomaterial production viable at an
industrial scale. Designing such a process
would allow the development of quality
engineering tools to optimize process yield and
reduce variance. Without such tools,
nanotechnology will be prohibitively expensive to
produce on an industrial scale. The system to
enable the development of the necessary
engineering tools would use data extracted from
the SEM images to learn patterns from the data
and allow predictions to be made. Machine
learning can be applied to complete this task and
provide the tools needed to scale up
nanomanufacturing. Therefore, the development
of automated systems for estimating size and
spatial distribution, which heavily influence
material properties at the nanoscale, is necessary
for the scale-up of nanomanufacturing processes.
Dimensional estimation and control would allow
industries to quickly, accurately, and affordably
determine the best nanomanufacturing processes.
This capability would also allow for the
standardization of nanomanufacturing processes,
further improving scalability.
FIGURE 1: APPLICATION OF NANOMATERIALS IN
CANCER TREATMENT [2].
FIGURE 2: APPLICATION OF NANOMATERIALS (IN
THIS CASE, SILVER NANOWIRES) IN IMPROVING
SOLAR CELL TECHNOLOGY [5].
FIGURE 3: APPLICATION OF NANOMATERIALS IN
STRUCTURAL HEALTH MONITORING [4].
4. Berger 4
to draw more information about the model. Cross validation is one of the most popular sampling
methods. Cross validation splits the training set into groups, and fits the model to each group,
assessing the modelβs performance by averaging all the groupsβ errors. K-fold cross validation is very
common in statistical learning; the training data, with n observations, is separated into K groups of
almost equal size. A special case is when K = n groups of size n-1, which is called leave-one-out cross
validation (LOOCV). LOOCV gives the best error estimate, but becomes extremely computationally
expensive when dealing with big data.
One of the most important aspects of our research was feature extraction. We needed to find the
features which distinguish the pixels of a nanorod from the background in order to build our model.
The effect is very similar to human memory β for example, you may recognize the people, places, and
objects you know based on their sounds, smells, physical features, or mannerisms (or a combination
of them). There are several feature detection algorithms available, and nearly limitless combinations
of features can be extracted, as there are approximately 10,000-30,000 different object categories
[9]. We needed to find the features that could not only segment out the nanorods in the images
correctly, but we needed to do so under varying levels of brightness, sharpness, or intensity across
multiple images. These differences can be significant, as highlighted in Figure 4.
2.2: CLASSIFICATION
All supervised learning models are built with the goal of predicting a response to one or more
predictors. Models can be built to serve a variety of purposes and to fit a variety of trends. For this
reason, selecting the proper method is of vital importance. Since the first goal of our research was to
pick out nanorods from images, it was clear that we would be using classification algorithms. Humans
perform classification instinctively many times per day. Classification is the simple association of
items to their descriptions (or other people to their names). While linear regression is used when
predicting a quantitative (numerical) response, classification is used when predicting a qualitative
(categorical) response. For our data, we needed to use classification to determine which pixels were
parts of nanorods and which were not.
There are many types of classifiers, or models used to classify data. One popular classifier is logistic
regression. This classifier assumes that the logarithm of the odds that an observation will be in a
certain class, or log-odds, is linear. The coefficients of the linear portion can be estimated using the
maximum likelihood method, which estimates the coefficients such that all training observations can
(a) (b)
FIGURE 4: NANOROD MICROGRAPHS. SEM IMAGES OFTEN VARY
SIGNIFICANTLY IN BRIGHTNESS, SHARPNESS, AND INTENSITY. A) [10],
B) [11].
5. Berger 5
be classified correctly [8, pp. 132]. This can be formalized mathematically using the likelihood
function, where Ξ²0 and Ξ²1 are regression coefficients:
π(π· π, π· π) = β π(ππ)
π:ππ=π
β (π β π(ππβ²))
πβ²:ππβ²=π
(2).
Logistic regression is mostly used for binary classification. When dealing with more than two classes,
linear discriminant analysis (LDA) is commonly used. LDA attempts to approximate the Bayes
classifier, which is the ideal classifier [8, pp. 37]. Quadratic discriminant analysis (QDA) is similar to
LDA, but uses quadratic discriminant functions instead of linear. K-nearest neighbors (KNN) is a
classification method which attempts to classify observations based on the K nearest observations.
Some popular, more computer-intensive methods include decision trees, random forests, boosting,
and support vector machines (SVM) [8, pp. 127]. Our group used SVM for classification, which is
covered in section 2.4.
2.3: OVERALL FLOW
Our experiments followed the structure of the flow chart in Figure 5.
FIGURE 5: GENERAL FLOW FOR AN EXPERIMENT.
For an experiment, the features of a training image were extracted as a data matrix and then used to
train a model. Then, the model was validated using a new image. This process was repeated for
various methods and feature extraction techniques. The success or failure of the model was
determined by classification error or 0/1 loss. Classification error has two components: misdetection
(false positive) error and false alarm (false negative) error. The general goal was to solve the
following optimization problem:
π΄πππππππ(
π΅πππππ ππ πππππππππ ππππππππππ ππππππππππππ
π΅πππππ ππ ππππππππππππ
) (3).
Prediction On
New Images
6. Berger 6
2.4: SUPPORT VECTOR MACHINES (SVM)
SVM is a vector space-based classifier which
separates training data based on their class labels
with a hyperplane such that the hyperplane is the
farthest possible from points in either class [13].
SVMs are one of the most popular machine learning
techniques available today [14]. SVM can be used to
construct linear or nonlinear classifiers (Figure 6)
using the kernel trick. A kernel is a function which
quantifies the similarity of two observations and
implicitly maps the data to a higher dimensional
feature space. The SVM then learns a (linear
classifier) hyperplane in this high-dimensional
feature space, resulting in nonlinear classification
boundaries in the original space [8, pp. 350]. Using
kernels is computationally less expensive than
creating new features (data transformations)
explicitly. For our research, we used mostly radial
and some linear kernels. SVM can have a hard or soft
margin. A hard margin a classifier that does not allow any misclassified observations. A soft margin
yields a smoother classifier by allowing some misclassifications [15]. The soft marginβs ability to
ignore some observations usually results in a better overall fit. The parameter which controls the
margin of an SVM is C (cost). Another parameter, Ξ³ (gamma), parametrizes the kernel function [16].
2.5: SCALE-INVARIANT FEATURE TRANSFORM (SIFT)
SIFT is a robust image descriptor developed by
David Lowe in 1999 which is commonly used in the
field of computer vision for object detection and
point matching between different views of 3D
objects. SIFT detects key points (points of interest)
for features. Histograms are used to generate a
vector of 128 features at each key point. These
vectors (descriptors) are used to classify the image
[9]. Object detection was critical for this research,
because we needed to be able to detect the features
which distinguished nanorods from the
background and because these features had to hold
up to the differences shown in the βSupervised
Learningβ section. We take this ability for granted as
humans because it is so easy for us. Humans are
capable of distinguishing thousands of types of
objects [9] with almost no difficulty through a wide range of illuminations, orientations, distances,
and distractions. For example, you may be able to recognize your car very quickly from relatively far
away. But how do you know the car is yours? Surely you do not own the only car of that type in the
world. You simply recognize the car intuitively, immediately recognizing all the features of the car
FIGURE 7: FLOW OF IMAGE CLASSIFICATION
USING SIFT FEATURE EXTRACTION ALGORITHM
[9].
FIGURE 6: SAMPLE SVM CLASSIFICATION PLOT.
7. Berger 7
that make it yours, and this is the goal of object detection. When
using SIFT, the first key points are detected by the scale-space
extrema of the Difference-of-Gaussian (DoG) values, and SIFT
extracts a 128-dimensional descriptor vector for each key point
[16]. Figure 8a shows a plot of SIFT key points overlaid on an
image. Plotting the SIFT descriptors was unnecessary for our
research, as the descriptors were simply extracted into tables
in CSV files. However, the results of plotting the SIFT
descriptors for an image can be seen in Figure 8b.
The DoG operator normalizes SIFT features and causes the
features to be scale invariant. This means that the features do
not vary with rotation, translation, or scaling. SIFT features can
be detected through wide differences in intensity, illumination,
and sharpness of an image. This made SIFT a top option, giving
us 128 low-variance predictors to study and to trainour models
instead. However, one issue with SIFT is that it may eliminate
critical variations which could help our SVM correctly classify
images. This issue was studied further by using K-means
clustering, an unsupervised learning method, to consider a
multi-level classification problem.
2.6: K-MEANS CLUSTERING & MULTI-LEVEL SVM CLASSIFICATION
Clustering is an unsupervised learning method which separates a data set into several groups of
similar observations. We used clustering to allow us to use more than two response variables.
Previously, we classified pixels or SIFT descriptors as belonging to either the foreground (nanorods)
or background (not nanorods). However, foreground observations may themselves have a variety of
patterns. Separating these patterns into separate groups, using clustering, can improve the accuracy
of the learned classifiers. K-means clustering was used to group the foreground and background data
so that we could use a multi-class SVM, which is an SVM with more than two classes to separate. K-
means clustering simply divides n observations into K clusters, where K is a selected value. In K-
means clustering each observation is placed into the cluster with the nearest mean. This is very
similar to K-medoids clustering, where each observation is placed into the cluster with the nearest
median. In R, there is a function to partition around medoids (PAM), part of a package called Flexible
Procedures for Clustering (FPC), which estimates the best value of K [18]. While PAM is meant for K-
medoids clustering, the similarities between K-means and K-medoids mean that PAM also gives a
good estimate of K for K-means. After using K-means to cluster a data set, cluster number can be used
as the response in a multi-class SVM. Foreground and background data were clustered separately,
with K = 10 and K = 2, respectively. Then, the data was combined to run a 12-class SVM on the entire
image.
(b)
(a)
FIGURE 8: A) SIFT KEYPOINTS ON A
TEST IMAGE [17]; B) SIFT
DESCRIPTORS.
8. Berger 8
2.7: EXPERIMENT SETUP
When building a supervised learning model, the first step is to acquire training data to build a model
to predict test data. We began by creating βGround Truthβ (GT) from micrographs which we acquired
from publications and from other labs. GT was used to generate training data which was used to train
our models. GT was created by using Microsoft Paint to color over distinguishable nanorods in the
selected micrographs. Each nanorod was assigned a set of RGB (red, green, blue) color values.
The first predictors tested were the (x, y) pixel coordinates. The (x-y) coordinate system in image
processing is different from typical Cartesian coordinates. With a Cartesian system, the positive y axis
is oriented vertically upward, but is vertically downward in image processing. Next, SIFT features
were used as predictors. For an image with N descriptors, the data was an Nx128 matrix. The next
set of features used was based on a pixelβs neighbors. For every pixel in a training image, we used 25
features a 5x5 descriptor vector per pixel. Thus, for a training image with P pixels, the data was a
Px25 matrix. These features were extracted in CSV format in MATLAB to be used in R to train an SVM.
When implementing the SVM in R, we used tuning to determine the best combination of parameters
for our model. Since we were dealing with large, high-dimensional data sets, tuning became highly
computationally expensive. SVM tuning could take hours or even days to execute. The only solutions
were to use a faster computer or write faster codes. R, by default, does not utilize parallel computing,
so using a multi-core computer (almost all of todayβs computers have multiple cores) initially
provides no advantage. To address the need for faster computing, it was necessary to develop an SVM
implementation which worked in parallel on a high-performance multi-core machine. Tuning
optimizes an SVM over a range of parameters. R tunes the SVM by testing one model at a time by
default. The doParallel library in R provides a parallel backend, or a parallel network of workers, for
the foreach loop. DoParallel must be combined with the parallel library, which is included in recent
versions of R. The foreach library enables the foreach loop, and along with the iterators library,
enables the parallel and doParallel libraries to be installed. The e1071 library is needed to run and
tune SVM. The foreach loop can be easily nested to conduct run SVM for variations of C and Ξ³ [19].
The foreach loop did not tune the SVM; the loop ran an SVM for each combination of C and Ξ³, all at
once. For a given training data set, the model with the best accuracy (lowest error) was selected and
a single SVM (a much faster computation) was run with that set of parameters. This implementation
provided us with a major speed increase when dealing with big data and would be advantageous for
industrial nanomanufacturers with access to high-performance servers.
9. Berger 9
3: RESULTS AND DATA ANALYSIS
3.1: DATA COLLECTION
GT was created for 16 SEM images. Numerical data matrices were extracted using MATLAB in the
form of CSV files for both training (GT) images and test images. Models were trained using R on a 40-
core Linux (Ubuntu) platform. Using GT images to train our models allowed us to predict the locations
of nanorods in other images.
3.2: FEATURE EXTRACTION
Our first tests used the (x, y) coordinate values
as the only variables to predict the response of
foreground or background. Next, we used SIFT
features as predictors, first with a binary
response, and then using K-means clustering to
conduct 12-class classifications. Using PAM, we
found the ideal value of K for the foreground
and background data to be 10 and 2,
respectively. A section of the data matrix for
Figure 8 is shown to visualize our data. A
sample training image and GT are shown in
Figure 9 with SIFT keypoints plotted. FIGURE 9: EXAMPLE SIFT IMAGE DATA MATRIX. THE
IMAGE HAD 2,480 DESCRIPTORS AND A BINARY
RESPONSE (0 FOR BACKGROUND AND 1 FOR
FOREGROUND).
FIGURE 10: EXAMPLE ORIGINAL IMAGE (A) AND GT (B) [20].
(b)(a)
10. Berger 10
3.3: CLASSIFICATION
SVM tuning originally lasted from as little as 30 minutes for smaller data sets to days for larger data
sets. To address this issue, I developed a parallel computing method as discussed in the Experiment
Setup section. Using nested foreach loops, I was able to run an
SVM for all combinations of C and Ξ³ very quickly. When testing
the effectiveness of my code, running on 25 of our 40 cores, I was
able to increase the speed of the tuning process by a factor of 30
(the image took ~30 minutes to tune originally but only 57
seconds with my code) when running nine combinations of C and
Ξ³. This implementation became very useful for tuning large data
sets like those generated by the 25 feature descriptors for every
pixel. I tuned one of these images in parallel over three hours,
meaning that the default tuning process would have lasted
almost four days. After tuning in parallel, a new SVM had to be
made using the parameters from the loop which resulted in the
lowest error. However, the parallel implementation was still an
order of magnitude faster than the default tuning.
When we applied a model from our SVM to a test image, we could
generate a table called a confusion matrix, which is simply a table of predicted values vs. real values
where each value corresponds to a class. A binary SVM classification was run for the GT image in
Figure 11 and tested with the image in Figure 8a, resulting in a
confusion matrix (Figure 13) and an SVM classification plot (Figure 12).
Using the confusion matrix and Equation 3, accuracy (Ξ·) can be
computed for each test of a classifier. Table 1 (next page) shows Ξ· for
17 experiments which used six images (designated Image 1 β Image 6,
see Figures 13 and 14 on the next page). Table 2 (pp. 12) shows the
average and standard deviation for Ξ· for each set of features and each
value of C and Ξ³.
FIGURE 11: SVM CLASSIFICATION
PLOT FOR EXAMPLE TEST IMAGE.
FIGURE 12: CONFUSION
MATRIX FOR EXAMPLE SVM.
12. Berger 12
4: DISCUSSION AND CONCLUSIONS
4.1: DISCUSSION
The main assumption in this research is
that all nanorods extend into the
substrate. This assumption will allow
us to use the top and side edges and the
angle of the side edges to find the
nanorodsβ projection lengths.
Tables 1 and 2 show that the features
have the strongest correlation to SVM
classifier accuracy. As expected, using
only the (x, y) coordinates of the pixels
resulted in poor accuracy. The SIFT
features were quite robust, and the 25
feature image was well above the
average, but more trials from all three
sets of features are necessary to be able
to truly say that which features are the
best. Varying C and Ξ³ shows very little
effect on the accuracy of the classifier,
but low values of C and Ξ³ did result in a marked improvement in accuracy. This makes sense because
we were dealing with big data sets, meaning that giving single data points too much influence could
lower the classifier accuracy. Although more testing will be needed to confirm the positive effect,
using multiple classes by clustering, using cluster number as the response, may improve accuracy
dramatically. Based on these results, using SIFT feature descriptors or the 25 feature descriptors with
low values of C and Ξ³ for a 12 class SVM classifier should result in high accuracy. Maximizing classifier
accuracy will allow industries to classify images with very little error and accurately determine the
optimal nanomanufacturing processes.
The SVM classification plot in Figure 11 appears to give an upside down view of the test image (Figure
8a). This makes sense because the positive y-axis in images is oriented vertically downward, as
mentioned in the Experimental Setup section. Knowing this, the plot clearly performs reasonably
well, as the boundary separating the nanorods from the background appears clearly in the SVM
classification plot. Some misclassifications are clear both in the nanorods and in the background,
which makes sense due to this particular SVMβs accuracy rating of about 86%.
Most of the error in our experiments was due to the propagation of human error in coloring GT
images. Using multiple GT images as training data in the future may minimize the effect of human
error. Any remaining error is due to random intrinsic errors. Therefore, running more experiments
will be critical to finding the features and parameters to maximize classifier accuracy. Another
important source of error is due to the estimation of nanoparticle dimensions using projection
lengths. In the future, it will be important for nanomanufacturers to understand that the dimensions
being extracted are not exact. However, the goal is to minimize these errors to the point where valid
observations of changes in length can be made.
Features Avg. Ξ· Difference from
Total Avg. Ξ·
Std. Dev.
(x, y) 0.740117 -0.118846 0.021319
SIFT 0.871043 0.012080 0.029669
25* 0.927537* 0.068574* 0*
C
0.1 0.870503 0.011539 0.049352
10 0.865906** 0.006943** 0**
100 0.846418 -0.012545 0.072270
1000 0.865906** 0.006943** 0**
Ξ³
0.5 0.888165 0.029201 0.047467
1 0.865906** 0.006943** 0**
5 0.815590 -0.043373 0.063082
10 0.865906** 0.006943** 0**
TABLE 2: EFFECTS OF FEATURE TYPE, C, AND GAMMA ON
ACCURACY. *ONLY ONE TRIAL USING 25 FEATURES, **ALL
EXPERIMENTS YIELDED THE SAME RESULTS.
13. Berger 13
Experiments 8-16 in Table 1 resulted in the same accuracy. These results mean that one or more of
the following are true:
1. SIFT features are unaffected by C and Ξ³. This is unlikely because experiments 4-7 show a
chance in accuracy using SIFT features with low values of C and Ξ³. One possible conclusion is
that the classifier accuracy stagnated due to the C and Ξ³ values being too, meaning that SIFT
features should only be used with low C and Ξ³.
2. For all the experiments where Ξ· = 0.865906, there were zeroes in the bottom right corner of
the confusion matrix. This may mean that there were errors in this data or that binary
classification does not work well for SIFT features. Multi-class classification should be tested
for these experimental setups.
3. The accuracy data in Tables 1 and 2 are biased. As mentioned before, more experiments
should be run for all combinations of parameters for binary and multi-level classification in
order to get a clearer picture of how to maximize accuracy.
4. The SIFT features are too robust to properly classify our images. SIFT was designed to only
use features which did not respond to changes in lighting, viewing angle, or changes in image
brightness, sharpness, or intensity. Some of the factors which are critical to properly
classifying our images may not be considered by using SIFT. For example, changes in size may
affect optical properties of the nanoparticle being examined, and the SIFT features may be
too robust to recognize the change.
4.2: FUTURE WORK AND RECOMMENDATIONS
For future research, more experiments for all combinations of methods and parameters shown in
Table 1 are recommended to get a clearer picture of what features and parameters work best. Wider
ranges of C and Ξ³ values, as well as intermediate values, should be tested in order to potentially
establish relationships between the SVM classifier accuracy for different sets of features and the
combination of C and Ξ³. Further use of multi-class classification is also recommended due to the high
accuracy (and the need of more data to confirm that high accuracy). Furthermore, different clustering
algorithms such as K-medoids and hierarchical clustering should be attempted for multi-level
classifications. Also, the effect of boosting or changing classification methods should be investigated.
In industry, speed, automation, and accuracy will be critical. Therefore, once classifier accuracy is
optimized, I propose the following recommendations before commercializing this research:
1. Codes should be adapted to a newer, faster language, such as C++ or Python, to improve
computation speed and to create a single software package to be licensed. Computer science
professionals are increasingly being educated in newer languages (i.e. C++, Python), and our
programs should be written in the languages they are familiar with, as well as implementing
optimized algorithms, multiple processors, and GPUs.
2. All codes should be written to utilize multi-core machines, and if possible, should
automatically run on a specified percentage of the available cores. This would maximize
accuracy for all nanomanufacturers who use the software.
3. An autorun script should be implemented, and a user-friendly Graphical User Interface (GUI)
should be developed. This would reduce the cost of labor for nanomanufacturers.
14. Berger 14
4.3: CONCLUSIONS
This research shows that supervised learning algorithms have potential to be an excellent solution
for nanoparticle dimension estimation and control, which would be a significant quality engineering
development. The completion of this research will enable the scale-up, standardization, and
commercialization of nanomanufacturing processes. The initial classifier accuracy is promising, and
points to a bright future for this technology as further improvements are discovered.
16. Berger 16
[14] Y. Liu, H. Zhang, Y. Wu. βHard or Soft Classification? Large-margin Unified Machines,β North
Carolina State University, Raleigh, North Carolina, 10 January 2011.
[15] SciKit-Learn. (2014). RBF SVM Parameters [Online]. Available: http://scikit-
learn.org/stable/auto_examples/svm/plot_rbf_parameters.html. Last accessed: 24 July
2015.
[16] T. Lindeberg. (2012). Scale Invariant Feature Transform [Online]. Scholarpedia. Available:
http://www.scholarpedia.org/article/Scale_Invariant_Feature_Transform. Last accessed: 24
July 2015.
[17] H. Gao W. Cai, P. Shimpi, H. Lin, P. Gao. β(La,Sr)CoO 3/ZnO nanofilmβnanorod diode arrays
for photo-responsive moisture and humidity detection,β Journal of Physics D: Applied
Physics, 43(27):272002, 2010.
[18] C. Hennig. (2014, October 2) Flexible procedures for clustering [Online]. CRAN Repository.
Available: https://cran.r-project.org/web/packages/fpc/fpc.pdf.
[19] S. Weston. βNesting Foreach Loops,β Revolution Analytics, Redmond, Washington, 10 April
2014.
[20] R. Wang, H. Tan, Z. Zhao, G. Zhang, L. Song, W. Dong, Z. Sun. βStable ZnO@TiO2 core/shell
nanorod arrays with exposed high energy facets for self-cleaning coatings with anti-reflective
properties,β Journal of Material Chemistry A, 2:7313β7318, 2014.
[21] Y. Li, J. Kubota, K. Domen, βA Protocol for Fabrication of Barium-doped Tantalum Nitride
Nanorod Arrays,β Protocol Exchange, Nature Publising Group, doi 10.1038/protex.2013.080,
2013.
[22] Y. Luo, L. Wang, Y. Zou, X. Sheng, L. Chang, D. Yang. βElectrochemically Deposited Cu2O on
TiO2 Nanorod Arrays for Photovoltaic Application,β Electrochemical and Solid-State Letters,
15(2):H34{H36, 2011.