Projects

PDUT- optum
During my internship at Optum, I played a pivotal role in developing the PDUT (Production Database
Update Tool) Hybrid Application Model. This project aimed to automate and streamline production
database updates across both on-premises and Azure cloud environments. Utilizing Java and Spring
Boot framework, we designed and implemented a robust hybrid application architecture. Integration
with Azure services such as Azure SQL Database and Virtual Machines ensured seamless deployment
and efficient performance. My responsibilities included developing server-side code, optimizing
database queries, and collaborating closely with cross-functional teams including developers, cloud
services experts, and database administrators. This experience significantly enhanced my technical
proficiency in hybrid application development, cloud computing, and version control with Git. It also
fostered growth in teamwork, communication, and problem-solving skills, preparing me effectively
for future challenges in the tech industry.
Bharat Intern
During my internship at ML Bharat, I contributed significantly to the Twitter Sentiment Analysis

project, focusing on hate speech detection in tweets. Utilizing machine learning algorithms like
Logistic Regression, along with natural language processing techniques such as Count Vectorization
and TF-IDF, we processed and classified tweets to identify racist or sexist content. The project
involved extensive pre-processing of textual data, feature extraction, model training, and evaluation
using metrics like accuracy and F1-score. This experience enhanced my proficiency in NLP, model
evaluation, and handling large datasets. It also underscored the importance of algorithm selection
and bias mitigation in improving hate speech detection systems. Moving forward, the project aims to
refine the model further and explore real-time analysis capabilities for broader applicability in social
media moderation and sentiment analysis.
JPMorgan Chase & Co.
During my internship at JPMorgan Chase & Co., I had the opportunity to immerse myself in a
dynamic environment focused on financial technology and data analysis. My role primarily involved
interfacing with stock price data feeds and using company frameworks and tools to analyze and
visualize this data for traders. This hands-on experience allowed me to apply my skills in Python and
data manipulation libraries to real-world financial datasets, enhancing my understanding of
quantitative analysis and market dynamics.
Throughout the internship, I collaborated closely with a team of experienced professionals who
provided guidance and mentorship, helping me navigate complex financial data systems and hone
my problem-solving abilities. I also gained valuable insights into the operations of a global financial
institution, learning how technology and data science drive decision-making processes in the
financial markets.
Additionally, the internship at JPMorgan equipped me with practical knowledge of financial tools
and industry best practices, contributing to my professional growth and preparing me for future
roles in fin-tech and data analytics. This experience solidified my interest in leveraging technology to
solve challenges in the financial sector and provided a solid foundation for my career aspirations in
quantitative finance and data-driven decision-making.
A Novel Framework for Plant Disease Detection using Keras and API Implementation
A Novel Framework for Plant Disease Detection using Keras and API Implementation
represents a significant project aimed at enhancing agricultural practices through advanced
technology. Leveraging Keras, a deep learning framework, the project focuses on automating
the detection of plant diseases using image processing techniques. The framework involves
training convolutional neural networks (CNNs) on a large dataset of plant images annotated
with disease labels.
The implementation integrates an API (Application Programming Interface) to facilitate
seamless interaction between the trained model and end-users, such as farmers or agricultural
experts. This API allows users to upload images of diseased plants, which are then processed
through the CNN model to accurately diagnose the presence and type of disease. The system
provides real-time feedback, enabling prompt decisions on disease management and
treatment.
Key features of the framework include its scalability, as it can accommodate new disease
patterns and plant varieties through ongoing model training and dataset expansion. It also
emphasizes usability, with a user-friendly interface that simplifies the disease detection
process for non-technical users in the agricultural sector.
Overall, this project represents a pioneering approach to leveraging deep learning and API
technology to address agricultural challenges, promising improved crop management, higher
yields, and sustainable farming practices.
1. Data Flow:
• Plant images are collected from kaggle sources.
• The collected data undergoes pre-processing steps such as resizing,
normalization(data is transformed to have a common scale or range.), and
augmentation(artificially increase the size and diversity of a training dataset by creating new data
points from existing ones.).
• The pre-processed data is split into training and testing sets and validation set(used to fine-
tune the hyperparameters of a machine learning model during training. It helps prevent overfitting
and ensures the model generalizes well to unseen data)
• The trained convolutional neural network (CNN) models are deployed as APIs for disease
detection.
• Users can feed images into the deployed APIs for live disease prediction.
2. Model Training:
• Transfer learning and fine-tuning techniques are utilized on pre-trained CNN models (e.g.,
VGG16, ResNet) using a Kaggle dataset.
VGG16
• Strengths: Achieves high accuracy on various tasks, relatively simple and easy to implement.
• Weaknesses: Requires large memory and computational resources, prone to overfitting.
• Applications: Image classification, object detection, feature extraction
ResNet:
• Strengths: Addresses the vanishing gradient problem, achieves better accuracy than VGG16
with fewer parameters.
• Weaknesses: More complex architecture, requires careful hyperparameter tuning.
• Applications: Image classification, object detection, image segmentation, video analysis.
• Hyperparameter optimization techniques are applied to fine-tune the model's performance.
• Regular retraining of models is performed to adapt to new data and evolving conditions.
3. Model Evaluation:
• Model evaluation is conducted using a confusion matrix, which provides insights into the
model's performance across different classes.
• Other evaluation metrics such as accuracy, loss, precision, recall, and F1-score may also be
utilized to assess model performance.
4. Security and Privacy:
• Security measures such as encryption, authentication, and access control are implemented
to ensure data security.
• Privacy-preserving techniques are employed to protect user data and maintain

confidentiality.
5. Monitoring and Maintenance:
• Monitoring tools and logging mechanisms are established to track the health and
performance of the system.
• Regular maintenance and updates are conducted to optimize system performance and
address any issues or vulnerabilities.
6. API Implementation/Web Application:
• The framework includes a user-friendly interface implemented using technologies such as

Streamlit for users to interact with and visualize the model's predictions.
• Features such as image upload, prediction display, and model performance metrics are
included.
Real-time Heart Disease Detection through Neural Networks
Real-time Heart Disease Detection through Neural Networks is a cutting-edge application of

artificial intelligence in healthcare, specifically designed to enhance early diagnosis and
treatment of cardiovascular conditions. This project utilizes neural networks, a subset of
machine learning algorithms inspired by the human brain's structure, to analyze medical data
and predict the presence of heart disease.
The neural network model is trained on a comprehensive dataset comprising various patient
attributes such as age, gender, cholesterol levels, blood pressure, and other relevant clinical
indicators. Through extensive training and validation, the model learns to recognize patterns
and correlations in the data that are indicative of different types and severities of heart
disease.
The real-time aspect of this project is critical, as it enables healthcare providers to receive
instantaneous feedback based on patient data inputs. This capability supports timely decision-
making, potentially leading to quicker intervention and treatment planning for patients at risk
of heart disease.
Key components of the project include data preprocessing to ensure data quality and
consistency, feature engineering to extract meaningful insights from raw data, and model
evaluation using performance metrics like accuracy, sensitivity, specificity, and area under
the curve (AUC). These metrics validate the model's effectiveness in distinguishing between
healthy individuals and those with cardiovascular issues.
By harnessing the power of neural networks and real-time data processing, this project aims
to contribute significantly to preventive healthcare strategies, improving patient outcomes,
and reducing healthcare costs associated with cardiovascular diseases. It represents a
transformative application of AI in healthcare, aligning technological innovation with critical
medical needs.
Patient data is gathered from the University of California, Irvine (UCI) Machine Learning Repository,
focusing on relevant attributes for heart disease diagnosis. This dataset contains various clinical and
demographic features of patients.
1. Data Preprocessing:
• Data cleaning is performed to handle missing values, outliers, and ensure data quality.
Missing values, typically represented as "?", are removed or imputed.
• The dataset is transformed to ensure all data is in numeric format, which is necessary for
machine learning algorithms to process.
2. Data Splitting:
• The dataset is divided into training (80%) and testing (20%) sets using the train_test_split
function from Scikit-Learn. This is crucial for assessing the model's generalization performance.
3. Label Encoding:
• Class values representing different heart disease severity levels are converted into
categorical labels for classification. This ensures the model can understand and predict the severity
levels.
4. Neural Network Architecture:
• A neural network model is designed using the Keras library. The model consists of:
• Input layer with 13 input features corresponding to dataset attributes.
• Hidden layers with ReLU activation functions to introduce non-linearity.
• Output layer with softmax activation function for multiclass classification.
5. Model Training:
• The neural network model is trained using categorical cross-entropy loss and the Adam
optimizer. The training process involves optimizing weights and biases through iterations to
minimize the loss function.
Categorical cross-entropy loss, often used in multiclass classification problems, measures the
performance of a classification model whose output is a probability value between 0 and 1. It’s
calculated as the negative log-likelihood of the true labels given the predictions provided by the
model. For a model that outputs probabilities ( p_{i,j} ) for class ( j ) for instance ( i ), and with true
labels ( y_{i,j} ), the loss for ( N ) instances and ( C ) classes
The Adam optimizer, short for Adaptive Moment Estimation, is an algorithm for optimization that’s
particularly efficient for large datasets or parameters. Adam computes adaptive learning rates for
each parameter based on estimates of first moments (the mean) and second moments (the
uncentered variance) of the gradients
• The model is trained for a specific number of epochs with a defined batch size.
6. Binary Classification:
• After obtaining promising results, the problem is simplified into a binary classification task.
The goal is to distinguish between the presence and absence of heart disease, where class 0
represents no heart disease, and classes 1-4 represent varying degrees of heart disease severity.
7. Evaluation and Metrics:
• The performance of the model is evaluated using the testing dataset.
• Metrics such as accuracy, precision, recall, and F1-score are calculated to assess the model's
effectiveness.
• Classification reports are generated using the Scikit-Learn library to provide detailed insights
into the model's performance across different classes.
Task Management System
The task management system I developed is a comprehensive tool designed to streamline and
organize workflow processes efficiently. Built using Python Flask for the backend and
MySQL for the database, it offers robust features for task creation, assignment, tracking, and
completion. Users can categorize tasks, set priorities, assign deadlines, and monitor progress
through an intuitive user interface.
Key functionalities include user authentication and authorization, ensuring secure access and
data integrity. The system supports real-time updates and notifications, enhancing
collaboration among team members. It also integrates reporting capabilities, allowing
stakeholders to generate custom reports and insights on project statuses and performance
metrics.
Overall, the task management system facilitates productivity by centralizing task
management activities, promoting transparency, and optimizing workflow efficiency across
organizational levels.
Twitter Sentiment Analysis
• Tweets from Twitter are collected using the Twitter API. The dataset may include tweets
labeled with sentiment (positive, negative, neutral) or may require manual labeling.
2. Data Preprocessing:
• Text data preprocessing techniques are applied to clean and prepare the tweets for analysis.
This includes removing special characters, punctuation, stopwords, and performing tokenization.
3. Feature Extraction:
• Various techniques such as Count Vectorization(tool in the scikit-learn library for Python that
transforms text data into numerical representations based on word frequencies.) are used to
convert the text data into numerical vectors. These techniques capture the semantic meaning of
words and their relationships within the corpus.
4. Model Selection:
• Logistic Regression is chosen as the classification algorithm due to its efficiency and
effectiveness in text classification tasks like sentiment analysis.
• Linear Regression:
o It’s a regression algorithm used to predict continuous outcomes.
o The goal is to find the best-fit line (linear relationship) that can predict the output for a
continuous dependent variable based on one or more independent variables.
• Logistic Regression:
o It’s a classification algorithm used to predict binary outcomes (0 or 1, Yes or No, True or
False).
o The probability of the default class (usually “1”) is modeled as a function of independent
variables.
5. Model Training:
• The selected model is trained on the preprocessed and feature-engineered dataset. During
training, the model learns to classify tweets into different sentiment categories (positive, negative,
neutral).
6. Model Evaluation:
• The trained model is evaluated using metrics such as accuracy, precision, recall, and F1-score
to assess its performance. Cross-validation techniques may be used to ensure the model's
generalization ability.
Accuracy: This measures the overall correctness of the model’s predictions. It’s the ratio of correctly
predicted observations to the total observations.
Accuracy=Total ObservationsTrue Positives+True Negatives

Precision: Also known as the positive predictive value, this metric quantifies the number of true
positive predictions out of all positive predictions made by the model.
Precision=True Positives+False PositivesTrue Positives
Recall: Also known as sensitivity or the true positive rate, this metric quantifies the number of true
positive predictions made out of all actual positive examples in the dataset.
Recall=True Positives+False NegativesTrue Positives
F1-score: This is the harmonic mean of precision and recall, providing a balance between the two by
considering both false positives and false negatives. It’s particularly useful when the class
distribution is imbalanced.
F1-score=2×Precision+RecallPrecision×Recall

Projects

Uploaded by

Copyright:

Available Formats

Projects

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Projects

Uploaded by

Copyright:

Available Formats

PDUT- optum

During my internship at ML Bharat, I contributed significantly to the Twitter Sentiment Analysis

JPMorgan Chase & Co.

• Plant images are collected from kaggle sources.

• The collected data undergoes pre-processing steps such as resizing,

normalization(data is transformed to have a common scale or range.), and

• Weaknesses: Requires large memory and computational resources, prone to overfitting.

• Applications: Image classification, object detection, feature extraction

• Weaknesses: More complex architecture, requires careful hyperparameter tuning.

• Applications: Image classification, object detection, image segmentation, video analysis.

• Hyperparameter optimization techniques are applied to fine-tune the model's performance.

4. Security and Privacy:

• Privacy-preserving techniques are employed to protect user data and maintain

5. Monitoring and Maintenance:

6. API Implementation/Web Application:

• The framework includes a user-friendly interface implemented using technologies such as

Real-time Heart Disease Detection through Neural Networks

Real-time Heart Disease Detection through Neural Networks is a cutting-edge application of

4. Neural Network Architecture:

• Input layer with 13 input features corresponding to dataset attributes.

• Hidden layers with ReLU activation functions to introduce non-linearity.

• Output layer with softmax activation function for multiclass classification.

7. Evaluation and Metrics:

• The performance of the model is evaluated using the testing dataset.

Task Management System

Twitter Sentiment Analysis

o It’s a regression algorithm used to predict continuous outcomes.

Accuracy=Total ObservationsTrue Positives+True Negatives

Precision=True Positives+False PositivesTrue Positives

Recall=True Positives+False NegativesTrue Positives

You might also like