Projects
Projects
Projects
During my internship at Optum, I played a pivotal role in developing the PDUT (Production Database
Update Tool) Hybrid Application Model. This project aimed to automate and streamline production
database updates across both on-premises and Azure cloud environments. Utilizing Java and Spring
Boot framework, we designed and implemented a robust hybrid application architecture. Integration
with Azure services such as Azure SQL Database and Virtual Machines ensured seamless deployment
and efficient performance. My responsibilities included developing server-side code, optimizing
database queries, and collaborating closely with cross-functional teams including developers, cloud
services experts, and database administrators. This experience significantly enhanced my technical
proficiency in hybrid application development, cloud computing, and version control with Git. It also
fostered growth in teamwork, communication, and problem-solving skills, preparing me effectively
for future challenges in the tech industry.
Bharat Intern
During my internship at JPMorgan Chase & Co., I had the opportunity to immerse myself in a
dynamic environment focused on financial technology and data analysis. My role primarily involved
interfacing with stock price data feeds and using company frameworks and tools to analyze and
visualize this data for traders. This hands-on experience allowed me to apply my skills in Python and
data manipulation libraries to real-world financial datasets, enhancing my understanding of
quantitative analysis and market dynamics.
Throughout the internship, I collaborated closely with a team of experienced professionals who
provided guidance and mentorship, helping me navigate complex financial data systems and hone
my problem-solving abilities. I also gained valuable insights into the operations of a global financial
institution, learning how technology and data science drive decision-making processes in the
financial markets.
Additionally, the internship at JPMorgan equipped me with practical knowledge of financial tools
and industry best practices, contributing to my professional growth and preparing me for future
roles in fin-tech and data analytics. This experience solidified my interest in leveraging technology to
solve challenges in the financial sector and provided a solid foundation for my career aspirations in
quantitative finance and data-driven decision-making.
A Novel Framework for Plant Disease Detection using Keras and API Implementation
A Novel Framework for Plant Disease Detection using Keras and API Implementation
represents a significant project aimed at enhancing agricultural practices through advanced
technology. Leveraging Keras, a deep learning framework, the project focuses on automating
the detection of plant diseases using image processing techniques. The framework involves
training convolutional neural networks (CNNs) on a large dataset of plant images annotated
with disease labels.
The implementation integrates an API (Application Programming Interface) to facilitate
seamless interaction between the trained model and end-users, such as farmers or agricultural
experts. This API allows users to upload images of diseased plants, which are then processed
through the CNN model to accurately diagnose the presence and type of disease. The system
provides real-time feedback, enabling prompt decisions on disease management and
treatment.
Key features of the framework include its scalability, as it can accommodate new disease
patterns and plant varieties through ongoing model training and dataset expansion. It also
emphasizes usability, with a user-friendly interface that simplifies the disease detection
process for non-technical users in the agricultural sector.
Overall, this project represents a pioneering approach to leveraging deep learning and API
technology to address agricultural challenges, promising improved crop management, higher
yields, and sustainable farming practices.
1. Data Flow:
augmentation(artificially increase the size and diversity of a training dataset by creating new data
points from existing ones.).
• The pre-processed data is split into training and testing sets and validation set(used to fine-
tune the hyperparameters of a machine learning model during training. It helps prevent overfitting
and ensures the model generalizes well to unseen data)
• The trained convolutional neural network (CNN) models are deployed as APIs for disease
detection.
• Users can feed images into the deployed APIs for live disease prediction.
2. Model Training:
• Transfer learning and fine-tuning techniques are utilized on pre-trained CNN models (e.g.,
VGG16, ResNet) using a Kaggle dataset.
VGG16
• Strengths: Achieves high accuracy on various tasks, relatively simple and easy to implement.
ResNet:
• Strengths: Addresses the vanishing gradient problem, achieves better accuracy than VGG16
with fewer parameters.
• Regular retraining of models is performed to adapt to new data and evolving conditions.
3. Model Evaluation:
• Model evaluation is conducted using a confusion matrix, which provides insights into the
model's performance across different classes.
• Other evaluation metrics such as accuracy, loss, precision, recall, and F1-score may also be
utilized to assess model performance.
• Security measures such as encryption, authentication, and access control are implemented
to ensure data security.
• Monitoring tools and logging mechanisms are established to track the health and
performance of the system.
• Regular maintenance and updates are conducted to optimize system performance and
address any issues or vulnerabilities.
• Features such as image upload, prediction display, and model performance metrics are
included.
Patient data is gathered from the University of California, Irvine (UCI) Machine Learning Repository,
focusing on relevant attributes for heart disease diagnosis. This dataset contains various clinical and
demographic features of patients.
1. Data Preprocessing:
• Data cleaning is performed to handle missing values, outliers, and ensure data quality.
Missing values, typically represented as "?", are removed or imputed.
• The dataset is transformed to ensure all data is in numeric format, which is necessary for
machine learning algorithms to process.
2. Data Splitting:
• The dataset is divided into training (80%) and testing (20%) sets using the train_test_split
function from Scikit-Learn. This is crucial for assessing the model's generalization performance.
3. Label Encoding:
• Class values representing different heart disease severity levels are converted into
categorical labels for classification. This ensures the model can understand and predict the severity
levels.
• A neural network model is designed using the Keras library. The model consists of:
5. Model Training:
• The neural network model is trained using categorical cross-entropy loss and the Adam
optimizer. The training process involves optimizing weights and biases through iterations to
minimize the loss function.
Categorical cross-entropy loss, often used in multiclass classification problems, measures the
performance of a classification model whose output is a probability value between 0 and 1. It’s
calculated as the negative log-likelihood of the true labels given the predictions provided by the
model. For a model that outputs probabilities ( p_{i,j} ) for class ( j ) for instance ( i ), and with true
labels ( y_{i,j} ), the loss for ( N ) instances and ( C ) classes
The Adam optimizer, short for Adaptive Moment Estimation, is an algorithm for optimization that’s
particularly efficient for large datasets or parameters. Adam computes adaptive learning rates for
each parameter based on estimates of first moments (the mean) and second moments (the
uncentered variance) of the gradients
• The model is trained for a specific number of epochs with a defined batch size.
6. Binary Classification:
• After obtaining promising results, the problem is simplified into a binary classification task.
The goal is to distinguish between the presence and absence of heart disease, where class 0
represents no heart disease, and classes 1-4 represent varying degrees of heart disease severity.
• Metrics such as accuracy, precision, recall, and F1-score are calculated to assess the model's
effectiveness.
• Classification reports are generated using the Scikit-Learn library to provide detailed insights
into the model's performance across different classes.
The task management system I developed is a comprehensive tool designed to streamline and
organize workflow processes efficiently. Built using Python Flask for the backend and
MySQL for the database, it offers robust features for task creation, assignment, tracking, and
completion. Users can categorize tasks, set priorities, assign deadlines, and monitor progress
through an intuitive user interface.
Key functionalities include user authentication and authorization, ensuring secure access and
data integrity. The system supports real-time updates and notifications, enhancing
collaboration among team members. It also integrates reporting capabilities, allowing
stakeholders to generate custom reports and insights on project statuses and performance
metrics.
Overall, the task management system facilitates productivity by centralizing task
management activities, promoting transparency, and optimizing workflow efficiency across
organizational levels.
• Tweets from Twitter are collected using the Twitter API. The dataset may include tweets
labeled with sentiment (positive, negative, neutral) or may require manual labeling.
2. Data Preprocessing:
• Text data preprocessing techniques are applied to clean and prepare the tweets for analysis.
This includes removing special characters, punctuation, stopwords, and performing tokenization.
3. Feature Extraction:
• Various techniques such as Count Vectorization(tool in the scikit-learn library for Python that
transforms text data into numerical representations based on word frequencies.) are used to
convert the text data into numerical vectors. These techniques capture the semantic meaning of
words and their relationships within the corpus.
4. Model Selection:
• Logistic Regression is chosen as the classification algorithm due to its efficiency and
effectiveness in text classification tasks like sentiment analysis.
• Linear Regression:
o The goal is to find the best-fit line (linear relationship) that can predict the output for a
continuous dependent variable based on one or more independent variables.
• Logistic Regression:
o It’s a classification algorithm used to predict binary outcomes (0 or 1, Yes or No, True or
False).
o The probability of the default class (usually “1”) is modeled as a function of independent
variables.
5. Model Training:
• The selected model is trained on the preprocessed and feature-engineered dataset. During
training, the model learns to classify tweets into different sentiment categories (positive, negative,
neutral).
6. Model Evaluation:
• The trained model is evaluated using metrics such as accuracy, precision, recall, and F1-score
to assess its performance. Cross-validation techniques may be used to ensure the model's
generalization ability.
Accuracy: This measures the overall correctness of the model’s predictions. It’s the ratio of correctly
predicted observations to the total observations.
Recall: Also known as sensitivity or the true positive rate, this metric quantifies the number of true
positive predictions made out of all actual positive examples in the dataset.
F1-score: This is the harmonic mean of precision and recall, providing a balance between the two by
considering both false positives and false negatives. It’s particularly useful when the class
distribution is imbalanced.
F1-score=2×Precision+RecallPrecision×Recall