Thanks for stopping by! Iโm a data scientist and program evaluator with over 6 years of experience in the United States and India. I've led and worked in data science teams working on building predictive ML models for government clients, creating GIS-based systems, and developing widely-used data products in the urban and tech space, among other things. I hold an MSc in Computer Science and Public Policy (MS-CAPP) at the University of Chicago. I am passionate about applying my technical skills to build & improve products and programs that improve the lives of people.
Feel free to take a look at my resume and to connect with me via LinkedIn.
- ๐: The programming languages I use are: Python, R, and SQL.
- โ๏ธ: Cloud Computing Stack: AWS and GCP
- ๐: Other skills: Power BI, PySpark, Tableau, QGIS, Computer Vision.
- ๐: Currently working on: Improving expertise in LLMs and NLP.
Some of the projects I have worked on, among others, are:
- Conducted a comprehensive analysis of climate change impacts using NLP models, computer vision, and dashboards.
- Developed supervised & unsupervised ML models to evaluate the impact of transit programs and analyze crime patterns.
- Deployed boosted trees ML model (in R) to optimize methane leak inspections at oil & gas facilities.
- Used OpenCV and QGIS to classify and georeference urban land use images into interactive raster maps to show environmental degradation over time.
- Built NLP-based techniques and models to investigate fake news.
You can see some of my work in the following repositories:
-
๐ Rides to Safety: Chicago Crime, UChicago, and the Lyft Smart Ride Program: This repository presents a comprehensive analysis of transportation and crime in Chicago. The projects utilize big data processing, machine learning, and interactive visualization to evaluate the impact of safety programs and analyze crime patterns. Technologies used include GCP and PySpark for handling large datasets, AWS for cloud computing, and R for data visualization and interactive applications. These skills are showcased through various components, including exploratory data analysis, supervised and unsupervised machine learning, and the development of Shiny applications for interactive data exploration.
-
๐๏ธ Climate Dynamics Decoded: Analyzing Impact, Opinion, and Change: This repository is an integrated analysis of climate change impacts through disaster frequency and cost, public sentiment, and urban land use changes, utilizing techniques such as NLP models, PyTorch, OpenCV, and Dash for interactive data analysis.
-
๐ฎ๐ณ The economic and environmental costs of congestion: In this repository, I developed a novel method to estimate the economic & social costs of congestion in cities using half a billion Uber data points. I also built an interactive dashboard & published a paper highlighting the impacts of removing bottlenecks on labor markets & overall economic productivity. Techniques utilized included big data processing, PostGres for data management, with PostGIS, Python, & R for analysis.
Please get in touch, Iโm keen to chat. I always look forward to meeting enthusiastic and interesting new people.