Geospatial ML - Feature Engineering
Geospatial ML - Feature Engineering
1
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
prediction. Feature engineering helps identify and retain the most relevant features,
reducing input dimensionality and making training more efficient. Furthermore, raw
data often contains irrelevant or redundant information, and feature engineering aims
to eliminate these elements, which not only speeds up training but also reduces the
risk of overfitting, ultimately improving the model's accuracy and reliability.
For example, in predicting wildfire risk using remote sensing maps, there is a wide
range of potential inputs, such as vegetation type, soil moisture, temperature, wind
speed, humidity, topography, and historical fire occurrence. Each of these could
potentially influence wildfire risk, but not all of them may be equally informative, and
some might even introduce noise if left unfiltered. Through feature engineering, we
can focus on the most impactful features and prepare them in ways that make them
more useful to the model. For instance, we might select vegetation type, soil moisture,
and temperature as key predictors after analyzing their correlation with past fire events.
We can also create interaction features, like combining temperature and humidity to
capture their combined effect on fire risk. Additionally, we might compress highly
detailed topographic data into simpler elevation or slope variables, which the model
can more easily interpret. This careful feature selection and transformation process
tailors the inputs, ensuring that only the most relevant and refined information is fed
into the model, making wildfire prediction both more efficient and accurate.
Historically, feature engineering was a manual task where data scientists relied heavily
on domain expertise to select and transform features that would enhance model
performance. However, with the advent of deep neural networks, models have gained
the ability to automatically learn relevant features from raw data. These models can
capture complex patterns and representations directly, without manual intervention—a
process known as feature learning, where the model discovers and refines the most
useful features on its own. This capability has been transformative, particularly for high-
dimensional and unstructured data like images, text, and audio.
However, there are still limitations: deep neural networks require vast amounts of data
and computational resources to learn effectively, and they may not automatically
capture certain domain-specific nuances. For many applications, particularly those
with smaller datasets or unique domain requirements, manual feature engineering
remains important, and it’s likely to stay essential for achieving optimal results in various
contexts including environmental data analytics.
Spatial Feature Engineering
Spatial feature engineering is a specialized subset of feature engineering focused on
creating features from data with spatial or geographical components. It emphasizes
leveraging spatial relationships—such as proximity, neighborhood effects, or spatial
2
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Feature Selection
Feature selection is the process of choosing the most relevant inputs (features) for a
model while discarding those that may be redundant, irrelevant, or noisy, helping the
model focus on high-quality data for better performance.
For example, in remote sensing or geospatial data, we may have multiple bands (e.g.,
red, green, blue, near-infrared) and additional data layers like elevation, soil moisture,
and land cover type. Suppose the goal is to predict vegetation health. Using feature
selection, we might find that near-infrared and red bands, combined with soil moisture,
are highly predictive of vegetation health, while the blue band or elevation data has
little effect. By keeping only these relevant features, the model can focus on the most
impactful data, reducing noise and potentially speeding up processing. There are a
wide range of methods for feature selection, here we focus on two popular ones:
1) Cluster-based Feature Selection:
Cluster-based Feature Selection is a technique used to reduce feature redundancy by
grouping similar features into clusters and then selecting representative features from
each group. This process begins by measuring the similarity between features, often
using correlation or mutual information, and then clustering them using algorithms like
hierarchical clustering or k-means. Within each cluster, a representative feature is
chosen—typically the feature that is most central, has the highest variance, or shows the
strongest relationship with the target variable. By selecting one feature from each
3
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
cluster, this method ensures that the final subset retains the diversity of information
across clusters while eliminating redundant features. This approach is especially useful
in datasets where high correlations between features can lead to overfitting or
inefficient model training.
4
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
gradients indicate that small changes in a feature have a strong effect on the output,
suggesting higher importance.
Feature Combination
5
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Dimensionality Reduction
PCA is popular in geospatial feature engineering. In geospatial contexts, data often includes
many correlated features collected over extensive areas and time periods, making PCA a
valuable tool for reducing complexity while retaining essential spatial patterns and
relationships. A common example of PCA use cases is Multispectral and Hyperspectral
Imaging. In remote sensing, PCA is frequently applied to satellite or aerial imagery, which often
includes multiple spectral bands. By reducing the number of bands to a few principal
components, PCA simplifies the data, making it easier to analyze without losing critical
information
6
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Feature Extraction
Feature extraction is the process of creating new features by transforming raw data into
informative representations, often in cases where raw data is high-dimensional or
complex (e.g., images, text, or audio). It involves generating meaningful features that
summarize or capture essential information from the data, which can improve model
performance. Unlike feature selection, which simply chooses existing features, feature
extraction focuses on creating entirely new representations from the raw data.
7
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
In spatial feature engineering two types of feature extraction techniques are common:
Spatial summary feature which aggregates information from a spatial area to produce
a single metric for a location. For example, taking the average income within a city block
or the total vegetation cover in a specific radius around a point creates spatial summary
features. These features help capture the overall characteristics of an area or its
immediate surroundings.
Key categories of spatial summary features include:
• Zonal Statistics Features: These summarize values within defined spatial zones,
such as mean, median, standard deviation, or sum. For example, average
elevation, mean rainfall, or population density over a defined area. Zonal
statistics are broadly used in fields like urban planning, hydrology, and ecology.
• Spatial Density Features: These features capture the density of objects or
events within a specified area, like population density or the density of trees in a
forest. Density features are applicable in fields like ecology, public health (e.g.,
disease spread), and urban studies.
• Spatial Autocorrelation Features: Measures like Moran's I capture how similar
values are spatially clustered or dispersed. These features help in identifying
patterns, such as clustering of high-pollution areas, and are widely used in
environmental studies, economics, and crime analysis.
• Edge and Boundary Features: These focus on the properties of boundaries or
transitions between different spatial areas, such as land cover edges or
coastlines. Boundary features are useful in studying ecological zones, urban-
rural boundaries, and habitat fragmentation across fields like geography,
environmental science, and conservation.
• Texture features are a popular type of spatial summary feature in remote
sensing. In remote sensing, texture features summarize the spatial patterns of
pixel values within a specified area, capturing characteristics like roughness,
smoothness, or variability. This aligns with the concept of spatial summary
features, as texture features aggregate information from a defined spatial area
to provide a single descriptive metric that represents the surface's overall texture
within that region. The resulting texture features. for example, enhance the ability
to differentiate between land cover types (e.g., forests, urban areas, water
bodies).
8
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
9
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Feature Transformation
Coordinate encoding and normalization transform raw latitude and longitude values
into features that are more interpretable and effective for machine learning models.
Since longitude values repeat every 360 degrees and latitude has physical constraints
(bounded between -90 and 90 degrees), these geographic coordinates often need
special handling to maintain their spatial relationships. Here are some key methods
1) Cyclic Encoding (Sinusoidal or Trigonometric Encoding)
Latitude and longitude are circular in nature; for example, the longitude of -180
degrees is equivalent to +180 degrees, meaning there's a wraparound effect. Directly
using these values in a model can lead to incorrect distance and direction relationships
because 180 and -180, though numerically distant, are geographically adjacent.
• Solution: Transform latitude and longitude into cyclical features using sine and
cosine transformations. This encoding helps maintain the periodic or
wraparound nature of these coordinates, which can be particularly important for
capturing spatial patterns or proximities.
10
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
In cyclic encoding, each coordinate (latitude and longitude) is transformed into two
separate values, one for the sine (_sin) and one for the cosine (_cos).
2) Scaling and Normalization
Latitude and longitude are on different ranges (latitude between -90 and 90, longitude
between -180 and 180), which can lead to scaling issues in models sensitive to feature
magnitude (e.g., distance-based methods like K-nearest neighbors).
• Solution: Normalize both latitude and longitude to a consistent scale, typically
between 0 and 1, or standardize them to have zero mean and unit variance. This
step ensures that both coordinates contribute comparably to model predictions
and that no single coordinate overwhelms the distance metric.
For example:
Encoding categorical features is the process of converting categorical data (e.g., labels
or categories) into numerical values that a machine learning model can understand.
11
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Common methods include one-hot encoding (creating binary columns for each
category, where 1 indicates presence and 0 indicates absence) and label encoding
(assigning a unique integer to each category).
For example, in geospatial or remote sensing data, suppose we have a feature
representing land cover type with categories like "forest," "urban," and "water." Using
one-hot encoding, we would create three new features—one for each type—so that each
area is marked as either forest, urban, or water. This approach allows the model to work
with categorical data more effectively, treating land cover as a distinct factor while still
handling it in numerical form.
Feature Binning
Feature binning is the process of dividing continuous data into discrete intervals, or
"bins," to simplify analysis and help models detect broader patterns within ranges
rather than exact values. This technique can be particularly useful for managing outliers
or creating categorical levels from numerical data.
For example, in geospatial or remote sensing data, suppose we have a feature for
elevation that ranges from 0 to 3000 meters. Instead of using exact elevation values,
we could create bins like "low" (0–1000m), "medium" (1000–2000m), and "high" (2000–
3000m). By grouping elevation this way, the model can recognize general elevation
patterns across regions, which may simplify its understanding and improve predictions
when elevation only matters within these broader categories.
12