A3 110006223
A3 110006223
A3 110006223
In this question I’m going to use the route from our school which is National Tsing Hua
University to Big City which is a department store in Hsinchu.
The picture below is the route from Google Map that is recommended to take.
By the above map, the break points of line segments are selected by hand and each line
segment is computed using the least square method. Here, I divide it into 6 segments,
which equal to 12 points. Here I set seg_idx = [1, 9, 17, 54, 70, 102, 109].
Figure 1 Figure 2
3. a. As stated in number 2, we can find the errors of the linear system by using residuals
which is object that is returned by the function numpy.linalg.lstsq. So we first calculate
the total of errors of the equation minw∥Aw − X∥^2 and minz∥Az − Y ∥^2. If we
already find the total error then we are going to compare it to our error boundary which is
ϵ. The error of each line segment is no more than the given error bound ϵ, which means
If the total error is greater than ϵ we can append the idx2 which is the index of the point
x. Here we only use x because x and y have the same length. But the residuals we still
count with respect to x and y. The first and the last index of the x or y must be appended
to seg_idx. Therefore, it can automatically break a trajectory into line segments.
The code below shows how all the line segments are connected
Here, we plot with based on x and y because x and y are the points in the GPX. By
doing this, the line segments are always connected because x and y are in the elements
of the GPX points, which is the actual points that generate the original map.
b. The time complexity of this algorithm is O(nmF(k)). This is because my code performs
a linear search through the input lists to find the segment line, and for each segment it
calls the np.linalg.lstsq function once. Thus, if the np.linalg.lstsq function has a time
complexity of O(k^2), the overall time complexity of the algorithm is O(nmk^2).
→ b1t1^2 + b2t1 + a3 = y1
→ b1t2^2 + b2t2 + a3 = y2
→…
→ b1tn^2 + b2tn + a3 = yn
Here, t is the independent variable, and x and y are the dependent variables which are
the position coordinates.
Next step is to find the values of the coefficients that minimize the sum of the squares of
the differences between the observed values and the predicted values. Observed values
are the actual values of the dependent variables (Here, x and y) that are measured in
the data set. The predicted values are the values of the dependent variables that are
predicted by the system which is the quadratic curve that needs to fit to the data.
The goal of the least squares method is to find the values of the coefficients in the
quadratic curve based on the given problem that produces the best fit to the observed
data. This is done by minimizing the sum of the squares of the differences between the
observed values and the predicted values. The smaller the sum of the squares of the
differences, the better the fit of the model to the data.
Therefore, we are going to use numpy.polyfit to find the value a and b because this
function is able to fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to
points (x, y). Returns a vector of coefficients p that minimizes the squared error in the
order deg, deg-1, … 0. Then, we are going to use the a and b as the inputs of the
function drawCurve. And the values that are returned by the drawCurve function, which
are x and y will be plotted. Thus, we can see the plotted graph as the figure below.
5. The total least squares method is one of regression methods that is used to fit a curve
to a set of data points by minimizing the sum of the squared errors between the points
and the curve or plane, rather than just the distance along one dimension like linear
least squares.
The algorithm for total least squares is similar to linear least squares, but it uses a
different approach to calculate the best fit line. Instead of finding the line or curve that
minimizes the sum of the squared residuals, which are the distances along one
dimension from the data points to the fitted line or plane, total least squares find the line
or plane that minimizes the sum of the squared distances from the data points to the
fitted line or plane. Below is the picture to illustrate how the total least squares method
and least squares method differs.
We can see that the total least squares method handles errors in both the x and y
dimensions. While the linear least squares method only handles errors in the y
dimension. This makes the total least squares method more accurate, especially when
dealing with data that has errors. The results of using the total least squares method to
compress trajectory data would likely be more accurate than using the linear least
squares method, since the total least squares method handles errors or both x and y
dimensions.
In terms of applications, the linear least squares method is often used when the data is
linear and the errors are small. However, the total least square method is more suitable
for data that is non-linear and has larger errors, and can be used to find the best fit line
for a set of data points in dimensions.
Sources:
https://realpython.com/k-means-clustering-python/#:~:text=Clustering%20is%20a%20set%20of,
Meaningfulness
https://numpy.org/doc/stable/reference/generated/numpy.linalg.lstsq.html#:~:text=Return%20the
%20least%2Dsquares%20solution,equation%20a%20%40%20x%20%3D%20b%20.
https://towardsdatascience.com/total-least-squares-in-comparison-with-ols-and-odr-f050ffc1a86
a#:~:text=Total%20least%20squares(aka%20TLS,often%20say%20a%20fitted%20value).