Vector 8 Problem Set 8

Due: Tuesday December 15 by 11:55pm CST.
Upload your solutions to Moodle in a PDF.
Please feel free to use RStudio for all calculations, including row reduction, matrix multiplication, eigenvector calculation, inverse matrices, and the Gram-Schmidt process.
You can download the Rmd source file for this problem set.
Four out of five questions require you to use RStudio. I’ve provided a PS8 solution template if you want to use it.

This problem set covers Sections 6.1 - 6.5 Orthogonality and the explorations on Voting Patterns in the US Senate and Least Squares Approximation.

8.1 Two Orthogonality Properties

Give rigorous proofs for each of the following problems. Here “rigorous” means that the heart of your explanation must use equations and calculations to explicitly justify any intuitive reasoning that you provide.

Let \(W\) be a subspace of \(\mathbb{R}^5\) with orthonormal basis \(\mathsf{w}_1, \mathsf{w}_2, \mathsf{w}_3\). Let \(W^{\perp}\) be its orthogonal complement with orthonormal basis \(\mathsf{v}_1, \mathsf{v}_2\). Prove that if \(\mathsf{u}\) is in both \(W\) and \(W^{\perp}\) then \(\mathsf{u}\) must be the zero vector.
Let \(U\) and \(V\) both be \(n \times n\) orthogonal matrices. Prove that the matrix \(UV\) is also an orthogonal matrix.

8.2 Bases for a Subspace and its Orthogonal Complement

Let \(W \subset \mathbb{R}^6\) be the set of solutions to \[\begin{align} x_1 + x_2 + x_3 + x_4 + x_5 + x_6 &= 0 \\ x_1 + x_2 - x_3 + x_4 + x_5 - x_6 &= 0 \end{align}\]

Find a basis for \(W\) by using row reduction on the appropriate matrix.
Use your answer from (a) to find a “human readable” orthogonal basis for \(W\) where the entries have integer values.
Find a basis for \(W^{\perp}\) without using the Gram-Schmidt process.
Use your answer from (c) to find a “human readable” orthogonal basis for \(W^{\perp}\) where the entries have integer values.

Turn in: Your code and its output. You must make it clear what your final answers are for each problem.

8.3 Eigensystem of a Symmetric Matrix

Recall that a square \(n \times n\) matrix is symmetric when \(A^{\top} = A\). We learned that the eigenvectors of a symmetric matric form an orthogonal basis of \(\mathbb{R}^n\). In this problem, you will confirm that this holds for the following symmetric matrix

\[ A = \begin{bmatrix} 0 & 8 & 10 & -4 \\ 8 & 4 & 28 & 6 \\ 10 & 28 & 3 & -4 \\ -4 & 6 & -4 & -7 \end{bmatrix} . \]

Find the eigenvalues and eigenvectors of \(A\).
Confirm that the eigenvectors returned by R are an orthonormal set. (Pro tip: you can do this in one calculation!)
Express the vector \(\mathsf{v} = \begin{bmatrix} 2 & -4 & -9 & -2 \end{bmatrix}^{\top}\) as a linear combination of the eigenvectors. Use the fact that the eigenvectors are orthonormal. (Don’t augment and row reduce.)
Let \(P\) be the matrix of these normalized, orthogonal eigenvectors. Diagonalize \(A\) using \(P\). Just write out \(A = P D P^{-1}\). Congratulations: you have orthogonally diagonalized the symmetric matrix \(A\)!

Turn in: Your R code and the output for each part. For parts (c) and (d), you need to make it clear what your final answers are.

8.4 Modeling Fertility in 1888 Switzerland

RStudio comes with some sample datasets, including swiss, an 1888 dataset of Swiss socio-economic data. In 1888, Switzerland was entering a demographic transition: birth rates were beginning to decrease as the country entered the industrial age.

The rows correspond to 47 French-speaking provinces. These provinces were at different stages of industrialization. There are 6 socio-economic columns, all scaled to the range \([0,100]\).

Fertility fert: standardized fertility rate
Agriculture agric: percent of men in agricultural occupations
Examination exam: percent of draftees receiving highest mark on army examination
Education educ: percent education beyond primary school for draftees
Catholic cath: percent Catholic (instead of Protestant)
Infant.Mortality mort: percent of live births who live less than one year

We want to find the best-fit linear model fert = \(c_1\) + \(c_2\) agric + \(c_3\) exam + \(c_4\) educ + \(c_5\) cath + \(c_6\) mort.

Obviously, this data set shows its age. We are attempting to model fertility rates of women by using indirect measures about male occupation and education levels. This is problematic and would have to be addressed in a thorough analysis. We limit our use of this classic data to illustrate the method of least squares.

Model this problem as a least-squares approximation of an inconsistent system \(A \mathsf{x} = \mathsf{y}\). Find the least squares solution \(\hat{\mathsf{x}}\) and confirm that the residual vector \(\mathsf{z}\) is orthogonal to the columns of \(A\).

To get you started, here is some convenient code to get the column vectors of the swiss dataset.

fert = swiss[,1]
agric = swiss[,2]
exam = swiss[,3]
educ = swiss[,4]
cath = swiss[,5]
mort = swiss[,6]

Turn in: Your code and the output.

The residual vector \(\mathsf{z}\) measures the quality of fit of our model. But how do we turn this into a meaningful quantity? One method is to look at the coefficient of determination, which is more commonly refered to as the “\(R^2\) value.”

Let \(\mathsf{y} = [ y_1, y_2, \ldots, y_n ]^{\top}\) be our target vector with least squares solution \(\hat{\mathsf{y}} = A \hat{\mathsf{x}}\) and residual vector \(\mathsf{z} = \mathsf{y} - \hat{\mathsf{y}}\). Let \[ a = \frac{1}{n} ( y_1 + y_2 + \cdots + y_n) \] be the average of the entries of target vector \(\mathsf{y}\) and let \(\mathsf{y}^* = [a, a, \ldots, a]\). (We call this vector “y star”, so ystar would be a fine name in R.) The \(R^2\) value is \[ R^2 = 1 - \frac{\| \mathsf{y} - \hat{\mathsf{y}} \| }{\| \mathsf{y} - \mathsf{y}^* \|} = 1 - \frac{\| \mathsf{z} \|}{\| \mathsf{y} - \mathsf{y}^* \|}. \] This is a number in \([0,1]\). If the \(R^2\) value is near 1, then our model does a good job at “explaining” the behavior of \(\mathsf{y}\) via a linear combination of the columns of \(A\).

Find the \(R^2\) value for our least squares solution to the Swiss data. Here are some helpful functions:
- sum(vec) returns the sum of the entries of the vector vec
- length(vec) returns the number of entries in the vector vec
- rep(a, n) creates a constant vector of length \(n\) where every entry is \(a\).
- Norm(vec) from the pracma package returns the magnitude (Euclidean length) of the vector vec.
Turn in: Your code and the \(R^2\) value that you calculated.

Remark: There is certainly more to say about ethical use of data, as well as \(R^2\) values and fitting models to data. To learn more, you should take STAT 155 Introduction to Statistical Modeling.

8.5 Cosine Similarity for US Senators

In high dimensional space \(\mathbb{R}^n\) a common measure of similarity between two vectors is cosine similarity: the cosine of the angle \(\theta\) between the vectors. We calculate this value as follows: \[ \cos(\theta) = \frac{ \mathsf{v} \cdot \mathsf{w}} {\| \mathsf{v}\| \, \|\mathsf{w}\|} = \frac{ \mathsf{v} \cdot \mathsf{w}} {\sqrt{\mathsf{v} \cdot \mathsf{v}} \sqrt{\mathsf{w} \cdot \mathsf{w}}}. \] This measure has the following nice properties:

\(-1 \le \cos(\theta) \le 1\),
\(\cos(\theta)\) is close to 1 if \(\mathsf{u}\) and \(\mathsf{v}\) are closely aligned,
\(\cos(\theta)\) is close to 0 if \(\mathsf{u}\) and \(\mathsf{v}\) are are orthogonal,
\(\cos(\theta)\) is close to \(-1\) if \(\mathsf{u}\) and \(\mathsf{v}\) are polar opposites.

Let’s use cosine similarity to compare the 99 US Senators (one senate seat was not filled at the time) from the 109th US Congress (2007-2008).

Here is code that loads in the voting records. Each row contains the senator’s name, party affiliation, state, and then their record on 46 resolutions. The votes are encoded as a sequence of 0s, 1s, and -1s, where 1 means a ‘aye’ vote, -1 means a ‘nay’ vote, and 0 means the senator abstained. For example, the row corresponding to Joseph Biden is biden = ["Biden", "D", "DE", -1, -1, 1, 1, ... , 1, -1].

library(readr)

senate.vote.file = "https://raw.github.com/mathbeveridge/math236_f20//main/data/SenateVoting109.csv"

record <- read_csv(senate.vote.file, col_names = TRUE)

## Rows: 99 Columns: 49
## ── Column specification ────────────────────────────────────────
## Delimiter: ","
## chr  (3): Name, Party, State
## dbl (46): V01, V02, V03, V04, V05, V06, V07, V08, V09, V10, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

clinton = record[record$Name == 'Clinton',]
collins = record[record$Name == 'Collins',]
frist = record[record$Name == 'Frist',]
mccain = record[record$Name == 'McCain',]
obama = record[record$Name == 'Obama',]
reid = record[record$Name == 'Reid',]


knitr::kable(
  head(record[,1:10],10), booktabs = TRUE,
  caption = 'First 10 rows of the 109th US Senate voting records'
)

Table 8.1: First 10 rows of the 109th US Senate voting records
Name	Party	State	V01	V02	V03	V04	V05	V06	V07
Akaka	D	HI	-1	-1	1	1	1	-1	-1
Alexander	R	TN	1	1	1	1	1	1	1
Allard	R	CO	1	1	1	1	1	1	1
Allen	R	VA	1	1	1	1	1	1	1
Baucus	D	MT	-1	1	1	1	1	1	-1
Bayh	D	IN	1	-1	1	1	1	1	-1
Bennett	R	UT	1	1	1	1	1	1	1
Biden	D	DE	-1	-1	1	1	1	1	-1
Bingaman	D	NM	1	0	1	1	1	1	-1
Bond	R	MO	1	1	1	1	1	1	1

Your first task is to implement our cosine function.
- get_vote_cosine_similarity(senator1, senator2) returns the cosine of the angle between the vote vectors of get_votes(senator1) and get_votes(senator2)
Note: The R command length(vec) returns the number of entries in the vector vec. You can get the magnitude (Euclidean length) of vector vec using the pracma function Norm(vec).

To get you started, we have written the helper function get_votes(senator). This function returns the vote vector for the senator. These are the values in columns 3 to 49.

Turn In: Your implementation of the function.

# returns the vote vector for the senator
get_votes <- function(senator) {
  #unlist makes this dataframe row into a regular vector
  return (unlist(senator[,4:49])) 
}

# Implement this function!
# Return the cosine of the angle between the vote vectors 
# of senator1 and senator2
get_vote_cosine_similarity <- function(senator1, senator2) {
  votes1 = get_votes(senator1)
  votes2 = get_votes(senator2)
  
  # your code goes here!
  # find the cosine of the angle between votes1 and votes2
  cosine = 1
  
  return (cosine)
}

Find the cosine of the angles between every pair of the following senators of note:
- clinton: Hilary Clinton (D, NY), presidential candidate 2016
- mccain: John McCain (R, AZ), presidential candidate 2008
- obama: Barack Obama (D, IL), president 2008-2016
- collins: Susan Collins (R, ME), moderate Republican
Does the cosine similarity pick up on the fact that Senator Collins is a “moderate Republican”?

Turn In: Your code and the output. Comment on the similarities between all pairs of senators.
The senate majority leader of the 109th Congress was Bill Frist (R, TN). The senate minority leader was Harry Reid (D, NV).
- Create a function classify_senator(senator) that returns “R” or “D” depending on the cosine similarity of senator to frist and to reid. You will have to write an “if … else statement” (here is the syntax).
- Then run the classify_all() function that we have written for you. You will see how many senators are misclassified, meaning that their votes are more similar to the leader of the opposing party.
Note: Your misclassified list will include Jim Jeffords (I, VT), a Republican who became and Independent in 2001 and then caucused with the Democrats. Your classifer correctly aligns the Independent Jeffords with the Democrats.

Turn In: Your code and the output. Comment on how many senators are misclassified, as well as their party affiliations.

# Implement this function!
# return "R" if senator is closer to frist
# return "D" if senator is closer to reid
classify_senator <- function(senator) {

  # your code goes here!
  # use the get_vote_cosine_similarity() method to compare senator with frist and reid
  val = "R"
  
  return (val)
}

# returns a dataframe containing all misclassified senator names
# and their party affiliations
classify_all <- function() {

  mismatch = vector()
  
  for (i in 1:99) {
    senator = record[i,]
    party = classify_senator(senator)
    if (party !=  record[i,2]) {
      mismatch = c(mismatch, i)
    }
  }
  
  record[mismatch,1:2]
}


# Uncomment the next line to classify all senators!

# classify_all()