PCA, Density Estimation Bayesian Classification

WX: 99515681 / 2025-02-18 / 原文

Project Part 1 [15 points] PCA, Density Estimation, and Bayesian Classification

(Due Tuesday, Oct. 29, 11:59pm)

This part of the project uses a subset of images (with modifications) from the FashionMNIST dataset. The original Fashion-MNIST dataset contains 70,000 images of objects,divided into 60,000 training images and 10,000 testing images. We use only images forclass “T-shirt” and class “Sneaker” in this project, and the imageshave been slightlymodified to suit this project.The data is stored in “.mat” files. You may use the following piece of code to read thedataset in Python (or you may use the load filename command in Matlab, since theseare .mat files):import scipy.iodata = scipy.io.loadmat(‘matlabfile.mat’)Following are the statistics for the data you are going to use:Number of samples in the training set: "T-shirt": 6000; "Sneaker": 6000Number of samples in the testing set: "T-shirt ": 1000; "Sneaker": 1000For the classification task, we assume that the prior probabilities are the same (i.e., P(0)

= P(1) =0.5).In the original .mat file, each image is stored as a 28x28 array. We need to “vectorize” animage by concatenating its columns to form a 784-dimensional vector. In the 784-d space,it would be difficult to apply Bayesian decision theory (e.g., the minimum error rate

classification). Hence, we will use PCA to do dimensionality reduction first.Specifically, you will practice doing the following five tasks in this project:Task 1. Feature normalization (Data conditioning).

You need to normalize the data in the following way, before starting any subsequent tasks.Using all the training images (each viewed as a 784-d vector, X = [x

(remember that we have 784 features) from all the training samples. The mean and STD

will be used to normalize all the data samples (training and testing): for each feature xi inany given sample, the normalized feature will be, y

Task 2. PCA using the training samples.

Use all the training samples to do PCA. You cannot use a built-in function pca or similar,if your platform provides such a function. You have to explicitly code the key steps of PCA:computing the covariance matrix, doing eigen analysis (you can use built-in functions forthis), and then identify the principal components. Task 3. Dimension reduction using PCA.Consider 2-d projections of the samples on the first and second principal components.These are the new 2-d representations of the samples. Plot/Visualize the training andtesting samples in this 2-d space. Observe how the two classes are clustered in this 2-Dspace. Does each class look like a normal distribution?Task 4. Density estimation.We further assume in the 2-d 代写 PCA, Density Estimation Bayesian Classification space defined above, samples from each class follow aGaussian distribution. You will need to estimate the parameters for the 2-d normaldistribution for each class, using the training data. Note: You will havetwodistributions,one for each class.Task 5. Bayesian Decision Theory for optimal classification.Use the estimated distributions for doing minimum-error-rate classification. Report theaccuracy for the training set and the testing set respectively.

What to submit:

  1. Your code for doing the above.
  2. A report summarizing the results with the following format
  3. Introduction – start with problem statement, data description etc.
  4. Method – implementation details, steps followed etc.
  5. Results and observation – the results asked in each of the steps, e.g., theestimated parameters of the distributions and the final classification accuracynumber (any intermediate results for each of the tasks you want to show) alongwith your observations
  1. ConclusionNote: There is no minimum or maximum length requirement for the report. Writingthe report is the opportunity for you to reflect on your understanding of theproblems/tasks through organizing your results.
  1. The report should be typed (handwritten reports are not allowed) and in a .pdfformat (to be submitted as separate document, not included within the code file).
  1. Do not submit a .zip file. Submit multiple individual files on Canvas instead.The data files for the project are uploaded in the Files/Assignments folder:train_data.mat, test_data.mat2