Fixed = fixed-charge covering ratio (income/debt)
RoR = rate of return on capital
Cost = cost per kilowatt capacity in place
Load = annual load factor
Demand = peak kilowatt-hour demand growth from 1974 to 1975
Sales = sales (kilowatt-hour use per year)
Nuclear = percent nuclear
Fuel Cost = total fuel costs (cents per kilowatt-hour)
Please Load the data as a Panda dataframe, set row names (index) to the utilities column (company). Convert all columns to float.
1. a. Use “from sklearn.metrics import pairwise” and calculate the pairwise Euclidean distance between each pair of Utilities and show the distance matrix.
b. Standardize the features based on mean and std and recalculate the pairwise distance matrix using Euclidean distance.
2) a. Use “from scipy.cluster.hierarchy import linkage” and plot the Dendrogram using the Single linkage
c. use “from scipy.cluster.hierarchy import fcluster” and apply it to Dendrograms for both Single and Average linkages to separate the data points into 6 clusters and print the clusters with their corresponding members. (Set the criterion=’maxclust’ for the fcluster)
3) a. Use “from sklearn.cluster import KMeans” to cluster the data into 6 clusters. Set the random state for KMeans to “0”. Print the clusters and their members.
b. For the number of clusters from 1-7, plot the average SSE vs the number of clusters as a line plot. Use “intertia” attribute of KMeans to get the SSE. Make sure that you divide it by the number of clusters to get the average SSE.