Specific assignment information and instructions The challenge: You have been approached by the ministry of buildings about a proposal to build a new airport. They want the airport to provide the highest level of customer satisfaction. You been provided a data set made up of features that customers look for in an airport and level of their satisfaction with the airport. The description of the features/columns making up the dataset are self- explanatory. You can look up further information about them in the Appendix below. The data set contains 3502 data points and 37 features. Your task is to develop a machine learning model to predict/estimate customer

Specific assignment information and instructions
The challenge: You have been approached by the ministry of buildings about a proposal to build a
new airport. They want the airport to provide the highest level of customer satisfaction. You been
provided a data set made up of features that customers look for in an airport and level of their
satisfaction with the airport. The description of the features/columns making up the dataset are self-
explanatory. You can look up further information about them in the Appendix below. The data set
contains 3502 data points and 37 features. Your task is to develop a machine learning model to
predict/estimate customer satisfaction based on airport features.
Tools to use: Majority of the MATLAB code you need to complete the assignment are available from
various lab sessions. If you are comfortable using Python, you are free to use it. You also free to use
Orange for various aspects of the coursework as required.
Tasks and Mark Scheme: The aim of this coursework is to design, implement and evaluate an
effective machine learning pipeline for predicting customer satisfaction. The specific tasks and
corresponding mark scheme are given in the table below. It is up to you how you approach this
problem, design a solution and write-up your results. For each task, the mark within the grade
boundary will be based on your description in your report, results and code.

Task/Assessment Description Mark Range Level of
achievement
Conduct a domain analysis and present your findings as related to
the domain of the coursework.

Discuss how what you have found from your domain analysis will
support and be carried over to other parts of your coursework.

0-15% 1
Achieve level 1 as well as conduct data cleaning, pre-processing and
feature engineering.

Discuss how you used your understanding of the domain from level
1 to support this task.

15-25% 2
Achieve the previous levels plus discuss the steps taken in dimension
reduction and preventing bias in the dataset to be used to training the
machine learning algorithms.

Answer the following questions:
 Which data features capture the most variability in the dataset
and explain why you think they do so? (Hint: Perform PCA
first, extract the Principal Components (PCs) that capture the
highest variability in the dataset. Then see which features
contribute to the PCs). Highlight the PCs together with the
features that contribute most to them.

 Which 5 variables closely correlate with the customer
satisfaction column and using your knowledge of the domain
(Hint: Use your travel experience), explain why you think
they correlate to the customer satisfaction column?

25-40% 3
Achieve all the previous levels as well as explain how:
 You decided on the choice of the best two machine learning
algorithms to apply to the problem.

 You used orange (or python/MATLAB) to develop an
effective machine learning pipeline from data cleaning up to
the point of evaluation.
40-60% 4
Achieve all the previous levels plus discuss how you applied cross
validation techniques in the machine learning pipeline.

60-70% 5
Achieve all the previous levels as well as discuss how effective your
pipeline is at preventing overfitting and underfitting through the
application of learning curves.

70-80% 6
Achieve all the previous levels and the below:
 You can compare your choice of machine learning algorithm
with at least two other algorithms that we have not covered in
class.
80-100% 7
 Discuss the mathematical peculiarities of the algorithms you
have chosen (strengths and weaknesses) and how they impact
the results you obtained.

 Apply the appropriate metrics to compare the algorithms you
have chosen with the ones we have used in class.

 Discuss the effects of model complexity of the chosen
algorithms on the learning curves generated.

Technical Report and code
Write your results in no more than a 15 page technical report. Make sure your report has a table
of content, sections, discussion and conclusion.
You must create a MATLAB code and an orange pipeline design for your solution(s).
Support your report with an orange pipeline design and MATLAB code. Make sure you provide
comments in your MATLAB code as well as instructions on how to run it. Hand in your report
(.pdf), software (Orange and MATLAB) via Blackboard by 11pm on the 17th of December 2021.
This course work makes up 60% of your total module mark.

Appendix
Feature Type
Quarter (of the year) Plain Text
Date recorded Date & Time
Departure time Plain Text
Ground transportation to/from airport Number
Parking facilities Number
Parking facilities (value for money) Number
Availability of baggage carts Number
Efficiency of check-in staff Number
Check-in wait time Number
Courtesy of check-in staff Number
Wait time at passport inspection Number
Courtesy of inspection staff Number
Courtesy of security staff Number
Thoroughness of security inspection Number
Wait time of security inspection Number
Feeling of safety and security Number
Ease of finding your way through the airport Number
Flight information screens Number
Walking distance inside terminal Number
Ease of making connections Number
Courtesy of airport staff Number
Restaurants Number
Restaurants (value for money) Number
Availability of banks/ATM/money changing Number
Shopping facilities Number
Shopping facilities (value for money) Number
Internet access Number
Business/executive lounges Number
Availability of washrooms Number
Cleanliness of washrooms Number
Comfort of waiting/gate areas Number
Cleanliness of airport terminal Number
Ambience of airport Number
Arrivals passport and visa inspection Number
Speed of baggage delivery Number
Customs inspection Number
Overall satisfaction Number