Capstone_Projects
The capstone project has been designed to apply knowledge into practice and to develop and improve critical skills such as problem-solving and collaboration skills. Students are matched with research labs within the UCY community and with industry partners to investigate pressing issues, applying data science areas. Capstone projects aim to give students some professional experience in a real work environment and help enhance their soft skills. These projects involve groups of roughly 3-4 students working in partnership.
The process is the following:
Final assessment is carried out by the company and the supervisor.
_Summer 2022
Price & Energy Performance Certificate(EPC) prediction of residential assets: Based on data from Department of Lands and Surveys (DLS) & web-scraping, two machine learning models should be constructed (one classification and one regression) aiming to predict the price of the asset and the corresponding energy certificate.
Customer behavioural expenses: Investigate and predict the expected customers’ monthly expenses. A two-layer model approach will be followed where homogeneous clusters of customers will be constructed following a classification model, which will feed on a regression model predicting the absolute amount per month.
Many people with dementia residing in long-term care may face barriers in accessing experiences beyond their physical premises; this may be due to location, mobility constraints, legal act and/or mental health restrictions. Previous research has suggested that institutionalization increases the co-existing symptoms of dementia, such as aggression, depression, apathy, lack of motivation and loss of interest in oneself and others. Despite the importance of supporting the mental wellbeing of people with dementia, in many cases, it remains undertreated. In recent years, there has been a growing research interest towards designing non-pharmacological interventions aiming to improve the Health-Related Quality of Life for people with dementia within long-term care. With computer technology and especially Virtual Reality offering endless opportunities for mental support, we must consider how Virtual Reality for people with dementia can be sensitively designed to provide comfortable, enriching out-world experiences. Working closely with dementia patients and medical and paramedical personnel, we will co-design an intelligent and personalized Virtual Reality system to enhance symptom management of dementia patients residing in long-term care. We already published a paper, where we thoroughly explain the screening process and analysis we run to identify which environments patients would like to receive as a Virtual Reality intervention to minimize the aforementioned co-existing symptoms of dementia, and we will then develop an intelligent system using the selected environments, that adapts the content of the Virtual Reality experience based on physiological and eye-tracking data from the patients and their personal preferences. Both physiological and eye-tracking data will be analysed and correlate with the subjective reports of the patient. The in-depth big-data analysis will allow us to identify the various areas of interest (AOI) to enhance the system. These AOIs will be annotated using an ontology-based approach derived from the themes identified during the exposure. The above analysis will help us to redefine the system and based on the analyzed eye-tracking data, and physiological data (i.e., heart rate, and stress level) of the user the system will be trained to analyze in real-time and used to adjust the content of the experience.
Security in Industrial Control Systems
Industry 4.0 is the new term used for manufacturing/production industries that digitize their process in the aim of increasing the quality and quantity of their production. Digitizing previously isolated systems and connecting them to the most untrusted network of all, the Internet, entails the risks of being vulnerable. An attacker can penetrate the system and create damage to the industry’s operations. Detecting the presence of unknown attacks/faults within an Industry using data from Industrial Control System. The challenge is to define what is the normal operational environment.
Project aims:
Analysis of network and operational traffic
Apply AI, ML and/or Data mining techniques
Datasets: iTrust, HAI (Kaggle)
Use of ML Techniques to Enable Intrusion Detection in IoT Networks
Networks are vulnerable to costly attacks. Thus, the ability to detect these intrusions early on and minimize their impact is imperative to the financial security and reputation of an institution.
There are two mainstream systems of intrusion detection (IDS), signature-based and anomaly-based IDS. Signature-based IDS identify intrusions by referencing a database of known identity, or signature, for each of the previous intrusion events. Anomaly-based IDS attempt to identify intrusions by referencing a baseline or learned patterns of normal behavior. Under this approach, deviations from the baseline are considered intrusions.
Project aims:
This MSc project will utilize existing work on Lightweight Intrusion Detection for Wireless Sensor Networks (using BLR, SVM, SOM, Isol. Trees) and extend it, considering the characteristics of IoT networks, new attacks, new topologies, and especially new classification algorithms.
Reading/Datasets: Works by V. Vassiliou and C. Ioannou (University of Cyprus), suitable datasets from Kaggle.
Predicting YouTube Trending Video
A new trend in 5G and 6G networks is using the network edge (base station) for caching and processing and for supporting the quality of service and the experience of end users consuming visual content and interactive media.
Project aims:
Within this project, will provide the ability to group and predict users’ and/or content’s needs. One way is to predict videos trending on social media.
Predicting Motor-vehicle Accidents
Motor vehicle crashes cause loss of life, property and finances. Vehicle accidents are a focus of traffic safety research, uncovering useful information that can be directly applied to reduce these losses.
Traditionally, modeling crash events has been done using machine learning techniques, considering crash level variables, such as roadway characteristics, lighting conditions, weather conditions and the prevalence of drugs or alcohol.
Project aims:
Incorporate crash report data, road data, and demographic data to better understand crash locations and the associated cost. Use a mixed linear modeling technique that enables data fusion in a principled way to build a better predictive model. Analyze the natural clustering of events in space by different geographic levels.
Datasets: Need to get relevant information from the Cyprus Police and the Association of Insurance Companies.
Preventive to Predictive Maintenance
Maintenance is an integral component of operating manufacturing equipment. Preventive maintenance occurs on the same schedule every cycle — whether or not the upkeep is actually needed. Preventive maintenance is designed to keep parts in good repair but does not take the state of a component or process into account.
Predictive maintenance occurs as needed, drawing on real-time collection and analysis of machine operation data to identify issues at the nascent stage before they can interrupt production. With predictive maintenance, repairs happen during machine operation and address an actual problem. If a shutdown is required, it will be shorter and more targeted.
While the planned downtime in preventive maintenance may be inconvenient and represents a decrease in overall capacity availability, it’s highly preferable to the unplanned downtime of reactive maintenance, where costs and duration may be unknown until the problem is diagnosed and addressed.
Preventive to Predicitve Maintenance is about the transition of a preventive maintenance strategy to a predictive maintenance strategy of a replaceable part.
Projects aims:
The objective of the project is to use the associated detailed dataset is to precisely predict the remaining useful life (RUL) of the element in question, so a transition to predictive maintenance is made possible.
Optimizing food supply chain for large vessels
A major pain point for Oceanic is the controlling of the accuracy of the monthly Stock as this has a direct and absolute impact on the calculated performance of the vessel, thus the profitability of the company (Opening Stock + Purchases – Remaining Stock = Consumption // consumption:meal days = performance // as a result, if “Remaining stock on board” is wrong, the performance is wrong).
The fact is that monthly stock taking is not accurate, resulting in a domino effect on many other parts of the procurement and reporting chain.
Project aims:
The objective of the project is to create a proactive, rather than a reactive approach, during the procurement flow – ideally, we would like to identify stock, calculate consumption, identify vessel trading patterns and proactively (vs currently reactively) proceed and recommend items and quantities, thus enabling the procurement officer to focus more on controlling and less on the administering.
Datasets:
Will need to coordinate with CYENS so that all info on Menus, Recipes, Ships, Providers, Cooks, etc is collected.
Interns shall integrate a smart obstacles detection system into CyRIC’s car infotainment system. The integrated solution shall process in real time the images from a CyRIC camera installed in the front of the car and detect obstacles. Specifically, the developed system shall:
Scope of this project is to create conversion models (propensity to buy an insurance product by a customer) for the two main channels of sales of Hellas Direct (“Direct” and “Aggregators”). The students will work with the strong guidance of our Senior Data scientist to create and automated and structured data preparation process and then apply two different model types per channel and compare prons/cons.
Resources:
Our company will provide
Desired Deliverables:
Insurance Claims Prediction:
Development of a set of algorithms to detect the relations between insurance claims, implementation of high dimensionality to reach all the levels, detection of the missing observations, etc.
Main objectives & outcomes would be:
Resources: We will provide
Desired Deliverables:
Automating AB Test Decision-making
AB Testing help avoid unnecessary risks by allowing companies to target resources for maximum effect and efficiency, which helps increase ROI whether it be based on short-term conversions, long-term customer loyalty, or other important metrics
In an effort to maximize product conversions (e.g. Credit Card applications), there is always an inevitable cost. During the execution lifespan of AB tests, a sizable portion of the traffic is routed to a losing variant directly reducing critical business metrics (e.g. Revenue). Minimizing the regret is especially important in time-sensitive situations (e.g. seasonal AB tests on Black Friday marketing campaigns), or in cases where the cost of having losing variants is so high making companies hesitant in running AB tests.
Factors like the reliance on a three/four-week window to reach statistical significance via traditional statistical calculators (Chi-squared test & Bayesian inference) or the limited capacity of human resources within the team, make the opportunity cost of lost conversions appear too high.
If only we had a Machine Learning Model in place to help us sort this out 😃
The proposal:
Build a model that can detect the best version much earlier and direct an increasing amount of traffic to it.
The goal is to accurately predict if a test is successful earlier than the default three/four weeks currently used. Ocean Finance has a huge number of data and historical experiments that were run over the last 2 years. This vast amount of data gives a lot of room for backtesting in order to assess the viability of a testing strategy by discovering how tests would play out using historical data. If backtesting works, Product analysts may have the confidence to employ it going forward.
Resources: Our company will provide technical and business support though
Desired Deliverables:
AI adoption in manufacturing can lead to enhanced processes through data-driven decision making in various aspects of the production process. Unplanned downtime can be prevented through predictive maintenace and prediction of failure mode services that leverage sensor data and machine learning techniques. However, in order for the predictions of the machine learning models to be successfully integrated in decision making, they need to be trusted by the respective users.
Explainable AI techniques have been therefore developed as the result of an ongoing effort to help stakeholders outside the data science community to understand, trust, or even challenge, machine learning results.
The target of this project is to create a complete AI-enabled application for extracting accurate insights from manufacturing sensor data that can be leveraged to predict when a piece of the monitored equipment might fail.
The project will span across the complete flow of required data processing steps, starting from raw data exploration and cleaning (focusing on commonly encountered challenges in sensor data processing, including handling of faulty/missing measurements), feature engineering, experimenting with different machine learning models, application of XAI techniques and finally the creation of an interactive dashboard for presenting the extracted insights to the users and allowing them to drill down, question and understand the results.
Tickmill is a retail FX broker with over 250 employees around the globe. It’s average monthly trading volume is more than $150 Billion. Tickmill has its own quantitative research team with proprietary trading systems which are developed by using an vast amount of tabular data that are stored in various database. The main task of the team is to take investment decisions by using various types of unconventional data. For example, can we predict Starbucks price if we know how many people have visited their store? If yes, can we achieve better results if we combine this data with the average amount of money that the people have spent in a Starbucks store?
During the capstone project, students that will spend time with Tickmill Quantitative Research team are expected to be involved in data mining, data storage in various types of databases (depending on their type) and most importantly in data analysis. In financial industry data analysis is one of the biggest challenges now a days for two main reasons; First, and most importantly, data cannot be created (in contrast with voice, image or video data). Secondly, the signal to noise ratio is quite low, and this is the main reason that makes this specific task one of the most difficult task of Machine Learning (and data science).
Students that will be involved in this project will have a 2 weeks training about the financial industry and trading (No prior knowledge is needed). After that, students will be exposed in the various types of data and the data analysis that could be done in that scope. Students, will use their desired data analysis tools (we use Jupyter notebook but they are free to use whatever they feel comfortable with) to perform the analysis. At the end of the project, they will have to be in a position to understand what types of data are useful for predicting financial assets price change and what data are full of noise. This involves (but is not limited) the understanding of correlation and causation. In additional to this, (what we call first stage data analysis ) students will have the opportunity to create their own features and the freedom to combine different types of data in order to achieve meaningful conclusions.
Due to confidentiality agreements, students will not be allowed to disclose data vendors names in their capstone reports.