Capstone_Projects
The capstone project has been designed to apply knowledge into practice and to develop and improve critical skills such as problem-solving and collaboration skills. Students are matched with research labs within the UCY community and with industry partners to investigate pressing issues, applying data science areas. Capstone projects aim to give students some professional experience in a real work environment and help enhance their soft skills. These projects involve groups of roughly 3-4 students working in partnership.
The process is the following:
Final assessment is carried out by the company and the supervisor.
_Summer 2024
Internal Fraud Detection System
Description
The goal of this project is to develop an Internal Fraud Detection System. The system aims to identify and flag suspicious activities suggestive of internal fraud by detecting abnormal patterns in employee behaviour and actions. Using unsupervised learning techniques, students will develop models to identify anomalies. This project includes data collection and analysis, model development and model serving.
Milestones
Possible set of data sources
Curating the Emotional Journey: A Museum Music Recommendation System with Emotion Recognition
Project Goal:
This capstone project challenges you to design and develop a data science application that personalizes the museum visitor experience through music. The application will leverage emotion recognition and stress level technologies to analyze visitor emotional state from video footage and heart rate signal to generate appropriate background music to fit on their visiting experience on the thematic era of the exhibit.
Problem Statement:
Museums strive to create engaging and impactful experiences for visitors. Music plays a significant role in setting the atmosphere and influencing visitor emotions. However, traditional, static background music may not effectively cater to the diverse emotional states and interests of visitors navigating various exhibits. This project aims to bridge this gap by developing a music generation system tailored to human individual emotional state.
Data Sources:
Technical Stack:
Project Deliverables:
Success Criteria:
A successful project will demonstrate the following:
Description:
As part of its Real Estate offering, Deloitte would like to build an end-to-end real estate asset monitoring and analysis infrastructure through capturing data on a recurring basis from various publicly and non-publicly available sources. These datasets will be subsequently used to populate Deloitte’s internally developed PowerBI tool, which the business is solely going to use for internal purposes.
Objective:
Data:
Deliverables:
Title: Telco Network Failure Prediction
Duration: 2 months
Project Description:
This project aims to leverage advanced data science techniques to anticipate and mitigate network failures in telecommunication systems. Predicting network failures in telecommunication systems is crucial for maintaining service quality, minimizing downtime, and optimizing resource allocation. By analyzing historical data such as network performance metrics, environmental conditions, equipment status, and maintenance logs, the data science team will develop predictive models to anticipate potential failures before they occur. This will allow the Telco provider to proactively address issues, improve network reliability, and enhance customer satisfaction.
Resources:
Our company will provide:
Desired Deliverables:
Measuring the Impact of Wildfire Risk on Property Market Prices
Objective:
Develop a predictive model to measure the impact of wildfire risk on property market prices.
Data:
The project will utilize a combination of the following datasets:
Deliverables:
Resources:
Expected Outcome:
The project aims to provide a robust predictive model that accurately measures the impact of wildfire risk on property market prices. Interactive visualization presentation of results is expected in the form of a WebApp (through Shiny, Flask etc.). The findings will help users to make informed decisions regarding property investments and risk management.
Machine Learning for Fraud Detection in Insurance Claims
Background
Insurance claim fraud is a global issue that drains billions of dollars from the industry each year, escalating costs for both insurers and policyholders. In Greece, for instance, fraudulent claims tally up to €200 million annually, accounting for about 10% of all claim payouts. These challenges are not unique to Greece but are mirrored worldwide, necessitating a unified and innovative approach across the sector to tackle this pervasive problem.
Insurance claim fraud is defined as any deliberate and misleading act or omission by any individual or legal entity aimed at gaining an unlawful financial benefit for themselves or facilitating such gain for others within the context of a valid or invalid insurance contract.
The need for advanced data analytics and machine learning in fraud detection is becoming increasingly critical as fraudsters employ more sophisticated methods to circumvent traditional detection systems. Globally, insurers are turning to technology to gain an edge against these tactics, utilising big data and predictive analytics to spot irregularities and inconsistencies in claim submissions. This technological shift represents a move from reactive to proactive fraud management, where potential frauds are flagged and investigated before they result in financial loss.
Moreover, the ongoing digital transformation in the insurance industry highlights the importance of continuous innovation in fraud detection systems. The integration of AI and machine learning technologies enhances fraud detection accuracy and speeds up the process, allowing insurers to handle claims more efficiently while reducing the chances of fraudulent claims slipping through the net.
Engaging with this project provides an opportunity to contribute to significant advancements in improving this procedure and applying cutting-edge data science techniques to real-world problems. This initiative is an excellent initiative for aspiring data scientists to make a tangible difference, helping to shape the future of an industry on which everyone relies.
Objectives
Briefly, the core objectives of this project are defined as:
Methodology
The principal methodology pipeline will consist of:
Resources
All the necessary resources for the completion of the project will be delivered, such as:
Title: Customer Behavior prediction
Project Description:
The project focuses on analyzing customer data of a Telecommunication company, specifically related to their mobile plans. The data contains information related to customers such as monthly payments, mobile usage, contract information and competitive data from other telecom vendors. The objective is to develop a methodology to process the data, extract meaningful insights through analysis and create new features that will be used to build predictive models. The models will be trained to predict customer behavior that will lead to switch to a different mobile plan. This project will provide an opportunity to gain practical experience in Data Analysis, Feature Engineering, and Machine Learning, while also gaining valuable insights into solving real-world problems in the telecommunications industry.
Resources:
Our company will provide:
Desired Deliverables:
Health Data Analysis and/or Pricing model development
Project Description:
This dynamic and interactive project will provide students with hands-on experience applying their knowledge in data science to the actuarial field, particularly in data analysis and optimization as well as Health Insurance Pricing model development. It features a unique collaboration between Milliman’s local and international teams, ETHNIKI, THE, HELLENIC GENERAL INSURANCE CO. S.A., one of Greece’s largest insurance companies, and the University of Cyprus that successfully promotes the project.
To get a flavor of what the project will include, have a look into the below. (Note that this is not an exhaustive list of tasks):
*The intention will be to anonymize all data utilized for this Capstone project.
Resources:
UCY students will have the opportunity to work with real life Insurance market data that will include information regarding historical claims with a wealth of parameters around each claim. Milliman will provide guidance and support throughout the duration of the project and will ensure that materials provided to the students are adequate to perform the tasks.
Desired Deliverables:
Scope
Today, accurate flight delay prediction is among the most critical and challenging problems when scheduling and managing flights for all involved aviation data value chain stakeholders. As part of the capstone project, the flight delay prediction problem will be addressed from the perspective of an airport and for a day-ahead time horizon in terms of:
The flight delay forecasts need to be accompanied by appropriate explanations in order to shed light into the provided forecasts, so Explainable AI (XAI) techniques (i.e. for feature relevance and counterfactual explanations) need to be leveraged.
The aviation data that will be used as part of the capstone project are confidential flight data from an airport in Europe.
Deliverables
D1. XAI Models for the Flight Delay Forecasting (Classification) & Problematic Flight Forecast Index – end of June 2024
D2. XAI Models for the Flight Delay Forecasting (Regression) & XAI Models Updates for the Flight Delay Forecasting (Classification) & Problematic Flight Forecast Index – end of July 2024
_Summer 2023
As part of its risk assessment, the bank is required to predict the returns from the sales of its real estate collaterals. One of the parameters that determines those returns is the ‘recovery rate’, i.e., the % of the property market value that the Bank will recover by selling the property.
Objective: develop a predictive model for recovery rate parameter of collaterals, using the Bank’s most recent internal database.
Data: contains information on collateral sales transactions (e.g., open market value, sale price, sale date) and property characteristics (e.g., location, property type, size), for historic collaterals onboarded from 2016 onwards. Publicly available data from CY Statistical Service related to property price indexes can be used to supplement the analysis.
Deliverables:
Short description: The project aims to develop and evaluate a solution to automate the processes of model validation and identify variables that canchallenge and enhance the performance of the Bank’s behavioural credit models.
Objective: The objective of the project is to improve the efficiency and effectiveness of the model validation function in the Bank, and to ensure compliance with validation unit’s internal procedures and methods.
Data: The project will use the following data sources:
Deliverables: The expected outcomes and deliverables of the project are:
Project Title: Cancer Prevention AI tools
Project Description: The Cancer Prevention AI tools project is a cross-functional capstone project that aims to develop a mobile application that monitors the daily living of people and suggests new food, activity, and lifestyle routines to reduce the percentage of cancer disease. The application will be built by a team of two or three students with expertise in data analytics (visual, textual, multisensory signals analysis) and one in business development (optional).
The data analytics student(s) will be responsible for developing AI models that will analyse user data, including daily food intake, physical activity, mood estimation and lifestyle habits, to provide personalized recommendations for reducing the risk of cancer. The AI models will use machine learning algorithms to identify patterns and correlations in the data and suggest changes to the user’s routine accordingly. The Data Analytics student(s) should have expertise in programming languages such as Python, and be familiar with relevant libraries and tools such as NumPy, Pandas, Scikit-learn, TensorFlow, Keras, and Matplotlib. They should also have experience in machine learning algorithms, deep learning models, and data visualization techniques.
Overall, the Cancer Prevention AI tools project will provide a valuable tools for people to monitor their daily habits and receive personalized recommendations for reducing their risk of cancer. The project will also provide valuable experience for the student team members in data analytics, and business development, as well as a tangible product to showcase their skills to potential employers (optional).
Deliverables:
AI Models: The project should deliver appropriate AI models to analyse user data, including daily food intake, physical activity, and lifestyle habits, to provide personalized recommendations for reducing the risk of cancer. The model should use machine learning algorithms to identify patterns and correlations in the data and suggest changes to the user’s routine accordingly.
Business Plan: The project should include a business plan that outlines the target market for the AI tools, the competitive landscape, and the marketing and sales strategy. The plan should also include financial projections and revenue models.
Technical Documentation: The project should include technical documentation that describes the AI tools design, and functionality, as well as instructions for installing and running the AI tools.
Presentation: The project team should prepare a final presentation that showcases the AI models, and the business plan. The presentation should include a demonstration of the AI tools’ functionality, and the business plan’s revenue projections.
Project title: |
Use of ML Techniques to Enable Intrusion Detection in IoT Networks |
Background: |
Networks are vulnerable to costly attacks. Thus, the ability to detect these intrusions early on and minimize their impact is imperative to the financial security and reputation of an institution. There are two mainstream systems of intrusion detection (IDS), signature-based and anomaly-based IDS. Signature-based IDS identify intrusions by referencing a database of known identity, or signature, for each of the previous intrusion events. Anomaly-based IDS attempt to identify intrusions by referencing a baseline or learned patterns of normal behavior. Under this approach, deviations from the baseline are considered intrusions. |
Project aims:
|
This MSc project will utilize existing work on Lightweight Intrusion Detection for Wireless Sensor Networks (using BLR, SVM, SOM, Isol. Trees) and extend it, considering the characteristics of IoT networks, new attacks, new topologies, and especially new classification algorithms |
Reading / Datasets |
Works by V. Vassiliou and C. Ioannou (University of Cyprus) Find suitable datasets from https://www.kaggle.com/ |
Contact Details
|
Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy |
Project title: |
Predicting YouTube Trending Video |
Background:
|
A new trend in 5G and 6G networks is using the network edge (base station) for caching and processing and for supporting the quality of service and the experience of end users consuming visual content and interactive media. |
Project aims:
|
Within this project, will provide the ability to group and predict users’ and/or content’s needs. One way is to predict videos trending on social media. |
Readings / Datasets |
Youtube Video Info https://www.kaggle.com/datasets/datasnaek/youtube-new |
Contact Details
|
Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy |
Project title: |
Predicting Motor-vehicle Accidents |
Background:
|
Motor vehicle crashes cause loss of life, property and finances. Vehicle accidents are a focus of traffic safety research, uncovering useful information that can be directly applied to reduce these losses. Traditionally, modeling crash events has been done using machine learning techniques, considering crash level variables, such as roadway characteristics, lighting conditions, weather conditions and the prevalence of drugs or alcohol.
|
Project aims:
|
Incorporate crash report data, road data, and demographic data to better understand crash locations and the associated cost. Use a mixed linear modeling technique that enables data fusion in a principled way to build a better predictive model. Analyze the natural clustering of events in space by different geographic levels. |
Readings / Datasets |
Need to get relevant information from the Cyprus Police and the Association of Insurance Companies. |
Contact Details
|
Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy |
Project title: |
Preventive to Predictive Maintenance |
Background:
|
Maintenance is an integral component of operating manufacturing equipment. Preventive maintenance occurs on the same schedule every cycle — whether or not the upkeep is actually needed. Preventive maintenance is designed to keep parts in good repair but does not take the state of a component or process into account. Predictive maintenance occurs as needed, drawing on real-time collection and analysis of machine operation data to identify issues at the nascent stage before they can interrupt production. With predictive maintenance, repairs happen during machine operation and address an actual problem. If a shutdown is required, it will be shorter and more targeted. While the planned downtime in preventive maintenance may be inconvenient and represents a decrease in overall capacity availability, it’s highly preferable to the unplanned downtime of reactive maintenance, where costs and duration may be unknown until the problem is diagnosed and addressed. Preventive to Predicitve Maintenance is about the transition of a preventive maintenance strategy to a predictive maintenance strategy of a replaceable part |
Project aims:
|
The objective of the project is to use the associated detailed dataset is to precisely predict the remaining useful life (RUL) of the element in question, so a transition to predictive maintenance is made possible. |
Readings / Datasets |
https://www.kaggle.com/datasets/prognosticshse/preventive-to-predicitve-maintenance |
Contact Details
|
Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy |
Project title: |
Analysis and Visualization of Mobile Cellular Telephony Network Coverage and Electromagnetic Measurements |
Background:
|
An interesting challenge in 5G and 6G Mobile Cellular Telephony is the need to have a large(r) number of small(er) base stations to achieve the data rates, delays and number of users expected. The Mobile Network Operators report to the Telecommunication Regulators of each country a number of parameters, including coverage, location of stations and radiation levels.
|
Project aims:
|
The objective of the project is to collate information available in different systems/platforms and generate an up-to-date map of coverage and mobile telephony stations, alongside the information gathered through periodic and ad-hoc measurements. These will be related with measurements of mobile telephony quality of service and shown on a common reference map with the ability to explore the changes during the years. |
Readings / Datasets
|
Open data from the Department of Electronic Communications. Information from the ICT market observatory of OCECPR and other public data. http://www.emf.mcw.gov.cy/emf/?page=emfmeasurements https://www.data.gov.cy/search/field_topic/%CE%B5%CF%80%CE%B9%CF%83%CF%84%CE%AE%CE%BC%CE%B7-%CE%BA%CE%B1%CE%B9-%CF%84%CE%B5%CF%87%CE%BD%CE%BF%CE%BB%CE%BF%CE%B3%CE%AF%CE%B1-40
|
Contact Details
|
Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy |
Title: Measuring consumer price inflation in Cyprus
We calculate Cyprus inflation in terms of the Consumer Price Index (CPI), which is compiled using the online prices (big data) for a pre-selected/fixed and representative basket of goods and services. In doing so, we employ ‘web scraping’ algorithms to visit the websites/pages of large retailers in Cyprus and collect and store the prices of goods and services available online. We then implement standard techniques and proprietary methodologies to calculate price statistics and indices. This work is inspired by the Billion Prices Project, an academic initiative at MIT and Harvard that used prices collected from hundreds of online large retailers around the world on a daily basis to conduct research in macro and international economics.
Title: Studying the crude oil price pass-through into fuel and/or consumer prices in Cyprus
We leverage a high-frequency (weekly) online price series dataset in an econometric framework for pass-through estimation, forecasting and policy making. We aim to quantify how (in what degree and time) the fuel and/or consumer prices respond to an oil price shock and provide policy makers with important implications and useful information regarding consumer welfare.
Title: Create an end-to-end MLops pipeline for a specific model (“anti-scrapping model”)
Duration: Hard deadline the end of August
Team: 3 Data Science MSc Students with backgrounds: 2 in CS, 1 in Statistics, plus one supervisor on the UCY side. The supervisor can offer weekly one-hour coaching and review sessions to the students.
Project Description:
Scope of this project is to create an end-to-end MLops pipeline for a specific model (detecting competitors trying to scrape/crawl prices from our website). The students will work with the strong guidance of our Data science team to design/implement/deploy an ML model to production.
Resources:
Our company will provide
Desired Deliverables:
Title: Loyalty Scheme – Client Churn Prediction
Duration: 2 months (June 2023 – July 2023)
Team Members:
Data Scientist with a background in Computer Science: Responsible for data preprocessing, feature engineering, and model development.
Statistician: Responsible for statistical analysis, model evaluation, and validation.
Business Analyst: Responsible for understanding the business requirements, interpreting the results, and providing insights for decision-making.
Supervisor: A subject matter expert who will provide guidance and support throughout the project.
Project Description:
This project aims to predict the churn of COMPANY’s loyalty customers by analyzing customer data and building predictive models. The team will work closely with COMPANY’s internal stakeholders to gather relevant data related to customer behavior, transactions, and engagement metrics. The collected data will be processed, cleaned, and transformed to create meaningful features. The team will then explore various machine learning algorithms to develop predictive models that can identify potential churners accurately.
The project will involve conducting an in-depth exploratory data analysis to uncover insights and patterns within the data. The team will perform feature engineering to derive additional relevant features from the existing dataset. These features will be used to train and evaluate predictive models using techniques such as logistic regression, decision trees, random forests, or neural networks. The models will be assessed based on metrics like accuracy, precision, and F1-score.
Resources:
Our company will provide:
A designated contact person who will offer guidance, support, and domain expertise.
Access to relevant customer data, subject to a non-disclosure agreement (NDA).
Desired Deliverables:
Churn prediction models and evaluation report: The report should detail the methodology, model performance metrics, and provide insights into key features driving churn.
Final Project Report: A comprehensive document summarizing the entire project, including data preprocessing, model development, insights, and recommendations for COMPANY.
Presentation: A final presentation summarizing the project, highlighting the key findings, and providing actionable recommendations for COMPANY to reduce churn among loyalty customers.
Title: Quality control, cleaning, and analyses of wearable data in the context of clinical trials.
Wearables have enabled the non-intrusive monitoring of subjects in medical research studies during their normal daily lives. At Stremble we have already expanded our in-house analytics platform to automatically collect data from several of our clinical trials and studies in flat JSON files. However, cleaning, organizing, and analyzing these data is a challenge. Therefore, standardizing the quality control storage and access to the data would benefit many ongoing cancer research projects.
Project Aims:
Skills:
Students in Computer Science, Mathematics or Statistics would be eligible for this position.
Programming/Scripting knowledge preferably Python
R and ideally Shinny for the analyses.
Who we are:
Stremble Ventures is a company based in Limassol, Cyprus established in 2011. We offer contract research to companies and research institutions around the world with a focus on Bioinformatics and Computational Biology. We have several ongoing EU, RIF, and commercially funded projects.
Tickmill is a retail FX broker operating globally and employing a dedicated team of over 250 professionals. With an average monthly trading volume surpassing $150 billion, Tickmill stands as a major player in the financial industry. One of its key assets is its own quantitative research team that has developed proprietary trading systems. These systems leverage extensive tabular data stored across diverse databases. The team’s primary objective is to make informed investment decisions by analyzing unconventional data. For instance, is it possible to predict Starbucks’ stock price if we know the number of people visiting their stores? If so, would combining this information with the average amount spent by customers in Starbucks stores yield improved results?
During the capstone project, students who join Tickmill’s Quantitative Research team will actively participate in tasks such as data mining, data storage utilizing various database types based on their specific requirements, and, notably, data analysis. In the financial industry, data analysis poses significant challenges for two primary reasons. Firstly, unlike voice, image, or video data, financial data cannot be generated or created. Secondly, the signal-to-noise ratio is typically low, which significantly contributes to the complexity of this particular task in machine learning and data science.
In this project, participating students will undergo a two-week training program focused on the financial industry and trading. Prior knowledge in this field is not required. Following the training, students will delve into various types of data and explore the corresponding data analysis within that domain. They will have the flexibility to use their preferred data analysis tools (although we utilize Jupyter notebook, they are free to choose the tool they are most comfortable with) to conduct their analyses. By the conclusion of the project, students will be equipped with the ability to determine which types of data are valuable for predicting the price change of financial assets and which data contain excessive noise. This understanding encompasses the concepts of correlation and causation, although it is not limited to these factors alone. In addition to this phase, which we refer to as the first stage of data analysis, students will have the opportunity to create their own features and explore the freedom to combine different types of data in order to derive meaningful insights and conclusions.
Due to confidentiality agreements, students will not be allowed to disclose data vendors names in their capstone reports.