Capstone Projects - Master in Data Science

>>>Master

... in Data Science

Capstone_Projects

>>Capstone Project

The capstone project has been designed to apply knowledge into practice and to develop and improve critical skills such as problem-solving and collaboration skills. Students are matched with research labs within the UCY community and with industry partners to investigate pressing issues, applying data science areas. Capstone projects aim to give students some professional experience in a real work environment and help enhance their soft skills. These projects involve groups of roughly 3-4 students working in partnership.

The process is the following:

A short description of projects are announced to students.
Students bid up to three projects taking into account the fields of their interest or research.
The data science directors make the final assignment of projects to students. The projects are under the supervision of a member of the Programme’s academic staff.
Specific learning outcomes are stipulated in a learning agreement between the student, the supervisor and the company.
The student keeps a log file of his/her work and at the end writes a progress report (6000 words).
The company is obliged to monitor the progress of the students and to provide relevant mentorship.

Final assessment is carried out by the company and the supervisor.

Available Capstone Projects

_Summer 2024

Bank of Cyprus

Internal Fraud Detection System

Description

The goal of this project is to develop an Internal Fraud Detection System. The system aims to identify and flag suspicious activities suggestive of internal fraud by detecting abnormal patterns in employee behaviour and actions. Using unsupervised learning techniques, students will develop models to identify anomalies. This project includes data collection and analysis, model development and model serving.

Milestones

Data Collection and Preprocessing:

Objective: Gather and prepare data for analysis.
Tasks:
- Collect data on logs, relations, transactions, demographics.
- Clean, normalize and preprocess the data and perform feature engineering to ensure the data are prepared for analysis.

Model Development and Evaluation:

Objective: Develop and evaluate fraud detection models.
Tasks:
- Conduct exploratory data analysis (EDA) to understand the distributions and relationships between different variables.
- Apply unsupervised machine learning algorithms (e.g. clustering algorithms, anomaly detection methods etc) to develop fraud detection models that identify unusual patterns and potentially fraudulent activities.
- Since this is an unsupervised challenge use metrics whenever applicable like:
  - Isolation Forest Anomaly Score
  - Reconstruction Error
  - Z-score
  - One-class SVM
  - Local Outlier Factor

Implementation and Deployment:

Objective: Implement and test the model in a simulated environment.
Tasks:
- Deploy the selected model in a simulated environment where it will be able to accept requests.
- Use Event Driven Architecture (Kafka, RabbitMQ)
  - Producers: Responsible for sending events containing the necessary information for inference.
  - Consumers: Responsible for model inference and notify for possible fraudulent activity.
- Or, REST API where the model will be behind a GET/POST request and according to the payload it will infer if it is suspicious or not.

Possible set of data sources

Employee Viewing Activity Logs
Transactional Data
Relations
Customer Demographics

Catalink Ltd

Curating the Emotional Journey: A Museum Music Recommendation System with Emotion Recognition

Project Goal:

This capstone project challenges you to design and develop a data science application that personalizes the museum visitor experience through music. The application will leverage emotion recognition and stress level technologies to analyze visitor emotional state from video footage and heart rate signal to generate appropriate background music to fit on their visiting experience on the thematic era of the exhibit.

Problem Statement:

Museums strive to create engaging and impactful experiences for visitors. Music plays a significant role in setting the atmosphere and influencing visitor emotions. However, traditional, static background music may not effectively cater to the diverse emotional states and interests of visitors navigating various exhibits. This project aims to bridge this gap by developing a music generation system tailored to human individual emotional state.

Data Sources:

Museum Layout: This data will provide a map of the museum layout, including information on exhibit locations and themes (e.g., Ancient Egypt, Renaissance Art, Modernism).
Visitor Video Footage: Anonymized video footage captured ethically and with informed consent from visitors will be used to train the emotion recognition model.
Heart rate signal: Anonymized heart rate signal captured from visitors to train stress estimation models.
Music Library: A curated music library categorized by genre, mood, and historical period will provide the music recommendations.

Technical Stack:

Computer Vision: Techniques like facial expression recognition will be used to analyze visitor emotions from video data.
Machine Learning: A machine learning model will be trained to classify visitor emotions (e.g., joy, sadness, interest) based on facial expressions and stress level based on heart-rate signals.
Recommendation Systems: An algorithm will be developed to generate music that complements both the visitor’s emotional and stress state and the thematic era of the exhibit they are visiting.

Project Deliverables:

Data Preprocessing and Annotation (optional): Develop a system to anonymize video footage and extract relevant data for emotion recognition model training.
Emotion Recognition Model Development: Train a machine learning model to accurately classify visitor emotions from video data.
Stress estimation model development: Train a machine learning model to identify visitors’ stress level during their museum visiting experience.
Music generation System Design: Develop an algorithm that generate music based on visitor emotions, stress and exhibit themes.
Mobile Application Development (Optional): Create a mobile application that allows visitors to receive personalized music recommendations based on their location within the museum.

Success Criteria:

A successful project will demonstrate the following:

Ethical and responsible use of video footage for data collection. (optional)
Development of a high-performing emotion recognition model (happy, sad, neutral).
Development of a high-performing stress estimation model.
Implementation of a music generation system that considers both visitor emotions, stress and exhibit themes.
A functional prototype or mobile application showcasing the system’s capabilities. (optional)
Clear and concise documentation of the project methodology and results.

Deloitte

Description:

As part of its Real Estate offering, Deloitte would like to build an end-to-end real estate asset monitoring and analysis infrastructure through capturing data on a recurring basis from various publicly and non-publicly available sources. These datasets will be subsequently used to populate Deloitte’s internally developed PowerBI tool, which the business is solely going to use for internal purposes.

Objective:

Develop automated web-scraping scripts in Python to download sale and rental-related information from publicly available sources of information.
Combine the downloaded datasets from the publicly available sources to Deloitte’s purchased datasets and create analyses on Cyprus’s real estate market.

Data:

Downloaded information from publicly available domains that include pricing information on sale and rental asking prices, descriptive characteristics of the underlying real estate assets etc.
Deloitte-purchased datasets that list all officially recorded purchases of real estate assets since 2010.

Deliverables:

Automated web-scraping scripts in Python that will be downloading information on a recurring basis.
PowerBI dashboards that will include pre-determined analyses describing Cyprus’s real estate market. These analyses will be geared towards understanding asset performance (value development, rental yields, prices per sqm etc.), and identify overarching market trends.
Relevant documentation.

Drastyc

Title: Telco Network Failure Prediction

Duration: 2 months

Project Description:

This project aims to leverage advanced data science techniques to anticipate and mitigate network failures in telecommunication systems. Predicting network failures in telecommunication systems is crucial for maintaining service quality, minimizing downtime, and optimizing resource allocation. By analyzing historical data such as network performance metrics, environmental conditions, equipment status, and maintenance logs, the data science team will develop predictive models to anticipate potential failures before they occur. This will allow the Telco provider to proactively address issues, improve network reliability, and enhance customer satisfaction.

Resources:

Our company will provide:

A designated contact person who will offer guidance and support
Access to relevant customer data, subject to a non-disclosure agreement (NDA)
Experience with major enterprise customer
Startup and innovative environment

Desired Deliverables:

Data cleansing, exploratory analysis and feature construction
Model development and evaluation
Documentation

Grant Thornton

Measuring the Impact of Wildfire Risk on Property Market Prices

Objective:

Develop a predictive model to measure the impact of wildfire risk on property market prices.

Data:

The project will utilize a combination of the following datasets:

Historical property market data: Includes information on property sales transactions (e.g., market value, sale price, sale date) and property characteristics (e.g., location, property type, size) from the past decade.
Wildfire risk data: Historical data on wildfire occurrences, intensity, and affected areas, sourced from national and regional fire management agencies.
Environmental and climatic data: Data on weather patterns, temperature, and other relevant environmental factors from meteorological services.
Socioeconomic data: Information on population density, economic activity, and other relevant factors from public databases.
Supplementary data: Publicly available data from real estate and statistical services related to property price indexes.

Deliverables:

Data Collection & Integration:

Gather and compile datasets from various sources.
Ensure proper integration and alignment of the data for analysis.

Data Cleansing, Exploratory Analysis & Feature Construction:

Clean and preprocess the data to handle missing values, outliers, and inconsistencies.
Conduct exploratory data analysis (EDA) to understand the relationships and patterns within the data.
Construct relevant features that capture the influence of wildfire risk and other factors on property prices.

Model Development & Evaluation:

Develop predictive models using appropriate machine learning algorithms to estimate the impact of wildfire risk on property market prices.
Evaluate the models using standard metrics and validate their performance on test datasets.
Compare different models to identify the most effective approach for prediction.

Documentation:

Provide comprehensive documentation of the entire process, including data sources, methodologies, model development, and evaluation results.
Include detailed explanations of the assumptions, limitations, and potential implications of the findings.

Presentation:

Prepare a presentation summarizing the project’s objectives, methodologies, key findings, and recommendations.
Present the results to the stakeholders, highlighting the practical applications of the model and its potential impact on property market assessments.

Resources:

Access to relevant databases and data sources
Computational resources for data processing and model training
Software tools for data analysis, visualization, and model development (e.g., R, Python, Shiny, Flask)

Expected Outcome:

The project aims to provide a robust predictive model that accurately measures the impact of wildfire risk on property market prices. Interactive visualization presentation of results is expected in the form of a WebApp (through Shiny, Flask etc.). The findings will help users to make informed decisions regarding property investments and risk management.

Hellas Direct

Machine Learning for Fraud Detection in Insurance Claims

Background

Insurance claim fraud is a global issue that drains billions of dollars from the industry each year, escalating costs for both insurers and policyholders. In Greece, for instance, fraudulent claims tally up to €200 million annually, accounting for about 10% of all claim payouts. These challenges are not unique to Greece but are mirrored worldwide, necessitating a unified and innovative approach across the sector to tackle this pervasive problem.

Insurance claim fraud is defined as any deliberate and misleading act or omission by any individual or legal entity aimed at gaining an unlawful financial benefit for themselves or facilitating such gain for others within the context of a valid or invalid insurance contract.

The need for advanced data analytics and machine learning in fraud detection is becoming increasingly critical as fraudsters employ more sophisticated methods to circumvent traditional detection systems. Globally, insurers are turning to technology to gain an edge against these tactics, utilising big data and predictive analytics to spot irregularities and inconsistencies in claim submissions. This technological shift represents a move from reactive to proactive fraud management, where potential frauds are flagged and investigated before they result in financial loss.

Moreover, the ongoing digital transformation in the insurance industry highlights the importance of continuous innovation in fraud detection systems. The integration of AI and machine learning technologies enhances fraud detection accuracy and speeds up the process, allowing insurers to handle claims more efficiently while reducing the chances of fraudulent claims slipping through the net.

Engaging with this project provides an opportunity to contribute to significant advancements in improving this procedure and applying cutting-edge data science techniques to real-world problems. This initiative is an excellent initiative for aspiring data scientists to make a tangible difference, helping to shape the future of an industry on which everyone relies.

Objectives

Briefly, the core objectives of this project are defined as:

Development of a Predictive Statistical Model: Conceptual Design and Implementation of a robust Machine Learning Model towards the accurate identification of irregular patterns and potential fraud in claim submissions.
Data Profiling, Analysis and Processing: Processing and Analysis of historical claims to pinpoint key features and patterns associated with fraudulent activities by encapsulating and adhering to business knowledge from experts.
Imbalanced Model Optimization: Design, Implementation and Optimization of predictive models in a real-world case scenario with imbalanced datasets by experimenting, employing, and benchmarking diverse techniques.
Stakeholder Management, with the view of engaging the prototype solution with the direct Subject Matter Experts (SMEs) on a high-level commercial discussion.
Model Validation, establishing functionalities and methodologies for actual measurable outcomes.

Methodology

The principal methodology pipeline will consist of:

Data Preprocessing: The preprocessing phase will involve cleaning data, standardising formats, and engineering new features that highlight patterns indicative of fraudulent behaviour.
Exploratory Data Analysis (EDA): This phase will involve statistical analysis to understand distributions and identify outliers, correlation analysis to explore relationships between variables, and the use of visualisation tools to present these insights clearly. Effectively communicating findings helps refine the data for modelling and gain stakeholder buy-in.
Model Development: Several models will be evaluated, focusing on handling the corresponding feature distributions and the dataset’s imbalance.
Model Validation and Testing: The model’s performance will be validated using several metrics, always driving the stakeholders’ strategic business decisions. This entails fine-tuning the trade-off between different metrics that map directly into associated business metrics.
Stakeholder Engagement and Feedback: Feedback loops with end-users in the respective departments will be conducted, with the results officially presented. Their insights will guide iterative improvements to the model and its interfaces, ensuring the tool remains user-friendly and aligned with operational needs.

Resources

All the necessary resources for the completion of the project will be delivered, such as:

Respective anonymized claims data of the last months, on top of additionally demanded datasets
Interaction with the team with regular meetings, according to the demanded application
Technical leadership

KPMG

Title: Customer Behavior prediction

Project Description:

The project focuses on analyzing customer data of a Telecommunication company, specifically related to their mobile plans. The data contains information related to customers such as monthly payments, mobile usage, contract information and competitive data from other telecom vendors. The objective is to develop a methodology to process the data, extract meaningful insights through analysis and create new features that will be used to build predictive models. The models will be trained to predict customer behavior that will lead to switch to a different mobile plan. This project will provide an opportunity to gain practical experience in Data Analysis, Feature Engineering, and Machine Learning, while also gaining valuable insights into solving real-world problems in the telecommunications industry.

Resources:

Our company will provide:

contact person that will provide guidance and support
access to all necessary data (all covered by NDA)

Desired Deliverables:

Exploratory data analysis report that showcases meaningful insights (e.g., summary statistics, correlation analysis, customer segmentation)
Predictions on customer switch plan for different time intervals
Model evaluation report
Final Project report

Milliman

Health Data Analysis and/or Pricing model development

Project Description:

This dynamic and interactive project will provide students with hands-on experience applying their knowledge in data science to the actuarial field, particularly in data analysis and optimization as well as Health Insurance Pricing model development. It features a unique collaboration between Milliman’s local and international teams, ETHNIKI, THE, HELLENIC GENERAL INSURANCE CO. S.A., one of Greece’s largest insurance companies, and the University of Cyprus that successfully promotes the project.

To get a flavor of what the project will include, have a look into the below. (Note that this is not an exhaustive list of tasks):

Data preparation: UCY students will need to collect health related historical claims data* from different sources (provided by Milliman), organize the data in a way that will enable easy access and analysis, clean the data by removing any inconsistencies, duplicates or other that could diminish their reliability.
Data enhancement: To improve, if possible, the quality of the data by introducing additional relevant information.
Inflation analysis & Assumption building: For this task students will extract the trends and aim to model the inflation inherent in the historical claims of the company.
Model design: Create a robust pricing model that would utilize the cleaned and enhanced data incorporating assumptions for inflation (and any other trends identified). Use of GLM or other approaches that the students will find useful can be made.

*The intention will be to anonymize all data utilized for this Capstone project.

Resources:

UCY students will have the opportunity to work with real life Insurance market data that will include information regarding historical claims with a wealth of parameters around each claim. Milliman will provide guidance and support throughout the duration of the project and will ensure that materials provided to the students are adequate to perform the tasks.

Desired Deliverables:

The deliverable will include:
- a short report and/or presentation that would summarize the work performed (including conclusions)
- working files used (including pricing model)
To be further discussed with the UCY students and supervisor upon project kick-off

Suite5

Scope

Today, accurate flight delay prediction is among the most critical and challenging problems when scheduling and managing flights for all involved aviation data value chain stakeholders. As part of the capstone project, the flight delay prediction problem will be addressed from the perspective of an airport and for a day-ahead time horizon in terms of:

The exact (positive/negative) forecasted delays (in terms of minutes) for both the departure and arrival flights, with minimal errors/high accuracy (to the extent possible).
The forecasted flight delays (for both the departure and arrival flights) in the following classes: (a) early arrival/departure (before the scheduled time), (b) delay equal to or less than 15’ (15:59), (c) delay between 16’ (16:00) and 30’ (30:59), (d) between 31’ (31:00) and 45’ (45:59), (e) delay over 46’ (46:00) along with their probability, e.g. flight X has 58% probability of delay over 46’, 20% delay between 16’ and 30’, 12% delay between 30’ and 45’, 10% delay less than 15’.
A problematic flight forecast index including the problematic (cancelled) flights (e.g. flight X with 45% risk of cancellation due to weather conditions based on historical data)

The flight delay forecasts need to be accompanied by appropriate explanations in order to shed light into the provided forecasts, so Explainable AI (XAI) techniques (i.e. for feature relevance and counterfactual explanations) need to be leveraged.

The aviation data that will be used as part of the capstone project are confidential flight data from an airport in Europe.

Deliverables

D1. XAI Models for the Flight Delay Forecasting (Classification) & Problematic Flight Forecast Index – end of June 2024

D2. XAI Models for the Flight Delay Forecasting (Regression) & XAI Models Updates for the Flight Delay Forecasting (Classification) & Problematic Flight Forecast Index – end of July 2024

_Summer 2023

Bank of Cyprus - Project 1

As part of its risk assessment, the bank is required to predict the returns from the sales of its real estate collaterals. One of the parameters that determines those returns is the ‘recovery rate’, i.e., the % of the property market value that the Bank will recover by selling the property.

Objective: develop a predictive model for recovery rate parameter of collaterals, using the Bank’s most recent internal database.

Data: contains information on collateral sales transactions (e.g., open market value, sale price, sale date) and property characteristics (e.g., location, property type, size), for historic collaterals onboarded from 2016 onwards. Publicly available data from CY Statistical Service related to property price indexes can be used to supplement the analysis.

Deliverables:

Data cleansing, exploratory analysis of & feature construction
Model development & evaluation
Documentation

Bank of Cyprus - Project 2

Short description: The project aims to develop and evaluate a solution to automate the processes of model validation and identify variables that canchallenge and enhance the performance of the Bank’s behavioural credit models.

Objective: The objective of the project is to improve the efficiency and effectiveness of the model validation function in the Bank, and to ensure compliance with validation unit’s internal procedures and methods.

Data: The project will use the following data sources:

The Bank’s internal data on its behavioural credit models, such as model specifications, parameters, inputs, outputs, assumptions, limitations, and performance metrics.
The Bank’s internal data on its customers, such as credit scores, loan amounts, repayment history, default rates, and other relevant variables.
External data sources, such as market data, macroeconomic indicators, industry benchmarks, and peer comparisons.

Deliverables: The expected outcomes and deliverables of the project are:

Automation of the processes of model validation, such as data processing, model testing, back-testing, benchmarking, sensitivity analysis, and reporting.
Development of a machine learning algorithm that can identify variables that can enhance the performance of the Bank’s behavioural credit models, such as new features, interactions, transformations, or selection methods.
A report that documents the data analysis, machine learning pipeline development, machine learning pipeline evaluation, and machine learning pipeline deployment processes and results.
A presentation that showcases the project objectives, methods, findings, applications, and implications for the Bank’s model validation unit.

Catalink Ltd

Project Title: Cancer Prevention AI tools

Project Description: The Cancer Prevention AI tools project is a cross-functional capstone project that aims to develop a mobile application that monitors the daily living of people and suggests new food, activity, and lifestyle routines to reduce the percentage of cancer disease. The application will be built by a team of two or three students with expertise in data analytics (visual, textual, multisensory signals analysis) and one in business development (optional).

The data analytics student(s) will be responsible for developing AI models that will analyse user data, including daily food intake, physical activity, mood estimation and lifestyle habits, to provide personalized recommendations for reducing the risk of cancer. The AI models will use machine learning algorithms to identify patterns and correlations in the data and suggest changes to the user’s routine accordingly. The Data Analytics student(s) should have expertise in programming languages such as Python, and be familiar with relevant libraries and tools such as NumPy, Pandas, Scikit-learn, TensorFlow, Keras, and Matplotlib. They should also have experience in machine learning algorithms, deep learning models, and data visualization techniques.

Overall, the Cancer Prevention AI tools project will provide a valuable tools for people to monitor their daily habits and receive personalized recommendations for reducing their risk of cancer. The project will also provide valuable experience for the student team members in data analytics, and business development, as well as a tangible product to showcase their skills to potential employers (optional).

Deliverables:

AI Models: The project should deliver appropriate AI models to analyse user data, including daily food intake, physical activity, and lifestyle habits, to provide personalized recommendations for reducing the risk of cancer. The model should use machine learning algorithms to identify patterns and correlations in the data and suggest changes to the user’s routine accordingly.

Business Plan: The project should include a business plan that outlines the target market for the AI tools, the competitive landscape, and the marketing and sales strategy. The plan should also include financial projections and revenue models.

Technical Documentation: The project should include technical documentation that describes the AI tools design, and functionality, as well as instructions for installing and running the AI tools.

Presentation: The project team should prepare a final presentation that showcases the AI models, and the business plan. The presentation should include a demonstration of the AI tools’ functionality, and the business plan’s revenue projections.

CYENS Center of Excellence - Project 1

Project title:	Use of ML Techniques to Enable Intrusion Detection in IoT Networks
Background:	Networks are vulnerable to costly attacks. Thus, the ability to detect these intrusions early on and minimize their impact is imperative to the financial security and reputation of an institution. There are two mainstream systems of intrusion detection (IDS), signature-based and anomaly-based IDS. Signature-based IDS identify intrusions by referencing a database of known identity, or signature, for each of the previous intrusion events. Anomaly-based IDS attempt to identify intrusions by referencing a baseline or learned patterns of normal behavior. Under this approach, deviations from the baseline are considered intrusions.
Project aims:	This MSc project will utilize existing work on Lightweight Intrusion Detection for Wireless Sensor Networks (using BLR, SVM, SOM, Isol. Trees) and extend it, considering the characteristics of IoT networks, new attacks, new topologies, and especially new classification algorithms
Reading / Datasets	Works by V. Vassiliou and C. Ioannou (University of Cyprus) Find suitable datasets from https://www.kaggle.com/
Contact Details	Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy

CYENS Center of Excellence - Project 2

Project title:	Predicting YouTube Trending Video
Background:	A new trend in 5G and 6G networks is using the network edge (base station) for caching and processing and for supporting the quality of service and the experience of end users consuming visual content and interactive media.
Project aims:	Within this project, will provide the ability to group and predict users’ and/or content’s needs. One way is to predict videos trending on social media.
Readings / Datasets	Youtube Video Info https://www.kaggle.com/datasets/datasnaek/youtube-new
Contact Details	Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy

CYENS Center of Excellence - Project 3

Project title:	Predicting Motor-vehicle Accidents
Background:	Motor vehicle crashes cause loss of life, property and finances. Vehicle accidents are a focus of traffic safety research, uncovering useful information that can be directly applied to reduce these losses. Traditionally, modeling crash events has been done using machine learning techniques, considering crash level variables, such as roadway characteristics, lighting conditions, weather conditions and the prevalence of drugs or alcohol.
Project aims:	Incorporate crash report data, road data, and demographic data to better understand crash locations and the associated cost. Use a mixed linear modeling technique that enables data fusion in a principled way to build a better predictive model. Analyze the natural clustering of events in space by different geographic levels.
Readings / Datasets	Need to get relevant information from the Cyprus Police and the Association of Insurance Companies.
Contact Details	Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy

CYENS Center of Excellence - Project 4

Project title:	Preventive to Predictive Maintenance
Background:	Maintenance is an integral component of operating manufacturing equipment. Preventive maintenance occurs on the same schedule every cycle — whether or not the upkeep is actually needed. Preventive maintenance is designed to keep parts in good repair but does not take the state of a component or process into account. Predictive maintenance occurs as needed, drawing on real-time collection and analysis of machine operation data to identify issues at the nascent stage before they can interrupt production. With predictive maintenance, repairs happen during machine operation and address an actual problem. If a shutdown is required, it will be shorter and more targeted. While the planned downtime in preventive maintenance may be inconvenient and represents a decrease in overall capacity availability, it’s highly preferable to the unplanned downtime of reactive maintenance, where costs and duration may be unknown until the problem is diagnosed and addressed. Preventive to Predicitve Maintenance is about the transition of a preventive maintenance strategy to a predictive maintenance strategy of a replaceable part
Project aims:	The objective of the project is to use the associated detailed dataset is to precisely predict the remaining useful life (RUL) of the element in question, so a transition to predictive maintenance is made possible.
Readings / Datasets	https://www.kaggle.com/datasets/prognosticshse/preventive-to-predicitve-maintenance
Contact Details	Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy

CYENS Center of Excellence - Project 5

Project title:

Analysis and Visualization of Mobile Cellular Telephony Network Coverage and Electromagnetic Measurements

Background:

An interesting challenge in 5G and 6G Mobile Cellular Telephony is the need to have a large(r) number of small(er) base stations to achieve the data rates, delays and number of users expected. The Mobile Network Operators report to the Telecommunication Regulators of each country a number of parameters, including coverage, location of stations and radiation levels.

Project aims:

The objective of the project is to collate information available in different systems/platforms and generate an up-to-date map of coverage and mobile telephony stations, alongside the information gathered through periodic and ad-hoc measurements. These will be related with measurements of mobile telephony quality of service and shown on a common reference map with the ability to explore the changes during the years.

Readings / Datasets

Open data from the Department of Electronic Communications. Information from the ICT market observatory of OCECPR and other public data.

http://www.emf.mcw.gov.cy/emf/?page=emfmeasurements

https://www.data.gov.cy/search/field_topic/%CE%B5%CF%80%CE%B9%CF%83%CF%84%CE%AE%CE%BC%CE%B7-%CE%BA%CE%B1%CE%B9-%CF%84%CE%B5%CF%87%CE%BD%CE%BF%CE%BB%CE%BF%CE%B3%CE%AF%CE%B1-40

Contact Details

Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy

Economics Research Center (CypERC) - Project 1

Title: Measuring consumer price inflation in Cyprus

We calculate Cyprus inflation in terms of the Consumer Price Index (CPI), which is compiled using the online prices (big data) for a pre-selected/fixed and representative basket of goods and services. In doing so, we employ ‘web scraping’ algorithms to visit the websites/pages of large retailers in Cyprus and collect and store the prices of goods and services available online. We then implement standard techniques and proprietary methodologies to calculate price statistics and indices. This work is inspired by the Billion Prices Project, an academic initiative at MIT and Harvard that used prices collected from hundreds of online large retailers around the world on a daily basis to conduct research in macro and international economics.

Economics Research Center (CypERC) - Project 2

Title: Studying the crude oil price pass-through into fuel and/or consumer prices in Cyprus

We leverage a high-frequency (weekly) online price series dataset in an econometric framework for pass-through estimation, forecasting and policy making. We aim to quantify how (in what degree and time) the fuel and/or consumer prices respond to an oil price shock and provide policy makers with important implications and useful information regarding consumer welfare.

Hellas Direct

Title: Create an end-to-end MLops pipeline for a specific model (“anti-scrapping model”)

Duration: Hard deadline the end of August

Team: 3 Data Science MSc Students with backgrounds: 2 in CS, 1 in Statistics, plus one supervisor on the UCY side. The supervisor can offer weekly one-hour coaching and review sessions to the students.

Project Description:

Scope of this project is to create an end-to-end MLops pipeline for a specific model (detecting competitors trying to scrape/crawl prices from our website). The students will work with the strong guidance of our Data science team to design/implement/deploy an ML model to production.

Resources:

Our company will provide

quotations and sales data of the last 6 months (all covered by NDA).
Strong interaction with the our team.
A lot of positive energy.

Desired Deliverables:

A end-to-end MLops working solution (in python)
A final report of 3 slides

KPMG

Title: Loyalty Scheme – Client Churn Prediction

Duration: 2 months (June 2023 – July 2023)

Team Members:

Data Scientist with a background in Computer Science: Responsible for data preprocessing, feature engineering, and model development.

Statistician: Responsible for statistical analysis, model evaluation, and validation.

Business Analyst: Responsible for understanding the business requirements, interpreting the results, and providing insights for decision-making.

Supervisor: A subject matter expert who will provide guidance and support throughout the project.

Project Description:

This project aims to predict the churn of COMPANY’s loyalty customers by analyzing customer data and building predictive models. The team will work closely with COMPANY’s internal stakeholders to gather relevant data related to customer behavior, transactions, and engagement metrics. The collected data will be processed, cleaned, and transformed to create meaningful features. The team will then explore various machine learning algorithms to develop predictive models that can identify potential churners accurately.

The project will involve conducting an in-depth exploratory data analysis to uncover insights and patterns within the data. The team will perform feature engineering to derive additional relevant features from the existing dataset. These features will be used to train and evaluate predictive models using techniques such as logistic regression, decision trees, random forests, or neural networks. The models will be assessed based on metrics like accuracy, precision, and F1-score.

Resources:

Our company will provide:

A designated contact person who will offer guidance, support, and domain expertise.

Access to relevant customer data, subject to a non-disclosure agreement (NDA).

Desired Deliverables:

Churn prediction models and evaluation report: The report should detail the methodology, model performance metrics, and provide insights into key features driving churn.

Final Project Report: A comprehensive document summarizing the entire project, including data preprocessing, model development, insights, and recommendations for COMPANY.

Presentation: A final presentation summarizing the project, highlighting the key findings, and providing actionable recommendations for COMPANY to reduce churn among loyalty customers.

Stremble Ventures

Title: Quality control, cleaning, and analyses of wearable data in the context of clinical trials.

Wearables have enabled the non-intrusive monitoring of subjects in medical research studies during their normal daily lives. At Stremble we have already expanded our in-house analytics platform to automatically collect data from several of our clinical trials and studies in flat JSON files. However, cleaning, organizing, and analyzing these data is a challenge. Therefore, standardizing the quality control storage and access to the data would benefit many ongoing cancer research projects.

Project Aims:

Establish a cleaning and quality control workflow that accumulates data ready for analyses in a database.
Develop some analytical routines that utilize mathematics, statistics, and computer science to test hypothesis, explore the data utilizing AI approaches and produce visualizations.
Integrate the analyses workflow.

Skills:

Students in Computer Science, Mathematics or Statistics would be eligible for this position.

Programming/Scripting knowledge preferably Python

R and ideally Shinny for the analyses.

Who we are:

Stremble Ventures is a company based in Limassol, Cyprus established in 2011. We offer contract research to companies and research institutions around the world with a focus on Bioinformatics and Computational Biology. We have several ongoing EU, RIF, and commercially funded projects.

Tickmill

Tickmill is a retail FX broker operating globally and employing a dedicated team of over 250 professionals. With an average monthly trading volume surpassing $150 billion, Tickmill stands as a major player in the financial industry. One of its key assets is its own quantitative research team that has developed proprietary trading systems. These systems leverage extensive tabular data stored across diverse databases. The team’s primary objective is to make informed investment decisions by analyzing unconventional data. For instance, is it possible to predict Starbucks’ stock price if we know the number of people visiting their stores? If so, would combining this information with the average amount spent by customers in Starbucks stores yield improved results?

During the capstone project, students who join Tickmill’s Quantitative Research team will actively participate in tasks such as data mining, data storage utilizing various database types based on their specific requirements, and, notably, data analysis. In the financial industry, data analysis poses significant challenges for two primary reasons. Firstly, unlike voice, image, or video data, financial data cannot be generated or created. Secondly, the signal-to-noise ratio is typically low, which significantly contributes to the complexity of this particular task in machine learning and data science.

In this project, participating students will undergo a two-week training program focused on the financial industry and trading. Prior knowledge in this field is not required. Following the training, students will delve into various types of data and explore the corresponding data analysis within that domain. They will have the flexibility to use their preferred data analysis tools (although we utilize Jupyter notebook, they are free to choose the tool they are most comfortable with) to conduct their analyses. By the conclusion of the project, students will be equipped with the ability to determine which types of data are valuable for predicting the price change of financial assets and which data contain excessive noise. This understanding encompasses the concepts of correlation and causation, although it is not limited to these factors alone. In addition to this phase, which we refer to as the first stage of data analysis, students will have the opportunity to create their own features and explore the freedom to combine different types of data in order to derive meaningful insights and conclusions.

Due to confidentiality agreements, students will not be allowed to disclose data vendors names in their capstone reports.

>>>Master

... in Data Science

>>Capstone Project

Available Capstone Projects

Bank of Cyprus

Catalink Ltd

Deloitte

Drastyc

Grant Thornton

Hellas Direct

KPMG

Milliman

Suite5

Bank of Cyprus - Project 1

Bank of Cyprus - Project 2

Catalink Ltd

CYENS Center of Excellence - Project 1

CYENS Center of Excellence - Project 2

CYENS Center of Excellence - Project 3

CYENS Center of Excellence - Project 4

CYENS Center of Excellence - Project 5

Economics Research Center (CypERC) - Project 1

Economics Research Center (CypERC) - Project 2

Hellas Direct

KPMG

Stremble Ventures

Tickmill

Distinguished Lecture – Anastasia Ailamaki

Seminar – Christophe Ley

Seminar – Konstantinos Bourazas

Welcoming New Students, Celebrating Graduates, and Reuniting with Alumni!

Join us to learn about the work of the students of the MSc in Data Science in real companies (and for pizza!)

Master in Data Science Virtual Open Day