>>>Master

... in Data Science

Capstone_Projects

>>Capstone Project

The capstone project has been designed to apply knowledge into practice and to develop and improve critical skills such as problem-solving and collaboration skills. Students are matched with research labs within the UCY community and with industry partners to investigate pressing issues, applying data science areas. Capstone projects aim to give students some professional experience in a real work environment and help enhance their soft skills.  These projects involve groups of roughly 3-4 students working in partnership.

The process is the following:

  • A short description of projects are announced to students.
  • Students bid up to three projects taking into account the fields of their interest or research.
  • The data science directors make the final assignment of projects to students. The projects are under the supervision of a member of the Programme’s academic staff.
  • Specific learning outcomes are stipulated in a learning agreement between the student, the supervisor and the company.
  • The student keeps a log file of his/her work and at the end writes a progress report (6000 words).
  • The company is obliged to monitor the progress of the students and to provide relevant mentorship.

Final assessment is carried out by the company and the supervisor.

Available Capstone Projects 

_Summer 2023

Bank of Cyprus - Project 1

As part of its risk assessment, the bank is required to predict the returns from the sales of its real estate collaterals. One of the parameters that determines those returns is the ‘recovery rate’, i.e., the % of the property market value that the Bank will recover by selling the property.

Objective: develop a predictive model for recovery rate parameter of collaterals, using the Bank’s most recent internal database.

Data: contains information on collateral sales transactions (e.g., open market value, sale price, sale date) and property characteristics (e.g., location, property type, size), for historic collaterals onboarded from 2016 onwardsPublicly available data from CY Statistical Service related to property price indexes can be used to supplement the analysis.

Deliverables:

  • Data cleansing, exploratory analysis of & feature construction
  • Model development & evaluation
  • Documentation

Bank of Cyprus - Project 2

Short description: The project aims to develop and evaluate a solution to automate the processes of model validation and identify variables that canchallenge and enhance the performance of the Bank’s behavioural credit models.

Objective: The objective of the project is to improve the efficiency and effectiveness of the model validation function in the Bank, and to ensure compliance with validation unit’s internal procedures and methods.

Data: The project will use the following data sources:

  • The Bank’s internal data on its behavioural credit models, such as model specifications, parameters, inputs, outputs, assumptions, limitations, and performance metrics.
  • The Bank’s internal data on its customers, such as credit scores, loan amounts, repayment history, default rates, and other relevant variables.
  • External data sources, such as market data, macroeconomic indicators, industry benchmarks, and peer comparisons.

Deliverables: The expected outcomes and deliverables of the project are:

  • Automation of the processes of model validation, such as data processing, model testing, back-testing, benchmarking, sensitivity analysis, and reporting.
  • Development of a machine learning algorithm that can identify variables that can enhance the performance of the Bank’s behavioural credit models, such as new features, interactions, transformations, or selection methods.
  • A report that documents the data analysis, machine learning pipeline development, machine learning pipeline evaluation, and machine learning pipeline deployment processes and results.
  • A presentation that showcases the project objectives, methods, findings, applications, and implications for the Bank’s model validation unit.

Catalink Ltd

Project Title: Cancer Prevention AI tools

Project Description: The Cancer Prevention AI tools project is a cross-functional capstone project that aims to develop a mobile application that monitors the daily living of people and suggests new food, activity, and lifestyle routines to reduce the percentage of cancer disease. The application will be built by a team of two or three students with expertise in data analytics (visual, textual, multisensory signals analysis) and one in business development (optional).

The data analytics student(s) will be responsible for developing AI models that will analyse user data, including daily food intake, physical activity, mood estimation and lifestyle habits, to provide personalized recommendations for reducing the risk of cancer. The AI models will use machine learning algorithms to identify patterns and correlations in the data and suggest changes to the user’s routine accordingly. The Data Analytics student(s) should have expertise in programming languages such as Python, and be familiar with relevant libraries and tools such as NumPy, Pandas, Scikit-learn, TensorFlow, Keras, and Matplotlib. They should also have experience in machine learning algorithms, deep learning models, and data visualization techniques.

Overall, the Cancer Prevention AI tools project will provide a valuable tools for people to monitor their daily habits and receive personalized recommendations for reducing their risk of cancer. The project will also provide valuable experience for the student team members in data analytics, and business development, as well as a tangible product to showcase their skills to potential employers (optional).

Deliverables:

AI Models: The project should deliver appropriate AI models to analyse user data, including daily food intake, physical activity, and lifestyle habits, to provide personalized recommendations for reducing the risk of cancer. The model should use machine learning algorithms to identify patterns and correlations in the data and suggest changes to the user’s routine accordingly.

Business Plan: The project should include a business plan that outlines the target market for the AI tools, the competitive landscape, and the marketing and sales strategy. The plan should also include financial projections and revenue models.

Technical Documentation: The project should include technical documentation that describes the AI tools design, and functionality, as well as instructions for installing and running the AI tools.

Presentation: The project team should prepare a final presentation that showcases the AI models, and the business plan. The presentation should include a demonstration of the AI tools’ functionality,  and the business plan’s revenue projections.

 

CYENS Center of Excellence - Project 1

Project title:

Use of ML Techniques to Enable Intrusion Detection in IoT Networks

Background:

Networks are vulnerable to costly attacks. Thus, the ability to detect these intrusions early on and minimize their impact is imperative to the financial security and reputation of an institution.

There are two mainstream systems of intrusion detection (IDS), signature-based and anomaly-based IDS. Signature-based IDS identify intrusions by referencing a database of known identity, or signature, for each of the previous intrusion events. Anomaly-based IDS attempt to identify intrusions by referencing a baseline or learned patterns of normal behavior. Under this approach, deviations from the baseline are considered intrusions.

Project aims:

 

This MSc project will utilize existing work on Lightweight Intrusion Detection for Wireless Sensor Networks  (using BLR, SVM, SOM, Isol. Trees) and extend it, considering the characteristics of IoT networks, new attacks, new topologies, and especially new classification algorithms

Reading / Datasets

Works by V. Vassiliou and C. Ioannou (University of Cyprus)

Find suitable datasets from https://www.kaggle.com/

Contact Details

 

Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy

CYENS Center of Excellence - Project 2

Project title:

Predicting YouTube Trending Video

Background:

 

A new trend in 5G and 6G networks is using the network edge (base station) for caching and processing and for supporting the quality of service and the experience of end users consuming visual content and interactive media.

Project aims:

 

Within this project, will provide the ability to group and predict users’ and/or content’s needs.  One way is to predict videos trending on social media.

Readings / Datasets

Youtube Video Info  https://www.kaggle.com/datasets/datasnaek/youtube-new

Contact Details

 

Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy

 

CYENS Center of Excellence - Project 3

Project title:

Predicting Motor-vehicle Accidents

Background:

 

Motor vehicle crashes cause loss of life, property and finances. Vehicle accidents are a focus of traffic safety research, uncovering useful information that can be directly applied to reduce these losses.

Traditionally, modeling crash events has been done using machine learning techniques, considering crash level variables, such as roadway characteristics, lighting conditions, weather conditions and the prevalence of drugs or alcohol.

 

Project aims:

 

Incorporate crash report data, road data, and demographic data to better understand crash locations and the associated cost.  Use a mixed linear modeling technique that enables data fusion in a principled way to build a better predictive model. Analyze the natural clustering of events in space by different geographic levels.

Readings / Datasets

Need to get relevant information from the Cyprus Police and the Association of Insurance Companies. 

Contact Details

 

Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy

CYENS Center of Excellence - Project 4

Project title:

Preventive to Predictive Maintenance

Background:

 

Maintenance is an integral component of operating manufacturing equipment.

Preventive maintenance occurs on the same schedule every cycle — whether or not the upkeep is actually needed. Preventive maintenance is designed to keep parts in good repair but does not take the state of a component or process into account.

Predictive maintenance occurs as needed, drawing on real-time collection and analysis of machine operation data to identify issues at the nascent stage before they can interrupt production. With predictive maintenance, repairs happen during machine operation and address an actual problem. If a shutdown is required, it will be shorter and more targeted.

While the planned downtime in preventive maintenance may be inconvenient and represents a decrease in overall capacity availability, it’s highly preferable to the unplanned downtime of reactive maintenance, where costs and duration may be unknown until the problem is diagnosed and addressed.

Preventive to Predicitve Maintenance is about the transition of a preventive maintenance strategy to a predictive maintenance strategy of a replaceable part

Project aims:

 

The objective of the project is to use the associated detailed dataset is to precisely predict the remaining useful life (RUL) of the element in question, so a transition to predictive maintenance is made possible. 

Readings / Datasets

https://www.kaggle.com/datasets/prognosticshse/preventive-to-predicitve-maintenance

Contact Details

 

Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy

 

CYENS Center of Excellence - Project 5

Project title:

Analysis and Visualization of Mobile Cellular Telephony Network Coverage and Electromagnetic Measurements

Background:

 

An interesting challenge in 5G and 6G Mobile Cellular Telephony is the need to have a large(r) number of small(er) base stations to achieve the data rates, delays and number of users expected.  The Mobile Network Operators report to the Telecommunication Regulators of each country a number of parameters, including coverage, location of stations and radiation levels.

 

Project aims:

 

The objective of the project is to collate information available in different systems/platforms and generate an up-to-date map of coverage and mobile telephony stations, alongside the information gathered through periodic and ad-hoc measurements. These will be related with measurements of mobile telephony quality of service and shown on a common reference map with the ability to explore the changes during the years.

Readings / Datasets

 

Open data from the Department of Electronic Communications. Information from the ICT market observatory of OCECPR and other public data.

http://www.emf.mcw.gov.cy/emf/?page=emfmeasurements

https://www.data.gov.cy/search/field_topic/%CE%B5%CF%80%CE%B9%CF%83%CF%84%CE%AE%CE%BC%CE%B7-%CE%BA%CE%B1%CE%B9-%CF%84%CE%B5%CF%87%CE%BD%CE%BF%CE%BB%CE%BF%CE%B3%CE%AF%CE%B1-40

 

Contact Details

 

Dr. Vasos Vassiliou, Smart Networked Systems Research Group Leader, Associate Professor, Computer Science, University of Cyprus vasosv@cyens.org.cy

Economics Research Center (CypERC) - Project 1

Title: Measuring consumer price inflation in Cyprus

We calculate Cyprus inflation in terms of the Consumer Price Index (CPI), which is compiled using the online prices (big data) for a pre-selected/fixed and representative basket of goods and services. In doing so, we employ ‘web scraping’ algorithms to visit the websites/pages of large retailers in Cyprus and collect and store the prices of goods and services available online. We then implement standard techniques and proprietary methodologies to calculate price statistics and indices. This work is inspired by the Billion Prices Project, an academic initiative at MIT and Harvard that used prices collected from hundreds of online large retailers around the world on a daily basis to conduct research in macro and international economics.

Economics Research Center (CypERC) - Project 2

Title: Studying the crude oil price pass-through into fuel and/or consumer prices in Cyprus

We leverage a high-frequency (weekly) online price series dataset in an econometric framework for pass-through estimation, forecasting and policy making. We aim to quantify how (in what degree and time) the fuel and/or consumer prices respond to an oil price shock and provide policy makers with important implications and useful information regarding consumer welfare.

Hellas Direct

Title:  Create an end-to-end MLops pipeline for a specific model (“anti-scrapping model”)

Duration: Hard deadline the end of August

Team: 3 Data Science MSc Students with backgrounds: 2 in CS, 1 in Statistics, plus one supervisor on the UCY side. The supervisor can offer weekly one-hour coaching and review sessions to the students.

Project Description:

Scope of this project is to create an end-to-end MLops pipeline for a specific model (detecting competitors trying to scrape/crawl prices from our website). The students will work with the strong guidance of our Data science team to design/implement/deploy an ML model to production.

Resources:

Our company will provide

  • quotations and sales data of the last 6 months (all covered by NDA).
  • Strong interaction with the our team.
  • A lot of positive energy.

 

Desired Deliverables:

  • A end-to-end MLops working solution (in python)
  • A final report of 3 slides

KPMG

Title: Loyalty Scheme – Client Churn Prediction

Duration: 2 months (June 2023 – July 2023)

Team Members:

Data Scientist with a background in Computer Science: Responsible for data preprocessing, feature engineering, and model development.

Statistician: Responsible for statistical analysis, model evaluation, and validation.

Business Analyst: Responsible for understanding the business requirements, interpreting the results, and providing insights for decision-making.

Supervisor: A subject matter expert who will provide guidance and support throughout the project.

Project Description:

This project aims to predict the churn of COMPANY’s loyalty customers by analyzing customer data and building predictive models. The team will work closely with COMPANY’s internal stakeholders to gather relevant data related to customer behavior, transactions, and engagement metrics. The collected data will be processed, cleaned, and transformed to create meaningful features. The team will then explore various machine learning algorithms to develop predictive models that can identify potential churners accurately.

The project will involve conducting an in-depth exploratory data analysis to uncover insights and patterns within the data. The team will perform feature engineering to derive additional relevant features from the existing dataset. These features will be used to train and evaluate predictive models using techniques such as logistic regression, decision trees, random forests, or neural networks. The models will be assessed based on metrics like accuracy, precision, and F1-score.

Resources:

Our company will provide:

A designated contact person who will offer guidance, support, and domain expertise.

Access to relevant customer data, subject to a non-disclosure agreement (NDA).

Desired Deliverables:

Churn prediction models and evaluation report: The report should detail the methodology, model performance metrics, and provide insights into key features driving churn.

Final Project Report: A comprehensive document summarizing the entire project, including data preprocessing, model development, insights, and recommendations for COMPANY.

Presentation: A final presentation summarizing the project, highlighting the key findings, and providing actionable recommendations for COMPANY to reduce churn among loyalty customers.

Stremble Ventures

Title: Quality control, cleaning, and analyses of wearable data in the context of clinical trials.

Wearables have enabled the non-intrusive monitoring of subjects in medical research studies during their normal daily lives. At Stremble we have already expanded our in-house analytics platform to automatically collect data from several of our clinical trials and studies in flat JSON files. However, cleaning, organizing, and analyzing these data is a challenge. Therefore, standardizing the quality control storage and access to the data would benefit many ongoing cancer research projects.

Project Aims:

  1. Establish a cleaning and quality control workflow that accumulates data ready for analyses in a database.
  2. Develop some analytical routines that utilize mathematics, statistics, and computer science to test hypothesis, explore the data utilizing AI approaches and produce visualizations.
  3. Integrate the analyses workflow.

Skills:

            Students in Computer Science, Mathematics or Statistics would be eligible for this position.

            Programming/Scripting knowledge preferably Python

            R and ideally Shinny for the analyses.

Who we are:

Stremble Ventures is a company based in Limassol, Cyprus established in 2011. We offer contract research to companies and research institutions around the world with a focus on Bioinformatics and Computational Biology. We have several ongoing EU, RIF, and commercially funded projects.

Tickmill

Tickmill is a retail FX broker operating globally and employing a dedicated team of over 250 professionals. With an average monthly trading volume surpassing $150 billion, Tickmill stands as a major player in the financial industry. One of its key assets is its own quantitative research team that has developed proprietary trading systems. These systems leverage extensive tabular data stored across diverse databases. The team’s primary objective is to make informed investment decisions by analyzing unconventional data. For instance, is it possible to predict Starbucks’ stock price if we know the number of people visiting their stores? If so, would combining this information with the average amount spent by customers in Starbucks stores yield improved results?  

During the capstone project, students who join Tickmill’s Quantitative Research team will actively participate in tasks such as data mining, data storage utilizing various database types based on their specific requirements, and, notably, data analysis. In the financial industry, data analysis poses significant challenges for two primary reasons. Firstly, unlike voice, image, or video data, financial data cannot be generated or created. Secondly, the signal-to-noise ratio is typically low, which significantly contributes to the complexity of this particular task in machine learning and data science.

In this project, participating students will undergo a two-week training program focused on the financial industry and trading. Prior knowledge in this field is not required. Following the training, students will delve into various types of data and explore the corresponding data analysis within that domain. They will have the flexibility to use their preferred data analysis tools (although we utilize Jupyter notebook, they are free to choose the tool they are most comfortable with) to conduct their analyses. By the conclusion of the project, students will be equipped with the ability to determine which types of data are valuable for predicting the price change of financial assets and which data contain excessive noise. This understanding encompasses the concepts of correlation and causation, although it is not limited to these factors alone. In addition to this phase, which we refer to as the first stage of data analysis, students will have the opportunity to create their own features and explore the freedom to combine different types of data in order to derive meaningful insights and conclusions.

Due to confidentiality agreements, students will not be allowed to disclose data vendors names in their capstone reports.