Programme structure - Master in Data Science

>>>Master

... in Data Science

Programme_Structure

>>Overview

The Master in Data Science is a highly-selective programme for students who want to begin or advance their careers in Data Science.

The duration of the programme is 1,5-years (90 ECTS), while the language of instruction is English. The programme offers 3 tracks (Computer Science Track / Statistics Track / Business Analytics Track). The first two semesters will be dedicated to core courses; students will select a track at the end of the second semester. Part of the programme is the Capstone project in Data Science, where students tackle specific and practical problems of an interdisciplinary nature. In this course students engage in all aspects of the lifecycle of data-science projects – from process modelling, data extraction, cleaning and validation, to data interpretation and visualization. The capstone project will begin in the summer term, after the end of the second semester.

>>Course Schedule

First Semester	DSC 510: Introduction to Data Science and Analytics (offered by CS)	8
	DSC 530: Probability and Statistics for Data Science (offered by MAS)	8
	DSC 531: Statistical Simulation and Data Analysis (offered by MAS)	8
	One Free Elective Course (see below)	4
Second Semester	DSC 511: Big Data Analytics (offered by CS)	8
	DSC 550: Business Analytics Applications (offered by BUS)	8
	DSC 532: Statistical Learning (offered by MAS)	8
	One Free Elective Course (see below)	4
Summer Semester	Capstone Project in Data Science (1st Phase)	5
Third Semester	Computer Science Track/ Statistics Track/Business Analytics Track Course	8
	Computer Science Track/ Statistics Track/Business Analytics Track Course	8
	Computer Science Track/ Statistics Track/Business Analytics Track Course	8
	Capstone Project in Data Science (2nd Phase)	5

>>Course Descriptions

Core Courses

DSC 510: Introduction to Data Science and Analytics

This course will examine how data analysis technologies can be used to improve decision-making. The aim is to study the fundamental principles and techniques of data science, and we will examine real- world examples and cases to place data science techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science. In addition, this course will work hands-on with the Python programming language and its associated data analysis libraries.

DSC 530: Probability and Statistics for Data Science

This is a theoretical course covering fundamentals topics of probability and statistics in the context of data science with its inherent challenges. This course will start with a review of fundamental probability, covering topics like random variables, their distribution functions, expected values, conditioning on certain events and independence. The students will be acquainted with certain families of probability distributions and then will learn how to estimate certain quantities of interest from observations. A range of properties of estimators will be studied, including sufficiency, unbiasedness and consistency, which enable the evaluation of their quality with an emphasis in the framework of big datasets. The students will also learn how to introduce different types of hypotheses, how to construct tests for their hypotheses, as well as how to compare between tests and how to construct confidence intervals for their estimators.

DSC 531: Statistical Simulations and Data Analysis

The students will be introduced to the R programming language, a programming language that was specifically developed for analyzing data, and is today widely used in most organizations that conduct data analysis. The students will learn how to explore datasets in R, using basic visualization tools and summary statistics, how to run different kinds of regressions and analyses, and how to perform statistical inference in practice, for example how to test certain hypotheses regarding the data or how to compute confidence intervals for quantities of interest. The students will also learn how to use R in order to conduct simulations, an extremely useful tool that can fulfill a wide range of analytical tasks. Simulation techniques covered will include Monte Carlo, importance sampling and rejection sampling. Finally, the students will learn how to estimate the precision of computed sample statistics using resampling methods. The course uses a hands-on approach, with nearly half the work done in the lab.

DSC 511: Big Data Analytics

This course seeks a balance between foundational but relatively basic material in algorithms, statistics, graph theory and related fields, with real-world applications inspired by the current practice of internet and cloud services. Specifically, this course will look at social and information networks, recommender systems, clustering and community detection, search/retrieval/topic models, dimensionality reduction, stream computing, and online ad auctions. Together, these provide a good coverage of the main uses for data mining and analytics applications in social networking, e-commerce, social media, etc. The course is a combination of theoretical materials and weekly laboratory sessions, where several large-scale datasets from the real world will be explored. For this, students will work with a dedicated infrastructure based on Hadoop and Apache Spark.

DSC 550: Business Analytics Applications

This course presents knowledge and skills for applying business analytics to managerial decision-making in modern organizations. Key topics include descriptive, predictive, and prescriptive analytics, measuring the economic value of information in analytics investments, and using data to improve decision making under risk and uncertainty. Specifically, students will learn how to use data and analysis to make better decisions across different functional areas of the organization.

DSC 532: Statistical Learning

Students will acquire the knowledge to conduct statistical analysis on a variety of data sets using a wide range of modern computerized methods. The students will learn how to recognize which tools are needed to analyze different types of datasets, how to apply these tools in each case, and how to employ diagnostics to assess the quality of their results. They will learn about statistical models, their complexity and their relative benefits depending on the available data. Some of the tools that will be discuussed include linear simple and multiple regression, nearest neighbors methods, shrinkage methods (ridge, lasso), dimension reduction methods (principal components), logistic regression, linear discriminant analysis, tree-based methods, model selection algorithms and clustering. The focus of the course will be less on theory and more on providing the students with as much intuition as possible and acquainting them with as many methods as possible. The course will make substantial use of the R statistical programming language and its libraries.

Elective Courses of Specializations

_Computer Science Track

DSC 512: Information Retrieval and Search Engines

This course covers search engine technologies, which play an important role in any data mining applications involving text data. Key topics include Boolean Retrieval; Text encoding: tokenisation, stemming, lemmatisation, stop words, phrases; Dictionaries and Tolerant retrieval. Index Construction and Compression; Scoring and Term Weighting; Vector Space Retrieval;Evaluation in information retrieval; Relevance feedback/query expansion; Text classification and Naive Bayes; Vector Space Classification; Data Clustering; Web crawling and indexes; Link analysis.

DSC 513: Advanced Topics in Data Management

This course covers the fundamentals of modern Database Management Systems (DBMSs). Key topics include storage, indexing, query optimization, transaction processing, concurrency and recovery. Fundamentals of Distributed DBMSs, Web Databases and Cloud Databases (NoSQL / NewSQL): Semi-structured data management (XML/JSON, XPath and XQuery), Document data-stores (i.e., CouchDB, MongoDB, RavenDB), Key-Value data-stores (e.g., BerkeleyDB, MemCached), Introduction to Cloud Computing (NFS, GFS/Hadoop HDFS, Replication/Consistency Principles), Big-data processing/analytic frameworks (Apache MapReduce/PIG, Spark/Shark), Column-stores (e.g., Google’s BigTable, Apache’s HBase, Apache’s Cassandra), Graph databases (e.g., Twitter. FlockDB) and Overview of NewSQL (Google’s Spanner/F1). Spatio-temporal data management (trajectories, privacy, analytics) and index structures (e.g., R-Trees, Grid Files) as well as other selected and advanced topics, including: Embeeded Databases (sqlite), Sensor / Smartphone / Crowd data management, Energy-aware data management, Flash storage, Stream Data Management, etc.

DSC 514: Natural Language Processing

This course covers topics and technologies related to Natural language processing (NLP). NLP is one of the most important technologies of the information age, and a crucial part of artificial intelligence. Applications of NLP are everywhere because people communicate almost everything in language: web search, advertising, emails, customer service, language translation, medical reports, etc. In this course, several models and algorithms for automated textual data processing will be described: (1) morpho-lexical level: electronic lexica, spelling checkers; (2) syntactic level: regular, context-free, stochastic grammars, parsing algorithms; (3) semantic level: models and formalisms for the representation of meaning. Several application domains will be presented: Linguistic engineering, Information Retrieval, Text mining (automated knowledge extraction), Textual Data Analysis (automated document classification, visualization of textual data).

DSC 515: Deep Learning

This course covers the latest techniques in deep learning, focusing on regularization, optimization algorithms, convolutional networks, sequence modeling, and embedding methods internals with applications to computer vision and natural language understanding. In this course will offer students the conceptual background, deep learning techniques used in industry, and research perspectives.

DSC 516: Cloud Computing

This course covers topics and technologies related to Cloud Computing and their practical implementations. The course is organized in four parts focusing on: (i) Fundamental concepts and models of Cloud Computing; (ii) Cloud-enabling technologies: warehouse-scale machines, virtualization, and storage; (iii) Cloud application programming models and paradigms. (iv) Cloud resource orchestration, monitoring, and DevOps. The student will explore different architectural and service models of cloud computing, the concepts of virtualization, containerization, and cloud orchestration. Through lectures, tutorials, and laboratory sessions, the student will gain hands-on experience with various features of popular cloud platforms, such as Openstack, VMWare, Docker, and Kubernetes, as well as commercial offerings like Google App Engine, Microsoft Azure and Amazon Web Service. Advanced cloud programming paradigms such as Hadoop’s MapReduce and Microservices are also included in the course. Students will also learn the concept of modern Big Data analysis on cloud platforms using various data mining tools and techniques. The lab sessions will cover cloud application development and deployment, use of cloud storage, creation and configuration of virtual machines and data analysis on cloud using data mining tools. Different application scenarios from popular domains that leverage the cloud technologies such as online social networks will be explained. The theoretical knowledge, practical sessions and assignments aim to help students to build their skills to develop large-scale industry standard applications using cloud platforms and tools.

DSC 517: Data Security

Processing data is often realized through systems that can operate under hostile conditions, where adversaries try to monetize access to sensitive data. In this course we provide a short introduction of data security, and we review the basic arsenal we have for protection. We cover a large portion of applied cryptographic primitives and protocols that facilitate secure transmission of data. We then proceed and review how systems that process data can be attacked and protected. Finally, we discuss advanced attacks, and potential defenses, for systems that are based on Machine Learning.

DSC 551: Data Visualization

Introduction to Data visualization, Web development, Javascript, Data driven documents (D3.js), Interaction, filtering, aggregation, Perception, cognition, Designing visualizations (UI/UX), Text visualization, Graphs, Tabular data viz Music viz, Introduction to scientific visualization, Storytelling with data / data journalism, Creative coding.

_Statistics Track

DSC 533: Survey Sampling

This course studies the process of conducting surveys. Topics that will be discussed include survey design, sampling and nonsampling errors, simple random sampling, stratified sampling, systematic sampling, cluster sampling, ratio estimators, regression estimators, determination of optimal sample size, bias in survey sampling and modern techniques of survey sampling.

DSC 534: Time Series Analysis

This course studies the analysis of time series, that is temporal stochastic processes. Topics covered include: stochastic processes, weak and strong stationarity. Autoregressive and moving average based models for stationary and non-stationary time series. Trend and seasonal behaviour, sample autocorrelation function and sample partial autocorrelation function. Parameter estimations, model identification, prediction. ARMA, ARIMA and SARIMA models. Properties, estimation and examples. ARCH and GARCH models for volatility.

DSC 535: Multivariate Analysis

This course studies topics from multivariate statistical analysis. Topics covered include: random vectors, measures of center and variation in multivariate moments. Multivariate normal distribution. Tests for normality. Estimation of the mean vector and the variance analysis, independence, multivariate –covariance matrix. Wishart and Hotelling distributions. Statistical inference. Union – Intersection Test. Confidence regions. Multivariate analysis of variance and multivariate regression analysis. Least squares method and Wilks distribution. Analysis of covariance. Principal components, Factor analysis, Discriminant analysis, Cluster analysis. The R statistical programming language will be used for applying the introduced methods in a range of Data Science problems.

DSC 536: Bayesian Statistics

This course introduces Bayesian Statistics, an intuitive approach to Statistics allowing for better accounting of uncertainty. Topics include: subjective probability, Bayes rule, prior and posterior distributions, conjugate and non-informative priors, point-wise estimation and credible intervals, hypothesis testing, introduction to Bayesian decision analysis, introduction to empirical Bayes analysis, introduction to Markov chain Monte Carlo techniques. The course will make use of R statistical programming language for the implementation of algorithms for extracting information from the posterior and for the application of the introduced methods in a range of Data Science problems.

DSC 537: Computational Statistics

This course studies the interplay between Statistics and Computation, an area of vital importance in computer-age Statistics. Topics covered include: multiple regression, Cholesky decomposition, diagnostics and collinearity, principal components and eigenvalue problems. Nonlinear statistical methods: Maximum likelihood estimation, Newton-Raphson and related methods, multivariate data and the Newton Raphson method, optimization techniques (unconditional and under constraints) EM algorithm. Numerical Integration and Approximation: Newton-Coates method, spline interpolation, Monte Carlo integration, general approximation methods. Probability Density Estimation: Histogram, linear and non-linear smoothing, splines. Bootstrap.

DSC 538: Special Topics in Statistics

This course explores advanced and emerging topics in the field of statistics, providing students with an in-depth understanding of specialized areas not typically covered in standard statistical curricula. Each offering of the course will focus on a specific theme, which may vary depending on current trends, faculty expertise, and student interest.

Possible topics based on the current faculty include, but are not limited to Machine Learning Methods, Advanced Time Series Methods, Network Analysis, Bayesian Nonparametric Statistics and Applications, Statistical Methods in Genomics and Bioinformatics, Nonparametric and Semiparametric Methods, Spatial Statistics, Causal Inference and Survey Sampling (with a focus on privacy).

_Business Analytics Track

DSC 551: Data Visualization

DSC552: Complex Thinking and Reflective Judgment in Business Analytics

The purpose of this course is to introduce Business Analytics students to complex forms of thinking, which, ultimately, lead to reflective judgment, in the effort to cope effectively with real-world problems, especially ill-structured ones. The course will explore the features of complex thinking, provide the tools for developing it, and suggest ways in which complex thinking leads to reflective judgment.

Thinking becomes complex when it embraces uncertainty, accommodates paradox and allows for the ineffable; is dynamic and flexible; holistic, critical, and reflexive. Judgment becomes reflective when it is evidence-based and context-sensitive, while seeking unifying principles in the service of a good purpose. In this course, reflective judgment will be taken to be tantamount to practical wisdom (Aristotle’s phronesis). Developing reflective judgment, one needs to cope with paradoxes; reduce noise; grow a holistic and dynamic understanding of problems; get a proper grasp of intuition and insight, heuristics and biases; be in tune with one’s emotions; and sharpen critical thinking skills in ways that enable the disclosive framing of problems, namely, bringing to surface unspoken assumptions, hidden values, and the ethical dimension. Against a background of big data and AI, the course will explore the critical role of embodied knowledge, perception, and ethical clarity for reflective judgment.

DSC 553: Data-driven Project Management

This course examines the project management process with a focus on business analytics techniques to overcome the pitfalls and obstacles that frequently occur during a typical project. Designed for business leaders responsible for implementing projects, as well as beginning and intermediate project managers. Includes topics on planning and scheduling issues, costing and budgeting, staffing and organizing, project management methodologies, and the use of data to inform the project manager’s decisions throughout the project’s lifecycle. During the course, computer software dealing with project management will also be presented.

DSC 554: Information Networks

Topics include: how to model the formation of social and economic networks; understand and measure certain patterns of real-world networks; identify, quantify and model how opinions, fads, political movements and diseases spread through interconnected systems and measure the robustness and fragility of them. We will bring together models and techniques from economics, sociology, math, physics, statistics and computer science to answer these questions.

In more detail the course will include: Repetition of Statistical Definitions, Background and Network Elements, Networking, Social Networking & Behavioral Contagion, Project Management Networks, Economic complexity, Visualization of Networks.

DSC 555: Prescriptive Analytics and Decision-Making

This course provides a framework for understanding and applying prescriptive analytics (primarily optimization modeling) in various practical settings, preparing students for advanced roles in data-driven and analytical decision making roles. Prescriptive analytics is the branch of data science that examines methods and tools for direct support in quantitative decision-making problems. It builds on insights from descriptive and exploratory analysis, and utilizes forecasts from predictive analysis to build integrative models that generate actionable recommendations to achieve desired goals. Thus, its emphasis is on identifying, among the set of feasible solutions (those conforming to functional requirements and constraints) those that optimize specific decision objectives. It is widely used in many application domains such as production, supply chain management, finance, healthcare, etc., to enhance decision-making effectiveness and improve operational efficiency. This course provides comprehensive exploration of different forms of optimization models and associated solution techniques, emphasizing their applications in practical cases.

DSC 556: Large Language Models (LLMs) in Business Analytics

This course aims to teach students how Large Language Models (LLMs) can improve data analytics and decision-making in different fields. The course covers the basics of LLMs, focusing on how LLMs like Transformer models work and their applications in real-world scenarios. Students will learn to use LLMs effectively, understand their benefits and challenges, and explore their ethical implications. Through case studies and discussions, the course helps students develop critical thinking skills to responsibly implement and evaluate LLMs in various business functions, including Marketing and Sales, Customer Service, and HR.

DSC 557: Special Topics in Business Analytics

Enterprises, organizations and individuals are creating, collecting, and using massive amount of structured and unstructured data with the goal to convert the information into knowledge, to improve the quality and the efficiency of their decision-making process, and to better position themselves to the highly competitive marketplace. Data mining is the process of finding, extracting, visualizing and reporting useful information and insights from both small and large datasets with the help of sophisticated data analysis methods. It is part of the business analytics, which refers to the process of leveraging different forms of analytical techniques to achieve desired business outcomes through requiring business relevancy, actionable insight, performance management, and value management. The students in this course will study the fundamental principles and techniques of data mining. They will learn how to apply advanced models and software applications for data mining. Finally, students will learn how to examine the overall business process of an organization or a project with the goal to understand (i) the business context where hidden internal and external value is to be identified and captured, and (ii) exactly what the selected data mining method does.

DSC 558: Financial Theory

The course presents the theory of financial decisions and corporate policy. It covers discounted cash flow and contemporary methods of capital budgeting (comparison of techniques, relevant cash flows, projects with different lives, optimal timing, constraints, inflation), risk and uncertainty, mean-variance portfolio choice, capital asset pricing models and arbitrage pricing theory, efficient markets, capital structure and dividend policy, basic option pricing, corporate restructuring and mergers and acquisitions.

DSC 559: Investments

Possible topics based on the current faculty include, but are not limited to Marketing and Web Analytics, AB Testing, Data Governance, Data Management and Warehousing, Business Intelligence, Ethics and Data Privacy, Human Resources Analytics and Soft skills and Communication for Business Analysts.

Free Elective Courses

The program offers 3 free elective courses, aimed at developing technical skills relevant to Data Science:

DSC501 Python Crush Course (2 ECTS)
DSC581 Data Manipulation (4 ECTS)
DSC582 Data Science Toolbox (2 ECTS)

Addtiional free elective courses are offered by other entities of the University of Cyprus, e.g. Department of Law, Center of Entrepreneurship etc.

For example, the courses offered by the Center of Entrepreneurship can be found in the following link:

https://www.c4e.org.cy/activities/education-and-training/free-elective-courses

>>Capstone Project

The capstone project has been designed to apply knowledge into practice and to develop and improve critical skills such as problem-solving and collaboration skills. Students are matched with research labs within the UCY community and with industry partners to investigate pressing issues, applying data science areas. Capstone projects aim to give students some professional experience in a real work environment and help enhance their soft skills.

The process is the following:

A short description of projects are announced to students.
Students bid up to three projects taking into account the fields of their interest or research.
The data science directors make the final assignment of projects to students. The projects are under the supervision of a member of the Programme’s academic staff.
Specific learning outcomes are stipulated in a learning agreement between the student, the supervisor and the company.
The student keeps a log file of his/her work and at the end writes a progress report (6000 words).
The company is obliged to monitor the progress of the students and to provide relevant mentorship.

Final assessment is carried out by the company and the supervisor.

_Optional Research Project

Students that would like to continue their studies and be enrolled in a PhD programme, have the option of an additional research project. This option will be made available only to exceptional students who have clearly demonstrated research interest during their studies. The Interdepartmental Board of Directors has the discretion of offering this option to specific students, provided appropriate supervisors have been found. Specifically, the research project option will be available to students who want to pursue doctoral studies in any area covered by the programme. Such students need to satisfy the respective departmental Ph.D. admission criteria, while the research project will replace a restricted elective course.

>>>Master

... in Data Science

>>Overview

>>Course Schedule

>>Course Descriptions

Core Courses

DSC 510: Introduction to Data Science and Analytics

DSC 530: Probability and Statistics for Data Science

DSC 531: Statistical Simulations and Data Analysis

DSC 511: Big Data Analytics

DSC 550: Business Analytics Applications

DSC 532: Statistical Learning

Elective Courses of Specializations

DSC 512: Information Retrieval and Search Engines

DSC 513: Advanced Topics in Data Management

DSC 514: Natural Language Processing

DSC 515: Deep Learning

DSC 516: Cloud Computing

DSC 517: Data Security

DSC 551: Data Visualization

DSC 533: Survey Sampling

DSC 534: Time Series Analysis

DSC 535: Multivariate Analysis

DSC 536: Bayesian Statistics

DSC 537: Computational Statistics

DSC 538: Special Topics in Statistics

DSC 551: Data Visualization

DSC552: Complex Thinking and Reflective Judgment in Business Analytics

DSC 553: Data-driven Project Management

DSC 554: Information Networks

DSC 555: Prescriptive Analytics and Decision-Making

DSC 556: Large Language Models (LLMs) in Business Analytics

DSC 557: Special Topics in Business Analytics

DSC 558: Financial Theory

DSC 559: Investments

Free Elective Courses

>>Capstone Project

_Optional Research Project

Distinguished Lecture – Anastasia Ailamaki

Seminar – Christophe Ley

Seminar – Konstantinos Bourazas

Welcoming New Students, Celebrating Graduates, and Reuniting with Alumni!

Join us to learn about the work of the students of the MSc in Data Science in real companies (and for pizza!)

Master in Data Science Virtual Open Day