>>>Master

... in Data Science

Programme_Structure

>>Overview

The Master in Data Science is a highly-selective programme for students who want to begin or advance their careers in Data Science.

The duration of the programme is 1,5-years (90 ECTS), while the language of instruction is English. The programme offers 3 tracks (Computer Science Track / Statistics Track / Business Analytics Track). The first two semesters will be dedicated to core courses; students will select a track at the end of the second semester. Part of the programme is the Capstone project in Data Science, where students tackle specific and practical problems of an interdisciplinary nature. In this course students engage in all aspects of the lifecycle of data-science projects – from process modelling, data extraction, cleaning and validation, to data interpretation and visualization. The capstone project will begin in the summer term, after the end of the second semester.

>>Course Schedule

First SemesterDSC 510: Introduction to Data Science and Analytics (offered by CS)8
DSC 530: Probability and Statistics for Data Science (offered by MAS)8
DSC 531: Statistical Simulation and Data Analysis (offered by MAS)8
One Free Elective Course (offered by other entities of the University of Cyprus, e.g. Department of Law, Center for Entrepreneurship etc.)4
Second SemesterDSC 511: Big Data Analytics (offered by CS)8
DSC 550: Business Analytics Applications (offered by BUS)8
DSC 532: Statistical Learning (offered by MAS)8
One Free Elective Course (offered by other entities of the University of Cyprus, e.g. Department of Law, Center for Entrepreneurship etc.)4
Summer SemesterCapstone Project in Data Science (1st Phase)5
Third SemesterComputer Science Track/ Statistics Track/Business Analytics Track Course8
Computer Science Track/ Statistics Track/Business Analytics Track Course8
Computer Science Track/ Statistics Track/Business Analytics Track Course8
Capstone Project in Data Science (2nd Phase)5

>>Course Descriptions

Core Courses

DSC 510: Introduction to Data Science and Analytics

This course will examine how data analysis technologies can be used to improve decision-making. The aim is to study the fundamental principles and techniques of data science, and we will examine real- world examples and cases to place data science techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science. In addition, this course will work hands-on with the Python programming language and its associated data analysis libraries.

DSC 530: Probability and Statistics for Data Science

This is a theoretical course covering fundamentals topics of probability and statistics in the context of data science with its inherent challenges. This course will start with a review of fundamental probability, covering topics like random variables, their distribution functions, expected values, conditioning on certain events and independence. The students will be acquainted with certain families of probability distributions and then will learn how to estimate certain quantities of interest from observations. A range of properties of estimators will be studied, including sufficiency, unbiasedness and consistency, which enable the evaluation of their quality with an emphasis in the framework of big datasets. The students will also learn how to introduce different types of hypotheses, how to construct tests for their hypotheses, as well as how to compare between tests and how to construct confidence intervals for their estimators.

DSC 531: Statistical Simulations and Data Analysis

The students will be introduced to the R programming language, a programming language that was specifically developed for analyzing data, and is today widely used in most organizations that conduct data analysis. The students will learn how to explore datasets in R, using basic visualization tools and summary statistics, how to run different kinds of regressions and analyses, and how to perform statistical inference in practice, for example how to test certain hypotheses regarding the data or how to compute confidence intervals for quantities of interest. The students will also learn how to use R in order to conduct simulations, an extremely useful tool that can fulfill a wide range of analytical tasks. Simulation techniques covered will include Monte Carlo, importance sampling and rejection sampling. Finally, the students will learn how to estimate the precision of computed sample statistics using resampling methods. The course uses a hands-on approach, with nearly half the work done in the lab.

DSC 511: Big Data Analytics

This course seeks a balance between foundational but relatively basic material in algorithms, statistics, graph theory and related fields, with real-world applications inspired by the current practice of internet and cloud services. Specifically, this course will look at social and information networks, recommender systems, clustering and community detection, search/retrieval/topic models, dimensionality reduction, stream computing, and online ad auctions. Together, these provide a good coverage of the main uses for data mining and analytics applications in social networking, e-commerce, social media, etc. The course is a combination of theoretical materials and weekly laboratory sessions, where several large-scale datasets from the real world will be explored. For this, students will work with a dedicated infrastructure based on Hadoop and Apache Spark.

DSC 550: Business Analytics Applications

This course presents knowledge and skills for applying business analytics to managerial decision-making in modern organizations. Key topics include descriptive, predictive, and prescriptive analytics, measuring the economic value of information in analytics investments, and using data to improve decision making under risk and uncertainty. Specifically, students will learn how to use data and analysis to make better decisions across different functional areas of the organization.

DSC 532: Statistical Learning

Students will acquire the knowledge to conduct statistical analysis on a variety of data sets using a wide range of modern computerized methods. The students will learn how to recognize which tools are needed to analyze different types of datasets, how to apply these tools in each case, and how to employ diagnostics to assess the quality of their results. They will learn about statistical models, their complexity and their relative benefits depending on the available data. Some of the tools that will be discuussed include linear simple and multiple regression, nearest neighbors methods, shrinkage methods (ridge, lasso), dimension reduction methods (principal components), logistic regression, linear discriminant analysis, tree-based methods, model selection algorithms and clustering. The focus of the course will be less on theory and more on providing the students with as much intuition as possible and acquainting them with as many methods as possible. The course will make substantial use of the R statistical programming language and its libraries.

Elective Courses of Specializations

_Computer Science Track

DSC 512: Information Retrieval and Search Engines

This course covers search engine technologies, which play an important role in any data mining applications involving text data. Key topics include Boolean Retrieval; Text encoding: tokenisation, stemming, lemmatisation, stop words, phrases; Dictionaries and Tolerant retrieval. Index Construction and Compression; Scoring and Term Weighting; Vector Space Retrieval;Evaluation in information retrieval; Relevance feedback/query expansion; Text classification and Naive Bayes; Vector Space Classification; Data Clustering; Web crawling and indexes; Link analysis.

DSC 513: Advanced Topics in Data Management

This course covers the fundamentals of modern Database Management Systems (DBMSs). Key topics include storage, indexing, query optimization, transaction processing, concurrency and recovery. Fundamentals of Distributed DBMSs, Web Databases and Cloud Databases (NoSQL / NewSQL): Semi-structured data management (XML/JSON, XPath and XQuery), Document data-stores (i.e., CouchDB, MongoDB, RavenDB), Key-Value data-stores (e.g., BerkeleyDB, MemCached), Introduction to Cloud Computing (NFS, GFS/Hadoop HDFS, Replication/Consistency Principles), Big-data processing/analytic frameworks (Apache MapReduce/PIG, Spark/Shark), Column-stores (e.g., Google’s BigTable, Apache’s HBase, Apache’s Cassandra), Graph databases (e.g., Twitter. FlockDB) and Overview of NewSQL (Google’s Spanner/F1). Spatio-temporal data management (trajectories, privacy, analytics) and index structures (e.g., R-Trees, Grid Files) as well as other selected and advanced topics, including: Embeeded Databases (sqlite), Sensor / Smartphone / Crowd data management, Energy-aware data management, Flash storage, Stream Data Management, etc.

DSC 514: Natural Language Processing

This course covers topics and technologies related to Natural language processing (NLP). NLP is one of the most important technologies of the information age, and a crucial part of artificial intelligence. Applications of NLP are everywhere because people communicate almost everything in language: web search, advertising, emails, customer service, language translation, medical reports, etc. In this course, several models and algorithms for automated textual data processing will be described: (1) morpho-lexical level: electronic lexica, spelling checkers; (2) syntactic level: regular, context-free, stochastic grammars, parsing algorithms; (3) semantic level: models and formalisms for the representation of meaning. Several application domains will be presented: Linguistic engineering, Information Retrieval, Text mining (automated knowledge extraction), Textual Data Analysis (automated document classification, visualization of textual data).

DSC 515: Deep Learning

This course covers the latest techniques in deep learning, focusing on regularization, optimization algorithms, convolutional networks, sequence modeling, and embedding methods internals with applications to computer vision and natural language understanding. In this course will offer students the conceptual background, deep learning techniques used in industry, and research perspectives.

DSC 516: Cloud Computing

This course covers topics and technologies related to Cloud Computing and their practical implementations. The course is organized in four parts focusing on: (i) Fundamental concepts and models of Cloud Computing; (ii) Cloud-enabling technologies: warehouse-scale machines, virtualization, and storage; (iii) Cloud application programming models and paradigms. (iv) Cloud resource orchestration, monitoring, and DevOps. The student will explore different architectural and service models of cloud computing, the concepts of virtualization, containerization, and cloud orchestration. Through lectures, tutorials, and laboratory sessions, the student will gain hands-on experience with various features of popular cloud platforms, such as Openstack, VMWare, Docker, and Kubernetes, as well as commercial offerings like Google App Engine, Microsoft Azure and Amazon Web Service. Advanced cloud programming paradigms such as Hadoop’s MapReduce and Microservices are also included in the course. Students will also learn the concept of modern Big Data analysis on cloud platforms using various data mining tools and techniques. The lab sessions will cover cloud application development and deployment, use of cloud storage, creation and configuration of virtual machines and data analysis on cloud using data mining tools. Different application scenarios from popular domains that leverage the cloud technologies such as online social networks will be explained. The theoretical knowledge, practical sessions and assignments aim to help students to build their skills to develop large-scale industry standard applications using cloud platforms and tools.

DSC 517: Data Security

Processing data is often realized through systems that can operate under hostile conditions, where adversaries try to monetize access to sensitive data. In this course we provide a short introduction of data security, and we review the basic arsenal we have for protection. We cover a large portion of applied cryptographic primitives and protocols that facilitate secure transmission of data. We then proceed and review how systems that process data can be attacked and protected. Finally, we discuss advanced attacks, and potential defenses, for systems that are based on Machine Learning.

DSC 551: Data Visualization

Introduction to Data visualization, Web development, Javascript, Data driven documents (D3.js), Interaction, filtering, aggregation, Perception, cognition, Designing visualizations (UI/UX), Text visualization, Graphs, Tabular data viz Music viz, Introduction to scientific visualization, Storytelling with data / data journalism, Creative coding.

_Statistics Track

DSC 533: Survey Sampling

This course studies the process of conducting surveys. Topics that will be discussed include survey design, sampling and nonsampling errors, simple random sampling, stratified sampling, systematic sampling, cluster sampling, ratio estimators, regression estimators, determination of optimal sample size, bias in survey sampling and modern techniques of survey sampling.

DSC 534: Time Series Analysis

This course studies the analysis of time series, that is temporal stochastic processes. Topics covered include: stochastic processes, weak and strong stationarity. Autoregressive and moving average based models for stationary and non-stationary time series. Trend and seasonal behaviour, sample autocorrelation function and sample partial autocorrelation function. Parameter estimations, model identification, prediction. ARMA, ARIMA and SARIMA models. Properties, estimation and examples. ARCH and GARCH models for volatility.

DSC 535: Multivariate Analysis

This course studies topics from multivariate statistical analysis. Topics covered include: random vectors, measures of center and variation in multivariate moments. Multivariate normal distribution. Tests for normality. Estimation of the mean vector and the variance analysis, independence, multivariate –covariance matrix. Wishart and Hotelling distributions. Statistical inference. Union – Intersection Test. Confidence regions. Multivariate analysis of variance and multivariate regression analysis. Least squares method and Wilks distribution. Analysis of covariance. Principal components, Factor analysis, Discriminant analysis, Cluster analysis. The R statistical programming language will be used for applying the introduced methods in a range of Data Science problems.

DSC 536: Bayesian Statistics

This course introduces Bayesian Statistics, an intuitive approach to Statistics allowing for better accounting of uncertainty. Topics include: subjective probability, Bayes rule, prior and posterior distributions, conjugate and non-informative priors, point-wise estimation and credible intervals, hypothesis testing, introduction to Bayesian decision analysis, introduction to empirical Bayes analysis, introduction to Markov chain Monte Carlo techniques. The course will make use of R statistical programming language for the implementation of algorithms for extracting information from the posterior and for the application of the introduced methods in a range of Data Science problems.

DSC 537: Computational Statistics

This course studies the interplay between Statistics and Computation, an area of vital importance in computer-age Statistics. Topics covered include: multiple regression, Cholesky decomposition, diagnostics and collinearity, principal components and eigenvalue problems. Nonlinear statistical methods: Maximum likelihood estimation, Newton-Raphson and related methods, multivariate data and the Newton Raphson method, optimization techniques (unconditional and under constraints) EM algorithm. Numerical Integration and Approximation: Newton-Coates method, spline interpolation, Monte Carlo integration, general approximation methods. Probability Density Estimation: Histogram, linear and non-linear smoothing, splines. Bootstrap.

_Business Analytics Track

DSC 551: Data Visualization

Introduction to Data visualization, Web development, Javascript, Data driven documents (D3.js), Interaction, filtering, aggregation, Perception, cognition, Designing visualizations (UI/UX), Text visualization, Graphs, Tabular data viz Music viz, Introduction to scientific visualization, Storytelling with data / data journalism, Creative coding.

DSC 552 Managing Business Processes with Information Systems & Analytics

This course provides students the key tools to analyze and improve business processes in organizations, with an emphasis on the service sector. This is achieved by bringing together key ideas from the fields of information systems, business analytics, and business process design and management. The course introduces the fundamental types of information systems, including enterprise-wide systems (ERP, SCM, CRM), and the basic principles of supporting business strategy with Information Systems. The students will learn how to use information systems to support their organization’s business processes, and how to use business analytics and business process modeling techniques to inform key decisions during Business Process Re-engineering. The students will be introduced to different business analytics systems in fields such as marketing, retail, supply-chain management, e-commerce, etc. and will learn how to measure business process performance through appropriate metrics and frameworks (e.g. the Balanced Scorecard approach).

DSC 553 Project Management using Analytical Tools

This course examines the project management process with a focus on business analytics techniques to overcome the pitfalls and obstacles that frequently occur during a typical project. Designed for business leaders responsible for implementing projects, as well as beginning and intermediate project managers. Includes topics on planning and scheduling issues, costing and budgeting, staffing and organizing, project management methodologies, and the use of data to inform the project manager’s decisions throughout the project’s lifecycle. During the course, computer software dealing with project management will also be presented.

DSC 554 Information Networks

Topics include: how to model the formation of social and economic networks; understand and measure certain patterns of real-world networks; identify, quantify and model how opinions, fads, political movements and diseases spread through interconnected systems and measure the robustness and fragility of them. We will bring together models and techniques from economics, sociology, math, physics, statistics and computer science to answer these questions.

In more detail the course will include: Repetition of Statistical Definitions, Background and Network Elements, Networking, Social Networking & Behavioral Contagion, Project Management Networks, Economic complexity, Visualization of Networks.

DSC 555 Quantitative and Qualitative Decision-Making

This course explores decision making and policy formulation in organizations. Includes goal setting and the planning process, rational models of decision making, effective combination of qualitative and quantitative data (e.g. triangulation, complementarity etc.) with respect to the goal set, evaluation of alternatives, prediction of outcomes, cost-benefit analysis, decision trees, uncertainty and risk assessment, and procedures for evaluation of outcomes.

DSC 556 Web Analytics for Business

The course explores web analytics, text mining, web mining, and practical application domains. The web analytics part of the course studies the metrics of websites, their content, user behavior, and reporting. The Google analytics tool is used for collection of website data and doing the analysis. The text mining module covers the analysis of text including content extraction, string matching, clustering, classification, and recommendation systems. The web mining module presents how web crawlers process and index the content of web sites, how search works, and how results are ranked. Application areas mining the social web and game metrics will be extensively investigated.

DSC 557 Data Mining for Business Analytics

Enterprises, organizations and individuals are creating, collecting, and using massive amount of structured and unstructured data with the goal to convert the information into knowledge, to improve the quality and the efficiency of their decision-making process, and to better position themselves to the highly competitive marketplace. Data mining is the process of finding, extracting, visualizing and reporting useful information and insights from both small and large datasets with the help of sophisticated data analysis methods. It is part of the business analytics, which refers to the process of leveraging different forms of analytical techniques to achieve desired business outcomes through requiring business relevancy, actionable insight, performance management, and value management. The students in this course will study the fundamental principles and techniques of data mining. They will learn how to apply advanced models and software applications for data mining. Finally, students will learn how to examine the overall business process of an organization or a project with the goal to understand (i) the business context where hidden internal and external value is to be identified and captured, and (ii) exactly what the selected data mining method does.

DSC 558 Financial Theory

The course presents the theory of financial decisions and corporate policy. It covers discounted cash flow and contemporary methods of capital budgeting (comparison of techniques, relevant cash flows, projects with different lives, optimal timing, constraints, inflation), risk and uncertainty, mean-variance portfolio choice, capital asset pricing models and arbitrage pricing theory, efficient markets, capital structure and dividend policy, basic option pricing, corporate restructuring and mergers and acquisitions.

DSC 559 Investments

The course covers the basic principles of investment analysis and valuation, with emphasis on security analysis and portfolio management in a risk-return framework. Security analysis focuses on whether an individual security is correctly valued in the market (i.e., it is the search for mispriced securities). Portfolio management deals with efficiently combining securities into a portfolio tailored to the investor’s preferences and monitoring/evaluating the portfolio. The course covers both the theory and practical aspects of investments.

Free Elective Courses

The free elective courses are offered by other entities of the University of Cyprus, e.g. Department of Law, Center of Entrepreneurship etc.

For example, the courses offered by the Center of Entrepreneurship can be found in the following link:

https://www.c4e.org.cy/activities/education-and-training/free-elective-courses

>>Capstone Project

The capstone project has been designed to apply knowledge into practice and to develop and improve critical skills such as problem-solving and collaboration skills. Students are matched with research labs within the UCY community and with industry partners to investigate pressing issues, applying data science areas. Capstone projects aim to give students some professional experience in a real work environment and help enhance their soft skills.

The process is the following:

  • A short description of projects are announced to students.
  • Students bid up to three projects taking into account the fields of their interest or research.
  • The data science directors make the final assignment of projects to students. The projects are under the supervision of a member of the Programme’s academic staff.
  • Specific learning outcomes are stipulated in a learning agreement between the student, the supervisor and the company.
  • The student keeps a log file of his/her work and at the end writes a progress report (6000 words).
  • The company is obliged to monitor the progress of the students and to provide relevant mentorship.

Final assessment is carried out by the company and the supervisor.

_Optional Research Project

Students that would like to continue their studies and be enrolled in a PhD programme, have the option of an additional research project. This option will be made available only to exceptional students who have clearly demonstrated research interest during their studies. The Interdepartmental Board of Directors has the discretion of offering this option to specific students, provided appropriate supervisors have been found. Specifically, the research project option will be available to students who want to pursue doctoral studies in any area covered by the programme. Such students need to satisfy the respective departmental Ph.D. admission criteria, while the research project will replace a restricted elective course.