What is Data Science?
Data is the core of all domains from material science to
healthcare. Mastering big data requires a set of skills spanning a
variety disciplines, from distributed systems to statistics to
machine learning. This course will provide an overview of the wide
area of data science, with a particular focus on to the tools
required to store, clean, manipulate, visualize, model, and
ultimately extract information from large amounts of data.
Syllabus
Course Calendar
Lecture Slides & Material
Panopto
Overview
1 Instructor = Lorenzo De Stefani 2 Instructor Office Hours = TBD 3 HTA Mailing List = cs1951aheadtas@lists.brown.edu 4 Lecture Dates = Mondays & Wednesdays 5 Lecture Time = 3:00 - 4:20 PM 6 Lecture Location = 85 Waterman Street, Room 130
Topics Covered
- Database Design and SQL
- Web Scraping & Data Cleaning
- Hypothesis Testing
- Machine Learning
- Mapreduce
- Differential Privacy
- Correlation vs Causation
Final Project
Throughout the entire course you will be working on a data science
project which seeks to answer an interesting and important
real-world question. You will be collecting your own data,
cleaning it, modeling it, visualizing it, and finally presenting
your results in a poster session at the end of the course. You
will work in groups of four, and will be assigned a mentor TA to
help you through the process.
Additionally, your project can be used as a capstone with just a
few extra requirements, fully integrating what you will have
learned in the course, and building a fully-functional data
science application.
Prerequisites
The formal prerequisites to this course are CSCI 0160, 0180, or 0190. Additional experience in software engineering is recommended, including CSCI 0320 or 1320. This course is taught in Python 3.7, but no prior experience is necessary. We will provide several resources to get students started with Python at the beginning of the course. It is suggested that students also have experience in statistics (APMA 1650 or CSCI 1450) and linear algebra (MATH 0520, MATH 0540, or CSCI 0530) for the statistics and machine learning portion of this course.