CS 294-105: Empirical Analysis

Fall 2014

Instructor:

Vern Paxson, office hours Mon 3:30PM-4:30PM (737 Soda) and by appointment.

Class Meetings:

Tuesday 3:30-5PM in 320 Soda.

Backup slot: please also hold Thursdays 5-6:30PM. See the schedule below for currently identified such dates.

Addresses:

Web page: http://www.icir.org/vern/cs294-105/
Announcements, questions: the class Piazza site, which you sign up for here.
Feel free to email any questions/comments you want to make privately to the instructor at vern@berkeley.edu.

Course Description:

Summary: This class explores techniques and considerations for conducting computer science research rooted in empirical observation. Topics include measurement methodology, meta-data, use of external datasets, assessing data quality, calibration, sampling, statistical summaries, visualization techniques, goodness-of-fit, hypothesis testing, periodicities, non-stationarities, change-points, structuring the analysis process, and presentation of results. Ultimately, the goal is to foster analysis of empirical data that is both sound and illuminating.

Prerequisites: Basic probability/statistics; basic familiarity with networking helpful; active in research; complete the student survey.
Auditors and undergraduates require instructor approval.

Expectations/Grading: Students will "bring their own data" for exploration (via presentation) during class meetings. Ideally, this will come from their current or previous empirical research efforts, but if not, students can instead (or in addition) select empirical studies from the literature or publicly available datasets for presentation and analysis. The number of presentations will depend on the class size, though will not be more than 2 or 3. The course will also include occasional reading and/or data analysis assignments.

1.5 hours of lecture per week. 1-2 units. Grading for 1 unit of credit is based on class presentations, participation/engagement, and writeups for the occasional assignments. For 2 units, students also submit a substantial report (due 2PM Mon Dec 15) detailing a research effort that includes significant empirical analysis, which is graded on the correctness/thoroughness/clarity of the data presentation, and the soundness of the analysis.

Note: this is the first offering of the course and as such its structure is experimental and subject to change. Feedback on what to improve (and/or what's going well) appreciated and highly helpful!

Students potentially interested in this course might also want to consider the INFO 290: Exploratory Data Analysis seminar, INFO 271B: Quantitative Research Methods for Information Systems and Management, CS 294-103: Mathematical Foundations of Data Science, and/or CS 194-16: Introduction to Data Science.


Assignments:

Homework #1: due 9PM Wed Sep 3.

Homework #2: due 5PM Mon Sep 15.

Class Presentation: topic and preferred dates due Fri Sep 26.

Homework #3: due 4PM Wed Nov 19.


Schedule:


Tuesday meetings are in 320 Soda.
The location for Thursday meetings are as noted in the schedule.


Date Room Topic Notes
Thu 8/28 320 Soda Organizational Meeting
Thu 9/4 320 Soda Data Characterization: Keystrokes Slides
Tue 9/11 No lecture
Tue 9/16 Data Characterization: Keystrokes, con't Slides
Tue 9/23 Data Quality: Route Measurements Slides
Tue 9/30 Data Quality: Route Measurements, con't Slides
Tue 10/7 Shankari (bicycle usage): Pablo (sensor indications of stress) Shankari's slides; Pablo's slides
Tue 10/14 Brad (malware assessments); Jethro (Heartbleed) Brad's slides; Jethro's slides
Thu 10/23 380 Soda Paul (click fraud) (Contact Paul for slides.)
Tue 10/28 Joao (Amazon reviews); Frank (Twitter compromise) Joao's slides; Frank's slides
Tue 11/4 No lecture
Thu 11/13 606 Soda Kristin (edX and CS169); Zack (edX and CS169) Kristin's slides; Zack's slides
Thu 11/20 531 Cory Neeraja (Hadoop workloads)
Lecture on tweet automation analysis
(Contact Neeraja for her slides.)
Lecture slides
Tue 11/25 Peter (loan performance)
Lecture on tweet automation analysis, con't
Peter's slides. Corrected slides.
Lecture slides
Tue 12/2 Allon (drug mechanisms); Sara (meta data exploration) (Contact Allon for his slides.)
Sara's slides
Tue 12/9 Sarah (large-scale scraping); Kaifei (Wifi localization) Sarah's slides; Kaifei's slides
Thu 12/11 320 Soda Mangpo (edX demographics); Grant (ad injection ecosystem)
Thu 12/11 HKN evaluations (last 10 minutes)