CS 294-105: Homework #3 - Due 4PM Wed Nov 19


Complete this assignment by by 4PM Wednesday Nov 19. Turn in your answers via email to vern@cs.berkeley.edu with the term Homework in the Subject.

This assignment is meant to lay the ground work for some class discussion about analysis techniques. It is fine for you to spend only a modest amount of time on it. Those of you either (1) presenting on Nov 20, or (2) putting together draft materials to post on Piazza on Nov 18 for your presentation the following week, can skip this assignment if you wish, given that I didn't provide much advance notice that it would be forthcoming. (Please send me a note if you're doing this so I track this correctly.)

  1. Download the Tweet timings dataset. This dataset consists of the timestamps of tweets sent by about 500 Twitter users, organized per-user.

  2. Briefly characterize the dataset and assess quality issues.

  3. The analysis question we would like to go after with this dataset is: To what degree can we confidently state whether some of the accounts reflect automated posting, rather than a user tweeting their thoughts as they compose them? (This question arose in the broader context of how much of Twitter's population is in fact robots rather than humans.)

    Briefly write up your thoughts on what sort of analyses you could undertake to assess this question. (Feel free to scope the effort as you see fit.) Include discussion of potential difficulties you foresee, and what other data/meta-data you wish the dataset included, if any. Be prepared to talk about your approach at the Nov. 20 class meeting.

  4. Optional: Present results (as far as you wish to pursue them) and conclusions developed by your analyses.
(As always, I welcome feedback on this assignment.)