CS294-28 Internet/Network Security Projects

General information

Your term project should address a research issue in network security, interpreted broadly (it need not be a topic discussed in class). The goal in terms of depth and quality is to develop the effort to a degree that at least would merit a workshop-caliber publication. Most projects will fall into one of the following general categories:

  1. Analyze. Undertake a substantive analysis/assessment of security issues for a given network system. For example, to what degree does Skype expose its users to remote compromise? Preserve their privacy? Admit misuse of the system to aid in denial-of-service attacks? What is its trust model? What steps could be taken to strength Skype in this regard? What can you say about the expected efficacy of those steps? (Note: it needn't be an application nor involve end systems; you can consider schemes relevant to other layers of the networking stack, or that concern infrastructure/internal components.)
  2. Measure. Empirically explore and characterize a network security issue. For example, under what circumstances and to what degree do nodes in the Tor anonymizing network alter the content that passes through them?
  3. Innovate. Devise and analyze (or possibly implement) a new mechanism or technique. For example, this could be a new way to protect servers from application-level denial-of-service attacks, or a new detector for some type of malicious activity.
  4. Test. Take a result in the literature and undertake a thoughtful and meaningful reproduction of it to assess to what degree you obtain the same results, and why.
  5. Attack. Develop a new threat. Assess its efficacy, countermeasures/defenses, and likely "arms race" evolution.
  6. Research. Conduct a deep, thoughtful literature survey of a particular area in network security ("research" as a verb). Assess the strengths and weaknesses of the published results in the area, delimit the boundaries of the state of the art, identify themes and abstractions, frame avenues for future work.

I encourage you to find a topic of interest to you; feel free to be creative in selecting a project topic. You're welcome to pick a topic that is connected to your current research, and I'm happy to discuss possible topics with you in advance. See below for a list of some possible ideas (just meant as grist). Often you can pursue the same project jointly for two different classes. If this would be the case, you need to discuss it first both with me and with the other instructor(s).

Preferably you should work in a team of two, though individual projects are okay too. Team projects will be held to a somewhat higher standard. If you want to work in a team larger than two, first talk with me about why this is appropriate and how the work will be divided.

The process

(Note that the following dates are at this point tentative.)

  1. Write a concise (approximately 1 page) project proposal that clearly states the problem you will be tackling, the key challenges for new research, and your plan of attack (including milestones and dates). If there are any special resources you might need, flag these. Mention any relevant papers of which you are already aware.

    The project proposal is due the evening of Friday Sep 17.

  2. As part of turning in the project proposal, schedule a meeting with me to discuss your idea.
  3. Put together a related work writeup. This writeup should reflect a solid grounding in the literature relevant for your project, written in a style similar to the related work sections in the papers we've been reading. For each item of previous related work, briefly discuss the contributions of the paper, its relevance to your undertaking, and (if appropriate) in what ways it differs from your effort.

    In general, you can tell if your related work framing is possibly too narrow is by looking at the citations of those papers you currently discuss. If you see that they cite tons more work that at least from their titles sounds like they could be germane, then it's your task as a researcher to then track those down - ideally, all of the ones that sound like they could be relevant - and assess which ones you indeed need to read and absorb. Note, read-and-absorb here can run the range from reading in detail, similar to how you read papers for the class, to just reading sections or such, as you gauge relevance.

    You then recurse on the citations in those papers, repeating the process until you converge by not finding any new papers, and/ or the ones you find become only lightly related.

    At this point, you've then mastered the full literature on the area you're working in (and usually gotten a bunch of new ideas about what to try or, often more important, not try).

    When gathering these related papers, you may run across some that require payment through portals such as those run by ACM or IEEE. Note that UCB has site licenses for most of these libraries, so you should be able to readily fetch them using a campus machine/address without needing to provide payment.

    The related work writeup is due the evening of Friday Oct 15.

  4. Write up a short status report explaining what work you have completed, what remains, and any open issues (such as problems you haven't figured out how to solve or additional resources you require). Begin your report with a sketch of your project so I'm reminded of the context while reading it.

    The status report writeup is due the evening of Monday Nov 8.

  5. As part of turning in the status report writeup, also schedule a meeting with me to discuss your report.
  6. Prepare a class presentation. These will be on Wed Dec 1, Fri Dec 3, and Mon Dec 6 (note special time). 24 hours prior to the class in which you'll be presenting, mail out a brief (~1-2 paragraphs) description of your project to the class mailing list.

    There's an art to scoping a presentation to effectively make use of the available time. You need to gauge what context your particular audience (here, this means your classmates) already has regarding the problem space your work addresses, and not spend time developing that broader context; at most, just remind them. However, it will (better!) be the case that your particular area has depth beyond what the average audience member knows about. You do need to frame this additional context, both in terms of what makes the problem interesting and significant, and how the problem space has been previously viewed in terms of prior work and the assumptions this work reflects.

    Note: depending on class size, it's possible that instead of presentations we will have a poster session. I will determine which of these well in advance of the presentation dates.

  7. Finally, your project report is due on Monday Dec 13, at 1PM. No extensions will be granted.

The final report

You are expected to write a technical paper, in the style of a conference submission, on the research you have done. State the problem you're addressing, motivate why it is an important or interesting problem, present your research thoroughly and clearly, compare to any related work that may exist, summarize your research contributions, and draw whatever conclusions may be appropriate. There is no page limit (either minimum or maximum), but reports will be evaluated on technical content and not on length.

Here are some pointers regarding writing technical papers:

Please submit either HTML or PDF, via email attachment. I generally review papers from hardcopy, so it needs to print clearly and with sufficiently large text and figures. If you use color figures, mention that in your cover note so I can send it to an appropriate printer.

Some possible ideas

Here are a number of project ideas, some fairly specific and others more general. They are meant to stimulate your thinking and you don't have to select one. Some of them have particular considerations noted in italics.

  1. To what degree can you assess the accuracy of blacklist feeds (bad IP addresses, URLs, domains, or such)? How effective/evadable are they? To what degree are different feeds redundant?
  2. Spammers have been found to sometimes hijack BGP address blocks in order to briefly send from someone else's address space. How prevalent is this activity today?
  3. Work out an architecture for providing the Internet (or a future version of it) with solid attribution properties, while also preserving privacy when not in conflict with legal requirements.
  4. How well can you detect web-based attacks using network monitoring? Build detectors for attacks such as XSS, CSRF, or SSL stripping and implement them for the real-time Bro system developed by my research group. Assess detector efficacy in terms of false positives and false negatives. As the detectors mature, I can provide results of running them against large, live traffic streams.
  5. How serious is the problem of blog spam? What might be done to detect it?
  6. Is robots.txt actually honored? It seems it can't be, as it would otherwise provide a very easy way for malicious web sites to avoid inspection from folks like Google or Bing. If it isn't honored, can you still identify crawlers?
  7. To what extent can you fingerprint individual users by the timing of their typing/packets during interactive network sessions (such as logging in to a remote site via SSH)? This would be a continuation of a project begun by a previous student. I can provide an extensive dataset of hundreds of users typing over Telnet sessions, for which ground truth is available.
  8. Analyze the UCSD/CAIDA "backscatter" data to characterize how often a DDoS attack results in ISP's removing connectivity, based on a change from observing RSTs/SYN-ACKs to ICMP Unreachables. How long does it take ISP's to "pull the plug"? Along with the backscatter data, UCSD also has a trace of a DDoS attack as seen by the target, which could be analyzed instead or in addition.
  9. Flushing out illicit snooping: if you mention a URL in a supposedly private context (such as an anonymous Tor circuit, or in email sent via GMail, or an IRC chat), does one of the parties facilitating the communication (e.g., Tor exit node; Google; the IRC server operator) ever investigate the mention? This project has some risk of producing only negative results. However, a positive result would make a big splash. Thus, it would behoove one to make some up-front measurements to assess viability. Also, this project might be pursued working in collaboration with Prof. Stefan Savage of UCSD.
  10. When monitoring a site's access link, usually you expect to only see outgoing DNS requests from the site's internal name servers, which the site's hosts are supposed to use. If you see a lookup coming directly from an internal host, it may reflect malware that has reconfigured a system to use an external resolver ... or it might just reflect a misconfiguration. How can we determine if it reflects a problem? This would be a continuation of a project my group is pursuing, where the notion is to leverage a large list of open resolvers to determine whether results returned for such lookups likely reflect localization, or malice.
  11. To what degree can DNS registry information (e.g., "whois" records) be used to infer how dangerous a given address/domain is likely to be? This project would be in collaboration with a postdoc working in my research group, and a continuation of an existing effort.
  12. Study the phenomenon of email spam that attempts to recruit "mules" for laundering money and/or fraudulently purchased products. What can you determine about the different recruiters and the patterns of interest evinced in the recruiting messages? This would be working with a CS294-28 student, Albert Kim, who has already pursued groundwork for this project with my research group. One angle you could pursue here would be to use natural language technology to build a system that can construct numerous replies to spam emails such that they appear to be be from different individuals.
  13. How has network scanning evolved over the past 15 years? What about use of services, and to what degree can service "flux" be used to spot malicious services (such as newly installed backdoors)? For this project I can provide mediated access to a very large longitudinal dataset of connections seen at the Lawrence Berkeley National Laboratory.
  14. Design a Javascript rewriter - an in-path network element or a browser prefilter that modifies Javascript transferred in Web items. Evaluate the degree of protection it can provide to browsers versus semantic distortions it introduces. Members of my research group are interested in collaborating on this project.
  15. Securing "mediated" trace analysis: a major problem in network security research is obtaining access to realistic traffic traces. One paradigm for enabling such access is via "mediation", i.e., the researcher sends their analysis program to a data-holder, who runs it on behalf of the researcher and returns the results. How can we secure this process so that the data-holder can be confident that the results do not leak sensitive information? This would be a continuation of a research project my group has pursued, which grew out of a previous class project. There's a HotNets 2009 paper outlining how far we got.
  16. Construct a web "backtrace" to determine in a traffic trace when a user arrives at a malicious URL, how they got there. Some of this is straight-forward (recording of Referer chains); where it may get more difficult is stealthier redirection mechanisms.
  17. Explore the "traffic delivery business" where you can purchase "eyeballs" to visit your Web site. Do the sellers of this service actually deliver on increased visits? Where do the visitors come from? Are they humans or bots? How did the seller spur them to visit? This project has already been partially undertaken by a student at UCSD.
  18. Build a detector for traffic injection (e.g., DNS or ARP spoofing) and run it as widely as you can. What do you find? This project contains some risk, namely you may wind up with a completely negative result - no evidence of injection.
  19. A significant problem in network security monitoring is grappling with the large number of application protocols. Simply understanding their workings is currently a lengthy manual process. Employ forms of network protocol inference or binary execution analysis to automate elements of extracting the workings of unknown protocols. If done using network monitoring, this project would be in conjunction with members of my research group. If done using binary execution analysis, then with Prof. Dawn Song's group.