Course objective:
Topics include probability theory, parameter estimation, hypothesis testing, genomics
and gene regulatory networks. Build modeling and analysis capability for
analyzing genomic data. Analysis of molecular and cellular processes across a
hierarchy of scales, including genetic, molecular and cellular levels. Exposure
to currently emerging research areas of systems biology.
Prerequisites: CS101, Math231, MCB252
Text books:
Reference:
Software:
Grading policies:
Homework 15% quiz 15% Midterm 30% Final 40%
All homework is due at the beginning of class on the designated day. No late homework will be accepted.
You are permitted to discuss the general aspects of the course materials and assignments with your classmates. But the homework must be your individual effort. You are encouraged to consult other sources beyond the textbooks and the outside sources must be documented when you use them. Grading is based on sample assignments.
Request for regrading homework and exam has to be submitted within 48 hours after the work is returned in class. A written explanation is necessary for such a request.
Exams
Class attendance is important. Those who attend class learn more. Quiz will be given in class with early notification.
Course content
Introduction to basic concepts that underlie important applications of probability and statistics to the analysis of genomic data and biomolecular networks:
· Introduction to statistics (Montgomery Chapter 1)
· Probability theory
· Introduction to probability (Montgomery Chapter 2)
· Discrete random variables (Montgomery Chapter 3)
· Continuous random variables (Montgomery Chapter 4)
· Two or more random variables (Montgomery Chapter 5)
· Descriptive statistics (Montgomery Chapter 6)
· Parameter estimation
· Point estimation (Montgomery Chapter 7)
· Hypothesis testing
· One sample hypothesis testing (Montgomery Chapter 9)
· Two sample hypothesis testing (Montgomery Chapter 10)
· Simple linear regression (Montgomery Chapter 11)
· Introduction to genome projects: Organization, objectives and technology (Gibson Chapter 1)
· Mapping genomes: genetic maps, physical maps, comparative genomics
· The Human Genome Project
· Animal Genome Projects
· Plant Genome Projects / food and bio-energy
· Gene Expression (Gibson Chapter 4)
· Parallel Analysis of Gene Expression: Microarrays
· Microarray Image Processing
· Visualization
· Cancer Transcriptomics (as an application of two-sample hypothesis testing)
· Integrative genomics and biomolecular networks (Gibson Chapter 6)
· Signal transduction
· Predicting transcription factor binding sites (reinforcing the concepts of multinomial distribution, independence and likelihood).
· Gene regulatory network (as an application of regression)
Science Breakthrough of the year 2005-2008
Tenetative schedule
01/20
Lecture. Notes for using R: Note1,
Note2, Note3. Hw assignment: explore
R platform. Read
01/22 Lecture.
01/27 Lecture. HW1: Montegomery book: 2-72, 2-74, 2-82, 2-85, 2-89, 2-94, 2-112, 2-119, 2-120, Execute the R scripts in the notes 2.1, 2.2 and attach the R output tables/figures from these execution.
01/29 Lecture. Notes for using R: 2.1, 2.2
02/03 Lecture
02/05 Lecture.
02/10 HW1 due. Lecture
02/12 Lecture
02/17 Quiz. Lecture
02/19 No class.
02/24 Lecture
02/26 Lecture. HW2: Montegomery book : 3-76,
3-100, 4-51, 5-14
03/03 Lecture
03/05 HW2 due. Lecture
03/10 No lecture. In class Q&A session.
03/12 Midterm. exam.
03/17 Lecture
03/19 Lecture
03/24 Spring break
03/26 Spring break
03/30 Lecture
04/02 Lecture
04/07 Lecture
04/09 Quiz. Lecture
04/14 Lecture
04/16 Lecture
04/21 Quiz. Lecture
04/23 Lecture.
HW3:
In one study, Lin et al (Nature Biotechnology 24(12): 6-7) measured gene expression data in five colorectal adenocarcinomas and matched normal colonic tissues. Download the dataset from http://genomics.bioen.uiuc.edu/bioe598/data/colon-cancer.xls.
Perform a T-test between the cancer and the normal samples and identify genes that are either up or down-regulated in colon cancer. Specify your null hypothesis and alternative hypothesis. Write out the form of the test statistic. With p-value cutoff of 0.0001, how many genes do you identify? Order the identified genes by p-values. Select one or two genes from the ones that you have identified. Give its (their) p-value(s) and rank(s) of the p-value(s). Search related literature and comment on why it (they) may have been up or down regulated in cancer tissues.
Tips:
1. T-tests can be performed with Microsoft Excel, see http://www.wfu.edu/~massd2/T_test.htm for details.
2. If you are tired of looking through PUBMED, OMIM can be a good resource to help you to interpret some genes.
04/28 Reserved for invited talk
04/30 No lecture. In class Q&A session.
05/05 HW3 due. Final exam.