BIOE505: Computational Bioengineering

Course objective:

Presents mathematical and statistical models together with their accompanying computational techniques that are central to many aspects of systems biology and bioengineering research. Topics include: theory of supervised and unsupervised learning; linear models; dimension reduction; Monte Carlo computation; analysis of gene expression data and genome sequence data; modeling of gene transcription network signaling pathways.

Text book:

Bioinformatics. Springer 2007

Schedule

กก

Logistics:

Meeting Time: Fall 2009, 9:00am-10:50am, Tue Thur

Meeting place: 3211 Digital Computer Lab

Credits: 4 graduate hours. Required for all bioengineering PhDs.

Course Reference number: CRN 54270 

Instructor: Sheng Zhong (szhong AT uiuc DOT edu)

กก

Prerequisites: STAT400 or equivalent

กก

Evaluation:

Course grade is based on homework (50%), in class presentation (25%) and final project (25%).

กก

Contents:

I. (4 hrs) Overview of recent technology developments & large scale measurements of biological data                    

II. (9 hrs) Fundamentals of probability and statistics

          a) Set theory                                                                            

          b) Independence, conditional probabilities and Bayes' rules          

          c) Random variables                                                                 

          d) Expectation and moments                                                      

          e) Discrete distributions: Binomial, Geometric, Multinomial 

          f) Continuous distributions: Normal, Exponential                         

g) Case study: Modeling DNA motif with product-multinomial distribution                                                                                   

 

III. (9 hrs) Parameter estimation & Expectation-Maximization method

a) Likelihood maximization                                                      

b) EM algorithm: overview                                                                    

c) EM Recursions and error analysis                                         

d) Case study: Identification of protein-DNA interaction sites        

 

IV. (9 hrs) Clustering analysis              

a)     Hierarchical clustering                                                         

b)     K-means clustering                                                              

c)     Initialization and convergence                                     

d)     Visualization                                                                       

e)     Case study: Identification of co-expressed genes                     

 

V. (6 hrs) Statistical tests

          a) The idea: a coin example                                                       

          b) Parametric and non-parametric tests                                       

          d) Case study: Detecting differentially expressed genes     

 

VI. (9 hrs) Markov chains

          a) Transition probability and state transition graph                  

          b) Time evolution of probability distributions of states                

c) Classification of states: persistent, transient & periodic states    

d) Stationary distribution                                                           

e) Case study: modeling genome sequence with a Markov chain   

 

VII. (4 hrs) Markov Chain Monte Carlo (MCMC) methods

        a) Metropolis-Hastings                                                    

        b) Simulated Annealing