Banazîr the Jedi Hobbit (banazir) wrote,
Banazîr the Jedi Hobbit

  • Mood:
  • Music:

Bioinformatics summer course: 01 Jun - 15 Jun 2005

I haven't previously used this blog to announce courses per se, but seeing as folks are paying attention and time is getting short, I would like to bring to your attention the following:

The Kansas State University bioinformatics curriculum committee has been working on a couple of undergraduate-level courses on computational methods (particularly algorithms, numerical analysis, and visualization techniques) for biological applications. Some faculty in the College of Engineering have expressed interest in having such a course for the graduate students. The following is a 6-week course on basics of data mining for computational biology, to be offered in June and July. We would appreciate it if you would pass this along to any interested students (and if you are a k_state student reading this and are interested in registering, please send me e-mail at hsuwh[AT]

CIS 690 - Data Mining in Bioinformatics

Semester hours: 3
Reference number: 07560
Dates: Wed 01 Jun 2005 - Fri 15 Jul 2005
Time: MTWUF, 14:30 - 16:00
Format: 50 minutes lecture, 35 minutes lab daily
Venue: 236 Nichols Hall

Course Description: This 6-week course covers fundamentals of data modeling and mining with an emphasis on applications in computational biology, and
will be of interest to the undergraduate or graduate student in science, engineering, mathematics, or statistics who is seeking background in basic
data mining techniques. Topics to be covered include fundamentals of machine learning, pattern recognition, Bayesian methods, development and
application of relational databases, and visualization of data and clustering output. Programming background at the level of a first course in
computer science is required; no other background in mathematics, molecular biology or genetics is assumed. This course will emphasize analysis of
sequence data and gene expression data, but students with other interests in data mining are welcome to enroll and may select other project topics.


  • Required: first course in programming (CIS 200 or equivalent)

  • Recommended: first course in probability and statistics (STAT 510 or 410)

Grading: 20% midterm, 20% homeworks (2), 10% paper reviews, 50% project
Textbook: Data Mining (2000) by Witten and Frank


  • Data in bioinformatics (throughout course)

    • Microarrays

    • Sequence data: protein and nucleotide sequences

    • Expressed Sequence Tags (ESTs) and tag libraries

    • Sources of data: GenBank, PDB/SwissProt, Stanford Microarray Database; PubMed

  • Problems in computational biology (2 lectures intro; throughout course)

    • Modelling gene networks and pathways

    • Biochemical pathways and signal transduction

    • Protein-protein interactions

    • Protein secondary and tertiary fold prediction

    • Phylogenetic modeling

  • Fundamentals of machine learning (1 week)

    • Supervised inductive learning algorithms: a priori (association rules), decision trees, Naive Bayes

    • Relevance determination and feature selection

  • Supervised machine learning algorithms for bioinformatics (1.5 weeks)

    • Sequence learning: hidden Markov models (HMMs)

    • Bayesian networks: structure learning and parameter estimation

    • Kernel methods: maximum margin and support vector machines

    • Minimum description length (MDL) methods

  • Clustering (1 week)

    • k-means clustering

    • Hierarchical agglomerative clustering

    • Biclustering approaches

    • Advanced clustering methods: PCA/ICA, Kohonen's SOM

    • Applications to bioinformatics: clustering microarray data

  • Relational data mining (1.5 weeks)

    • fundamentals of relational databases

    • Structured Query Language: SELECT, JOIN, PROJECT

    • OLAP

    • database organization: star and constellation

    • probabilistic relational models (PRMs)

    • text mining fundamentals

    • data modeling in bioinformatics

  • Visualization (3 lectures)

    • Data and information visualization: scatterplots, evidence visualization

    • Output: Naive Bayes, decision trees and graphs; clusters and clustering trees

    • Survey of 3-D modelling

If you are interested in taking this course yourself or know someone qualified and interested, please do spread the word and let them know especially that the emphasis is on bioinformatics (a departure from previous offerings).


  • Old and Die for the Thief Is Also

    ... that's what Google Translate produces for 老而不死是為賊也. The Chinese sentence is a Confucian proverb: "To be old and not die is to be a thief as…

  • The Five Departments We Need the Most, Number 1: Linguistics

    Who needs it: People in speech communications, linguistic anthropology, modern languages, psycholinguistics, computational linguistics, and many…

  • The name of the wose

    State of the courses: I'm hearing some positive (second-hand) feedback about the organization and content of my AI and database courses this fall,…

  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.