I've been thinking - not just this morning, nor only this past week, but really this whole semester - about what we need to get some foundations of mathematics and theoretical CS percolating into the heads of our undergrads again. As usual, YMMV, this is only IMNSHO, and expect no mollycoddling.
What makes a computer scientist, anyway, as opposed to a computer engineer, or a software engineer? Are these things that belong in an arts and sciences CS curriculum and not an engineering one? I maintain that the math that CS students should have, in order to be properly equipped to comprehend a very large range of additional concepts, is generally a superset of the engineer curriculum. This is a transferrable, transmissible knowledge.
Here are some suggested topics for a partial list of "essentials". By this I don't mean that every topic has to be covered, but rather that if a CS curriculum is going to cover, say, algorithms, it would be reasonable to expect that people coming out of it would know something about graph algorithms or be able to pick it up. So, "sweepline algorithms for computational geometry" is certainly not "essential", but understanding how to do lexicographic sort and being able to grasp the use of priority queues for sweepline algorithms is essential.
- Relations and functions [***]: One should not be able to get a Bachelor of Science degree in Computer Science without being able to define a:
- relation - subset of the Cartesian product of two or more sets
- function - binary relation denoting a mapping from a preimage/domain to an image/range, such that no element of the domain maps to more than one element of the range
I took a straw poll recently in my data mining and machine learning courses, and the hit rate was about half and half even with some prompting. Articulating the most basic, fundamental concepts of math should not be so hard.
- Bijection [**]: The concepts of "one-to-one and onto mapping", correspondence, and perfect invertibility are essential to many things. Among these are proofs of the cardinality (especially of an infinite set), soundness and completeness of inferential systems, and derivations of mappings.
- Tail vs. linear vs. tree recursion [**]: The most mileage to come out of CS125 (Introduction to Computer Science), UIUC's counterpart to our CIS 200, was having students understand the foundational concepts of recursion before they even touched iteration, and then relating iteration to it. In 1996, when I was a graduate TA for that course, and it was still taught using Scheme, the 11 homeworks were: variables, recursion, environment and mutable state (assignment), lists, functionals (
foldr), iteration, arrays, abstract data types for data encapsulation, a two-part project on object-oriented programming (write a simple line editor), and OO design (iterator patterns). There were 13 labs covering the same material.
One week between homeworks 4 and 5, I taught a help session covering linear and tail recursion. I used the definition of n! and the linear and tail recursive implementations of
factorialas a starting example, and then showed the students how to calculate
fibonacci(n)using tree and tail recursion. And they got it. At least, some 250 of 375 first-years in the course got it. And yes, 125 dropped the course and the major. Harsh, but better than slow attrition.
Recursion is fundamental. Teach it early when students have the requisite neural plasticity, and when they are just getting inductive proof technique. Give them both barrels and they will fish for a lifetime. Or something like that.
- Basic number theory: base and radix arithmetic [**]: Radix sort as an extension of bucket sort is one of the most beautiful concepts in CS: simple yet elegant, like most good ideas in CS. It's a joy not only to learn about such things but to pass them on and see people's faces brighten as the light of knowledge enters their minds. Hence, it is a little disappointing when I mention radix sort in a class and get blank looks. "Dutch national flag? Anybody? No?" I think that whether elementary number theory is required of our students, they should be able to define "ceiling", "floor", "modulus", and the mathematical meaning and significance of some simple integer operations. A good student should be able to derive radix sort with a little help; an excellent one, with none, if he or she understands bucket sort.
- Diagonalization: Another elegant and useful concept in mathematics is the use of diagonalization in order to prove by contradiction that one set has greater cardinality than another. Cantor's diagonal argument can be generalized to prove Cantor's theorem (that |S| < | 2S |, i.e., any set S has cardinality smaller than its power set) and analogously demonstrate Russell's paradox, that the "set of all sets that do not contain themselves" is contradictory in definition under naive set theory. Diagonal arguments underlie fundamental theorems of decidability (Godel's Incompleteness Theorem) and computability. The latter is Turing's proof that Ld, the diagonal language, is not recursive enumerable) and that, by reduction, the complement of the halting language is also not RE. Thus, the Halting Problem is semidecidable (the language LH of universal Turing machines M and inputs x such that M halts and accepts x is recursive enumerable but not recursive), as is first-order logic (FOL) validity, aka Hilbert's 10th problem or the Entscheidungsproblem ("decision problem").
Diagonalization and Godel and Turing numberings: two great tastes that taste great together, and form a foundation for all of computability theory, not to mention the four families of mathematical solutions to Russell's paradox: the theory of types, formalism, intuitionism, and Zermelo-Franckel set theory.
- Unifying theory of state machines: Mealy & Moore models, automata, HMMs!
- Basic graph theory
- Pattern matching
Discrete Math and Set Theory
- LOGIC of sets!!!
- Venn diagrams and syllogism, quantification!
- Equivalence classes!
- Set difference
- Strong vs. weak induction, structural induction!!!
- Examples of well-founded orderings: number systems, structured orderings (lattices and semilattices)
Data Structures and Algorithms
- Divide and conquer: INTUITION behind recurrences and master theorem!!
- Clever constructive ideas: satellite data, bucket sort, lexicographic sort!!
- Sorting: array operations, iterator patterns!
- Dynamic programming!
- Proving properties of greedy algorithms
- "Advanced" topics in data structures: heaps, disjoint set (union-find) data structures, non-master theorem analyses (e.g., path compression in disjoint set data structres)
- Maximum-weight spanning tree
- Ratio bound
- Applied data structures: sweepline algorithms, basic computational geometry, etc.
- CONSTRUCTIVE PROOF!!!
- Closure properties!!
- Pumping lemma
- Decision problems!
- UTMs, decidability
Software and Programming Languages
- Metacircular evaluation!!
- Message-passing architecture!!
- Implementing OO interpretation!
- Generic polymorphism!
- Type variable polymorphism!
The $64 questions:
- How can course-based coverage of the fundamentals be enforced? Can we tighten our prerequisite structure?
- How much of this is psychological or cultural aversion to courses numbered 500 and higher, which could be self-reinforcing?
- What have I missed in the above lists as universal building blocks of a serious CS curriculum?
Edit, 11:10 CST Sun 23 Oct 2005 - altamira16 made a comment that reminds me that I should have mentioned that we have course numbers beginning with 2, 3/4, 5, and 6/7 for CS majors instead of 1-4. Anyone at K-State CIS: please correct me if I am wrong in my understanding of any course below.
- 100: strictly service courses for nonmajors
- 200: intro CS (currently there is only CIS 200, the CS1 course; intro programming courses have been lifted to 300 level)
- 300: foundational courses - 300 is Data Structures, 301 is Logic for CS (really rudiments of propositional logic, FOL, proof theory, and proof rules for imperative programs), 308 is an Intermediate Programming in C/C++ now required between 300 and 501, 350 is a Hardware/Logic course for nonmajors
- 400: more foundational courses - 450 is an Architecture course for majors, 462 is the Computers and Society course (Computing Ethics and some survey of the ramifications and impact of computing), and there is at least one course just for Information Systems (IS) majors.
- 500: "upper-division" undergrad courses, but really almost everything substantial beyond the freshman-sophomore level - 501 is Software Architecture, 505 is the first course in Programming Languages, 520 is Operating Systems, 525 is Computer Networks, 540/541 are Software Engineering 1 & 2, 570 is Automata Theory and Formal Languages, 575 is Algorithms I. Graduate students taking these courses to fulfill a "deficiency" (prerequisite or remedial) requirement do not receive graduate credit
- 600: technical electives for undergrads; grads are permitted to enroll for credit
- 700: technical electives for grads; undergrads are permitted to enroll
- Discrete Math and Set Theory: Discrete mathematics is offered by the Math department as MATH 510. We do not have a course on set theory and discrete structures at all, and though CIS 301 is a good course, the counting and set theory component falls by the wayside for lack of time. This is, IMO, the single aspect of our entire curriculum that most needs improvement - though as you'll see in a later post, I have an idea for a partial fix.
- "600" Considered Harmful: IMO, the 600/700 distinction is harmful. Undergraduates should not think that because they may or may not be bound for graduate school that they "don't need advanced concepts". Our graduate grades are inflated to Georgia Tech levels1, so that while grad students could kick undergrad ass all week long and twice on Sunday at my undergrad university (Hopkins) and my graduate university (Illinois), here they are comparable. The weakest grads are weaker than the average undergrads, and the strongest undergrads are as good as the strongest grads, better than the average grad. Why not challenge them? Those who have trouble in our 700-level classes, I have found, are almost always missing something fundamental from the "underclass" (300/400/500) level. Numbering a course 600 and mollifying the curve to ludicrous levels isn't going to fix this!
- Advanced Comp Models and Algorithms II, for the sweet love of God: The course in computability theory and advanced computational models (covering NP-completeness and decidability) is CIS 770. Like smaller university CS departments, we don't have an undergraduate version of Algorithms II: CIS 775.
- Whatever Happened to Compilers? There is no undergraduate-level compilers course. Few undergrads take CIS 706, Translator Design.
- Numerical Dead Ends: We require a lot of numerical computing (good), but it's spread out (bad), short on linear algebra (bad), and does not feed anything at the higher levels (bad). MATH 551 (Matrix Theory) and MATH 655 (Numerical Analysis) are required for our undergrads. I think this might have been a wonderful thing if we had undergrads that go on to do numerical computing but as it is, I think it's a dysfunction. We should require at least one of Linear Algebra, Calc III, or Differential Equations (all 200-level "non-gut/pud" courses), leave one elective slot for MATH 655, and prepare our students to understand MATH 551 material coming out of Linear Algebra (the only "500-level" things in that course that are not in Linear Algebra are some orthogonalization and triangulation methods that could be covered in about 3-4 hours of class; more advanced decompositions, cubic curve interpolation, and transformations are covered in MATH 655)
- Database Systems and Theory: CIS 560, Database Systems, is a required course that I will teach next spring. In principle, a DB theory course is not a bad thing, but given that we teach DBMS and a small amount of theory now that the DB theory professor (Maria Zamfir-Bleyberg) has retired, I am ambivalent about it. I think students would revolt if I went 100% theory with the course, and perhaps rightly so; but I also don't agree entirely with the current structure of the course, which IMO emphasizes practical aspects that could be done on a smaller scale in projects and homework rather than using all of the available class time.
1 GA Tech grads are very good. One of my undergrad classmates, Phyllis Schneck, went there for her Ph.D., and one of my colleagues here, Todd Easton in IMSE, got his Ph.D. there around the same time. That said, their undergrads are also excellent, and can compete on a level basis. In a few departments, the undergrads are actually consider better than the grads, or were when my dad was at Emory in the early 1970s and his friends were at GA Tech. This is not so at a lot of top 50, even top 100, "Research One" institutions.
Having just gone through accreditation, I was suprised how much the process now emphasises form (assessment and objective formalization) over function (transferrability of technical content or fundamental methodology of problem solving).