The History and Popularity of the C Programming Language
A short essay about the origins of the C programming language and why it became so popular so quickly.
The C programming language and its direct descendants are by far the most popular programming languages used in the world today. Most competent programmers know how to use C, and usually C is a programmer's language of choice. This structurally tiny language originated at Bell Labs in order to write the first Unix operating system for a DEC PDP-7 computer with only 8K bytes of memory. In spite of its small beginnings, C has scaled up to run on powerful super-computers with gigabytes of memory. The language is extremely portable; it is possible to write a program in C for just about every platform in existence. Yet, if one takes a more in depth look at C, one realizes that it is rather weak and has an extremely small vocabulary. How did C become so popular despite its deficiencies? This paper will explore the history of the C programming language and discuss the different aspects of the language in an attempt to determine why this language has thrived in spite of there being more powerful and better structured object oriented languages in existence. Portions of this paper get a little technical, but it is beyond the scope of this essay to teach the reader all the intricacies of C; however, the reader unfamiliar with C will still be able to follow the arguments, and thus come away with an understanding of how C was born and why it is so popular.
Before we can examine the history of C, we must first take a look at Unix's History. "The history of C and that of the Unix operating system itself are intertwined to such a degree that you might almost say that C was invented for the purpose of writing the Unix system."[McGilton, 11] The seeds of Unix began in 1965 with the completion of a project at MIT called Project MAC, making it one of the first time-sharing computer systems. It allowed simultaneous use of the same computer, for at most thirty people, by using one of 160 typewriter terminals placed around campus and in the homes of faculty. It became so overloaded that MIT decided to embark on a more ambitious time-sharing system as a joint venture with General Electric and Bell Labs. This project is known as MULTICS which stands for Multiplexed Information and Computing Service [Campbell-Kelly, 214]. Bell Labs became the software contractor because the company had many talented programmers. As a government regulated monopoly, however, they were not allowed to function as an independent company. At the start of the project, MIT wanted to use IBM's System/360 computers, however, IBM had not yet built time-sharing into their computers. This opened the door for GE to use its computers. The trio worked on this project for a few years but in 1969, Bell Labs decided to discontinue the project because MULTICS was too expensive and would take to long to complete. At this point, an informal group of Bell Labs employees, led my Ken Thompson, started to investigate alternatives. "When Bell Labs pulled out of the Multics project in 1969, [Ken] Thompson and [Dennis] Ritchie were left frustrated because working on Multics had provided them with a rather attractive programming environment."[Campbell-Kelly, 219] According to historians Jean Yates and Rebecca Thomas, Thompson and his group started designing an "operating system that would support coordinated teams of programmers in the development of products, and would simplify the dialog between human and machine."[Yates, 18] He began writing this skeleton system and he named it Unix, which is a rather bad pun on Multics.
At the time, developers enjoyed being able to write programs in high level languages such as PL/I or BCPL. A high level language is one which abstracts the hardware away from the programming environment so that instead of worrying about bits in memory and esoteric assembly language codes, as is necessary when using assembly language, a programmer can concentrate on other parts of a program leaving the language to handle the bits and the bytes. Thompson decided that Unix needed a high-level system language. However, he couldn't just pick one that already existed. He was restricted by the computer he was working on, the PDP-7, which had only 8K bytes of memory [Ritchie, 673]. This was such a problem that at the start of the project he could not even program on the PDP-7. Instead, he had to compile his program on the more powerful GE-635 machine and transfer the program to the slower computer using paper tape [Ritchie, 672]. Thompson had had some previous experience with the BCPL system language, which was invented by Martin Richards in the mid-1960s, however it was too slow and lacked the run-time support that Thompson thought was necessary [Pratt, 472]. So, Thompson took his experience of BCLP and created his own language which he called B. The language is not really a new language. More correctly it is, "BCPL squeezed into 8K bytes and filtered through Thompson's brain. The name most probably represents a contraction of BCPL, though an alternate theory holds that it derives from Bon, an unrelated language created by Thompson during the Multics days."[Ritchie, 673] Bon in turn is probably named after his wife, Bonnie, but others claim that it is named after a religion whose rituals involve chanting magic formulas.
B had some problems though, and the small Unix community felt limited by the restrictions in the language [Pratt, 472]. Its character handling mechanism is awkward, floating-point arithmetic is not implemented well, and there is too much overhead dealing with pointers [Ritchie, 677]. For these reasons, it seemed that B needed some enhancements to satisfy the developers. The main enhancement that it needed was data types to properly handle characters and floating point numbers. "A data type is defined by two properties: a set of values...and a set of operations."[Roberts, 33] This means that a language with strong typing would not allow the programmer to, for example, take the square root of a string of characters. A typeless language such as B or BCPL would do the operation even though it makes no sense and is probably not what the programmer wanted to do. A typed language would inform the programmer that he is doing something that is not permitted.
One of Thompson's coworkers, Dennis Ritchie, decided to improve the B language. In 1971, he began extending it by first adding data types. He called this extended language NB for "new B" [Ritchie, 677], and he continued to expand and rewrite parts of the language until it became clear that the language had changed enough to warrant a new name. The name Ritchie graced it with is C, being the letter directly after B in the alphabet. In 1973, Ritchie completed the essentials of C and then rewrote the entire Unix kernel using the language. This time he wrote it for the PDP-11, with its huge 24K memory [Pratt, 472]. In 1978, Brian Kernighan and Ritchie published The C Programming Language, which has since then become somewhat of a bible to C programmers, and is often called the "white book" or "K&R".
Now that C was the main system language for the Unix environment, its use quickly spread throughout the programming community. As the popularity of Unix grew, so did that of C. Unix was provided to universities at a huge discount, and many other programmers came to enjoy using the powerful Unix operating system. "During the 1980s the use of the C language spread widely, and compilers became available on nearly every machine architecture and operating system."[Ritchie, 681] As the language spread, it began to change slightly for each platform that it was written for, and at this point the first edition of K&R no longer fully described the language and it became clear that a standard version of C was needed [Ritchie, 681]. Therefore, in the summer of 1983 the American National Standards Institute established the X3J11 committee to define the C language. They finally defined ANSI C in 1989. It took 6 years because the committee had taken a cautious, conservative view of the language [Schildt, 4]. They knew that ANSI C was going to be very popular, and they wanted to get it right.
C is now such a popular programming language that there have been several descendants from it that are in wide use. One such descendant, Objective C, was adopted by the NeXT computer company for their OpenStep operating system. Apple Computer recently bought out NeXT, and Obj-C is now the main development language at Apple for its next generation operating system, codenamed Rhapsody. Another descendant from C is Concurrent C, which came out of Bell Labs by Dr. Narain Gehani and Dr. William D. Roome. This language is particularly well tailored for parallel programming and "is the result of an effort to enhance C so that it can be used to write concurrent programs that can run efficiently on single computers, on loosely-coupled distributed computer networks, or on tightly-coupled distributed multiprocessors."[AT&T, 1] And of course there is the extremely popular C++, which is an object oriented extension of C which has become almost as popular as its parent in recent years. C++ addresses some of the issues that plague C, such as weak support for modularization and weak type checking. We will now examine some of these problems.
"C is not a 'very high level' language, nor a 'big' one, and is not specialized to any particular area of application. But its absence of restrictions and its generality make it more convenient and effective for many tasks than supposedly more powerful languages."[Kernighan, xi] C is a middle-level language; "This does not mean that C is less powerful, harder to use, or less developed..."[Schildt, 4] Instead, C combines the advantages of a high-level language with the functionalism of assembly language. Like a higher-level language, C provides block structures, stand-alone functions, and some small amount of data typing. It allows the manipulation of bits, bytes, words, and pointers, like assembly, but it abstracts the hardware away from the code so that something written in C is very portable, meaning that a program can be easily adapted to run on several different computers [Jamsa, vii]. This is a great thing for systems programmers; they can get the efficiency of assembly language programming without all the fuss, and then have a highly portable program. C is also a very lenient language; it allows the programmer to do many things that probably would be caught as errors in a high-level language. This is both an advantage and a disadvantage. For the programmer who knows what she is doing, it's very convenient to be able to do whatever she knows will work. However, for the inexperienced programmer, it may be confusing when the behavior of a program is not correct. A high-level language catches many more possible errors at compile time. Still, C lacks the highly typed environment that characterizes high-level languages.
When Ritchie set out to improve B by adding data typing, he did manage to outfit C with six built in data types (characters, short integers, long integers, floating point numbers, double-precision floating point numbers, and a void type), however the language permits almost all type conversions, which makes it not nearly as strongly typed as high-level languages such as Pascal and Ada. Another important feature of C is that it only has 32 keywords to learn. BASIC, for comparison has 159 keywords, making it harder to learn than C [Schildt, 5]. Brian Kernighan, co-author with Dennis Ritchie of The C Programming Language says, "Although the absence of some of these features may seem like a grave deficiency...keeping the language down to modest size has real benefits. Since C is relatively small, it can be described in a small space and learned quickly. A programmer can reasonably expect to know and understand and indeed regularly use the entire language."[Kernighan, 2]
"The end result is that C gives the programmer what the programmer wants: few restrictions, few complaints, block structures, stand-alone functions, and a compact set of keywords. By using C, you can nearly achieve the efficiency of assembly code combined with the structure of ALGOL or Modula-2. It is no wonder that C is easily the most popular language among topflight professional programmers."[Schildt, 6]
Unfortunately, C is not perfect. It lacks strong typing; it also has very weak support for modularization, its treatment of arrays and pointers is confusing to beginners, and its indirection operator (*) is even described by C's creator as "an accident of syntax" [Ritchie, 684]. Why does it continue to be one of the most popular languages?
One reason for C's success is that "The Unix system is supplied to educational nonprofit organizations at a very low cost. Approximately 90% of university computer science departments license the Unix system, and many advanced programmers and computer science majors learn to use the Unix system and to program in the C language."[Yates, 19] When these Unix experienced graduates go out into the work force and start programming, many of them choose to program in C on the Unix platform because it is familiar to them. In fact, this frequently happened in the 1980s and as commercial versions of Unix began to appear, the exposure to C caused its popularity to grow [Pratt, 472]. Business such as Sun, Hewlett-Packard, IBM, Silicon Graphics Inc., NeXt, and now Apple which sell variants of the Unix operating system spread the use of C. It may not be the most functional language, but because it is the system language for Unix, it gets around.
But it is not only Unix that has caused C to be so popular. C exists on many non-Unix systems as well; "...the languages invasion of other environments suggests more fundamental merits."[Ritchie, 685] C thrives in the consumer sector were Unix does not come near the market share held by Microsoft's Windows and Apple's MacOS. The creators of Windows and the MacOS probably loved C, and thus they used it to program these non-Unix operating systems. Now, newcomers to programming learn C even though they probably are not at all familiar with the Unix environment.
Another reason for C's success, is the fact that programmers are willing to forgive C of its bad points because its good points are so outstanding. In fact, a majority of the problems with C diminish with experience. For example, the "accident of syntax" for the indirection operator becomes a natural oddity to the seasoned programmer, and the array and pointer confusion for beginners vanishes quickly with experience. The strong typing problem may even be seen as an advantage to experienced programmer. Sure, typing makes programming easier for beginners, but it could be seen as a hindrance for experienced programmers who know what they are doing. The problem with modularization is overcome by developing personal conventions for producing modules. Modules are chunks of code that can be wrapped up into a nice little bundle and interact with the rest of the program as a high-level object. "Object-oriented languages", which have recently become popular, are an attempt to make modules easy to make. One advantage to modularization is code reuse. Once you make a module that does something, you never have to write it again. Using conventions is by no means a solution to weak modularization support, but it can at least make the problem less noticeable. We can see that many of C's deficiencies can be overcome with experience, which is perhaps why C has become the language of choice among seasoned programmers.
Finally, the fact that a programmer can use C instead of assembly and achieve nearly the same efficiency of code contributes a great deal to the popularity of the language. This is not so important nowadays as it was back when computers were still tight on memory. High-level languages are memory hogs; the internal structures that need to be maintained within the language to handle data-typing, modularization, and scope take up precious memory. In the PDP-7 that Thompson started working on, 8K bytes of memory meant that any program written for the computer had to be highly efficient. To deal with this restriction, Thompson programmed the first Unix kernel in assembly. This allowed him to manipulate each register and bit in memory as he needed. When C came along it made everything easier. It produced code that was nearly as efficient as assembly, and it accelerated programming time considerably by providing many of the things high-level languages offer. Eric Roberts, a Stanford professor of Computer Science says, "C was designed to unlock the power of the computer and offers programmers considerable control over the programs they write. It is precisely this power that has contributed to the widespread use of the language."[Roberts, 15]
"But power has a downside... programmers can misuse the power of C--and many do, perhaps because they are careless, or because they think it is somehow a badge of honor to have accomplished a task in the most intricate and obscure way..."[Roberts, 15] For the highly experienced programmer, there is an annual contest called the "Obfuscated C Code Contest" where the goal is to write the most obscure, complex, poorly styled code that still actually does something, albeit not usually useful [Noll, 1]. Programmers can have fun with the language which certainly contributes to the popularity of C.
Despite some aspects of the language that mystify beginners, C remains a simple and small language that is easily translatable with simple compilers. The data types that it supports are well-suited to those provided by real machines, and for what people use computers for in their work. This makes learning the language less difficult. "At the same time the language is sufficiently abstracted from the machine details that program portability can be achieved."[Ritchie, 685] A program written in C can be more readily ported to different platforms because C compilers are so ubiquitous. The success of Unix itself is probably the most important factor in C's success. But the fact that it exists on many non-Unix platforms also contributes to its popularity. "C covers the essential needs of many programmers, but does not try to supply too much."[Ritchie, 685] It provides just enough to make programmers happy, but not so much as to create unneeded overhead and to make learning the language difficult. "C is quirky, flawed, and an enormous success. Although accidents of history surely helped, it evidently satisfied a need for a system implementation language efficient enough to displace assembly language, yet sufficiently abstract and fluent to describe algorithms and interactions in a wide variety of environments."[Ritchie, 686]
- AT&T News Release, March 19, 1990, http://www.att.com/press/0390/900319.ula.html.
- Campbell-Kelly, Martin & Aspray, William, Computer, Basic Books, 1996.
- Jamsa, Kris A., The C Library, McGraw-Hill, 1985, QA76.73.C15 J35.
- Kernighan, Brian W. & Ritchie, Dennis M., The C Programming Language, Second Edition, Prentice Hall PTR, 1998, QA76.73.C15 K47.
- McGilton, Henry, Introducing the UNIX System, McGraw-Hill, 1983, QA76.8.U65 M38.
- Noll, Landon C., Bassel, Larry & Srinivasan, Sriram, Obfuscated C Code Contest, http://reality.sgi.com/csp/ioccc/, 1996.
- Pratt, Terrence W. & Zelkowitz, Marvin V., Programming Languages: Design and Implementation, Prentice-Hall, 1996, QA76.7.P7.
- Ritchie, Dennis M., "The Development of the C Programming Language," in Thomas J. Bergin & Richard G. Gibson, History of Programming Languages, ACM Press, 1996, QA76.7.H558.
- Roberts, Eric S., The Art and Science of C, Addison-Wesley, 1995.
- Schildt, Herbert, C: The Complete Reference, Second Edition, McGraw-Hill, 1990, QA76.73 .C15 S353.
- Yates, Jean & Thomas, Rebecca, "History of UNIX," The UNIX Encyclopedia, Yates Ventures, 1984, QA76.8.U65 U65.