It’s Monday morning. I’m sitting at my desk, using my fingers to hit keys in more or less the right order. I’m using 26 letters to construct words and sentences in order to convey my meaning. Well, 26 letters, and the spellcheck.
An alphabet is a collection of symbols that can be combined to form words. The words can then be combined to create phrases, sentences, paragraphs and pages. It’s a flexible design that allows comparatively few symbols to be used to capture and share a language, a considerable improvement on other writing systems that use a different symbol for each word. Still, if you want a machine to react to what you’re writing, 26 letters aren’t enough, or, if you squint at it another way, 26 letters are far too many.
If you look at the two first paragraphs of this section, or any of the rest of this, you’ll notice that the letters a through z aren’t the only symbols I’m using. I’ve got periods and camas, and the occasional apostrophe. For a machine to be able to follow a language, it needs every last symbol you’re going to use to be a part of its alphabet. That includes an empty space—the symbol used to separate symbols into groups. In fact, even lower-case letters, (a, b, c,) are distinct from uppercase letters, (A, B, C,) which doubles the number of letters, to 46, plus all the other symbols I just mentioned, plus all the others that I didn’t.
The English alphabet started as a sort of shorthand way of writing. The Egyptians needed a way to write instructions for their workers. They came up with a phonetic system that was much quicker to learn and easier to use than their hieroglyphics. Later, the Phoenicians picked it up. Then, they passed it along to the Greeks.
The Greek language didn’t use every sound that the Phoenicians or Egyptians did. They took the symbols for sounds they didn’t use, and changed them into vowels. Combining vowels with the Phoenician consonants, they could represent every sound used in their language with fewer symbols, and it gave them a way to try and approximate sounds from other languages. Later still, the romans added a couple of other letters to cover sounds they used that the Greeks did not. Still later, the Roman alphabet turned into the beginning of the English one, along with several others.
Alphabets, like the languages they represent, change over time. At any given moment, someone somewhere could draw some new character and toss it into the mix. Even if your basic alphabet doesn’t change, you and your folk might get into something like engineering, or mathematics, or logic, and suddenly there’s a mess of folk, inventing new symbols left and right. If you need to include every symbol you’ll use in the machine’s alphabet, and new symbols may be added, your poor confused computing contraption will need an alphabet that includes all possible symbols.
Let’s see… That’s 26 lowercase, 26 uppercase, all the numbers, all the punctuation marks, spaces, indents… and every other possible symbol. I think that’s infinite.
There is an easy way to represent all possible symbols.
Take the symbols, letters and so forth, and map them to numbers. A simple version is to take the letters of the English alphabet, and number them, 1 through 26.
A=1, b=2, c=3… z=26.
So, “8 5 12 12 15” is the same as “hello”
we need spaces between the numbers to let you read them, but what if we want there to be a space between one word and the next? we’ll need a number to represent a space, say 27.
“8 5 12 12 15 27 23 15 18 12 4” means “hello world”
We’ve got quotation marks, which could be mapped to 28.
28 8 5 12 12 15 27 23 15 18 12 4 28 means “hello world”
Whenever we want or need a symbol that hasn’t been included, we can use the next number. Which number is mapped to which symbol doesn’t matter, so long as it’s consistent. To keep things simple, one can avoid fractions and the like. Using only whole numbers, all possible numbers can be written with an 11-symbol alphabet—0, 1, 2, 3, 4, 5, 6, 7, 8, 9, and a space. The number of whole numbers is infinite, so we can use 11 symbols to represent an infinite number of different symbols.
In typical computer design, your system represents all the letters and such you see, and various and sundry commands with numbers. If you want the computer to add two numbers, for example, it will send those numbers to a couple of specific spots in the machine, and another number that means, “Add those for me,” to another specific spot. Then, it can fetch the answer from yet another spot. To be sure the right number goes to the right place at the right time, the places where numbers can be sent have addresses, which are also numbers.
The simplest possible alphabet consists of only two symbols. If you have one symbol represent a zero, and use the other to represent a one, you can represent all possible numbers with just two symbols. Because of the way computers store numbers, you don’t even need a special character to represent a space.
It’s easier to design for a two-symbol alphabet, but the simplest alphabet isn’t always the simplest to follow. We’ll be using integers—positive and negative whole numbers. Keep in mind that our contraption will see these as zeros and ones, but we’ll get to how that two-symbol alphabet, called binary, actually works in later sections.
Before we start to design our contraption, let’s take some time to figure out how it will be programmed—how we can tell it what to do. Because, if we don’t give it a program to run, it will sit there, fires stoked, billows of steam drifting picturesquely through the air, and do nothing.