Character Set


A programming language is designed to help process certain types of data so that numbers contain letters and strings and provide useful output information known as. Data processing is done by executing a series of specific instructions called a program. These guidelines are formed using specific symbols and words according to some strict rules known as syntax rules. The instructions for each program must ensure the language's sentence-to-sentence rules.

Character Set

The characters that can be used to form words, numbers, and expressions depend upon the computer on which the program is run. However, a subset of characters is available that can be used on most personal, micro, mini, and mainframe computers. The characters in C are grouped into the following categories :

  1. Letters
  2. Digits
  3. Special characters
  4. White spaces

The compiler ignores white spaces unless they are a part of a string constant. White spaces may be used to separate words, but are prohibited between the characters of keywords and identifiers.

Trigraph Characters

Many non-English keyboards do not support all the characters mentioned in the table. C introduces the concept of "trigraph" sequences to provide a way to enter certain characters that are not available on some keyboards. Each trigraph sequence consists of three characters as shown in the table.

C Character Set

Letters Digits
Uppercase A....Z All decimal digits 0....9
Lowercase a....z

Special Characters

Symbol Meaning
, comma
. period
; semicolon
: colon
? question mark
' apostrophe
" quatation mark
! exclamation mark
| vertical bar
/ slash
\ backslash
~ tilde
_ under score
$ dollar sign
% percentage
& ampersand
^ caret
* astrish
- minus sign
+ plus sign
< opening angle bracket
> closing angle bracket
( left parenthesis
) right parenthesis
[ left bracket
] right bracket
{ left brace
} right brace
# number sign

The whitespaces will be covered in later tutorials.

C Tokens

In a paragraph of text, individual words and punctuation are called tokens. Similarly, in any C program, the smallest individual units are known as C tokens. C has six types of tokens as shown in the figure below. C programs are written using these tokens and syntax.

C Tokens