Version 1.1.
Copyright 2005-2006 Mac-arena the Bored Zo.
This work is licensed under a Creative Commons Attribution 2.5 License.
This document comes with a companion example program, available as one file or as multiple files (zipped).
This is regular text. This is a variable, some code
, and some sample output.
This is a line of code. This is a comment. This is also a comment.
This is output you'd see on your screen.
A pointer is a memory address.
(Mmm, short paragraphs.)
Say you declare a variable named foo.
int foo;
This variable occupies some memory. On a PowerPC, it occupies four bytes of memory (because an int is four bytes wide).
Now let's declare another variable.
int *foo_ptr = &foo;
foo_ptr is declared as a pointer to int. We have initialised it to point to foo.
As I said, foo occupies some memory. Its location in memory is called its address. &foo
is the address of foo (which is why & is called the 'address-of operator').
Think of every variable as a box. foo is a box that is sizeof(int)
bytes in size. The location of this box is its address. When you access the address, you actually access the contents of the box it points to.
This is true of all variables, regardless of type. In fact, grammatically speaking, there is no such thing as a 'pointer variable': all variables are the same. There are, however, variables with different types. foo's type is int. foo_ptr's type is int *. (Thus, 'pointer variable' really means 'variable of a pointer type'.)
The point of that is that the pointer is not the variable! The pointer to foo is the contents of foo_ptr. You could put a different pointer in the foo_ptr box, and the box would still be foo_ptr. But it would no longer point to foo.
The pointer has a type, too, by the way. Its type is int. Thus it is an 'int pointer' (a pointer to int). An int **'s type is int * (it points to a pointer to int). The use of pointers to pointers is called multiple indirection. More on that in a bit.
The obvious way to declare two pointer variables in a single statement is:
int* ptr_a, ptr_b;
ptr_a, ptr_b
),ptr_a, ptr_b
).Given this, what is the type of ptr_b? int *, right?
*bzzt* Wrong!
The type of ptr_b is int. It is not a pointer.
C's declaration syntax ignores the pointer asterisks when carrying a type over to multiple declarations. If you split the declaration of ptr_a and ptr_b into multiple statements, you get this:
int *ptr_a; int ptr_b;
Think of it as assigning each variable a base type (int), plus a level of indirection, indicated by the number of asterisks (ptr_b's is zero; ptr_a's is one).
It's possible to do the single-line declaration in a clear way. This is the immediate improvement:
int *ptr_a, ptr_b;
Notice that the asterisk has moved. It is now right next to the word ptr_a. A subtle implication of association.
It's even clearer to put the non-pointer variables first:
int ptr_b, *ptr_a;
The absolute clearest is to keep every declaration on its own line, but that can take up a lot of vertical space. Just use your own judgment.
Finally, I should point out that you can do this just fine:
int *ptr_a, *ptr_b;
There's nothing wrong with it.
Now, how do you assign an int to this pointer? This solution might be obvious:
foo_ptr = 42;
It is also wrong.
Any direct assignment to a pointer variable will change the address in the variable, not the value at that address. In this example, the new value of foo_ptr (that is, the new 'pointer' in that variable) is 42. But we don't know that this points to anything, so it probably doesn't. Trying to access this address will probably result in a segmentation violation (read: crash).
(Incidentally, compilers usually warn when you try to assign an int to a pointer variable. gcc will say 'warning: initialization makes pointer from integer without a cast'.)
So how do you access the value at a pointer? You must dereference it.
int bar = *foo_ptr;
In this statement, the dereference operator (prefix *, not to be confused with the multiplication operator) looks up the value that exists at an address. (On the PowerPC, this called a 'load' operation.)
It's also possible to write to a dereference expression (the C way of saying this: a dereference expression is an lvalue, meaning that it can appear on the left side of an assignment):
*foo_ptr = 42; Sets foo to 42
(On the PowerPC, this is called a 'store' operation.)
Here's a declaration of a three-int array:
int array[] = { 45, 67, 89 };
Note that we use the []
notation because we are declaring an array. int *array
would be illegal here; the compiler would not accept us assigning the { 45, 67, 89 }
initialiser to it.
This variable, array, is an extra-big box: three ints' worth of storage.
But here's a little secret: you can never refer to this array again.
'What?' you say. 'But the compiler lets me do that! Watch!'
printf("%p\n", array); Prints some hexadecimal string like 0x12307734
Ah, but what does %p
mean?
It means 'pointer'.
When you use the name of an array in your code, you actually use a pointer to its first element (in C terms, &array[0]
). This is called 'decaying': the array 'decays' to a pointer. Any usage of array is equivalent to if array had been declared as a pointer (with the exception that array is not an lvalue: you can't assign to it or increment or decrement it, like you can with a real pointer variable).
So when you passed array to printf
, you really passed a pointer to its first element, because the array decays to a pointer.
Say we want to print out all three elements of array.
int *array_ptr = array; printf(" first element: %i\n", *(array_ptr++)); printf("second element: %i\n", *(array_ptr++)); printf(" third element: %i\n", *array_ptr);
first element: 45 second element: 67 third element: 89
In case you're not familiar with the ++
operator: it adds 1 to a variable, the same as variable += 1
(except that since we used the postfix version, array_ptr++
rather than ++array_ptr
, the expression evaluated to the before-increment value of array_ptr).
But what did we do with it here?
Well, the type of a pointer matters. The type of the pointer here is int. When you add to or subtract from a pointer, the amount by which you do that is multiplied by the size of the type of the pointer. In the case of our three increments, each 1 that you added was multiplied by sizeof(int)
.
By the way, though sizeof(void)
is illegal, void pointers are incremented or decremented by 1 byte.
In case you're wondering about 1 == 4
: Remember that earlier, I mentioned that ints are four bytes on a PowerPC. So on a PowerPC, adding 1 to or subtracting 1 from an int pointer changes it by four bytes. Hence, 1 == 4
. (Programmer humour.)
printf("%i\n", array[0]);
OK… what just happened?
This happened:
45
Well, you probably figured that. But what does this have to do with pointers?
This is another one of those secrets of C. The index operator (e.g. array[0]
) has nothing to do with arrays.
Oh, sure, that's its most common usage. But remember that arrays decay to pointers. That's a pointer you passed to that operator, not an array.
As evidence, I submit:
int array[] = { 45, 67, 89 }; int *array_ptr = &array[1]; printf("%i\n", array_ptr[1]);
89
That one might bend the brain a little. Here's a diagram:
array points to the first element of the array; array_ptr is set to &array[1]
, so it points to the second element of the array. So array_ptr[1]
is equivalent to array[2]
(array_ptr starts at the second element of the array, so the second element of array_ptr is the third element of the array).
Also, you might notice that because the first element is sizeof(int)
bytes wide (being an int), the second element is sizeof(int)
bytes forward of the start of the array. You are correct: array[1]
is equivalent to *(array + 1)
. (Remember that the number added to or subtracted from a pointer is multiplied by the size of the pointer's type, so that '1
' adds sizeof(int)
bytes to the pointer value.)
Two of the more interesting kinds of types in C are structures and unions. You create a structure type with the struct keyword, and a union type with the union keyword.
The exact definitions of these types are beyond the scope of this article. Suffice to say that a declaration of a struct or union looks like this:
struct foo { size_t size; char name[64]; int answer_to_ultimate_question; unsigned shoe_size; };
Each of those declarations inside the block is called a member. Unions have members too, but they're used differently. Accessing a member looks like this:
struct foo my_foo; my_foo.size = sizeof(struct foo);
The expression my_foo.size
accesses the member size of my_foo.
So what do you do if you have a pointer to a structure?
One way to do it (*foo_ptr).size = new_size;
But there is a better way, specifically for this purpose: the pointer-to-member operator.
Yummy foo_ptr->size = new_size;
Unfortunately, it doesn't look as good with multiple indirection.
Icky (*foo_ptr_ptr)->size = new_size; One way (**foo_ptr_ptr).size = new_size; or another
Rant: Pascal does this much better. Its dereference operator is a postfix ^:
Yummy foo_ptr_ptr^^.size := new_size;
(But putting aside this complaint, C is a much better language.)
I want to explain multiple indirection a bit further.
Consider the following code:
int a = 3; int *b = &a; int **c = &b; int ***d = &c;
Here are how the values of these pointers equate to each other:
Thus, the & operator can be thought of as adding asterisks (increasing pointer level, as I call it), and the * and [] operators as removing asterisks (decreasing pointer level).
There is no string type in C.
Now you have two questions:
The truth is, the concept of a 'C string' is imaginary (except for string literals). There is no string type. C strings are really just arrays of characters:
char str[] = "I am the Walrus";
This array is 16 bytes in length: 15 characters for "I am the Walrus", plus a NUL (byte value 0) terminator. In other words, str[15]
(the last element) is 0. This is how the end of the 'string' is signalled.
This idiom is the extent to which C has a string type. But that's all it is: an idiom. Except that it is supported by:
The functions in string.h are for string manipulation. But how can that be, if there is no string type?
Why, they work on pointers.
Here's one possible implementation of the simple function strlen, which returns the length of a string (not including the NUL terminator):
size_t strlen(const char *str) { Note the pointer syntax here size_t len = 0U; while(*(str++)) ++len; return len; }
Note the use of pointer arithmetic and dereferencing. That's because, despite the function's name, there is no 'string' here; there is merely a pointer to at least one character, the last one being 0.
Here's another possible implementation:
size_t strlen(const char *str) { size_t i; for(i = 0U; str[i]; ++i); When the loop exits, i is the length of the string return i; }
That one uses indexing. Which, as we found out earlier, uses a pointer (not an array, and definitely not a string).
1 == 4
expression in title, and use of ++
, in 'Pointer arithmetic'.This document is also available in zip format. The previous version is also available.
2006-01-01 http://geocities.com/iamtheboredzo/pointers |
![]() |