If you are reading this you want to know more about c pointers. That’s a good thing. Even if you don’t program in C very often, understanding pointers gives you a deeper understanding how programming and memory works “under the hood”. Learning pointers will make you a better programmer. In this post we will start with variables and memory. We will look at how that relates to pointers. We will talk about the “why” behind pointers. We will discuss pointer operations. Then we will finish up with different types of pointers you will encounter.
What is a variable?
Let’s start simple. What is a variable? Most programmers will say a variable is a name for a piece of data that can change in a program. That’s true but it’s also just scratching the surface.
When a variable gets declared, memory to hold a variable of that type is allocated at an unused memory location. The location that is allocated is the variable’s memory address. For a compiler, a variable is a symbol for a starting memory address. The compiler knows two things about any variable, the name and the type. For above, the is a symbol that gets translated to a memory address. The type, , tells the compiler how much memory to store starting at that address.
A C compiler converts C source code to assembly source code. During that conversion variable names are converted to relative memory addresses. Here is an example in assembly of the code above. Don’t worry, you don’t need to know assembly to know pointers. This is just an example to show what happens.
Three things to notice. The labels, the pieces, and the values . The rbp is a base pointer. For our discussion, think of it like a starting point, a starting memory address. The are relative offsets, minus 20 and minus 4, from the starting point. The are sizes, number of bytes to store. On my machine, a is 4-bytes, 32-bits, and a is 1-byte, 8-bits.
Put this all together and says store 4 bytes with the value 1 starting at the relative offset , and says store 1 byte with the value 97, the ascii value for ‘a’, starting at the offset . When the program runs, the offsets like , are changed to actual memory addresses. The key takeaway is this. To a compiler all variables are just memory addresses and sizes.
To a compiler all variables are just memory addresses and sizes
What is a pointer?
C programs have different types of variables including ints, floats, arrays, chars, structs, and pointers. An int holds an integer number, a float holds a floating point decimal number. Arrays hold multiple values. A pointer is a variable that holds the memory address of another variable. It’s that simple. Above the int variable above holds the number 1 which is 4 bytes stored by the compiler at a starting at the relative offset . When the program runs that offset might be the real memory address . A pointer to would hold the value .
The why behind pointers
Why do pointers exist? Why do we need them? The simple answer is efficiency. Back when C was created, computers were much slower. Most programs were written in assembly. Programmers needed to be much more efficient at solving problmes.
The more detailed answer has to do with call semantics. The C language is call-by-value. When you call a function in C, the value of any parameters are literally copied into the function’s call stack. Pass an int, 4-bytes are copied into the function. Pass a char and 1-byte is copied into the function. What happens when you need to pass a 100k element int array into a function? You don’t want to have to copy the 400,000 bytes into a function. That is really inefficient. Instead you have a pointer which references the array. The pointer, all 4 or 8 bytes of it, is copied into the function where it can be dereferenced and the array accessed. Same goes for large structs. Don’t pass a copy of the large struct in, pass in a pointer to the struct.
There are two main operators for working with pointers. The operator and the operator. There is also the operator but we will get to that later.
The operator is used when declaring a pointer and when dereferencing a pointer. Declaring a pointer is like declaring any other variable. The compiler allocates spaces for the pointer. The size of a pointer, the number of bytes that are used to store each pointer, is dependent on the architecture of the machine. For 32-bit systems, pointers will be 4-bytes or 32-bits. For 64 bit systems, like most are these days, pointers will be 8-bytes or 64-bits.
The operator is used to get the address of another variable. It is used to assign a value to a pointer. Putting the operator in front of another variable returns a pointer to that variable of the type of that variable.
Take the following code which shows some simple usage of the and operators.
- Line 2 we use the operator to declare an int pointer. In other words declare a variable that holds a memory address where the value-at that memory address is a int.
- Line 5 we declare an int variable and assign it the literal value 1.
- Line 8 we use the operator to get the address-of the variable val and assign that address value to the ptr variable. We store the memory address of val in the variable ptr.
- Line 11 we dereference the ptr variable retrieving the value at the address stored in the pointer.
- Line 14 we dereference the ptr variable to set a new value to the address stored in the pointer.
Declaring a pointer is easy. It is the same a declaring a variable, the only difference being the operator used in front of the variable name which indicates a pointer. Assigning a value to the pointer is easy, we use the operator to get the address-of a variable of the correct type. Dereferencing is often where confusion lies.
Dereferencing is just indirection. It is telling the compiler, “I have the address of a variable in the pointer. I want to access that pointed-to address either to get a value or set a value “. A pointer holds a reference to a variable; the reference being the memory address stored in the pointer. When we access the value at that reference, we de-reference the pointer.
Dereferencing can be used to either indirectly get a value from the pointer address or to assign a value to the pointer address.
Let’s look at an example.
- Line 6 we declare an int value named ival and assign it the value 1.
- Line 7 we declare an int pointer iptr and assign the address of ival to iptr.
- Line 10 we dereference the iptr variable to get the value pointed to by iptr and assign it to the int variable named get.
- Line 11 we print out the get variable.
- Line 14 we dereference the iptr variable to set a new value, changing the value to 2. Literally we are assigning the value 2 to the address pointed to by iptr.
- Line 15 we dereference the iptr variable again to get its value and assign that value to the int set variable.
- Line 16 we print out the value of the set variable. It is now 2.
- Line 17 we print out the value of the ival variable. It is also now 2.
If we run that code and we get:*iptr = 1 *iptr = 2 ival = 2
In this example we have used dereferencing to both get and set values. Some people get confused and think dereference means getting a value. It doesn’t. Dereference means to indirectly access the address stored in the pointer. You can get a value, like we do in line 6 above, or you can set a value, like we do in line 10 above.
Pointers and types
Take a look at the following code.
When we declare a int pointer we are declaring the variable as a pointer, that it holds the address of another variable, and that the value at that address is an int. Same goes for a float pointer, char pointer, or any other type. Declaring a pointer to be a specific type tells the compiler when the pointer is dereferenced the value pointed to will be of that type.
You will notice in the example above we declare a pointer type and then assign the address of a value of the same type. If you were to uncomment the last few lines and try to compile that code it would give “assignment from incompatible pointer type” errors and wouldn’t compile. You can only assign addresses to pointers of the same type.
The operator returns a pointer of the type it is in front of. In the code above returns an int pointer, returns a float pointer, and returns a char pointer. Anywhere a pointer can be used, an equivalent can also be used.
Pointers to arrays
Just like you have a pointer to an int or float, you can have a pointer to an array as long as the pointer is the same type as the elements of the array.
Pretty simple. In fact if looks exactly like an int pointer, that’s because it is. When an array is created, , what actually happens is the compiler allocates memory for the entire array and then assigns a pointer to the array variable, in this case myarray, holding the address of the first element in the array.
Some people get confused and start thinking you can interchange pointers and arrays. You cannot. You can assign an array variable to a pointer of the same type but not the opposite. When an array is created, the array variable cannot be reassigned.
Here is an example
Pointers to structs
Like an array, a pointer to a struct holds the memory address of the first element in the struct. Here is some example code for declaring and using a struct pointer.
- Lines 5-10 we declare we declare the struct person, a variable to hold a person struct, and a pointer to a person struct. The declaration for a pointer to a struct is similar to a pointer to an any other type, .
- Line 12-13 we fill the struct with age and name values.
- Line 14 we assign the address of the first variable to the struct pointer ptr.
- Line 16 we print out values from the struct.
If we run that code and we get:age=21, name=full name
On line 16 we have a new operator . The operator is used to access a value from a struct pointer. This would be the same as doing (*ptr).field where we first derefence the struct pointer and then access the field using the standard notation. Accessing a field from a struct pointer is so common, the operator exists to make it easier.
Pointers to pointers
A pointer can pointer to another pointer variable. You can have a pointer to a pointer, and a pointer to a pointer to a pointer and so on down the rabbit hole. In practice it is rare to see more than a pointer to a pointer. Usually two levels of indirection are enough.
Take the following code:
If you run this code, you should get output similar to this but with different memory addresses.&ptr=0x7fff390fa6f8, &val=0x7fff390fa70c ptr2ptr=0x7fff390fa6f8, *ptr2ptr=0x7fff390fa70c, **ptr2ptr=1
- Lines 1-2 declare an int variable val and an int pointer variable ptr.
- Line 5 is new. Here we are saying that we have a variable ptr2ptr that holds the address of another int pointer.
- Line 6 we assign the ptr variable the address-of the val variable. We have seen this before.
- Line 7 we assign the ptr2ptr variable the address-of the ptr variable. Double indirection. The ptr2ptr variable stores the address-of ptr which in turn stores the address-of val.
- Line 8 we print out the address-of the ptr and val variables.
- Line 9 we print out the value stored in ptr2ptr which is the same as &ptr. When we dereference that we get the address of val. When that is dereferenced we get the value 1.
I hope this (somewhat) brief overview helps with some of the different types of pointers you will see. If you found this useful, check out some of my other posts on function pointers in C and pointers and arrays in c.
Related Posts via Categories
// some variables
// declare an int pointer name ptr
// declare an int with the value of 1
// get the address of the val variable and store it in ptr
// dereference the ptr variable to get the int value at the address stored
// dereference the ptr variable to set the int value at the address stored
// declare int ival and int pointer iptr. Assign address of ival to iptr.
// dereference iptr to get value pointed to, ival, which is 1
printf("*iptr = %d\n",get);
// dereference iptr to set value pointed to, changes ival to 2
printf("*iptr = %d\n",set);
printf("ival = %d\n",ival);
// declare an int value and an int pointer
// declare a float value and a float pointer
// declare a char value and a char pointer
// can't do this, doesn't make sense
// iptr = &fval;
// fptr = &ival;
// iptr = &cval;
// you can do this, myarray is a valid int pointer pointing to the first element of myarray
// you cannot do this, array variables cannot be reassigned
// myarray = ptr
// myarray = myarray2
// myarray = &myarray2
// declare a variable ptr2ptr which holds the value-at-address of
// an *int type which in holds the value-at-address of an int type
printf("ptr2ptr=%p, *ptr2ptr=%p, **ptr2ptr=%d\n",(void*)ptr2ptr,(void*)*ptr2ptr,**ptr2ptr);
is an integer object (of type ), and is a pointer object (of type or pointer-to-).
Unary is the address operator. Applying it to an object of type gives you the address of that object (or, equivalently, a pointer to that object); that address/pointer value is of type , or pointer-to-. The operand of unary must be the name of an object, not just a value; is illegal nonsense. (The symbol is also used for the binary bitwise and operator, which is completely unrelated to the address operator.)
Unary is the dereference operator, the inverse of . Its operand must be value of some pointer type. refers to the object to which points.
Given the above declarations, and assuming the value of hasn't been changed, the expressions and mean the same thing; they both refer to the same object (whose value happens to be 42). Similarly, the expressions and mean the same thing; they both yield the address of , an address that has been stored in the pointer object .
It's important to note that doesn't just refer to the current value of , it refers to the object itself -- just like the name does. If you use in a value context, this doesn't matter; you'll just get the value of . But if you use it on the left side of an assignment, for example, it doesn't evaluate to . It evaluates to the object itself, and lets you modify that object. (The distinction here is whether is used as an lvalue.)