Advanced Notes On C Language: (For Advanced System Programmers/Instructors/Trainers)
Advanced Notes On C Language: (For Advanced System Programmers/Instructors/Trainers)
Authored By:
Peter Chacko
Netdiox Computing Systems
peter@nediox.com
www.netdiox.com
Keywords/index terms: C interview questions FAQ, C Programming, volatile, pointers, GNU C ,
compiler, kernel,training, Linux..
TA B L E O F C O N T E N T S
ii
1 . I N T R O D U C T I O N
Please understand that this article is not meant for those who
don’t have a system programming background.
2 . D AT A T Y P E , C O N V E R S I O N S A N D E X P R E S S I O N S
iii
Consider the following array,
int ar[10];
int a ;
datatype identifier;
identifier
int ar[10];
Data type
But instead of int[10] arr; we did int arr[10], that is how a data type
construct and it’s identifier bound together in a declaration of an
array( or functions) .That is why it is called “derived type”.
We do
and
ARR ar;
Here we only followed that special rule of derived type to typedef. Many
experienced C programmers find it confused when they have to typedef
an array type, as they lack the above mentioned understanding of how
arrays are “typified”.
iv
Similarly function declarations and pointer declarations. ( identifier
comes in between that different parts of data type. )
data-type identifier;
eg: int a;
2nd type is
eg:
identifier
int ar[10];
dataype-part-ii
datatype-part-I
identifier
int fun(int a,int b);
dataype-part-ii
datatype-part-I
v
You can use this information when you work with complex
declarations in your projects. Please think over these topics and
understand the concepts well.
2 . 1 C O N V E R S I O N / D AT A T R A N S F O R M AT I O N
char A=128;
int B=A;
These are the kind of issues that will cause many obscene
bug in the form of data corruption.
int B=-1;
if (A>B)
{………}
else
vi
{ …….}
3 . P O I N T E R S
HEAP
address
BSS
DATA
.rodata
TEXT
vii
When a program is built(compilation, assembler invocation, and linking)
it is stored in a file with a special format. One such format is
ELF(Executable and Linking Format)in Unix like systems. It has the
necessary information ( in the ELF header and program header) to help
the program loader to load and create the process image in memory at
runtime, and also do run time linking (for shared libraries linked
dynamically).
At the time program is built, storage for all uninitialised data (static or
global) are not allocated, but only the total size is noted in program
header table of the ELF file. Later when program is being loaded, loader
read the metadata from the ELF file and allocate one large chunk (page
aligned) of memory to hose all such variables. This section is called
BSS(Block Static Storage or Block started by symbol).
Data( storage for all initialized static and global variables) and STACK
section should be familiar to you. HEAP is created by dynamic invocation
by memory-alloc routines of your program. In addition to this there will
be shared memory mappings as well. If you process a file using mmap()
system call interface, those regions are also mapped to your program.
Now the sum total of all such memory is called the address space of your
program. You can only manipulate these sections of memory and some
which are not to be written in to. Hence all sorts of pointer problems are
due to the following primary reasons.
1. You manipulate memory that is not in your program’s address space.
2. You manipulate memory in your address space wrongly(like, writing to a
write-protected region etc) or corrupting other variables.
For instance, Consider the following declaration ,
int ar[4]; int b=100;
ar[4]=200;
Will corrupt the variable b, because you exceeded the array
bound.[manipulated memory in your region wrongly..
int * a;
*a=100;
viii
will cause 100 be written to a memory address ‘0’. You have
manipulated a memory region outside your address space. Most
architecture doesn’t support de-returning 0 address. So accessing
a Null pointer is considered as accessing a region outside the
address space of any program
Now what’s an address? Is that a virtual address, physical address
or logical address?
The value you see when you apply an & operator to a C variable
is actually the program-relative logical address. It has to be
processed by segmentation unit(coupled with the value in the CS
register) to create a linear address and then processed by the
paging unit to create the physical address in your RAM. So these
details are not at the control of C, but is at the control of the
memory management subsystems of the operating system. But a
knowledge in these details help you debug your pointer issues
faster, as some one who knows his game. Pointers to pointers
are well understood as another layer of this indirection. The
compelling reason why we have pointers to pointers primarily is
to modify a pointer itself across a function call(by passing the
address of the pointer). The function in question need to have a
pointer to pointer as the formal parameter.( if you are doctor,
you need another doctor-to-doctor to cure you). Other wise same
concepts apply.
4 . S T R I N G S I N C
ix
compiler option, -writable-strings in some compilers. If its stored
in .rodata, you get a segmentation fault when you try to modify
these region, through the pointer. Always understand that “any
string” is an expression, that has the value of the address where
this string is stored in the process image.
5 . C O M P O S I T E D ATA T Y P E S
int ar[4];
int * p=malloc(4*sizeof(int));
2. when you apply the ‘&’ operator to ar, it gives the same value
as ar itself. When you apply ‘&’ operator to p it gives the address
of p. Reason is obvious, ‘ar’ has no address because its not
program symbol seen by the run time and has no address.
When you apply & operator to a structure or union you set the
address of the first by the of the first member(or the shared
object in case of union) and when you just refer the variable
name of the structure , you mean the entire object. All other
variables have the same semantics. But when you just mention
x
the array by name, you don’t refer it as the entire array, rather an
address of the first element. This means that in C,
[0] [1]
Now as you can see that each element of the array itself is an
array of 4 bytes. Hence we can say that this is an array of , array
of 4 bytes. Which is nothing but
char ar [2][4];
int arr[100][100];
for(int j=0;j<100;j++)
ar[i][j]=i*j;
xi
ar[i][j]=i*j;
We can see that arrays and pointers are similar. We have pointers
to array as well.
int (*ar)[10];
int b [2][10];
ar=b;
Or
now if you do **ar, it means the first integer stored in the element
array. When you do ++a, you are jumping off by 1 array itself.
See you have never declared a pointer to a pointer, but you de-
referenced it as a pointer-to pointer(using **ar) , this is the
beauty(or wildness?) of pointer to an array.
int ** p;
int ar[2][4];
p=ar;
xii
Here compiler throws error. She is right. You are actually storing
an object of type pointer to an array of 4 integers, to a plain
pointer to pointer to an integer. You should declare a pointer to
an array of 4 integers and then store it. You should do this in the
function argument declarations as well.
5 . 1 S T R U C T U R E S / U N I O N S
structure tag, member name, structure name all fall into different
name spaces. That is you can have
struct NewStruct{
int NewStruct;
}NewStruct
xiii
7 . I M P L E M E N T AT I O N O F A F U N C T I O N C A L L & R E C U S R I O N
CPU has the registers EBP, ESP ,EIP and the STACK segment at
its disposal to implement control transfer.
First, all arguments to the call are passed to the stack(or registers
based on the implementation). C is a right pusher. (It pushes right
most argument first(after it’s evaluation). Then the return address
is saved on the stack, followed by the current frame pointer .
Now what is left to be saved is the current ESP. fortunately
current ESP is the new EBP. Hence ESP is just saved in the new
EBP after the call. Hence CPU execute a instruction and we are in
the new function. Old ESP becomes the new EBP. Old ESP
becomes the new EBP. Old ESP will be modified by the new ESP
values(We already save the content in EBP). EIP will be used by
the instruction streams of the new function and we do our job in
the new function. When the function is done, EBP is stored back
to ESP. Then previous EBP is popped. Then what is left in the
stack frame is saved EIP which is also popped to EIP. And the CPU
go to the saved instruction, which is one right after the function
call.
{ if(!st) return;
if(!*st) return;
strReverse(st + 1 );
printf(“%c”, *str);
xiv
And assume also that we call this function with the string “ABC”
‘\0’
EIP
EBP
“C” “C”
EIP EIP
EBP EBP
“BC” “BC” “BC”
EIP EIP EIP
EBP EBP EBP
xv
As shown on the fourth call function gets a null character and
start unwinding, each stack frame see a different argument as
shown. Hence the printf function prints the character in reverse.
The important point to understand is that, all code after the
recursive call is executed only after the stack –unwinding process
begins.
8 . V O L AT I L E VA R I A B L E S I N C
9 . A P P E N D I C E S
9 . 1 G N U C C O M P I L E R E X T E N S I O N S
xvi
A. Like in ANSI C 99( it refers as a flexible array member of
size[1]), GNU C allows a variable length object by having a last
array element of size zero.
struct message {
char msg-buf[0];
};
NewMsg->length = CurrentMsgLength;
B. Case ranges
C . Attributes of a function
xvii
inside double parentheses. The following are some useful attributes (please refer
the GNU manual for a full list). no_return( indicates whether the function is not to
return) , pure(indicates side-effects-free function), always_inline( instruct the
compiler to in-line the function),no_inline, deprecated( will cause a warning if the
function is called), nonull( to cause a compiler error if non-null arguments are
used).
Eg: The following declaration,
: extern void *
Will cause a compiler error if you invoke the function with null pointers for the
arguments first and second. If nonull is used with no arguments, all arguments are
checked against NULL .
struct tag
char a;
xviii
}
xix
H. Built in function to avoid the cache-miss latency.
This function cause the memory object pointed to by the addr are in the cache line,
after the execution this function. 2 optional arguments can be passed. The first one
specify whether it is a read (value 0) or write(value 1). Second argument specify
how local it is (meaning how to replicate it in all hierarchies of the caches). 0 means
global( should be there in the outer most cache) and 3 being local( should be kept
only in the on-chip CPU cache) and 2 means in-between.
likely(0, unlikely() macros are very much used by kernel code to pass hints to
the compiler for the branch prediction( unlikely(0) cause to avoid prefetching
the code that follows, to instruction cache.). You can refer kernel sources to see
examples. Inline functions are also heavily used by the kernel code. New
structure initializer syntax of c99 is another GCC extensions you can find in
kernel code in many files.
9 . 2 I N L I N E A S S E M B LY
xx
asm ( assembler template
);
Now let's take a look at how to specify individual registers as constraints for the
operands. In the following example, the cpuid instruction takes the input in the
%eax register and gives output in four registers: %eax, %ebx, %ecx, %edx. CPUID
gets it’s input from “option”in the eax register, as cpuid expects it to. The a, b, c,
and d constraints are used to collect the results
int main() {
int var1,var2,var3,var4,option;
asm ("cpuid"
: "=a" (var1),
"=b" (var2),
"=c" (var3),
"=d" (var4)
: "a" (option));
B. memory constraints
xxi
Typically atomic_inc() dec functions use this in the kernel as we want a memory-
to-memory operations in this case. Example follows.
myLock;
asm __volatile__(
:"=m" (myLock)
:"m" (myLock));
xxii
9 . 3 S E M I - I L L E G E L C I N T E R V I E W Q U E S T I O N S / C FA Q
2. int i=1;
printf(“%d%d”,i++,++i);
xxiii
points to a location containing multiple values, what
you return is a single value, pointer.
xxiv
A B O U T T H E A U T H O R
xxv