Lecture02 C Basics
Lecture02 C Basics
Notes include some materials provided by Andrew Case, Jinyang Li, Mohamed Zahran, and the textbooks.
Reading materials Chapters 1-6 in The C Programming Language, by B.W. Kernighan and Dennis M. Ritchie
Section 1.2 and Aside on page 4 in Computer Systems, A Programmer’s Perspective by R.E. Bryant and D.R. O’Hallaron
Contents
1 Intro to C and Unix/Linux 3
1.1 Why C? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 C vs. Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Software Development Process (Not Only in C) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Basic Unix Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Basics of C 9
3.1 Data types (primitive types) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Using printf to print different data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Control flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 Variable scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.6 Header files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Pointers 15
4.1 Pointer Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Pointers and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Pointers and static arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Casting pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.5 Pointers, functions and arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Strings in C 22
5.1 Length of a string . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Copying a string . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Concatenating/Appending strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 string.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
6 Multidimensional Arrays 24
7 Structures 26
7.1 struct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.2 Pointers to structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.3 Structures and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.4 Creating simple data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.5 typedef - not really a structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8 Memory Management 29
8.1 malloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8.2 free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8.3 Revised linked list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
10 Summary 32
2
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
In 1972 Dennis Ritchie at Bell Labs writes C and in 1978 the publication of The C Programming Language by Kernighan & Ritchie
caused a revolution in the computing world.
1.1 Why C?
Mainly because it produces code that runs nearly as fast as code written in assembly language, but is a high level programming language.
Some examples of the use of C:
About C
• Hardware independent
3
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
• Programs portable to most computers (in source code format, not executable format)
• Case-sensitive
4
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
When you log in to a computer system remotely, very often this is the only way of interacting with the computer.
Whenever you are logged into a Unix/Linux system, you have a unique, special directory called your current or present working
directory (PWD for short). The present working directory is where you ”are” in the file system at the moment, i.e., the directory that
you feel like you are currently ”in”. This is more intuitive when you are working with the command line, but it carries over to the
graphical user interface (GUI) as well.
Many commands operate on the PWD if you do not give them an argument. We say that the PWD is their default argument. (Defaults
are fall-backs – what happens when you don’t do something.) For example, when you enter the ”ls” command (list files) without an
argument, it displays the contents of your PWD. The dot ”.” is the name of the PWD:
ls .
and
ls
both display the files in the PWD, first one because the name of the directory is provided, second one because it is the default behavior
for ls.
When you first login, your present working directory is set to your home directory. Your home directory is a directory created for you
by the system administrator. It is the top level of the part of the file system that belongs to you. You can create files and directories
within your home directory, but usually nowhere else. Usually your home directory’s name is the same as your username.
In Unix/Linux, files are referred to by pathnames. A pathname is like a generalization of the file’s name. A pathname specifies the
location of the file in the file system by its position in the hierarchy. In Unix/Linux, a forward slash ”/” separates successive directory
5
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
man command display a manual page (or simply help) for the command (this is the easiest way to learn about options to the commands
that you know and about new commands)
pwd print the name of the present working directory
6
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
1. Write the code in your favorite text editor (make sure it is a text editor, not rich text or any other document format that allows text
formatting). For example: Emacs, gedit, geany, vi, sublime, .... Save and name your file. The C source code files usually have .c
extension. Their name can be anything (should be something that is related to what the program does).
2. Run the ”compiler” to convert the program to executable code (or binary):
gcc -Wall -g -o programName programSource.c
3. If any errors occur during step 2, go back to step 1, fix the errors and recompile. If any warnings occur during step 2, read through
them and decide if they need fixing.
4. Run the program: this is done using the following syntax
./programName
(Given your environment setup, you may be able to run the program by typing only the name of the program without ./ )
5. If errors occur during running of your program, go back to step 1 and fix them.
produces another text file called hello.i. This file contains over 800 lines. That’s because the entire file stdio.h is ”pasted” at the
beginning. If we scroll all the way down, we find our own program at the bottom. It does not include the comments though.
During the compilation, the compiler reads the code and makes sure that it is syntactically correct. The compilation itself can be viewed
as two steps: 1) create the assembly code and then 2) create the executable code. We can tell gcc to stop right after the first stage and
look at the assembly code by using the following options:
gcc -S hello.c
By default, the output is written to a file with .s extension. hello.s file is fairly short. It contains the assembly instructions. It may
look like this:
.file "hello.c"
.section .rodata
.LC0:
.string "hello world"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $.LC0, %edi
call puts
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7-3)"
.section .note.GNU-stack,"",@progbits
(NOTE: In just a few weeks you will be able to read and understand the above code.)
8
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
The second part of compilation converts the above assembly code into the binary code. To tell gcc to stop after that stage, we can use
-c option. Running
gcc -c hello.c
produces hello.o file (this is called an object file). It is no longer a text file, so trying to look at it will produce gibberish (and many
editors will refuse to display it). It contains the binary version of our hello.c program.
The final step is linking. At this point the binary code for our hello.c program exists, but it needs to be linked together with the
code from C libraries (this is where printf lives, for example). Our original gcc command runs all of the steps above (including the
linking) and produces a runnable/executable binary file.
3 Basics of C
If you know any other programming language you should be able to read simple C programs and understand what they do. In this section
we will quickly review the basics and look at a few code examples.
9
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
char c = ’a’;
short s = 32767;
int i = 2147483647;
long int l = 2147483647;
long long int ll = 9223372036854775806;
float f = 1.0;
double d = 1.0;
int *iptr = NULL;
char* string = "test string";
See types.c.
The syntax of the if, if ... else ... and switch statements should be familiar from other programming languages. In C the
expression in the switch statement has to have an integral value (int, char, or anything else that evaluates to an integer).
if (...) {
...
}
------------------------
if (...)
...
else if
...
else
...
------------------------
switch (c) {
case ’a’ :
x = ’A’;
break;
case ’b’ :
x = ’B’;
break;
...
default:
break;
}
10
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
Repetition/loops
C also has three different loops: for loop, while loop, and do ... while loop. One thing to remember about the for loop is that
the control variable has to be declared before the loop.
int i;
for (i = 0; i < 10; i++) {
print("i=%d\n",i);
}
------------------------
int i = 10;
while (i > 0) {
print("i=%d\n",i);
i--;
}
------------------------
int i = 10
do {
print("i=%d\n",i);
i--;
} while (i > 0)
Jump statements
C provides break and continue statements. Both of them can be used with loops. break can also be used in a switch statement.
break immediately jump to the next statement after the loop or after the switch statement
continue immediately jump to the next iteration of the loop
C also has another way of ”jumping” in the code. It is the goto statment. One can specify labels in the code as in the example below and
the goto statement immediately moves to the line following the label.
for (...) {
for (...) {
...
if (disater)
goto LABEL_1;
}
}
LABEL_1:
...
Do not use goto statements in your code. They are considered to be bad programming style at this point and result in code that
is hard to debug and trace.
3.4 Functions
The file that contains main function can also contain other functions.
In C functions cannot be defined within other functions. A function has to be declared before it is used.
1 #include <stdio.h>
2 int modulo(int x, int divisor); // F u n c t i o n declaration
3
11
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
4 int main()
5{
6 int x = 10;
7 int divisor = 2;
8 printf("%d mod %d = %d\n", x, divisor,
9 modulo(x,divisor, 0 ));
10 divisor = 3;
11 printf("%d mod %d = %d\n", x, divisor,
12 modulo(x,divisor, 1));
13 return 0;
14 }
15
16 / * c o m p u t e s t h e r e m a i n d e r f r o m d i v i d i n g x b y d i v i s o r * /
17 int modulo(int x, int divisor, int rec)
18 {
19 if ( !rec ) {
20 /* i t e r a t i v e v e r s i o n */
21 while (x >= divisor) {
22 x -= divisor;
23 }
24 return x;
25 }
26 else {
27 /* r e c u r s i v e v e r s i o n */
28 if (x < divisor)
29 return x;
30 else
31 return modulo(x - divisor, divisor, rec);
32 }
33 }
We can also create functions in multiple files. If functionA uses functionB which is defined in another file, then one of the
following has to be true:
Scope rules:
• The scope of a variable/function name is the part of the program within which the name can be used
• A global (external) variable or function’s scope lasts from where it is declared to end of the file
• A local (automatic) variable’s scope is within the function or block
int x;
void foo(int y) {
y++;
x++; /* x is a c c e s s i b l e b e c a u s e it is in global scope */
}
12
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
void bar() {
y = 1; / * w r o n g - t h e r e i s n o y i n t h i s s c o p e * /
x = 1; / * x i s a c c e s s i b l e b e c a u s e i t i s i n g l o b a l s c o p e * /
}
void foo(int y) {
if (y > 10) {
int i;
for (i = 0; i < y; i++) {
...
}
}
else {
y++;
// ’i ’ is out of scope here
}
}
The following example uses an external/global variable and a function both defined in a file different than main function.
add.c
--------------------
int counter = 0;
void add_one() {
counter++;
}
program.c
--------------------
#include <stdio.h>
int main() {
printf("counter is %d\n", counter);
add_one();
add_one();
printf("counter is %d\n", counter);
13
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
return 0;
}
WARNING: Avoid global/external variables (constants are fine)! They are considered bad programming style. Use of globals results in
programs that are hard to debug and trace: just imagine how many different functions can possibly be modifying the value of a global
variable.
static keyword The keyword static limits a scope of a global variable to within the rest of the source file in which it is declared.
If we add the keyword static to the declaration of counter in add.c above, the counter variable will no longer be accessible
from program.c.
The other use of keyword static allows the variable (global or local) to preserve its value between function calls. If the variable is
declared with keyword static the initialization is performed only once.
add_2.c
--------------------
#include <stdio.h>
void add_one() {
static int counter = 0;
counter++;
printf("counter is %d\n", counter);
}
program_2.c
--------------------
void add_one();
int main() {
add_one();
add_one();
return 0;
}
add.c
--------------------
14
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
int counter = 0;
void add_one() {
counter++;
}
add.h
--------------------
extern int counter;
void add_one();
program.c
--------------------
#include <stdio.h>
#include "add.h"
int main() {
printf("counter is %d\n", counter);
add_one();
add_one();
printf("counter is %d\n", counter);
return 0;
}
Notice that the file program.c no longer needs to repeat all the declarations, it simply includes the file add.h. The name of that file
is included in quotes - this tells gcc to look for that file in a local directory, rather than in the locations of standard header files on your
machine.
4 Pointers
The program’s memory can be viewed as an array of bytes:
We can use pointers to store memory addresses and access this memory array.
type2 * varName2;
Note that the spacing around the star does not matter, but there has to be one star per name, if more than one pointer variable is declared
in the same statement.
15
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
The value of of a pointer variable has to be a memory address of another variable (could be a pointer) or null to indicate that the
pointer does not point to anything. To obtain a memory address of a variable we use the & operator: given a variable x, &x is the memory
address of x.
Example:
int x = 5; // r e g u l a r variable
int y = 17; // r e g u l a r variable
We can use a pointer variable to access the value stored at the address that the pointer itself stores (I know, this sounds complicated). To
do so we need to use a dereference/indirection operator, which is the * again.
Example continued:
*p = 7; // c h a n g e s the value stored in x to 7;
* has two uses: 1) in pointer declarations, 2) as a dereference operator to access the value of the variable that the pointer points to
16
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
Attempt #2: Use pointers as parameters to the function. This way main() and swap() deal with the same memory locations, even though
they have their own copies of the pointers.
1
2 #include <stdio.h>
3
4 void swap(int *x, int *y);
5
6 int main()
7{
8 int x = 1;
9 int y = 2;
10 printf("x=%d, y=%d\n", x, y);
11 swap(&x, &y);
12 printf("x=%d, y=%d\n", x, y);
13 return 0;
14 }
15
16 / / s w a p s v a l u e s o f x a n d y
17 void swap(int *x, int *y)
18 {
19 int tmp;
20 tmp = *x;
21 *x = *y;
22 *y = tmp;
23 }
24
This version of function swap() takes as parameters memory addresses of x and y that are defined in main(). This way it can
manipulate the x and y variables even from within the function: there is only one copy of x and one copy of y in the program. On line
10 of the program, when the function is called, we pass it the &x (address of x) and &y (address of y).
Something to Think About:
DNHI: Would this alternative version of the swap function work? Why? or Why not? Draw what happens in memory when this
swap function is used.
void swap(int *x, int *y)
{
int *tmp;
tmp = x;
x = y;
y = tmp;
}
Note: you can run the code to see its output, but you should also be able to figure this out without running the code, just by analyzing
what happens in memory.
17
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
creates an array a of 10 integer elements. (We will talk about an alternative way of creating arrays later in the course.)
To populate the array with values, we can use a loop that assigns a value to each location:
for (i = 0; i < size; i++) {
a[i] = i;
}
We can then print array using its name and subscript notation as follows
for (i = 0; i < size; i++) {
printf("%d ", a[i]);
}
An alternative approach to accessing array locations is to set a pointer to point to the first element in the array
int * pa = &a[0];
and then use that pointer to print array using pointer to the array and subscript notation (!!! you can use a pointer with the subscript
operator)
for (i = 0; i < size; i++) {
printf("%d ", pa[i]);
}
The array name itself is really a constant pointer to the first element in the array (constant pointer means that its value, i.e. the memory
address it stores, cannot be changed). So we can set a pointer pa1 to be equal to a instead.
int * pa1 = a;
for (i = 0; i < size; i++) {
printf("%d ", pa1[i]);
}
Yet another way of traversing the array is to advance a pointer through the array by incrementing what it points to (!!! you can
add/subtract values to/from pointers)
int * pa2 = a;
for (i = 0; i < size; i++) {
printf("%d ", *pa2);
pa2 = pa2+1; / / t h i s a d d s e n o u g h b y t e s t o p a 2 t o a d v a n c e i t t o t h e n e x t a r r a y l o c a t i o n
}
And if we wish to traverse an array backwards, we set the pointer to the last location in the array and keep on moving it towards the front
int * pa3 = a + size-1;
for (i = 0; i < size; i++) {
printf("%d ", *pa3);
pa3 = pa3-1; / / t h i s s u b t r a c t s e n o u g h b y t e s f r o m p a 3 t o m o v e i t t o t h e
// p r e v i o u s array location
}
18
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
• Array name is a constant pointer to the first element in the array. The following two statements are equivalent:
pa = & a[0];
pa = a;
• An array name is a constant pointer so it cannot be modified. The following statements are all illegal:
a++;
a = a+5;
a = pa;
• We can perform arithmetic on pointers. Adding a value K to a pointer advaces the memory location that is stored in the pointer by
K times the number of bytes used for storing variables of the pointer’s type. For example, if type int uses 4 bytes, then
pa=pa+3;
advances pa by 12 bytes.
Subtracting a value from a pointer works in the same way.
• The following expressions are all equivalent (assuming pa was assigned value of a and i is an integer between 0 and a declared
size of array a) and point to (contain the address of) the i’th element of array a:
& a[i]
a + i
& pa[i]
pa + i
• The following expressions are all equivalent (assuming pa was assigned value of a and i is an integer between 0 and a declared
size of array a) and are values of the i’th element of array a:
a[i]
*(a+i)
pa[i]
*(pa+i)
19
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
1 casting.c:
2 -------------------------
3 #include <stdio.h>
4
5 int main() {
6 int x[] = {65, 66, 67, 68, 69};
7 char *p = (char * ) x;
8
9 int i;
10 for (i = 0; i < 5; i ++ ) {
11 printf ( "%c ", *(p+i) );
12 }
13 printf("\n");
14
15 return 0;
16 }
Can you figure out what the code will print right now?
The function can be used to compute the sum of all of the elements of the array, or any subsequence of the elements within the array.
1 #include <stdio.h>
2
3 int sumElements (int *a, int size );
4
5 int main () {
6 const int size = 10;
7 int a[size];
8 int i;
9 for (i = 0; i < size; i++) {
10 a[i] = i;
11 }
12 printf( "sum of all elements in a is %d\n",
13 sumElements ( a, size ) );
14 printf( "sum of first 5 elements in a is %d\n",
15 sumElements ( a, 5 ) );
16 printf( "sum of last 5 elements in a is %d\n",
17 sumElements ( a+5, 5 ) );
18 return 0;
19 }
20
21 // adds size many e l e m e n t s of array a
22 int sumElements ( int *a, int size ){
20
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
23 int sum = 0;
24 int i;
25 for (i = 0; i < size; i++ ) {
26 sum = sum + a[i];
27 }
28 return sum;
29 }
See sum.c and sum 2.c for this code and its version that passes int a[] as a parameter to the function.
Something to Think About:
DNHI: Modify the code above to check your answers to the following questions:
1) What happens if sumElements() function modifies the content of array a? Do changes affect the array declared in main().
2) What happens if the size paramter passed to the sumElements() function exceeds the size of the declared array a? Try it by
exceeding the size by small numbers (1, 2, 5, 10) and large numbers (1000).
Make sure you understand what you observed.
Generic swap
We looked at a swap() function that exchanged the values of two integer variables.
How about swapping values of variables of any type - one function that can swap values of type int, long, char, double, float, etc. Some
of you may think of generics, but C does not have generics (or even templates). But we still can write a generic swap() function.
The following function uses casts the pointers passed to it to (void *). void * is a pointer to anything. It then swaps item1 and
item2 one byte at a time completely ignoring what those bytes represent.
swap_generic.c (partial):
--------------------------
/* *
* swaps values of item1 and item2
* item1 - p o i n t e r to a v a r i a b l e / data s t r u c t u r e
* item2 - p o i n t e r to a v a r i a b l e / a n o t h e r data s t r u c t u r e
* of the same type as item1
* size - number of bytes used for s t o r y i n g item1 / item2
*
* preconditions : the item1 and item2 use the same number
* of bytes , o t h e r w i s e the r e s u l t s are u n s p e c i f i e d
*/
void swap(void *item1, void *item2, int size) {
int i;
for (i = 0; i < size; i++) {
// for every byte : swap the byte
tmp = x[i];
x[i] = y[i];
y[i] = tmp;
}
}
21
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
5 Strings in C
Two strings walk into a bar and sit down. The bartender says, ”So what’ll it be?”
The first string says, ”I think I’ll have a beer quag fulk boorg jdkˆCjfdLk jk3s d#f67howe˜owmc63ˆDz x.xvcu”
”Please excuse my friend,” the second string says, ”He isn’t null-terminated.”
A string in C is an array of characters terminated by a null character (\0 ). Every function that performs operations on strings uses
that null character. A missing null character in a string is also the source of most errors (and jokes) related to strings in C.
Whenever we have a double quoted string in a program it is stored as a string constant terminated by a null character. For example:
printf("Hello World!\n");
Writing
char h[] = "hello";
creates and array of 6 characters (yes, 5 letters + a null character). It is equivalent to writing
char h[6]= "hello";
Note that the following lines will compile as well and sometimes even run without obvious problems (at first)
char h[5] = "hello";
does not create a new array that h points to. The string constant ”hello” is stored in the same area in memory where globals are stored
and h is just a pointer to that string constant. Attempting to modify such string constant produces undefined results. (Undefined result
implies that the result is unpredictable and depends on the compiler and system used.) For example,
char *s = "hello";
s[0] = ’H’;
does not produce a compiler error, but results in a segmentation fault error during runtime (at least on our system).
The following code has two different strings: h1 is an array of characters, h2 is a constant string. It also has a global, y, and local, x,
variables of type int. The memory locations of all four of them are printed out.
1 #include <stdio.h>
2
3 const int y = 5;
4
5 int main () {
6 int x = 17;
7 char h1[] = "hello";
8 char *h2 = "hello";
9
10 printf("h1 is at %p\n", h1);
11 printf("h2 is at %p\n", h2);
22
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
12 printf("\n");
13 printf(" x is at %p\n", &x);
14 printf(" y is at %p\n", &y);
15
16 return 0;
17 }
h2 is at 0x4006ac
x is at 0x7fffeb30725c
y is at 0x4006a8
This clearly shows that h2 and y are in a different area of memory than h1 and x.
This function looks at every character in string str until it encounters null character.
char *w = h;
Well, we can access the string ”hello” using both *h and *w. But there is only one string in memory. In order to create a true copy/du-
plicate, we need to do some more work.
void copy(char *src, char *dst) {
23
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
src++;
}
*dst = ’\0’;
}
This code assumes that the string passed as destination, dst, has enough memory associated with it to store all the characters stored in
src. To use this copy function we have to create an array in memory for the second string:
char *h = "hello";
char w[100];
copy(h, w);
char *w;
copy(h, w);
2) a function that appends at most n characters or until null is encountered from the src string to dst string
void appendN (char * src, char * dst, int n ) ;
5.4 string.h
The string.h header file contains declarations of many useful functions for working with null terminated string (all of the one’s
mentioned above, among others). You can learn about their names by using the man pages:
man string.h
and then read about specific functions by using man pages for those specific functions, for example:
man strlen
6 Multidimensional Arrays
In C multidimensional arrays are stored in consecutive memory locations.
We can create a 2D array using the following syntax:
24
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
#define ROWS 3
#define COLS 2
We can now traverse this matrix using a traditional for loop. The following code adds all the entries in the above 2D matrix:
for (i = 0; i < ROWS; i ++) {
for (j = 0; j < COLS; j++) {
sum += matrix[i][j];
}
}
But we can also use a pointer to int and treat this matrix as a one dimensional array that is traversed using a single for loop
or a doubly nested for loop
for (i = 0; i < ROWS; i ++) {
for (j = 0; j < COLS; j++) {
sum += p[j*ROWS+i];
}
}
Creating a variable that stores a memory address of another variable that stores a memory address does not seem significant in itself.
But in view of the relationship between pointers, arrays and strings the pointers to pointers are important. They are a way of creating a
different type of multidimensional arrays.
The following array consists of three pointers to strings and is populated by three strings:
char* names[3] = { "alice", "bob", "clark" };
Each element of this one dimensional array of pointers contains a separate string. The following definition may seem to be equivalent to
the one above, but it is not!
char** names = { "alice", "bob", "clark" }; / / I N C O R R E C T
Use of [] operator guarantees that the memory is allocated properly. The above definition does not allocate memory for all the strings.
But we can use a double pointer to char to access the elements of the array of characters:
char ** p_names = names;
for (i = 0; i < 3; i++ ) {
printf ("%s \t %s \n", names[i], p_names[i] );
}
An example of another multidimensional array is the argv array that can be passed to main() function:
int main(int argc, char *argv[])
// alternatiely , char ** argv
{
int i;
// prints the number of c o m m a n d line arguments
25
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
7 Structures
7.1 struct
Java and C++ have classes and objects. C has structures - we can think of structures as classes without methods, just a collection of
data items grouped into a single thing. (Warning: the structures in C++ are not exactly the same as structures in C.)
Example of a structure definition (notice the keyword struct):
struct point {
float x;
float y;
}
This gives us a definition for a new type called point with two members x and y. In order to declare a variable of type point we write:
struct point p1;
The keyword struct is needed in the declaration of the variable. It should not be omitted.
To access individual coordinates of the point p1 we use the dot operator:
p1.x = 3.5;
p1.y = 8.9;
This is allowed as long as the structure is simple and there is no ambiguity in which value is supposed to be assigned to which structure
member.
Structure definitions can contain any number of members (variables) of any type. This includes pointers. For example:
struct student {
char* id;
char* name;
float gpa;
int num_of_credits;
}
And, of course, there can be a structure whose members are other types of structures. For example a rectangle defined by its two
diagonally opposite corners:
struct rectangle {
struct point c1;
struct point c2;
}
26
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
Using the standard pointer dereference operator we can access the values of the members of p2 using ppoint
(*ppoint).x
(*ppoint).y
The parenthesis are needed since the . operator has higher precedence than the * operator.
But since pointers to structures are very frequently used, there is a shorthand notation for the above (well, you be the judge how shorthand
it is):
ppoint->x
ppoint->y
The arrow operator ->, is a dash followed by the greater than sign.
For example: assume the rectangle structure defined in the previous section. To access the coordinates of the two points using a pointer
to a rectangle, we may write:
struct rectangle r = { {0,0}, {3,4} };
struct rectangle *pr = &r
printf( t w o corners are (%f,%f) and (%f,%f) ,
(pr->c1).x, (pr->c1).y, (pr->c2).x, (pr->c2).y );
We first use the pointer pr to access a specific point/corner and then use that to access its x and y coordinates.
When you use multiple operators, make sure that it is clear what happens in what order. C has its precedence rules and has no trouble
parsing things like
*p->str++
but most humans reading this will have a hard time deciding what is happening. (p is a pointer to a structure, so -> accesses its member.
str is a member of p and is a pointer itself and * dereferences that pointer. Finally, ++ is applied to the value after dereferencing str.)
It would have been much clearer if the above was written as
(*(p->str))++
27
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
and a slightly modified function adds the two points and stores the result in p1 instead of returning it:
void addpoints( struct point *p1, struct point *p2 ) {
p1->x += p2->x;
p1->y += p2->y;
}
The node can store a word/string and a pointer to the next node in the list. We want to make sure that the next pointer is initialized to
null, unless it is set to a specific node.
A function that adds a node to an existing list needs to take as a parameter a pointer to the node and a pointer to the head of the list.
void addFront(struct node * n, struct node ** head) {
n->next = (*head);
(*head) = n;
}
1 #include <stdio.h>
2 struct node {
3 char * word;
4 struct node * next;
5 };
6
7 void addFront(struct node * n, struct node ** head);
8
9 int main() {
10 struct node * head = 0; / / h e a d p o i n t e r f o r t h e l i s t
11
12 struct node n1 = { "hello", 0};
13 struct node n2 = { "cso201", 0};
14 struct node n3 = { "students", 0} ;
15
16 addFront ( &n1, &head);
17 addFront ( &n2, &head);
28
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
creates a new type called int_pointer that is equivalent to int * (useful for those who cannot or do not like to keep track of the
stars).
The typedef tends to be often used with the structures.
Rewriting the node structure definition as follows:
typedef struct {
char * word;
struct node * next;
} node;
allows us to use node as the type name, rather than struct node:
node n;
rather than
struct node n;
8 Memory Management
So far we have been using static memory. This limits the programs that we can write. For example, we needed to decide up front (at
the time of writing the program) how many nodes there are going to be in our linked list. We can write much more interesting programs
once we learn how to manage the memory: request it when we need it and release it when we no longer need it
(the second part is a crucial one in a language like C).
29
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
8.1 malloc
The memory allocated dynamically (while the program is running) is in the area of memory reffered to as heap. In Java/C++ we use
the operator new to allocate memory dynamically. In C we use a function called malloc (and its other flavors). The function takes as
a parameter an integer that indicates how many bytes of memory we want and returns a pointer to a newly allocated block of memory of
that size. Occasionally (for one reason or another) malloc is not able to give us what we want. In that case, the call to malloc returns
0 or NULL.
Here are some examples of using malloc:
// a l l o c a t e memory big enough for an int
int * num = (int *) malloc (sizeof(int) );
Before using any of the above allocated memory, we should always check if the memory was actually allocated:
if (num == 0){ / / o r ( n u m = = N U L L )
// ERROR : the memory was not a l l o c a t e d
}
8.2 free
C does not provide garbage collection. This means that if we allocat a chunk of memory dynamically (i.e., using malloc), we need to
release it when it is no longer needed. If releasing of memory is not done diligently, the programs have memory leaks which decreases
their own performance and, eventually, decreases the performance of the entire machine on which the program runs.
Here are some examples of using free:
free(num);
free(words);
free(n);
In all of the above cases, the pointer passed to free must point to a memory block that was previously allocated with malloc. If that
is not the case, or if that memory block has been previously freed, the behavior of free is undefined. You may get a segmentation fault,
but this is not guaranteed.
31
CSCI-UA 201 Joanna Klukowska
Lecture 2: Introduction to C Programming Language joannakl@cs.nyu.edu
The above code will read the input one character at a time and print it back to standard output. The loop ends when the end of file (EOF)
is encountered. One can trigger the end of file character by pressing Ctrl+D in the terminal.
When the input is formatted, scanf() function is much more useful because it can perform a lot of parsing and converting to appropriate
types.
double sum,v;
while (scanf("%lf",&v) == 1)
printf("%.2f\n", sum += v);
The above loop keeps reading in floating point numbers. It accumulates their sum and displays it to standard output.
int day, year;
char month[100];
10 Summary
This concludes our discussion about C specifically as a programming language. We will use it for the rest of the semester and you will
learn other features of the language. The Unix/Linux manual pages are a great resource for documentation. Running, for example, man
malloc provides the documentation for the malloc f unction that is the most specific to the system on which the program is developed.
32