Files Notes
Files Notes
https://www.cse.unsw.edu.au/~cs1521/223T12T3/
The virtual machine interface can stay the same across different hardware.
much easier for user to write portable code which works on different hardware
with memory access restrictions so user code can only memory allocated to it
user code can make requests to operating system called system calls
a system call transfers execution to operating system code in privileged mode
at completion of request operating system (usually) returns execution back to user code in non-privileged mode
mipsy system calls are designed for students writing tiny MIPS programs without library functions
e.g system call 1 - print an integer, system call 5 read an integer
like mipsy every Linux system call has a number, e.g write bytes to a file is system call 2
Linux provides 400+ system calls
$ cat /usr/include/x86_64-linux-gnu/asm/unistd_64.h
...
#define __NR_read 0
#define __NR_write 1
#define __NR_open 2
#define __NR_close 3
#define __NR_stat 4
...
#define __NR_pidfd_getfd 438
#define __NR_faccessat2 439
#define __NR_process_madvise 440
On Unix-like systems there are C library functions corresponding to each system call,
e.g. open, read, write, close
the syscall function is not used in normal coding
These functions are not portable
C used on many non-Unix operating systems with different system calls
POSIX standardizes a few of these functions
some non-Unix systems provide implementations of these functions
but better to use functions from standard C library, available everywhere
e.g fopen, fgets, fputc from stdio.h
on Unix-like systems these will call open, read, write
on other platforms, will call other low-level functions
but sometimes we need to use lower level non-portable functions
e.g. a database implementation need more control over I/O operations
Unix-like (POSIX) systems add some extra file-system-related C types in these include files:
#include <sys/types.h>
#include <sys/stat.h>
convenient function perror() looks at errno and prints message with reason
or strerror() converts errno integer value to string describing reason for error
after this, the operating system reclaims the space used by the files
stdio.h - fclose()
fputs/fgets, fscanf/fprintf can not be used for binary data because may contain zero bytes
can use text (ASCII/Unicode) but can not use to e.g. read a jpg
scanf/fscanf/sscanf often avoided in serious code
but fine while learning to code
much slower than previous version which copies 4096 bytes at a time
$ clang -O3 cp_libc.c -o cp_libc
$ time ./cp_libc random_file random_file_copy
real 0m0.008s
user 0m0.001s
sys 0m0.007s
how?
next 4095 fgetc() calls return a byte from (input buffer ) and do not to call read()
and so on
first 4095 fputc() calls put bytes in an array, the (output buffer )
4096th fputc() calls write() for all 4096 bytes in the output buffer
and so on
output buffer * emptied by exit or main returning
program can explicitly force empty of output buffer with fflush() call
for example:
fseek(stream, 42, SEEK_SET); // move to after 42nd byte in file
fseek(stream, 58, SEEK_CUR); // 58 bytes forward from current position
fseek(stream, -7, SEEK_CUR); // 7 bytes backward from current position
fseek(stream, -1, SEEK_END); // move to before last byte in file
Using fseek to read the last byte then the first byte of a file
almost all the 16Tb are zeros which the file system doesn’t actually store
file systems manage persistent stored data e.g. on magnetic disk or SSD
On Unix-like systems:
a file is sequence (array) of zero or more bytes.
no meaning for bytes associated with file
file metadata doesn’t record that it is e.g. ASCII, MP4, JPG, …
Unix-like files are just bytes
a directory is an object containing zero or more files or directories.
file systems maintain metadata for files & directories, e.g. permissions
. current directory
.. parent directory
Unix/Linux Pathnames
absolute pathnames start with a leading / and give full path from root
e.g. /usr/include/stdio.h, /cs1521/public_html/
files
directories (folders)
system information
inter-process communication
network
File Metadata
unix-like file systems effectively have a large array of inodes containg metadata
ls -i prints inode-numbers
$ ls -i file.c
109988273 file.c
$
note there is usually more than one file systems mounted on a Unix-like system
each file-systems has a separate set of inode-numbers
files on different file-systems could have the same inode-number
File system links allow multiple paths to access the same file
Hard links
multiple names referencing the same file (inode)
the two entries must be on the same filesystem
all hard links to a file have equal status
file destroyed when last hard link removed
can not create a (extra) hard link to directories
Symbolic links (symlinks)
point to another path name
acessing the symlink (by default) accesses the file being pointed to
symbolic link can point to a directory
symbolic link can point to a pathname on another filesystems
symbolic links don’t have permissions (just a pointer)
inode number
type (file, directory, symbolic link, device)
size of file in bytes (if it is a file)
permissions (read, write, execute)
times of last access/modification/status-change
struct stat {
dev_t st_dev; /* ID of device containing file */
ino_t st_ino; /* Inode number */
mode_t st_mode; /* File type and mode */
nlink_t st_nlink; /* Number of hard links */
uid_t st_uid; /* User ID of owner */
gid_t st_gid; /* Group ID of owner */
dev_t st_rdev; /* Device ID (if special file) */
off_t st_size; /* Total size, in bytes */
blksize_t st_blksize; /* Block size for filesystem I/O */
blkcnt_t st_blocks; /* Number of 512B blocks allocated */
struct timespec st_atim; /* Time of last access */
struct timespec st_mtim; /* Time of last modification */
struct timespec st_ctim; /* Time of last status change */
};
Using stat
struct stat s;
if (stat(pathname, &s) != 0) {
perror(pathname);
exit(1);
}
printf("ino = %10ld # Inode number\n", s.st_ino);
printf("mode = %10o # File mode \n", s.st_mode);
printf("nlink =%10ld # Link count \n", (long)s.st_nlink);
printf("uid = %10u # Owner uid\n", s.st_uid);
printf("gid = %10u # Group gid\n", s.st_gid);
printf("size = %10ld # File size (bytes)\n", (long)s.st_size);
printf("mtime =%10ld # Modification time (seconds since 1/1/70)\n",
(long)s.st_mtime);
source code for stat.c
mkdir
int mkdir(const char *pathname, mode_t mode)
. is a reference to itself
for example:
mkdir("newDir", 0755);
#include <stdio.h>
#include <sys/stat.h>
// create the directories specified as command-line arguments
int main(int argc, char *argv[]) {
for (int arg = 1; arg < argc; arg++) {
if (mkdir(argv[arg], 0755) != 0) {
perror(argv[arg]); // prints why the mkdir failed
return 1;
}
}
return 0;
}
source code for mkdir.c
file permissions
removing files
$ dcc rm.c
$ ./a.out rm.c
$ ls -l rm.c
ls: cannot access 'rm.c': No such file or directory
renaming a file
// rename the specified file
int main(int argc, char *argv[]) {
if (argc != 3) {
fprintf(stderr, "Usage: %s <old-filename> <new-filename>\n",
argv[0]);
return 1;
}
char *old_filename = argv[1];
char *new_filename = argv[2];
if (rename(old_filename, new_filename) != 0) {
fprintf(stderr, "%s rename %s %s:", argv[0], old_filename,
new_filename);
perror("");
return 1;
}
return 0;
}
source code for rename.c
char new_pathname[256];
snprintf(new_pathname, sizeof new_pathname,
"hello_%d.txt", i);
printf("Creating a link %s -> %s\n",
new_pathname, pathname);
if (link(pathname, new_pathname) != 0) {
perror(pathname);
return 1;
}
}
return 0;
}
source code for many_links.c
#include <sys/types.h>
#include <dirent.h>