Underlying Technologies II - Semaphores, Sockets, Forks
From csi702
Contents |
1 Race Conditions
A race condition is one where the output of a system depends on the order of a set of events. This term is usually used to describe unexpected and critical problems.
Example code from lecture slides
#include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define NUM _THREADS 5 int counter = 0; void *test_mutex(void *tid) { int thread_id; int rnd; thread_id = (int)tid; rnd = rand() % 5; sleep(rnd); counter = counter + thread_id; printf("Hello World! It's me, thread #%d counter = %d!\n", thread_id, counter); pthread_exit(NULL); } int main(int argc, char *argv[]) { pthread_t thread1[NUM_THREADS]; int t; int ec; // loop over thenumber of threads for(t=0;t<NUM_THREADS;t++) { printf("In main: creating thread %d\n", t); ec = pthread_create(&thread1[t], NULL, test_mutex, (void *)t ); } printf( "the results is %d \n",counter); pthread_exit(NULL); }
Possible Result:
In main: creating thread 0 In main: creating thread 1 In main: creating thread 2 In main: creating thread 3 In main: creating thread 4 the results is 0 Hello World! It's me, thread #0 counter = 0! Hello World! It's me, thread #1 counter = 1! Hello World! It's me, thread #4 counter = 5! Hello World! It's me, thread #2 counter = 7! Hello World! It's me, thread #3 counter = 10!
2 Reading/Writing From Shared Memory
Memory from the main program may be changed during the execution of a thread. This could cause a simple program to produce unpredictable results if you don't take care to fix this issue.
[insert code example]
3 Atomic Operations
Atomic operations are designed so that only one processor can change a value at any given time. This is similar to using a Mutex value, but less complex and more computationally efficient. Finding the glib library and required flags is a bit tricky, but the on-line documentation recommends using: gcc 'pkg-config --cflags --libs glib-2.0' lpthread atomic
NOTE: We will not be using glib atomic operations on any homework, and sample code was not included.
4 Deadlocks
Deadlocks occur when several actions are waiting for the other to finish. Since both are waiting for the other to complete, neither can finish themselves. For example, codes stop if all threads are trying to lock the same Mutex at the same time.
NASA Sojourner Mars Rover was a famous example of deadlock condition.[1]
Example code
#include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define NUM_THREADS 2 pthread_mutex_t my_mutex; void *test_mutex(void *tid) { int *tt; int thread_id, rnd; tt = (int *) tid; thread_id = *tt; // print the initial message printf("Hello World! It's me, thread #%d \n", thread_id); if (thread_id == 0) { pthread_mutex_lock(&my_mutex); } else { pthread_mutex_unlock(&my_mutex); } printf("Thread #%d has completed \n", thread_id); pthread_exit(NULL); } int main(int argc, char *argv[]) { pthread_t thread1[NUM_THREADS]; int t; int ec; pthread_mutex_init(&my_mutex, NULL); pthread_mutex_lock(&my_mutex); for(t=0;t<NUM_THREADS;t++) { ec = pthread_create(&thread1[t], NULL, test_mutex, (void *) &t ); // join the thread - this statement waits // until the thread has exited pthread_join(thread1[t], NULL); } }
Runtime:
In main: creating thread 0 Hello World! It's me, thread #0 The program never completes execution.
References
[1] http://www.cs.ucsb.edu/~kemm/courses/cs172/Intro.pdf
5 Semaphores
Semaphores are counters with non-negative integer values. Generally they are available at the system level. If you try to decrement them below zero, the command will go into a wait state until another process brings the number back to one.
[insert sample semaphore commands]
Excellent description of semaphores: http://www.csc.villanova.edu/~mdamian/threads/posixsem.html
6 Barriers
Barriers are sections of code that require all threads or processors to check in before any of them can proceed. Barriers are essential for most non-trivial parallel codes. Implementation of barriers is found across virtually all high-level parallel langiages.
[insert code barrier.c]
Barriers are relatively simple to implement using any code that can count. Semaphores are only one way this is done, and they are only used on systems with shared memory. Barriers are an important, recurring concept in parallel computing. The use of barriers can cause deadlocks however, if the last process doesn't check in.
7 Networking and Sockets
History
Computer networking began simply as defining methods for timesharing large computers between multiple users[1]In 1963 through the U.S. Department of Defenses's Defense Advanced Research Projects Agency, Advanced Research Projects Agency Network or ARPANET was developed. ARPANET was an advanced switching network that shared data between a collection of government organizations, private companies, and universities.[1]
The development of UUCP in 1974 allowed for more robust email and file transfers between Unix Machines. Eventually UUCP provided support for other operating systems and became a common method for downloading email, news, and files through modems for computes without direct internet access. However, as internet access gained prevalence the utility of UUCP decreased as TCP/IP based protocols became more poplular.[1]
TCP/IP was introduced in 1982 as a replacement for ARPANET's NCP (Network Control Program). TCP/IP was designed to be more flexible and place the burden of data integrity on the hosts as opposed to the network, therefore reducing the responsibility of the network to primarily routing.[1]
Protocols
TCP/IP is the current prevalent networking protocol for the internet, and most private networks. Historically it evolved from the ARPANET 1822 protocol, and actually consists of two seperate protocols IP, and TCP.
IP or Internet Protocol simply tries to get a package of data from a source to a destination through defined routing rules and a set of addresses. However, it provides no assurance or guarantee that the data makes it there.
IP-addresses belong the the Internet Protocol and are defined under version 4 as a set of four 8-bit fields. Certain addresses are reserved for particular routing needs, while others may be used for addressing private and not public networks. Here are some examples of IP-addresses.
10.0.1.15 The local address of my laptop
10.0.1.255 A broadcast address that will send an ip package to every address from 10.0.1.1-10.0.1.254
10.0.1.0 Generally Meaningless by itself but in conjunction with what is known as a subnet mask can be used to specify how to route packets.
TCP or Transmission Control Protocol, is primarily designed to exist at the higher end of the network stack, primarily to manage IP packets and make IP easier to deal with. This includes retransmision, timeouts, end-to-end acknowledgements, resizing IP packets, and ensuring IP packets are in the correct order.[1]
Layers
ISO has defined eight standard layers for network protocols known as the OSI model. These layers include:[1]
Physical - This is the lowest layer and describes how the physical mediums interact to send and receive single bits of information. This could be a wire, or through waves, or optical through fiber optic.
Data Link - Typically this layer is best described as the lowest level ability to send data between your computer and the next network aware device, e.g. your computer and a switch, or your computer and another computer directly linked or through a hub. Data at this level is not routed. For Ethernet addressing is done through MAC addresses.
Network Layer - Since the data link layer allowed us to communicate between network aware devices, the network layer then uses this communication medium to send between any two addressible devices. This is akin to passing a note in a class, you write on the note who you want to pass the note too, but no student can pass the note except to an adjacent student. So the addressing on the note is at the network level and the passing is at the data link level. IP exists at this level, and much like the classroom note there is no guarantee that the data will arrive at it's destination.
Transport - The transport layer ensures that the data makes it there. Generally it's a partner in crime with the Network Layer, this can be seen in TCP/IP, UDP/IP, IPX/SPX etc... So to continue the metaphor this is a bit like passing a note in a class full of close friends that if they loose the note or get caught, will let you know you need to try it again.
Session - This layer exists to ensure quality of service, allows for connecting and disconnecting communication to local and remote applications, and can be used for bandwidth reservations. This layer also includes remote procedure calls, BSD sockets, and SSH.[1]
Presentation - This layer serves as a translator between the lower layers of the stack and the application level. Some encryption is also handled at this level.[1]
Application - The highest level of the OSI layers, this is where network applications both write and read data to and from the network.
Sockets
By the BSD System Calls Manual SOCKET(2), "...creates an end point for communication and returns a descriptor."[1] In Unix a descriptor is an identifier for a particular file and is passed to the kernel to provide access. This seems consistent with the Unix philosophy of treating all resources as files, therefore communication with sockets is similar to other streams where you can both read and write to it for communication.
Sockets can exist both through network protocols and through a typical file system (default for mysql.sock). Since we are concerned about network communication we will consider TCP/IP based sockets.
Creating a socket under C requires a bit more work than a typical fopen() though. Socket communication follows a client server model where a server has a known address and port number that it listens to, a client, and only one client, may communicate with this port at this address one at a time.
To illustrate this description we have the following C examples below:
Server Example:
// Server Example by J. Wallin #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #define MY_PORT 12346 main(int argc, char *argv[]) { int sock; // socket indentifier int br; // error code for bind int fd; // file handle for open socket int cl; // length of returned client sockaddr_in int sl; // length of server sockaddr_in char input_buf[1024]; // character string for input buffer int n; // length of input buffer char message[1024]; // character string for output buffer int ml; // length of output buffer struct sockaddr_in server, client; // socket data types for outgoing // and incoming sockets // create a socket sock = socket(AF_INET, SOCK_STREAM, 0) ; if (sock <0) { printf(" socket creation fails \n"); exit(1); } // set up the values in the data structure to receive messages // from any port over the internet server.sin_family = AF_INET; server.sin_addr.s_addr = htonl(INADDR_ANY); server.sin_port = htons(MY_PORT); sl = sizeof(server); // bind the server parameters to the socket br = bind(sock, (struct sockaddr *) &server, sl); if (br < 0) { printf(" binding failed \n"); exit(1); } // listen for a message listen(sock, 5); // verify that the system is listening on the right port getsockname(sock, (struct sockaddr *) &server, &sl); printf("listening on port %d \n", ntohs(server.sin_port)); // enter an infinite loop to wait for a message while(1) { // accept an incoming communication fd = accept(sock, (struct sockaddr *) &client, &cl); if (fd < 0) { printf ("error in accepting client \n"); exit(1); } // read the data being sent from the client and print it out locally n = read(fd, input_buf, 1024); printf("%s\n",input_buf); // load a message into the output buffer, and then write it strcpy(message,"our finest hour \n"); ml = strlen(message); write(fd, message, ml); // close the socket - hang up on the client close(fd); } }
Client Example:
// Client Example by J. Wallin #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <stdio.h> #include <netdb.h> #include <stdlib.h> #include <string.h> #define MY_PORT 12346 main(int argc, char *argv[]) { int sock; // socket indentifier int br; // error code for bind int fd; // file handle for open socket int cl; // length of returned client sockaddr_in int sl; // length of server sockaddr_in char input_buf[1024]; // character string for input buffer int n; // length of input buffer char message[1024]; // character string for output buffer int ml; // length of output buffer struct sockaddr_in server; // socket data types for outgoing // and incoming sockets struct hostent *theserver; char host[1024]; // create a socket sock = socket(AF_INET, SOCK_STREAM, 0) ; if (sock <0) { printf(" socket creation fails \n"); exit(1); } // set up the server name strcpy(host, "localhost"); theserver = (struct hostent *) gethostbyname(host); if (theserver == NULL) { printf("no such host \n"); exit(1); } // initialize the server attributes // clean out the memory bzero((char *) &server, sizeof(server)); // set the connection type for internet server.sin_family = AF_INET; // copy the host information from above into the data structure bcopy((void *)theserver->h_addr, (void *)&server.sin_addr, theserver->h_length); // set the port server.sin_port = htons(MY_PORT); sl = sizeof(server); // connect to the socket ml = connect(sock, (struct sockaddr *) &server, sl); if (ml < 0) { printf("error in connect \n"); exit(1); } // write an output message to the server strcpy(message, "now is the time for all good men to come to the aid of their party\n"); n = write(sock, message, strlen(message)); // read the returning message input_buf[0] = '\0'; n = read(sock, input_buf, 1024); printf("read %s from server \n",input_buf); }
...
8 Forks and Processes
A fork is a method to copy one instance of a program to a new instance. This new instance, called the child, retains the current state of memory as its parents. However, unlike a thread, the child does not share any resources with the parent.
Therefore threading can be used for trivial parallel applications that simply require multiple instances of the same exact program.
The following is an example of a fork in C.
#include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <semaphore.h> #define NUM_THREADS 5 /* The problem with forks version 1 this program shows the problems you have when you fork a process and use the same random number generator Written Sept 2007 - J. Wallin */ int counter = 0; main() { int rnd, wait_time; pid_t pid; // start the timer with a random value srand(time(0)); pid = fork(); if (pid == 0) { printf("child pid %d \n",pid ); } else { printf("parent pid %d \n",pid ); } // sleep the thread for a random time interval rnd = rand(); wait_time = rnd % 5; sleep(wait_time); counter = counter + 1; if (pid == 0) { printf("child counter=%d rnd=%d time delay=%d \n", counter, rnd, wait_time); } else { printf("parent counter=%d rnd=%d time delay=%d \n", counter, rnd, wait_time); } }
With output:
child pid 0 parent pid 6872 child counter=1 rnd=591527118 time delay=3 parent counter=1 rnd=591527118 time delay=3
As can be seen, both the parent and child provided the same random number, this isn't an accident, since the fork call was after the srand function, the values of all variables were copied over. To produce random numbers independently the program should fork before the srand call.
As a note of caution, as specified by IEEE Std 1003.1-2008 for Unix, a fork may not be an exact copy. This may become obvious when working with files, since a fork will retain the descriptors but not the locks. Therefore a child process cannot have access to a file until a parent unlocks it.[1]
9 Process-Based Behavior
Process based behavior is a programming method that allows a programmer to assign different tasks to different processes where the task is determined by the processes's id. The following code snippet is a trivial example of such behaviour where the printed message is determined by the process's id.
// fork the program pid = fork(); if (pid != 0) { printf("I am the original program"); } else { printf("I am a copy!"); }
