Binary Tree - (Self-referential Structures) example program in C

By: Daniel Malcolm Emailed: 1705 times Printed: 2207 times

 By: rohit kumar - how this program is work By: Kirti - Hi..thx for the hadoop in By: Spijker - I have altered the code a By: ali mohammed - why we use the java in ne By: ali mohammed - why we use the java in ne By: mizhelle - when I exported the data By: raul - no output as well, i'm ge By: Rajesh - thanx very much... By: Suindu De - Suppose we are executing

Suppose we want to handle the more general problem of counting the occurrences of all the words in some input. Since the list of words isn't known in advance, we can't conveniently sort it and use a binary search. Yet we can't do a linear search for each word as it arrives, to see if it's already been seen; the program would take too long. (More precisely, its running time is likely to grow quadratically with the number of input words.) How can we organize the data to copy efficiently with a list or arbitrary words?

One solution is to keep the set of words seen so far sorted at all times, by placing each word into its proper position in the order as it arrives. This shouldn't be done by shifting words in a linear array, though - that also takes too long. Instead we will use a data structure called a binary tree.

The tree contains one ``node'' per distinct word; each node contains

• A pointer to the text of the word,
• A count of the number of occurrences,
• A pointer to the left child node,
• A pointer to the right child node.
No node may have more than two children; it might have only zero or one.

The nodes are maintained so that at any node the left subtree contains only words that are lexicographically less than the word at the node, and the right subtree contains only words that are greater. This is the tree for the sentence ``now is the time for all good men to come to the aid of their party'', as built by inserting each word as it is encountered:

To find out whether a new word is already in the tree, start at the root and compare the new word to the word stored at that node. If they match, the question is answered affirmatively. If the new record is less than the tree word, continue searching at the left child, otherwise at the right child. If there is no child in the required direction, the new word is not in the tree, and in fact the empty slot is the proper place to add the new word. This process is recursive, since the search from any node uses a search from one of its children. Accordingly, recursive routines for insertion and printing will be most natural.

Going back to the description of a node, it is most conveniently represented as a structure with four components:

```   struct tnode {     /* the tree node: */
char *word;           /* points to the text */
int count;            /* number of occurrences */
struct tnode *left;   /* left child */
struct tnode *right;  /* right child */
};
```
This recursive declaration of a node might look chancy, but it's correct. It is illegal for a structure to contain an instance of itself, but
```    struct tnode *left;
```
declares left to be a pointer to a tnode, not a tnode itself.

Occasionally, one needs a variation of self-referential structures: two structures that refer to each other. The way to handle this is:

```   struct t {
...
struct s *p;   /* p points to an s */
};
struct s {
...
struct t *q;   /* q points to a t */
};
```
The code for the whole program is surprisingly small, given a handful of supporting routines like getword that we have already written. The main routine reads words with getword and installs them in the tree with addtree.
```   #include <stdio.h>
#include <ctype.h>
#include <string.h>

#define MAXWORD 100
struct tnode *addtree(struct tnode *, char *);
void treeprint(struct tnode *);
int getword(char *, int);

/* word frequency count */
main()
{
struct tnode *root;
char word[MAXWORD];

root = NULL;
while (getword(word, MAXWORD) != EOF)
if (isalpha(word[0]))
treeprint(root);
return 0;
}
```
The function addtree is recursive. A word is presented by main to the top level (the root) of the tree. At each stage, that word is compared to the word already stored at the node, and is percolated down to either the left or right subtree by a recursive call to adtree. Eventually, the word either matches something already in the tree (in which case the count is incremented), or a null pointer is encountered, indicating that a node must be created and added to the tree. If a new node is created, addtree returns a pointer to it, which is installed in the parent node.
```   struct tnode *talloc(void);
char *strdup(char *);

/* addtree:  add a node with w, at or below p */
struct treenode *addtree(struct tnode *p, char *w)
{
int cond;

if (p == NULL) {     /* a new word has arrived */
p = talloc();    /* make a new node */
p->word = strdup(w);
p->count = 1;
p->left = p->right = NULL;
} else if ((cond = strcmp(w, p->word)) == 0)
p->count++;      /* repeated word */
else if (cond < 0)   /* less than into left subtree */
else             /* greater than into right subtree */
return p;
}
```
Storage for the new node is fetched by a routine talloc, which returns a pointer to a free space suitable for holding a tree node, and the new word is copied into a hidden space by strdup. (We will discuss these routines in a moment.) The count is initialized, and the two children are made null. This part of the code is executed only at the leaves of the tree, when a new node is being added. We have (unwisely) omitted error checking on the values returned by strdup and talloc.

treeprint prints the tree in sorted order; at each node, it prints the left subtree (all the words less than this word), then the word itself, then the right subtree (all the words greater). If you feel shaky about how recursion works, simulate treeprint as it operates on the tree shown above.

```   /* treeprint:  in-order print of tree p */
void treeprint(struct tnode *p)
{
if (p != NULL) {
treeprint(p->left);
printf("%4d %s\n", p->count, p->word);
treeprint(p->right);
}
}
```
A practical note: if the tree becomes ``unbalanced'' because the words don't arrive in random order, the running time of the program can grow too much. As a worst case, if the words are already in order, this program does an expensive simulation of linear search. There are generalizations of the binary tree that do not suffer from this worst-case behavior, but we will not describe them here.

Before leaving this example, it is also worth a brief digression on a problem related to storage allocators. Clearly it's desirable that there be only one storage allocator in a program, even though it allocates different kinds of objects. But if one allocator is to process requests for, say, pointers to chars and pointers to struct tnodes, two questions arise. First, how does it meet the requirement of most real machines that objects of certain types must satisfy alignment restrictions (for example, integers often must be located at even addresses)? Second, what declarations can cope with the fact that an allocator must necessarily return different kinds of pointers?

Alignment requirements can generally be satisfied easily, at the cost of some wasted space, by ensuring that the allocator always returns a pointer that meets all alignment restrictions. The alloc  does not guarantee any particular alignment, so we will use the standard library function malloc, which does.

The question of the type declaration for a function like malloc is a vexing one for any language that takes its type-checking seriously. In C, the proper method is to declare that malloc returns a pointer to void, then explicitly coerce the pointer into the desired type with a cast. malloc and related routines are declared in the standard header <stdlib.h>. Thus talloc can be written as

```   #include <stdlib.h>

/* talloc:  make a tnode */
struct tnode *talloc(void)
{
return (struct tnode *) malloc(sizeof(struct tnode));
}
```
strdup merely copies the string given by its argument into a safe place, obtained by a call on malloc:
```   char *strdup(char *s)   /* make a duplicate of s */
{
char *p;

p = (char *) malloc(strlen(s)+1); /* +1 for '\0' */
if (p != NULL)
strcpy(p, s);
return p;
}
```
malloc returns NULL if no space is available; strdup passes that value on, leaving error-handling to its caller.

Storage obtained by calling malloc may be freed for re-use by calling free;

 1 View Comment`i can't get getword function.Where is that written?` View Tutorial          By: withu at 2010-01-26 23:25:12 2 View Comment```/* getword: get next word or character from input */ int getword(char *word, int lim) { int c, getch(void); void ungetch(int); char *w = word; while (isspace(c = getch())) ; if (c != EOF) *w++ = c; if (!isalpha(c)) { *w = '\0'; return c; } for ( ; --lim > 0; w++) if (!isalnum(*w = getch())) { ungetch(*w); break; } *w = '\0'; return word[0]; } int getch(void) { /* get a (possibly pushed-back) character */ return (bufp > 0) ? buf[--bufp] : getchar(); } void ungetch(int c) /* push character back on input */ { if (bufp >= BUFSIZE) printf("ungetch: too many characters\n"); else buf[bufp++] = c; }``` View Tutorial          By: Vinod S at 2011-12-02 07:53:52

Your email(required, will not be shown to the public):

More Tutorials by Daniel Malcolm
 javac options in Java Operator Precedence in Java Calling Multiple Listeners in JSF Using free() Function in C ForwardAction in Struts Listing Files and Directories sample program in C Binary Tree - (Self-referential Structures) example program in C A simple program using EL in JSP Command-line Arguments in C Example Calculator program in C - describing use of External Variables in C Assignment Operators and Expressions in C The for statement in C JSF Basics assert() Versus Exceptions in C++ RMS Basics in J2ME

More Tutorials in C
 Sum of the elements of an array in C Printing a simple histogram in C Sorting an integer array in C Find square and square root for a given number in C Simple arithmetic calculations in C Command-line arguments in C Calculator in C Passing double value to a function in C Passing pointer to a function in C Infix to Prefix And Postfix in C while, do while and for loops in C Unicode and UTF-8 in C Formatting with printf in C if, if...else and switch statements in C with samples Statements in C

More Latest News
Most Viewed Articles (in C )
 goto and labels in C Listing Files and Directories sample program in C Using memset(), memcpy(), and memmove() in C Find square and square root for a given number in C Using free() Function in C Functions returning non-integer values in C lseek() sample program in C Formatting with printf in C Character Arrays in C Type Conversions in C (String to Integer, isdigit() etc) Precedence and Order of Evaluation in C Using printf function in C assert() Function Example program in C perror() Function - example program in C Passing double value to a function in C
Most Emailed Articles (in C)
 Word Counting sample program in C Do while Loops in C Bitwise Operators in C Functions returning non-integer values in C getch and ungetch in C File Inclusion in C Pointer Arrays and Pointers to Pointers in C Multi-dimensional Arrays in C (Explained using date conversion program) Using memset(), memcpy(), and memmove() in C Using qsort() and bsearch() with strings - example program in C Statements in C Formatting with printf in C while, do while and for loops in C Infix to Prefix And Postfix in C Printing a simple histogram in C