Science and technology

Parsing information with strtok in C

Some packages can simply course of a complete file without delay, and different packages want to look at the file line-by-line. In the latter case, you doubtless have to parse information in every line. Fortunately, the C programming language has an ordinary C library operate to just do that.

The strtok operate breaks up a line of knowledge in keeping with “delimiters” that divide every subject. It offers a streamlined strategy to parse information from an enter string.

Reading the primary token

Suppose your program must learn a knowledge file, the place every line is separated into completely different fields with a semicolon. For instance, one line from the info file would possibly appear to be this:

102*103;K1.2;K0.5

In this instance, retailer that in a string variable. You might need learn this string into reminiscence utilizing any variety of strategies. Here’s the road of code:

char string[] = "102*103;K1.2;K0.5";

Once you’ve the road in a string, you need to use strtok to tug out “tokens.” Each token is a part of the string, as much as the following delimiter. The primary name to strtok seems like this:

#embody <string.h>
char *strtok(char *string, const char *delim);

The first name to strtok reads the string, provides a null () character on the first delimiter, then returns a pointer to the primary token. If the string is already empty, strtok returns NULL.

#embody <stdio.h>
#embody <string.h>

int
foremost()
{
  char string[] = "102*103;K1.2;K0.5";
  char *token;

  token = strtok(string, ";");

  if (token == NULL) {
    puts("empty string!");
    return 1;
  }

  puts(token);

  return 0;
}

This pattern program pulls off the primary token within the string, prints it, and exits. If you compile this program and run it, you need to see this output:

102*103

102*103 is the primary a part of the enter string, as much as the primary semicolon. That’s the primary token within the string.

Note that calling strtok modifies the string you might be inspecting. If you need the unique string preserved, make a replica earlier than utilizing strtok.

Programming and growth

Reading the remainder of the string as tokens

Separating the remainder of the string into tokens requires calling strtok a number of occasions till all tokens are learn. After parsing the primary token with strtok, any additional calls to strtok should use NULL instead of the string variable. The NULL permits strtok to make use of an inner pointer to the following place within the string.

Modify the pattern program to learn the remainder of the string as tokens. Use some time loop to name strtok a number of occasions till you get NULL.

#embody <stdio.h>
#embody <string.h>

int
foremost()
{
  char string[] = "102*103;K1.2;K0.5";
  char *token;

  token = strtok(string, ";");

  if (token == NULL) {
    puts("empty string!");
    return 1;
  }

  whereas (token) {
    /* print the token */
    puts(token);

    /* parse the identical string once more */
    token = strtok(NULL, ";");
  }

  return 0;
}

By including the whereas loop, you’ll be able to parse the remainder of the string, one token at a time. If you compile and run this pattern program, you need to see every token printed on a separate line, like this:


Multiple delimiters within the enter string

Using strtok offers a fast and straightforward strategy to break up a string into simply the elements you are in search of. You can use strtok to parse every kind of knowledge, from plain textual content information to advanced information. However, watch out that a number of delimiters subsequent to one another are the identical as one delimiter.

For instance, in case you have been studying CSV information (comma-separated values, resembling information from a spreadsheet), you would possibly count on an inventory of 4 numbers to appear to be this:

1,2,3,4

But if the third “column” within the information was empty, the CSV would possibly as a substitute appear to be this:

1,2,,4

This is the place you could watch out with strtok. With strtok, a number of delimiters subsequent to one another are the identical as a single delimiter. You can see this by modifying the pattern program to name strtok with a comma delimiter:

#embody <stdio.h>
#embody <string.h>

int
foremost()
{
  char string[] = "1,2,,4";
  char *token;

  token = strtok(string, ",");

  if (token == NULL) {
    puts("empty string!");
    return 1;
  }

  whereas (token) {
    puts(token);
    token = strtok(NULL, ",");
  }

  return 0;
}

If you compile and run this new program, you will see strtok interprets the ,, as a single comma and parses the info as three numbers:


Knowing this limitation in strtok can prevent hours of debugging.

Using a number of delimiters in strtok

You would possibly marvel why the strtok operate makes use of a string for the delimiter as a substitute of a single character. That’s as a result of strtok can search for completely different delimiters within the string. For instance, a string of textual content might need areas and tabs between every phrase. In this case, you’ll use every of these “whitespace” characters as delimiters:

#embody <stdio.h>
#embody <string.h>

int
foremost()
{
  char string[] = "  hello t world";
  char *token;

  token = strtok(string, " t");

  if (token == NULL) {
    puts("empty string");
    return 1;
  }

  whereas (token) {
    puts(token);
    token = strtok(NULL, " t");
  }

  return 0;
}

Each name to strtok makes use of each an area and tab character because the delimiter string, permitting strtok to parse the road accurately into two tokens.

Wrap up

The strtok operate is a useful strategy to learn and interpret information from strings. Use it in your subsequent undertaking to simplify the way you learn information into your program.

Most Popular

To Top