Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

C Programming Language

strtok segfaults in CLI but not in GDB


Hello,
here I have a strange problem with a real simple strtok example.

The program is as follows:

### BEGIN STRTOK ###

#include <string.h>
#include <stdio.h>

int main()
{
   char *input1 = "Hello, World!";

   char *tok;

   tok = strtok(input1, " ");
   if(tok) printf("%s\n", tok);

   tok = strtok(NULL, " ");
   if(tok) printf("%s\n", tok);

   return(0);

}

### END STRTOK ###

Now, when I run it from the command line, I get a bus error:

### BEGIN COMMAND LINE OUTPUT ###

> gcc -ggdb -Wall -o strtok strtok.c
> ./strtok

Bus error (core dumped)
Exit 138

### END COMMAND LINE OUTPUT ###

When I run it step by step in GDB, the program terminates normally:

### BEGIN DEBUGGER OUTPUT ###

> gdb ./strtok

GNU gdb 6.1.1 [FreeBSD]
[snip]GDB copyright and bla bla[/snip]
(gdb) break main
Breakpoint 1 at 0x8048570: file strtok.c, line 6.
(gdb) run
Starting program: /home/piter/strtok

Breakpoint 1, main () at strtok.c:6
6          char *input1 = "Hello, World!";
(gdb) next
10         tok = strtok(input1, " ");
(gdb)
11         if(tok) printf("%s\n", tok);
(gdb)
Hello,
13         tok = strtok(NULL, " ");
(gdb)
14         if(tok) printf("%s\n", tok);
(gdb)
World!
16         return(0);
(gdb)
18      }
(gdb)
0x08048485 in _start ()
(gdb)
Single stepping until exit from function _start,
which has no line number information.

Program exited normally.
(gdb)

### END DEBUGGER OUTPUT ###

Is there something I'm missing wrt C and/or strtok, or it's rather a
problem related to my environment (in which case I'll be happy to post
in the right newsgroup) ?

Thanx in advance

--
Pietro Cerutti

PGP Public Key ID:
http://gahr.ch/pgp

strtok alters its input.  You are passing it a string literal, modifying
a string literal invokes the demons of undefined behavior.  Don't.

--
Ian Collins.

Pietro Cerutti said:

strtok modifies the string you pass it. You pass it a string literal.
You're not allowed to modify string literals.

Change

  char *input1 = "Hello, World!";

to

  char input1[] = "Hello, World!";

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

Pietro Cerutti wrote:
>    char *input1 = "Hello, World!";

just in case, I know that the string to be tokenized shouldn't be a
constant, but rather an array of chars.
So, it should be declared as

char input1[14] = "Hello, World!";

The thing I don't understand is: why does it works in GDB?

--
Pietro Cerutti

PGP Public Key:
http://gahr.ch/pgp

Pietro Cerutti wrote:
> Pietro Cerutti wrote:

>>    char *input1 = "Hello, World!";

> just in case, I know that the string to be tokenized shouldn't be a
> constant, but rather an array of chars.
> So, it should be declared as

> char input1[14] = "Hello, World!";

> The thing I don't understand is: why does it works in GDB?

Luck?

--
Ian Collins.

Ian Collins wrote:
> Pietro Cerutti wrote:
>> Pietro Cerutti wrote:

>>>    char *input1 = "Hello, World!";
>> just in case, I know that the string to be tokenized shouldn't be a
>> constant, but rather an array of chars.
>> So, it should be declared as

>> char input1[14] = "Hello, World!";

>> The thing I don't understand is: why does it works in GDB?

> Luck?

Ya, maybe.

The point is:
I understand what UB means, so WW3 could start now and I'd know why...

But if a string literal is - by definition - not modifiable, then how
can it happen that GDB actually modifies it using strtok?

--
Pietro Cerutti

PGP Public Key:
http://gahr.ch/pgp

Pietro Cerutti wrote:
> here I have a strange problem with a real simple strtok example.

Guess: you're trying to use it on a literal string.

(fx:dancing) Yes!

`strtok` writes to its argument -- it sticks nuls in there to make
the strings it returns.

You're not allowed to write into a string literal: that gets you
undefined behaviour.

An implementation may just write into the string. Or it may abort in
some way. Or it may ignore the write. Or it may write somewhere else
entirely. Or it may mail a report to your co-coders, or start a game
of rogue, or book you a holiday in the Lake District, or set fire to
your keyboard, or arrange a date with your Most Preferred Person.

[That last one never seems to happen, though.]

--
"You've spotted a flaw in my thinking, Trev" Big Al,/The Beiderbeck Connection/

Hewlett-Packard Limited registered office:                Cain Road, Bracknell,
registered no: 690597 England                                    Berks RG12 1HN

Chris Dollin wrote:
> You're not allowed to write into a string literal: that gets you
> undefined behaviour.

> An implementation may just write into the string.

Uh? So you mean that a string literal isn't unmodifiable by definition?

--
Pietro Cerutti

PGP Public Key:
http://gahr.ch/pgp

Pietro Cerutti <g@gahr.ch> writes:
> Pietro Cerutti wrote:

>>    char *input1 = "Hello, World!";

> just in case, I know that the string to be tokenized shouldn't be a
> constant, but rather an array of chars.
> So, it should be declared as

> char input1[14] = "Hello, World!";

> The thing I don't understand is: why does it works in GDB?

Because it invokes undefined behavior.  There are no rules about what
happens.  It can crash, it can "work", it can make demons fly out of
your nose.

(I suppose string literals are stored in write-protected memory when
your program runs normally, but not when it runs under gdb -- which
seems odd.)

--
Keith Thompson (The_Other_Keith) k@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson wrote:
> (I suppose string literals are stored in write-protected memory when
> your program runs normally, but not when it runs under gdb -- which
> seems odd.)

Yes it's weird, but it's a logical explanation.
I'll investigate with the freebsd people..
Thank you.

--
Pietro Cerutti

PGP Public Key:
http://gahr.ch/pgp

In article <40054$464acdb7$50dabbcd$14@news.hispeed.ch>,
Pietro Cerutti  <g@gahr.ch> wrote:

>But if a string literal is - by definition - not modifiable, then how
>can it happen that GDB actually modifies it using strtok?

It's not modifiable in that you're not allowed to modify it.  It's not
required that the implementation signal an error when you do it.  It's
a constraint on you, not on the system.

My guess as to why you don't see an error with GDB is that the
debugger needs the text segment to be writable, so that it can set
breakpoints.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

I think you don't *quite* understand what UB means.

The actual definition (C99 3.4.3) is:

    behavior, upon use of a nonportable or erroneous program construct
    or of erroneous data, for which this International Standard
    imposes no requirements

and C99 6.4.5p6 says:

    [...]  If the program attempts to modify such an array, the
    behavior is undefined.

For example, consider this program:

#include <stdio.h>
int main(void)
{
    char *s = "Hello, world";
    s[0] = 'J'; /* attempt to modify a string literal */
    puts(s);
    return 0;

}

One of the infinitely many possibly results is that the string literal
is actually modified, and the program prints "Jello, world".

The standard doesn't say that string literals are not modifiable.  It
says that attempting to modify a string literal invokes undefined
behavior.

--
Keith Thompson (The_Other_Keith) k@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson wrote:
> The standard doesn't say that string literals are not modifiable.  It
> says that attempting to modify a string literal invokes undefined
> behavior.

Got it. Thanks!

--
Pietro Cerutti

PGP Public Key:
http://gahr.ch/pgp

Pietro Cerutti wrote:
> Chris Dollin wrote:

>> You're not allowed to write into a string literal: that gets you
>> undefined behaviour.

>> An implementation may just write into the string.

> Uh? So you mean that a string literal isn't unmodifiable by definition?

Yes, that's what I (well, the C standard) says.

Specifically, it says that if you attempt to write into a string literal,
/the effect is undefined/. Anything can happen. C washes it's hands of
your code. It cares not. Mind the gap. Do as you will.

An implementation may implement this freedom by changing the content of
the literal, if that's convenient.

Hence: don't go writing into string literals. Even though it /might/
get you a date, it probably won't, and I am assured that nasal demons
are not fun to have.

--
"I'm still here and I'm holding the answers"  - Karnataka, /Love and Affection/

Hewlett-Packard Limited registered office:                Cain Road, Bracknell,
registered no: 690597 England                                    Berks RG12 1HN

Richard Tobin wrote:
> My guess as to why you don't see an error with GDB is that the
> debugger needs the text segment to be writable, so that it can set
> breakpoints.

GDB on Debian/GNU Linux gives an error when I try to modify it.
On FreeBSD it doesn't, that's why I'm asking right now the FreeBSD
people whether the behavior is wanted or erroneous.

Thanx

> -- Richard

--
Pietro Cerutti

PGP Public Key:
http://gahr.ch/pgp

Clear. Thanks to you too.

--
Pietro Cerutti

PGP Public Key:
http://gahr.ch/pgp

Pietro Cerutti said:

> Richard Tobin wrote:

>> My guess as to why you don't see an error with GDB is that the
>> debugger needs the text segment to be writable, so that it can set
>> breakpoints.

> GDB on Debian/GNU Linux gives an error when I try to modify it.

That's an acceptable outcome of undefined behaviour.

> On FreeBSD it doesn't,

So's that.

 that's why I'm asking right now the FreeBSD

> people whether the behavior is wanted or erroneous.

It is neither Debian nor FreeBSD, but rather your program, that is
erroneous.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

In article <QqGdnW1XddD-ZdfbnZ2dnUVZ8tSdn@bt.com>,
Richard Heathfield  <r@see.sig.invalid> wrote:

>> GDB on Debian/GNU Linux gives an error when I try to modify it.

>That's an acceptable outcome of undefined behaviour.

>> On FreeBSD it doesn't,

>So's that.

> that's why I'm asking right now the FreeBSD
>> people whether the behavior is wanted or erroneous.

>It is neither Debian nor FreeBSD, but rather your program, that is
>erroneous.

I think he meant "erroneous" in the sense of a mistake, rather than
a violation of the C standard.

It certainly seems desirable to have programs behave the same way
under the debugger as without it, so it would be good if the FreeBSD
version could be changed.  Meanwhile, we at least have a clue that if
a segmentation fault goes away in the debugger then the cause may well
be attempted modification of literal strings.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

<OT>
Yes, _but_:  from the point of view of gdb users and maintainers, they
may still consider it a gdb bug if, on a single platform, _any_ program
executes differently under gdb than it does when run normally.  After all, the
underlying problem -- writing into r/o storage -- could be triggered from
an assembler program.  And gdb doesn't have the same standards-contract
relationship with anything that a C implementation does.

It is, however, a separate issue from the fact that the program invokes UB.
</OT>

Richard Tobin wrote:
> Pietro Cerutti  <g@gahr.ch> wrote:

>> But if a string literal is - by definition - not modifiable, then
>> how can it happen that GDB actually modifies it using strtok?

> It's not modifiable in that you're not allowed to modify it.  It's
> not required that the implementation signal an error when you do
> it.  It's a constraint on you, not on the system.

> My guess as to why you don't see an error with GDB is that the
> debugger needs the text segment to be writable, so that it can set
> breakpoints.

To get an error with gcc, add "-Wwrite-strings" to the command.  No
quote chars used.

--
 <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
 <http://www.securityfocus.com/columnists/423>
 <http://www.aaxnet.com/editor/edit043.html>
 <http://kadaitcha.cx/vista/dogsbreakfast/index.html>
                        cbfalconer at maineline dot net

--
Posted via a free Usenet account from http://www.teranews.com

That will cause gcc to emit a warning message if it can determine at
compilation time that you've attempted to modify a string literal.

Actually, it will generate warnings even in some cases where you
*don't* attempt to modify a string literal.  It works by internally
applying a "const" qualifier to the array type.  So, for example:

% cat c.c
char *s = "Hello, world";
% gcc -c c.c
% gcc -c -Wwrite-strings c.c
c.c:1: warning: initialization discards qualifiers from pointer target type

I haven't attempted to modify the string literal, but by assigning its
address to a (non-const) char*, I've created the potential to do so.
It would be nice if gcc were a bit smarter about this, at least
marking the array type as some kind of "pseudo-const" so it can give
more sensible warning messages.  But since an implementation can warn
about anything it likes, I don't believe the "-Wwrite-strings" option
causes gcc to be non-conforming (unless you also add "-Werror").

Consider the following program:

#include <stdio.h>
int main(void)
{
    const char *s = "Hello, world";
    char *bogus = (char*)s;
    bogus[0] = 'J';
    puts(s);
    return 0;

}

It attempts to modify a string literal, and gcc doesn't complain about
it (during compilation) even with "-Wwrite-strings", because I hid the
evil part behind a pointer cast that dropped the "const" qualifier.
On the system I'm using, it dies with a segmentation fault at run time
-- *unless* I specify "-fwritable-strings", in which case it happily
prints "Jello, world".

Most of this is gcc-specific, of course.  The topical point is that,
apart from the fact that the "-Wwrite-strings -Werror" combination
causes some valid programs to be rejected, all this behavior conforms
to the standard.

--
Keith Thompson (The_Other_Keith) k@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

[long explanation snipped]

To be fair, though, it will cause a warning to be generated for the
original code in question.

--
Keith Thompson (The_Other_Keith) k@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Nothing strange here.  strtok tries to modify it's input string.
You defined the input string as a char *, which results in a
pointer to unmodifiable chars.  Boom.

--
 <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
 <http://www.securityfocus.com/columnists/423>
 <http://www.aaxnet.com/editor/edit043.html>
 <http://kadaitcha.cx/vista/dogsbreakfast/index.html>
                        cbfalconer at maineline dot net

--
Posted via a free Usenet account from http://www.teranews.com

Pietro Cerutti wrote:
> Chris Dollin wrote:

>> You're not allowed to write into a string literal: that gets you
>> undefined behaviour.

>> An implementation may just write into the string.

> Uh? So you mean that a string literal isn't unmodifiable by
> definition?

Undefined behaviour includes working the way you think i should.

--
 <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
 <http://www.securityfocus.com/columnists/423>
 <http://www.aaxnet.com/editor/edit043.html>
 <http://kadaitcha.cx/vista/dogsbreakfast/index.html>
                        cbfalconer at maineline dot net

--
Posted via a free Usenet account from http://www.teranews.com

Pietro Cerutti wrote:

... snip ...

> But if a string literal is - by definition - not modifiable, then
> how can it happen that GDB actually modifies it using strtok?

Because that is one satisfactory implementation of "undefined".

--
 <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
 <http://www.securityfocus.com/columnists/423>
 <http://www.aaxnet.com/editor/edit043.html>
 <http://kadaitcha.cx/vista/dogsbreakfast/index.html>
                        cbfalconer at maineline dot net

--
Posted via a free Usenet account from http://www.teranews.com

Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc