Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

C Programming Language

Converting strings to int


I want to convert a string representation of a number ("1234") to an
int, with overflow and underflow checking. Essentially, I'm looking
for a strtol() that converts int instead of long. The problem with
strtol() is that a number that fits into a long might be too big for
an int. sscanf() doesn't seem to do the over/underflow checking.
atoi(), of course, doesn't do any checking. I've long thought it odd
that there aren't strtoi() and friends for int and short types in the
standard.

Any suggestions?

allthecoolkidshave@gmail.com wrote:
> I want to convert a string representation of a number ("1234") to an
> int, with overflow and underflow checking. Essentially, I'm looking
> for a strtol() that converts int instead of long. The problem with
> strtol() is that a number that fits into a long might be too big for
> an int. sscanf() doesn't seem to do the over/underflow checking.
> atoi(), of course, doesn't do any checking. I've long thought it odd
> that there aren't strtoi() and friends for int and short types in the
> standard.

> Any suggestions?

Use strtol() and check the result to see if it fits in an int.

--
Ian Collins.

On May 24, 12:22 am, allthecoolkidshave@gmail.com wrote:

> I want to convert a string representation of a number ("1234") to an
> int, with overflow and underflow checking.

Of course, 30 seconds later, I think to myself "Why not convert to a
long and see if it's between INT_MIN and INT_MAX and if so  return
that value casted to an int?"

allthecoolkidshave@gmail.com wrote:
> I want to convert a string representation of a number ("1234") to an
> int, with overflow and underflow checking. Essentially, I'm looking
> for a strtol() that converts int instead of long. The problem with
> strtol() is that a number that fits into a long might be too big for
> an int. sscanf() doesn't seem to do the over/underflow checking.
> atoi(), of course, doesn't do any checking. I've long thought it odd
> that there aren't strtoi() and friends for int and short types in the
> standard.

Check the long value against INT_MAX and INT_MIN.
Then you have the value (if the conversion to long worked), even if out
of range for an int, and the error checking you want.
allthecoolkidshave@gmail.com said:

> On May 24, 12:22 am, allthecoolkidshave@gmail.com wrote:
>> I want to convert a string representation of a number ("1234") to an
>> int, with overflow and underflow checking.

> Of course, 30 seconds later, I think to myself "Why not convert to a
> long and see if it's between INT_MIN and INT_MAX and if so  return
> that value casted to an int?"

If it is between those values, you don't need a cast. And if it isn't, a
cast won't do any good anyway.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

Richard Heathfield <r@see.sig.invalid> writes:
> allthecoolkidshave@gmail.com said:
>> On May 24, 12:22 am, allthecoolkidshave@gmail.com wrote:
>>> I want to convert a string representation of a number ("1234") to an
>>> int, with overflow and underflow checking.

>> Of course, 30 seconds later, I think to myself "Why not convert to a
>> long and see if it's between INT_MIN and INT_MAX and if so  return
>> that value casted to an int?"

> If it is between those values, you don't need a cast. And if it isn't, a
> cast won't do any good anyway.

But if you want to store the result in an int, you *will* need a
conversion.  This conversion will be done implicitly when you assign
the value.

A lot of people aren't aware that the term "cast" refers *only* to the
explicit cast operator, using a type name in parentheses.

--
Keith Thompson (The_Other_Keith) k@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson said:

> Richard Heathfield <r@see.sig.invalid> writes:
>> allthecoolkidshave@gmail.com said:

>>> [...] "Why not convert to a
>>> long and see if it's between INT_MIN and INT_MAX and if so  return
>>> that value casted to an int?"

>> If it is between those values, you don't need a cast. And if it
>> isn't, a cast won't do any good anyway.

> But if you want to store the result in an int, you *will* need a
> conversion.  This conversion will be done implicitly when you assign
> the value.

Or you can simply return it:

int foo(const char *s)
{
  long int x = whatever(s);
  validate_or_die(x);
  return x;

}

> A lot of people aren't aware that the term "cast" refers *only* to the
> explicit cast operator, using a type name in parentheses.

Yes, sure, but do we really need to include a full chapter of
explanation in every single reply we post?

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

No, but it seemed reasonable in this case.  The OP incorrectly thought
he needed a cast; the common confusion between "cast" and "conversion"
is a likely explanation of his confusion.

--
Keith Thompson (The_Other_Keith) k@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

On 24 May 2007, allthecoolkidshave@gmail.com wrote:

> I want to convert a string representation of a number ("1234") to
an
> int, with overflow and underflow checking. Essentially, I'm looking
> for a strtol() that converts int instead of long. The problem with
> strtol() is that a number that fits into a long might be too big
for
> an int. sscanf() doesn't seem to do the over/underflow checking.
> atoi(), of course, doesn't do any checking. I've long thought it
odd
> that there aren't strtoi() and friends for int and short types in
the
> standard.

> Any suggestions?

It's actually harder than it looks to use strtol() properly.  Here's
the guts a wrapper function I wrote for ints.  The wrapper returns 1
if the conversion was OK, 0 otherwise and outputs the value through a
parameter:

[code]

char *  end = NULL;
long    value;

errno = 0;
value = strtol(str, &end, base);

/*
     end == NULL if the base is invalid.
     end == str  if no conversion was done.
    *end == '\0' or *end is whitespace if the number was
            whitespace delimited (a reasonable assumption).
    errno is 0 if no overflow or underflow occurred.
*/
if (end != NULL && end != str && errno == 0 &&
     (*end == '\0' || isspace(*end)))
{
    if (INT_MIN <= value && value <= INT_MAX)
    {
        *integer = (int) value;

        return 1;
    }

}

return 0;

[/code]

I wonder if anyone would care to comment on whether this method is
adequate.

Dave

--
D.a.v.i.d  T.i.k.t.i.n
t.i.k.t.i.n [at] a.d.v.a.n.c.e.d.r.e.l.a.y [dot] c.o.m

David Tiktin said:

<snip>

> if (end != NULL && end != str && errno == 0 &&
>      (*end == '\0' || isspace(*end)))

<snip>

> I wonder if anyone would care to comment on whether this method is
> adequate.

A cursory glance reveals to me only that you are perhaps a little
optimistic in passing *end to isspace(), which requires that its
parameter be representable as an unsigned char. If, for example, *end
were -1, this would not qualify, and the behaviour would be undefined.

This is one of those very rare and bizarre cases where it is actually a
*good* idea to use a cast - isspace((unsigned char)*end) - and the
normal promotion rules will of course take care of the conversion to
int for you.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

On 24 May 2007, Richard Heathfield <r@see.sig.invalid> wrote:

Good catch!  I actually "knew" that ;-)  I have a bunch of macros
like:

  #define TO_LOWER(c)  ((char) tolower((unsigned char)(c)))

But not for isspace().  I can't figure out why.  Fixed now, though.  

Thanks!

Dave

--
D.a.v.i.d  T.i.k.t.i.n
t.i.k.t.i.n [at] a.d.v.a.n.c.e.d.r.e.l.a.y [dot] c.o.m

allthecoolkidshave@gmail.com wrote:

> I want to convert a string representation of a number ("1234") to
> an int, with overflow and underflow checking. Essentially, I'm
> looking for a strtol() that converts int instead of long. The
> problem with strtol() is that a number that fits into a long
> might be too big for an int. sscanf() doesn't seem to do the
> over/underflow checking. atoi(), of course, doesn't do any
> checking. I've long thought it odd that there aren't strtoi()
> and friends for int and short types in the standard.

> Any suggestions?

Yup.  Try this.  If you want longs, do some type changes.  You can
also modify to read from strings if really needed.

/* ------------------------------------------------- *
 * File txtinput.c                                   *
 * ------------------------------------------------- */

#include <limits.h>   /* xxxx_MAX, xxxx_MIN */
#include <ctype.h>    /* isdigit, isblank, isspace */
#include <stdio.h>    /* FILE, getc, ungetc */
#include "txtinput.h"

/* For licensing restrictions (GPL) see readme.txt in:
 *    <http://cbfalconer.home.att.net/download/txtio.zip>
 *
 * These stream input routines are written so that simple
 * conditionals can be used:
 *
 *      if (readxint(&myint, stdin)) {
 *         do_error_recovery; normally_abort_to_somewhere;
 *      }
 *      else {
 *         do_normal_things; usually_much_longer_than_bad_case;
 *      }
 *
 * They allow overflow detection, and permit other routines to
 * detect the character that terminated a numerical field. No
 * string storage is required, thus there is no limitation on
 * the length of input fields.  For example, a number entered
 * with a string of 1000 leading zeroes will not annoy these.
 *
 * The numerical input routines *NEVER* absorb a terminating
 * char (including '\n').  Thus a sequence such as:
 *
 *      err = readxint(&myint, stdin);
 *      flushln(stdin);
 *
 * will always consume complete lines, and after execution of
 * readxint a further getc (or fgetc) will return the character
 * that terminated the numeric field.
 *
 * They are also re-entrant, subject to the limitations of file
 * systems.  e.g interrupting readxint(v, stdin) operation with
 * a call to readxwd(wd, stdin) would not be well defined, if
 * the same stdin is being used for both calls.  If ungetc is
 * interruptible the run-time system is broken.
 *
 * Originally issued 2002-10-07
 *
 * Revised 2006-01-15 so that unsigned entry overflow (readxwd)
   uses the normal C modulo (UINT_MAX + 1) operation.  readxwd
   still rejects an initial sign as an error.
 */

/* -------------------------------------------------------------
 * Skip to non-blank on f, and return that char. or EOF The next
 * char that getc(f) will return is unknown.  Local use only.
 */
static int ignoreblks(FILE *f)
{
   int ch;

   do {
      ch = getc(f);
   } while ((' ' == ch) || ('\t' == ch));
   /* while (isblank(ch)); */                   /* for C99 */
   return ch;

} /* ignoreblks */

/*--------------------------------------------------------------
 * Skip all blanks on f.  At completion getc(f) will return
 * a non-blank character, which may be \n or EOF
 *
 * Skipblks returns the char that getc will next return, or EOF.
 */
int skipblks(FILE *f)
{
   return ungetc(ignoreblks(f), f);

} /* skipblks */

/*--------------------------------------------------------------
 * Skip all whitespace on f, including \n, \f, \v, \r.  At
 * completion getc(f) will return a non-blank character, which
 * may be EOF
 *
 * Skipwhite returns the char that getc will next return, or EOF.
 */
int skipwhite(FILE *f)
{
   int ch;

   do {
      ch = getc(f);
   } while (isspace(ch));
   return ungetc(ch, f);

} /* skipwhite */

/*--------------------------------------------------------------
 * Read an unsigned value.  Signal error for overflow or no
 * valid number found. Returns 1 for error, 0 for noerror, EOF
 * for EOF encountered before parsing a value.
 *
 * Skip all leading blanks on f.  At completion getc(f) will
 * return the character terminating the number, which may be \n
 * or EOF among others. Barring EOF it will NOT be a digit.  The
 * combination of error, 0 result, and the next getc returning
 * \n indicates that no numerical value was found on the line.
 *
 * If the user wants to skip all leading white space including
 * \n, \f, \v, \r, he should first call "skipwhite(f);"
 *
 * Peculiarity: This specifically forbids a leading '+' or '-'.
 */
int readxwd(unsigned int *wd, FILE *f)
{
   unsigned int value, digit;
   int          status;
   int          ch;

   #define UWARNLVL (UINT_MAX / 10U)
   #define UWARNDIG (UINT_MAX - UWARNLVL * 10U)

   value = 0;                           /* default */
   status = 1;                          /* default error */

   ch = ignoreblks(f);

   if (EOF == ch) status = EOF;
   else if (isdigit(ch)) status = 0;    /* digit, no error */

   while (isdigit(ch)) {
      digit = ch - '0';
      if ((value > UWARNLVL) ||
          ((UWARNLVL == value) && (digit > UWARNDIG))) {
         status = 1;             /* overflow */
         value -= UWARNLVL;
      }
      value = 10 * value + digit;
      ch = getc(f);
   } /* while (ch is a digit) */

   *wd = value;
   ungetc(ch, f);
   return status;

} /* readxwd */

/*--------------------------------------------------------------
 * Read a signed value.  Signal error for overflow or no valid
 * number found.  Returns true for error, false for noerror.  On
 * overflow either INT_MAX or INT_MIN is returned in *val.
 *
 * Skip all leading blanks on f.  At completion getc(f) will
 * return the character terminating the number, which may be \n
 * or EOF among others. Barring EOF it will NOT be a digit.  The
 * combination of error, 0 result, and the next getc returning
 * \n indicates that no numerical value was found on the line.
 *
 * If the user wants to skip all leading white space including
 * \n, \f, \v, \r, he should first call "skipwhite(f);"
 *
 * Peculiarity: an isolated leading '+' or '-' NOT immediately
 * followed by a digit will return error and a value of 0, when
 * the next getc will return that following non-digit.  This is
 * caused by the single level ungetc available.
 */
int readxint(int *val, FILE *f)
{
   unsigned int value;
   int          status, negative;
   int          ch;

   *val = value = 0;                    /* default */
   status = 1;                          /* default error */
   negative = 0;

   ch = ignoreblks(f);

   if (EOF != ch) {
      if (('+' == ch) || ('-' == ch)) {
         negative = ('-' == ch);
         ch = ignoreblks(f);             /* absorb any sign */
      }

      if (isdigit(ch)) {                 /* digit, no error */
         ungetc(ch, f);
         status = readxwd(&value, f);
         ch = getc(f);           /* This terminated readxwd */
      }

      if (0 == status) {
         /* got initial digit and no readxwd overflow */
         if (!negative && (value <= INT_MAX))
            *val = value;
         else if (negative && (value < UINT_MAX) &&
                 ((value - 1) <= -(1 + INT_MIN)))
            *val = -value;
         else {                       /* overflow */
            status = 1;  /* do whatever the native system does */
            if (negative) *val = -value;
            else          *val = value;
         }
      }
      else if (negative) *val = -value;
      else               *val = value;
   }
   ungetc(ch, f);
   return status;

} /* readxint */

/*-----------------------------------------------------
 * Flush input through an end-of-line marker inclusive.
 */
void flushln(FILE *f)
{
   int ch;

   do {
      ch = getc(f);
   } while (('\n' != ch)  && (EOF != ch));

} /* flushln */

/* End of txtinput.c */

and this:

#ifndef H_txtinput_h
#define H_txtinput_h
#  ifdef __cplusplus
      extern "C" {
#  endif

#include <stdio.h>

/* For licensing restrictions (GPL) see readme.txt in:
 *    <http://cbfalconer.home.att.net/download/txtio.zip>
 *
 * These stream input routines are written so that simple
 * conditionals can be used:
 *
 *      if (readxint(&myint, stdin)) {
 *         do_error_recovery; normally_abort_to_somewhere;
 *      }
 *      else {
 *         do_normal_things; usually_much_longer_than_bad_case;
 *      }
 *
 * They allow overflow detection, and permit other routines to
 * detect the character that terminated a numerical field. No
 * string storage is required, thus there is no limitation on
 * the length of input fields.  For example, a number entered
 * with a string of 1000 leading zeroes will not annoy these.
 *
 * The numerical input routines *NEVER* absorb a terminating
 * char (including '\n').  Thus a sequence such as:
 *
 *      err = readxint(&myint, stdin);
 *      flushln(stdin);
 *
 * will always consume complete lines, and after execution of
 * readxint a further getc (or fgetc) will return the character
 * that terminated the numeric field.
 *
 * They are also re-entrant, subject to the limitations of file
 * systems.  e.g interrupting readxint(v, stdin) operation with
 * a call to readxwd(wd, stdin) would not be well defined, if
 * the same stdin is being used for both calls.  If ungetc is
 * interruptible the run-time system is broken.

 * Revised 2006-01-15 so that unsigned entry overflow (readxwd)
   uses the normal C modulo (UINT_MAX + 1) operation.  readxwd
   still rejects an initial sign as an error.
 */

/*--------------------------------------------------------------
 * Skip all blanks on f.  At completion getc(f) will return
 * a non-blank character, which may be \n or EOF
 *
 * Skipblks returns the char that getc will next return, or EOF.
 */
int skipblks(FILE *f);

/*--------------------------------------------------------------
 * Skip all whitespace on f, including \n, \f, \v, \r.  At
 * completion getc(f) will return a non-blank character, which
 * may be EOF
 *
 * Skipblks returns the char that getc will next return, or EOF.
 */
...

read more »

David Tiktin <dtik@nospam.totally-bogus.com> wrote:
> Richard Heathfield <r@see.sig.invalid> wrote:
> > ...
> > This is one of those very rare and bizarre cases where
> > it is actually a *good* idea to use a cast - isspace(
> > (unsigned char)*end) - and the normal promotion rules
> > will of course take care of the conversion to int for
> > you.

> Good catch!  I actually "knew" that ;-)  I have a bunch
> of macros like:

>   #define TO_LOWER(c)  ((char) tolower((unsigned char)(c)))

How is the (char) cast useful?

P.S. I find the (unsigned char) application above
contentious in that it assumes that 1c and sm
implementations will make plain char unsigned.

--
Peter

On 24 May 2007, Peter Nilsson <a@acay.com.au> wrote:

> David Tiktin <dtik@nospam.totally-bogus.com> wrote:
>> Richard Heathfield <r@see.sig.invalid> wrote:
>> > ...
>> > This is one of those very rare and bizarre cases where
>> > it is actually a *good* idea to use a cast - isspace(
>> > (unsigned char)*end) - and the normal promotion rules
>> > will of course take care of the conversion to int for
>> > you.

>> Good catch!  I actually "knew" that ;-)  I have a bunch
>> of macros like:

>>   #define TO_LOWER(c)  ((char) tolower((unsigned char)(c)))

> How is the (char) cast useful?

In it's typical use:

  char * ptr = str;

  while (*ptr)
  {
     *ptr = TO_LOWER(*ptr);
     ptr++;
  }

some compilers I've used over the years complain about the assignment
of an int to a char due to loss of precision.  I generally run with
the highest warning levels I can get, so the cast silences a warning
I've investigated and found not to be a problem in this situation.

> P.S. I find the (unsigned char) application above
> contentious in that it assumes that 1c and sm
> implementations will make plain char unsigned.

Sorry, I don't understand your point here or where that assumption is
made.

Is there a problem that the code should be:

  *ptr = tolower((int)(*ptr) & 0xFF);

to assure the passed value and result are in the range 0-255 even if
CHAR_BITS is greater than 8?

Dave

--
D.a.v.i.d  T.i.k.t.i.n
t.i.k.t.i.n [at] a.d.v.a.n.c.e.d.r.e.l.a.y [dot] c.o.m

There is no semantic difference.

> some compilers I've used over the years complain about the
> assignment of an int to a char due to loss of precision.

Assignment of int values to a char is probably the most
fundamental of useful constructs that C has. Putting a
warning on that is to me like putting a warning on every
#include asking if that's the file you actually meant to
include.

> I generally run with the highest warning levels I can get,

A good move, but you shouldn't change code to silence one
compiler's warnings unecessarily. Different compilers will
issue warnings for different reasons and two different
compilers can even issue warnings for opposing reasons.

> so the cast silences a warning I've investigated and found
> not to be a problem in this situation.

The simpler option is to acknowledge that no action is
required as a consequence of the warning.

It's easy to fall into the belief that the absense of
warnings is a strong measure of correctness. 'Clean'
compiles give a sense of confidence. But it's a small
step away from introducing bugs, just to silence a
compiler.

> > P.S. I find the (unsigned char) application above
> > contentious in that it assumes that 1c and sm
> > implementations will make plain char unsigned.

> Sorry, I don't understand your point here or where that
> assumption is made.

Depending on how you use them, input routines often read
and store bytes, not (plain) chars. On an sm machine
interpreting an input byte as a char representation and
converting it to an unsigned char can potentially yield
a different character code to the original for some
characters outside the basic character set.

It's a highly unlikely scenario, and it's dismissed with
a little handwaving about QoI guaranteeing that 1c and sm
machines will always make plain char unsigned.

> Is there a problem that the code should be:

>   *ptr = tolower((int)(*ptr) & 0xFF);

> to assure the passed value and result are in the range
> 0-255 even if CHAR_BITS is greater than 8?

No. I'm suggesting, in some cases, it should be...

  *ptr = tolower(* (unsigned char *) ptr);

Obviously that's not as aesthetic as the direct conversion
(unsigned char) *ptr, but it does have the advantage of
working on the hypothetical machines (contrived if you
like) as well as the vanilla ones.

--
Peter

On 25 May 2007, Peter Nilsson <a@acay.com.au> wrote:

Semantic difference between what:

  *ptr = (char) tolower(c);

and

  *ptr = tolower(c);

?

>> some compilers I've used over the years complain about the
>> assignment of an int to a char due to loss of precision.

> Assignment of int values to a char is probably the most
> fundamental of useful constructs that C has. Putting a
> warning on that is to me like putting a warning on every
> #include asking if that's the file you actually meant to
> include.

Sorry, I just don't agree.  How many times have we seen code in this
group that goes:

[bad code]

char c;

while ((c = getc()) != EOF)
{
  /* infinite loop */

}

[/bad code]

I suspect that the int -> char warnings are there to prevent things
like this.

I *never* fall into that belief ;-)  What I do assume is that code
that compiles *with* warnings is likely *not* correct.  I routinely
compile with at least 4 different compilers on 4 different platforms
(2 of them big-endian).  I expect my code to compile without warnings
on all of them (and to be correct on all of them ;-)  Yes, that
sometimes means adding a cast for a "picky" compiler.  It also
sometimes means changing the code to something simpler, clearer and
better.  But if I don't fix the code to silence the warnings, even if
they don't signal a real problem, I'll continue to get the warnings
and waste time looking at things I've already thought about, tested
and fixed.  I don't do this in a calavier manner, but when I rebuild
a 50 file project, I need to be able to *see* it builds warning free.

OK, thanks for the warning ;-)  I've never had to code for a platform
that's 1s complement or sign-magnitude, but if I did, I imagine I'd
have more to worry about that int -> char casts.  I know of at least
one piece of code I have that explicitly assumes 2s complement, and
I'm sure *none* of my networking code would work!

Dave

--
D.a.v.i.d  T.i.k.t.i.n
t.i.k.t.i.n [at] a.d.v.a.n.c.e.d.r.e.l.a.y [dot] c.o.m

Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc