Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

Fortran Programming Language

Easy formatting questions :-)


Hi group, Happy Easter! :-)

I've been trying to read a line in an input file that has a mix of
characters & integers. The format can be assumed to be characters for
15 columns and a binary string of unknown/variable length.

The format of the input file 'test.txt':
|-CHARAC15-------|-----------INTEGER-----------.....
   Binary string = 010101010 ....

The following test program fails:
Program test_format
Implicit None

Integer:: buffer
Integer, allocatable:: vector(:)
Character:: a15*15
buffer=1000
Allocate (vector(buffer))
Open (1,file='test.txt')
Read (1,*) a15, vector(1:10)
Print *, a15
print *, vector(1:10)

End program test_format

Some questions...
- If I change the Read statement to Read (1,'(a15,10000i1)') it works.
However I want to read it in as free format if possible, so that if
the binary string is longer than 10000 bits, my program won't cause
problems.
- Similarly for writing out large arrays...I don't want to have to
specify "Write (1,'(a15,10000i1)')" as arrays larger than 10000 would
get truncated.
- Is there an easy way to get the program to initially parse test.txt,
look for the widest binary string length, and automatically allocate
"buffer" to match that size?

Thanks everyone,
skate xx

On Apr 8, 1:30 pm, "sk8terg1rl" <sk8terg1rl_2@yahoo.co.uk> wrote:

> Hi group, Happy Easter! :-)

> I've been trying to read a line in an input file that has a mix of
> characters & integers. The format can be assumed to be characters for
> 15 columns and a binary string of unknown/variable length.

> The format of the input file 'test.txt':
> |-CHARAC15-------|-----------INTEGER-----------.....
>    Binary string = 010101010 ....

> The following test program fails:

You cleared the first hurdle of asking a good programming question by
posting what looks
like a complete code. But you didn't clear the 2nd -- never say merely
that a program "fails" or "does not work" -- explain HOW it fails.
What output does it give, and how does that differ from what you want?

sk8terg1rl <sk8terg1rl_2@yahoo.co.uk> wrote:
> The format of the input file 'test.txt':
> |-CHARAC15-------|-----------INTEGER-----------.....
>    Binary string = 010101010 ....

> The following test program fails:
...
> Read (1,*) a15, vector(1:10)

Not surprising. There are no delimitters (blanks or commas) between the
integers. I can't imagine how one would expect the compiler to read your
mind to guess that you might mean to read each digit into a separate
integer.  Anyway...

> - If I change the Read statement to Read (1,'(a15,10000i1)') it works.
> However I want to read it in as free format if possible, so that if
> the binary string is longer than 10000 bits, my program won't cause
> problems.
> - Similarly for writing out large arrays...I don't want to have to
> specify "Write (1,'(a15,10000i1)')" as arrays larger than 10000 would
> get truncated.
> - Is there an easy way to get the program to initially parse test.txt,
> look for the widest binary string length, and automatically allocate
> "buffer" to match that size?

Well, let's answer the part about writing first, as that is simplest.
You are already using what I usually recommend as the simplest technique
for such things. It is adequate for most applications, but if your data
really might be arbitrarily long, then you need something better. First
I'd wonder if you couldn't just bump up the 10000 to some larger value
that would always be enough. I do recommend reconsidering that
possibility (probably protected by a test to give an error message if
you detect the limit being hit). But assuming you can't do that...

For output, either

1. Generate the format using an internal write such as
    character :: fmt*32
    ...
    write (fmt,'(a,i10,a)') '(a15,i', size(vector), 'i1)'

or

2. Use non-advancing I/O to write one character or the record at a time.

For input, you'll pretty much have to read the file using non-advancing
I/O. There aren't any other good options. Probably the simplest thing is
to read the file twice. The first read scans for the largest record
using non-advancing I/O. Then allocate the needed buffers and reread the
file using either non-advancing I/O or a format string generated as
described above.

You can avoid reading the file twice if needed, but it adds
complication. You'd probably need to temporarily store the data in a
dynamic structure such as a linked list. It depends on the application
whether that is worth doing. In some applications you really can't read
the file twice for any of several reasons; or you might not want the
performance penalty of doing so.

--
Richard Maine                    | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle           |  -- Mark Twain

sk8terg1rl wrote:

...
> Integer:: buffer
> Integer, allocatable:: vector(:)
> Character:: a15*15
> buffer=1000
> Allocate (vector(buffer))
> Open (1,file='test.txt')
> Read (1,*) a15, vector(1:10)
> Print *, a15
> print *, vector(1:10)

...
]> - If I change the Read statement to Read (1,'(a15,10000i1)') it works.

> However I want to read it in as free format if possible, so that if
> the binary string is longer than 10000 bits, my program won't cause
> problems.
> - Similarly for writing out large arrays...I don't want to have to
> specify "Write (1,'(a15,10000i1)')" as arrays larger than 10000 would
> get truncated.
> - Is there an easy way to get the program to initially parse test.txt,
> look for the widest binary string length, and automatically allocate
> "buffer" to match that size?

Well, you are insisting on list-directed I/O for something that's pretty
much required to use an explicit format.  Consider the input record:

ABCDEFGHIJKLMNOP011010101010101010100

By the rules of list directed I/O, that whole object will be read
as the character string.  You need to delimit the part that's intended
to be the string from the rest of the record:

ABCDEFGHIJKLMNOP,011010101010101010100

But, now the whole sequence of digits will be interpreted as the first
of the integer values.  You need to delimit each digit from the next:

ABCDEFGHIJKLMNOP,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,0

This sounds like a hardship to me.  Instead of changing the data on
your files, I'd look into using non-advancing I/O and explicit formats:

   read(1, '(a15)', advance = 'no') a15
   do i=1, buffer
      read(1, '(i1)', advance=no, eor=100) vector(i)
   end do
   ! at this point a buffer full has been read, there may still be more
   ...
100 continue
   ! at this point an end of record has bean read and i-1 elements of
   ! data have been read

You'll have to read up on non-advancing I/O to fill in the details.
This will work for strings of digits as long as your system's record
limit permits.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies."   --  C. A. R. Hoare

It's not directly possible in free format because something like
010101  is an integer and there is no way for the processor to know
that you mean 6 "1 bit" numbers, rather than a 6 digit decimal
number.  There are two general approaches to consider.
1)  declare a huge character string, bigger than you will ever need
and "read" through it character by character.
   something like
     character (len=1000000000000000000) :: x  !or maybe smaller ;)
     x=' '
     read(...) x
     do  i = 1,10000000000000000000000
       if(x(I:I) == '1') vector(i) = 1
       if(x(I:I) == '0') vector(i) = 0
       if(x(I:I) == ' ') exit
     enddo
here, I will be the actual length

2)  use non advancing I/O and read each digit with an I1 format.
You'll need to add ADVANCE='NO' to the read.  If you can't find
out about non-advancing I/O, ask here and several people will
explain it.

With either one, you'll need to make VECTOR be big enough before you start.

> - Similarly for writing out large arrays...I don't want to have to
> specify "Write (1,'(a15,10000i1)')" as arrays larger than 10000 would
> get truncated.

Formats revert when they come to their end and there are still
items in the I/o list.  You could do something like
        write (1, '(a15, 100i1, (15x,100i1)) title, vector
that will write out the title and first 100 digits one the first
line and then 15 blanks and the next 100 digits on the second and
third and ... nth line.  It will use as many lines as necessary
and write out the odd number one the last line.

> - Is there an easy way to get the program to initially parse test.txt,
> look for the widest binary string length, and automatically allocate
> "buffer" to match that size?

No easy, if there are many lines you need to read them in one at a time
into a huge string and look for the longest line.  you can probably do
something like
       lentrim(x)-15
which will tell you how many digits there are on one line and then
find the max of this over all of the lines.

Hope this helps

Dick Hendrickson

On 8 Apr, 18:48, "Beliavsky" <beliav@aol.com> wrote:

Hi Beliavsky, thanks for replying.

Here's the error:
sk8terg1rl@home:~/test>./test
forrtl: severe (24): end-of-file during read, unit 1, file /home/
sk8terg1rl/test/test.txt
Image              PC                Routine            Line
Source
test               000000000043BD2B  Unknown               Unknown
Unknown
test               000000000043A356  Unknown               Unknown
Unknown
test               000000000043A2DE  Unknown               Unknown
Unknown
test               000000000041C76A  Unknown               Unknown
Unknown
test               000000000041C3CB  Unknown               Unknown
Unknown
test               000000000040F161  Unknown               Unknown
Unknown
test               0000000000402763  Unknown               Unknown
Unknown
test               000000000040266A  Unknown               Unknown
Unknown
libc.so.6          00002AAAAAD355AA  Unknown               Unknown
Unknown
test               00000000004025AA  Unknown               Unknown
Unknown

The expected output is a character string and an integer string.

skate xx

On 8 Apr, 19:07, nos@see.signature (Richard Maine) wrote:

> sk8terg1rl <sk8terg1rl_2@yahoo.co.uk> wrote:
> > The format of the input file 'test.txt':
> > |-CHARAC15-------|-----------INTEGER-----------.....
> >    Binary string = 010101010 ....

> > The following test program fails:
> ...
> > Read (1,*) a15, vector(1:10)

Hi Richard, thanks for replying.

> Not surprising. There are no delimitters (blanks or commas) between the
> integers. I can't imagine how one would expect the compiler to read your
> mind to guess that you might mean to read each digit into a separate
> integer.  Anyway...

My answer to Beliavsky is with an input file which has delimiters. So
each binary bit occupies 3 columns (space/bit/space).

The question was originally pitched for reading in this "space-bit-
space..." format, but I decided to take a more visually appealing case
of "bit-bit-bit...."  format as vi produces a confusing output with
very wide columns. So yes, the delimiters were there initially :-)

Because sometimes the output gets messed up. E.g.
Open (1,file='temp.txt')
character15 = 'abcd...'
Write (1,'(a10000)') character15

Try to vi this file...simple text editors mess up with extremely wide
files.

Also, I wasn't sure if there was a limit on the number you could put
there.

>I do recommend reconsidering that
> possibility (probably protected by a test to give an error message if
> you detect the limit being hit). But assuming you can't do that...

Good idea.

> For output, either

> 1. Generate the format using an internal write such as
>     character :: fmt*32
>     ...
>     write (fmt,'(a,i10,a)') '(a15,i', size(vector), 'i1)'

You mean:
write (fmt,'(a,i10,a)') '(a15,',size,'i1)'

Actually I just had an idea: Linux's wc command + scrupulous column
accounting.
So some code like this would work:

result = systemqq(wc test.txt > wcoutput)
Open (2,file='wcoutput')
Read (2,*) rows, characters, columns
buffer = columns - 15

Thanks for the remark about a non-advancing input too. I've bookmarked
a few Google hits and will look it up. We weren't taught this in our
classes.

skate xx

Hi James, thanks for replying,

On 8 Apr, 19:20, "James Giles" <jamesgi@worldnet.att.net> wrote:

Actually I would have mine written out as:
Write (1,'(a15,10000i1)') 'ABCDEFGHIJKLMNO '
(Note the space after O)

> By the rules of list directed I/O, that whole object will be read
> as the character string.  You need to delimit the part that's intended
> to be the string from the rest of the record:

> ABCDEFGHIJKLMNOP,011010101010101010100

> But, now the whole sequence of digits will be interpreted as the first
> of the integer values.  You need to delimit each digit from the next:

> ABCDEFGHIJKLMNOP,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,0

> This sounds like a hardship to me.

Quite.

Okay, I will read up on this. For now I'll use explicit formats but I
hate having to dig through mysteriously crashing code looking for
"flexibility problems" like above after being to explicit with array
sizes, formats, etc.

> "I conclude that there are two ways of constructing a software
> design: One way is to make it so simple that there are obviously
> no deficiencies and the other way is to make it so complicated
> that there are no obvious deficiencies."   --  C. A. R. Hoare

About your sig - would you suggest that the natural conclusion from
this is to use bootstrapped programs (Linux philosophy) rather than
writing huge programs (Windows Vista philosophy)?

skate xx

Hi Dick, thanks for replying,

On 8 Apr, 19:28, Dick Hendrickson <dick.hendrick@att.net> wrote:

Yes. I was having an "engineering moment" ;-)

I originally posed the question as delimited binary bits (space-bit-
space format). I would prefer a concatenated binary string for
compactness with simple text editors like vi, hence me rephrasing it
to bit-bit-bit format :-)

> 1)  declare a huge character string, bigger than you will ever need
> and "read" through it character by character.
>    something like
>      character (len=1000000000000000000) :: x  !or maybe smaller ;)
>      x=' '
>      read(...) x
>      do  i = 1,10000000000000000000000
>        if(x(I:I) == '1') vector(i) = 1
>        if(x(I:I) == '0') vector(i) = 0
>        if(x(I:I) == ' ') exit
>      enddo
> here, I will be the actual length

Why did you do x(I:I) instead of x(I)?

> 2)  use non advancing I/O and read each digit with an I1 format.
> You'll need to add ADVANCE='NO' to the read.  If you can't find
> out about non-advancing I/O, ask here and several people will
> explain it.

I will Google it up as something potentially useful to keep in mind. I
need to code and (grr...inevitably!) debug a working program quite
soon so I will go for the easy explicit format for now. As I hope to
expand its flexibility in the future, this will become handy to
know...

Linux's wc would work too. My code is written for Linux systems, and
makes use of some Linux-specific commands. I program it in a Windows
box for aesthetic reasons and scp it over - I'm quite comfortable with
Compaq Visual Fortran's editor (I haven't found an editor that is very
Fortran-friendly in Linux yet).

In my Windows compiler's manual, it is len_trim(x). Still, another
handy intrinsic function to know.

> Hope this helps

It does, thanks again :-)

skate xx

sk8terg1rl wrote:

...

>>      do  i = 1,10000000000000000000000
>>        if(x(I:I) == '1') vector(i) = 1
>>        if(x(I:I) == '0') vector(i) = 0
>>        if(x(I:I) == ' ') exit
>>      enddo
>> here, I will be the actual length

> Why did you do x(I:I) instead of x(I)?

It's a design flaw of F77 that persists into the later versions
of the standard because of backward compatibility.  I doubt
that it was actually deliberate, but you would need detailed
minutes of the meetings to really determine when the problem
was discovered and why they chose as they did.  The short
answer is that X(I) is implicitly interpreted as a function
call if X is not declared to be an array.  Since X is declared
to be a string (which are considered to be scalars, not arrays)
the syntax X(I) is a function call.  X(I:I) is not a function
call since function arguments can't use the colon syntax.
Hence all character substrings *must* be extracted with
the colon form.  :-(

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies."   --  C. A. R. Hoare

Think of this problem as being similar to reading Excel data which may
be tab, space, comma, semicolon delimited (and mixed comma-semicolon
and non separating spaces and the occasional quotes
and double quotes).
My response is based on how I wrote my own program to process this
data.

The file is easily processed using unformatted binary reads.
The delimiter in this problem case is the 15-character uninterrupted
string, which is then followed by an arbitrary count of space-
separated digit values of only one or zero, unitll either-end-of-file
of another 15 character string is found.
You work by reading blocks of data into a work area and getting bytes
from it until exhausted, (when you get another block and reset the
pointer).

And the process is by a state engine.
 state 1 = found 15 characters with no blanks
 state 2 = in binary digit array where there is one digit only before
a blank.
And you extract the appropriate state "x" values till the state
switches.

sk8terg1rl <sk8terg1rl_2@yahoo.co.uk> wrote:
> On 8 Apr, 19:07, nos@see.signature (Richard Maine) wrote:
> So yes, the delimiters were there initially :-)

Oh. Well, if you are asking a question about how to read a specific
input form, the answers aren't likely to bet very good if the input form
you show isn't the one you mean.

> > I'd wonder if you couldn't just bump up the 10000 to some larger value
> > that would always be enough.

> Because sometimes the output gets messed up. E.g.
...
> Try to vi this file...simple text editors mess up with extremely wide
> files.

Um. Now you've got me confused about your requirement. It seemed like
you were specifically asking about how to make such a wide file. Yes, I
know they are awkward. Now I don't understand what the question is. If
you are just trying to put out the values and don't care how many
records they take, then it is pretty trivial. Just

  write(lun,*) vector_or_whatever

will do fine. Or you can use explicit formats. If you do, for example,
something like

  write (lun,'(a16,20i3/(16x,20i3))') 'stuff = ', vector

It will write stuff= and the first 20 values on the first line, followed
by 20 values per line for as many lines as it takes. Is that all you
were asking? If not, you need to clarify the question.

For input, I now don't know what you are asking. You'll need to actually
say what the input is - not some example of what it isn't. Including
such details as whether it is all on one line or not.

> Also, I wasn't sure if there was a limit on the number you could put
> there.

Not specified by the standard. It is compiler dependent. Some old
compilers used to have smallish limits on record sizes and thus formats
that they would accept, but I haven't seen that in a long time. I'd
avoid implied record lengths of over 2 billion. But then from other
comments above, maybe you don't want such a single long record anyway.

> >     write (fmt,'(a,i10,a)') '(a15,i', size(vector), 'i1)'

> You mean:
> write (fmt,'(a,i10,a)') '(a15,',size,'i1)'

No. I meant size(vector). Unlss my eyes are overlooking something, that
(and irrelevant spacing) is the only difference between what I wrote and
what you did. Size is an intrinsic function that returns the size of an
array. If you happen to have the appropriate value handy in a variable,
that's also fine. (Though I recomend against using size as a variable
name because of the conflict with the intrinsic name; it is allowed, but
I recomend against it).

> Actually I just had an idea: Linux's wc command + scrupulous column
> accounting...

Yes. I even thought of mentioning something like that. But its just a
variant of reading through the file twice. The wc command reads through
the file once, and then your Fortran program does it again. I'd
personally think it simpler and more robust to just read the file twice
in Fortran, since you are doing the second read in Fortran anyway. That
way you can easily make sure you follow the same formatting rules both
times. And you don't have to fuss with other trivia like making sure
that your temporary file doesn't conflict with anything else or have
other problems. (Not to speak of the system function being compiler
specific, and systemqq being a particularly non-portable variant.)

> Thanks for the remark about a non-advancing input too. I've bookmarked
> a few Google hits and will look it up. We weren't taught this in our
> classes.

Oh. It is really useful for some kinds of more complicated input and
output issues. Perhaps doesn't come up so much in classroom examples,
but does regularly in practice.

--
Richard Maine                    | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle           |  -- Mark Twain

Terence <tbwri@cantv.net> wrote:
> The file is easily processed using unformatted binary reads....

Wow. I could imagine more complicated and system dependent ways, but I'd
have to work a bit at it. Perhaps bypassing the file system and talking
directly to the disk controller; yes, that would do it. :-(

--
Richard Maine                    | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle           |  -- Mark Twain

sk8terg1rl wrote:

<snip>

> Actually I just had an idea: Linux's wc command + scrupulous column
> accounting.
> So some code like this would work:

> result = systemqq(wc test.txt > wcoutput)
> Open (2,file='wcoutput')
> Read (2,*) rows, characters, columns
> buffer = columns - 15

That's not quite what wc does.  It counts newlines, words, and bytes,
and prints the results.  Your code would use the total byte count as the
maximum column count, which would be effective but probably excessive.

My free advice?  Sure, you could do this in Fortran, but it might be
easier to write a script in something like Perl to read the file and
rewrite it in a format which a very simple Fortran program could read.
You could change a record with 24 bits like this:

abcdefghijklmno101010101111000010100000

to a pair of records like this:

abcdefghijklmno 24
101010101111000010100000

Then you could read the 15-character header and the bit count,
reallocate your array if necessary, and then read the bits all at once.

Louis

"James Giles" <jamesgi@worldnet.att.net> wrote in message

news:LfbSh.26477$VU4.7070@bgtnsc05-news.ops.worldnet.att.net...

>   read(1, '(a15)', advance = 'no') a15
>   do i=1, buffer
>      read(1, '(i1)', advance=no, eor=100) vector(i)
>   end do
>   ! at this point a buffer full has been read, there may still be more
>   ...
> 100 continue
>   ! at this point an end of record has bean read and i-1 elements of
>   ! data have been read

> You'll have to read up on non-advancing I/O to fill in the details.
> This will work for strings of digits as long as your system's record
> limit permits.

If instead, one didn't need to have to read in a character string but were
simply going after a number, one would clearly not need the statement:

>   read(1, '(a15)', advance = 'no') a15

If the number you were expecting were of selected real kind (13, 37), how
would you read it in?  It would seem to me that you set buffer to 64 or
greater, you would be alright.(?)
--
WW

Because I choose to make X be a single character string with a
length of 100000000000000000 characters.  Fortran character
variables are very different from C character strings.  Each
Fortran character variable can have its own length, and it doesn't
have to be 1.  It's not an array of 100000000000000000000 elements.
(I should have used a length that is easier to type, ;()  The
X(I:I) notation means the Ith character.  In general, you can
use X(I:J) to mean all of the characters from position I through
J and that is treated as a single thing (with a length of J-I+1).

You could also write an array or elements of any length.  For this
problem, a character length of 1 would be the best choice.
Something like
       character(len=1), dimension(10000000)  ::  X
[there's other syntax for this that is somewhat terser, but
spelling out all of the words makes it easier to understand.]
Then, you would refer to an element of the array as X(I),
which is a character thing of length 1.  With the array
notation there is no direct way to make a longer thing than a
single character.  The major drawback to using an array for this
problem (as I originally understood it) is that when you try to
read an array with something like
        read(...)  x
the processor will attempt to read in the entire array and generate
an error when it comes to the end of the actual data line (unless
it happens to be 10000000 "bits" long).  If you try to read in
a single character variable (the len=1000000000000000 case) the
processor will blank fill the variable if it comes to the end of
the data line.

Oops, I'm off playing with my granddaughgter, so I guess I had
a senior moment.  It should be len_trim.

Dick hendrickson

Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc