Actually, using "random" values is not really a good idea. The problem is
that the data distribution won't be patterned after the "real" data. This
evenly--a characteristic that's not typical of a live database. Because of
live data is used.
It's all about statistics.
>Actually, using "random" values is not really a good idea. The problem is
>that the data distribution won't be patterned after the "real" data.
That depends, if you use it in the right places you can create some
very interesting data. No two clients will use your systems the same,
and some will even abuse fields in ways you never throught of.
Creating several databases with different weights (heavier in some
fields than others) you will get a better feel for performance needs.
>means the data would be spread over the scope of the indexes fairly
>evenly--a characteristic that's not typical of a live database. Because of
>this, the way you tune your queries, build your indexes and manage how the
>QO works would be generally worthless and would have to be repeated once the
>live data is used.
Having a series of databases and understanding how users might use
your systems you can create a basis of how to create your tables. Most
people like to think that you can create one master database model for
all your client needs and this is unrealistic. You can come close, but
you can't really meet their needs until you have their data and you
know what they might want.
Because taking this length with all your clients is often too
consuming you need to have thought about how you can model differently
or use your indexes/cache differently for situations. Some clients
might have small databases and have small hardware. Some clients have
all the hardware in the world and you can really load down the system
with indexes and cache. So creating a model is a good idea, but it's
not the magic bullet.
>If you really want test data in the same proportions and distribution as the
>production database, it's best to make a copy of the production database and
>"desensititize" it--scramble the SSAN or other private information in ways
>that don't affect the distribution. Depending on how you write your queries,
>you might also want to replace the text with greeked text to protect the
>innocent and the privacy of the data.
And if you only had one production database then this would be a good
>It's all about statistics.
And statistics always lie when it comes to the real world. Be prepared
and code with heavy defences. -----------------------------------------------Reply-----------------------------------------------
On May 30, 8:28 pm, "William \(Bill\) Vaughn"
This is interesting...
I guess there is really no real free tool for this.
Only one i could find, is DBMonster: http://dbmonster.kernelpanic.pl/
Didn't check it out yet... But i've got not much choice.
Don't want to waste the time to develop something myself, even if it's