Debugging death



In episode 144, I oh so casually mentioned that I was getting some runtime errors. They don’t happen in every run, but the fact that they happen at all is a problem. My artificial life system will eventually be running for days at a time as I do different experiments, and I can’t have these errors causing my system to halt.

What does one do when one sees output like the following?


Random seed=1519820768283.
The size of the realm=3
Before:
ready
start
Longest=0.
null
null
null
Elapsed time=0 hours, 0 minutes and 0 seconds

Exception in thread “Thread-0” java.lang.IndexOutOfBoundsException: Index: 0, Si
ze: 0
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.remove(Unknown Source)
at DishDeath.check(DishDeath.java:88)
at Realm.cycle(Realm.java:73)
at CyclerThread.run(CyclerThread.java:41)
q
After:
longest=514

I can remember seeing errors like this from other projects, and being completely flummoxed. This time, the error message was enough to tell me what the problem is.

The death object will remove a figure when it gets overly large. It then checks to see if too much time has passed, if the death clock value is too large. If it is, the oldest figure is removed. The trouble happens when the last figure in the population was too large, and the death clock was too large. In that case, the program tries to remove a figure from the population that isn’t there.

Though I was confident about what was causing the error and how to fix it, I still wanted to be able to check. I can and did diddle my code to force that error to occur, but the only way to be certain that my fix will in fact fix it, is to catch one of these errors in the wild, so to speak, and then test the fix against that run. But how does one repeat results when there is so much randomness in any given run?

Fortunately, the system uses a pseudo-random number generator.

From the java documentation for the java.util.Random object:


An instance of this class is used to generate a stream of pseudorandom numbers. The class uses a 48-bit seed, which is modified using a linear congruential formula. (See Donald Knuth, The Art of Computer Programming, Volume 2, Section 3.2.1.)
If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers.

All I need do is hand code a number to act as the seed for the random number generator, and then every time I run the system it should act exactly the same way. I can try different seeds until the error I want to try and fix is generated, and then I can be certain that the fix is working.

Only, it didn’t work!

I put in the seed you can see in the first line of the output, but running it more than once cause different results. I was completely flummoxed.

I diddled the code, and made and tested a fix that way. It took less than ten minutes, but the inconsistent behavior from my system when it should be behaving consistently bothered me. There will be plenty of times when I’ll want to be able to repeat results. Thank all the gods of bits and bytes that the answer hit me, this morning.

I just needed one more line of code that I had forgotten, “baby.rand=realm.random;” and it started behaving consistently, just as it should when given the same seed. Then I lucked out, and the first seed I tried produced the error in less than 2 minutes.

I’m about to put my fix back in and give it a try. Let’s see what happens.


Random seed=1519820768283.
The size of the realm=3
Before:
ready
start
Longest=0.
null
null
null
Elapsed time=0 hours, 0 minutes and 0 seconds

report
Longest=514206425.
null
Figure 514206424 at 1 Size=15 true command=15 port=-33
Figure 514206425 at 2 Size=15 true command=15 port=-33
Elapsed time=0 hours, 13 minutes and 30 seconds

A friend happened to call while the experiment was running. After I got off the phone, I got a report. Already you can see that the error has been fixed. Otherwise it would have happened already.

For fun, I let it keep running. Here’s the last bit of the output, which shows that more than 2-billean replications took place, and that all is well, even if it took about twice as long as it has in other runs.

“report
Longest=2140101568.
Figure 2140101567 at 0 Size=15 true command=-50 port=-37
Figure 2140101568 at 1 Size=15 true command=-15 port=-13
null
Elapsed time=0 hours, 38 minutes and 52 seconds

It’s alive… ALIVE!
interrupting cycles
cycles interrupted
Longest=2147483647.

It’s alive… ALIVE!
Snap shot:
Longest=905.
Figure 0 at 0 Size=15 false command=0 port=0
Figure 1 at 1 Size=15 false command=0 port=0
null
Elapsed time=0 hours, 5 minutes and 29 seconds

Current:
null
Figure 2147483647 at 1 Size=15 true command=-50 port=-37
Figure -2147483648 at 2 Size=15 true command=-15 port=-13
Elapsed time=0 hours, 38 minutes and 59 seconds

q
After:
longest=2147483647

That’s enough for now.


Leave a Reply