BenMargolis.com

Space Art
Galleries
Wallpaper
Star Trek

The Two Spirals Problem Solved
Using a simple four-layer perceptron and standard backprop
(You're going to say I cheated.)

While the Two-Spirals Problem may be (or actually, may not be) a valuable benchmark for neural network researchers working on new architectures, I believe that it much more valuable, and to a much larger number of people (those working on applied NN solutions), as an example of THE WRONG WAY to present data to a neural network. This paper is intended as a primer on how to evaluate and manipulate your dataset prior to building a neural network to improve or even guarantee your results.

UPDATE: March 2008

It seems that a white paper published at the 1995 IEEE International Conference on Neural Networks already made my point here.
See: Solving two-spiral problem through input data representation

Two researchers, Jiancheng Jia and Hock-Chuan Chua found that "Using a proper input encoding approach, the two-spiral problem can be solved with a standard backpropagation neural network."
Well, Duh!

You can go download their white paper for $35, or read mine for free.

First of all, let me say that I did not cheat.
The whole question of whether I cheated or not, boils down to your interpretation of the assignment. If you are a student in a classroom, and a teacher tells you to solve the two-spirals problem (which usually uses x and y as the inputs) then what I did would be considered “not following directions” and I’d get a “failing grade” (again, but that’s beside the point).

If, however, you are a professional in any kind of a business environment working on applied NN solutions, and you’re given the two-spirals problem, then you have to assume that x and y simply represent the available data, and as a result, passing x and y through an SQL query in order to turn that data into inputs that the NN can actually understand isn’t cheating, its simply doing your job.

In short, you really don’t have to design some ground-breaking neural network architecture to solve this problem; you just have to understand how an NN functions, and how the inputs relate (or should relate) to the outputs. (And then how to modify the inputs accordingly.)

Worst Dataset Ever!
This problem is perhaps the best example of bad data for a neural network that I have ever seen. The authors, I’m sure, were trying to create a challenging problem, but I don’t think they meant to do this. Because the problem here is that it’s simply the wrong way to feed data to the NN. There are several issues with this dataset, all of which are easily correctable.

First, when we (people) look at an x-y plot of this dataset, we see a simple pattern (the two-spirals). However, when we are looking at that plot, a) we’re looking at the entire dataset at once (as one fact) and b) we’re using a much more advanced neural system than any artificial neural network ever invented.

We "input" this one image/fact through a highly specialized neural input device called a retina. Which is itself an amazing complex grid of neurons responsive to light and dark (and color), and then we pass that data to the most advanced pattern recognizer on Earth, the human brain.

Now let’s compare that lovely picture to the actual dataset:

x

y

pattern

6.5

0

1

-6.5

0

-1

6.3138

1.2559

1

-6.3138

-1.2559

-1

5.88973

2.43961

1

-5.88973

-2.43961

-1

5.24865

3.50704

1

-5.24865

-3.50704

-1

4.41941

4.41943

1

-4.41941

-4.41943

-1

3.43758

5.14473

1

-3.43758

-5.14473

-1

2.34392

5.65877

1

-2.34392

-5.65877

-1

1.18272

5.94601

1

-1.18272

-5.94601

-1

-0.00002

6

1

It continues for 194 rows
download it all here

We have chopped up that one image-fact into 194 individual number facts (pixel-facts?), two inputs each, x and y coordinates (which don’t come close to occupying the entire x, y plane) coupled with one output; a color (black or white). In other words, we have turned the act of looking at a picture into the equivalent of playing the game of Battleship. Think about that for a minute, this dataset does not represent a picture anymore. To the NN, it’s just, “B-15,” “Miss!” and “A-16,” “Hit!” over and over again.

Worse yet, we’re not even giving the computer the benefit of a board and pegs (because we don’t allow it to memorize the facts). As I have said before, if you’re going to expect any kind of AI to understand something, you better have at least one person (i.e. a human brain) who understands it first. I challenge you to find a single human savant to whom you can call out these x, y coordinates and “black” or “white” 194 times and have him say at the end, “Oh! You mean two interlocking spirals!”

So while a summary x, y plot of the dataset has meaning to a person, the individual facts do not. We can see this a little more clearly when we look at a line chart of the facts.

(Note: it is my experience that a line chart is a much more reliable method of showing direct relationships of inputs and outputs. If your facts cannot be shown clearly on a line chart, it is doubtful you will be able to create an MLP that understands them.)

This line chart looks like the sound wave created by a guitar string, but that' the only pattern I see. More importantly, there seems to be no relation between the height of the purple and blue lines (x and y) and the zig-zagging yellow line beneath (the output).

The other issue with using x, y coordinates as an input is zero. On the x, y plane 0, 0 (“the origin”) is just another point. But to a neural network, zero, as an input, is nominal. It’s not a number (yes, I know zero is a number) but to a neural net it represents the absence of input, numbness if you will. Neural activity decreases as the inputs approach zero (literally, the output of that neuron becomes zero). But in the two spirals problem, the frequency of the black/white shift increases as the inputs approach zero, which is completely contrary to any kind of sigmoid transfer function (which outputs 0 for the input of 0).

But, now I hear you say, other people have already solved this problem using more advanced NNs. True. But all those who have done so, ended up using some form of grid of neurons (i.e. a retina) but they put them in the hidden layers.

Cascade Correlation (Falman and Lebiere, 1991) adds neurons one by one, linking each neuron to all of the inputs and outputs and to all the other neurons (forming a grid). Alexis Wieland of MITRE, who invented this problem, solved it using quickprop (Falman 1988) and a network with five hidden layers of five neurons each, but all the hidden layers were connected to both inputs, again this forms a grid.

So I figured that instead of using a hidden-layer grid, the problem could be more directly solved by using an input grid and a regular multilayer perceptron. My theory being that this would more closely resemble the retina-brain relationship, and just might work. The question was, how to arrange this grid, and many (or actually, how few), neurons I needed. The next step was to take a really good look at two spirals.

I generated a two spirals image in a photo-editing program by taking black and white halves and applying a “spiral distortion” of two and a half revolutions, I then used a pixelization filter to reduce the number on inputs. While a spiral image begins to appear in as few as 100 pixels (i.e. inputs) it takes over 1000 pixels to clearly show the center. While our eyes are certainly more complex, I figured that 1000 input neurons were simply too many, and there had to be a simpler solution.

(These animations are supposed to loop, but on some browsers, you may have to right-click and "rewind" and then "play" to see them from the beginning.)

Then I noticed that the photo-editing program’s pixelization filter had another setting, something it called, “radial pixelization” (which sounded to me like an oxymoron) so I gave it a try. See the next animation. No matter how few radial segments I used, the spiral was still clear.

It was now obvious that the relevant data to this problem was not the x, y coordinates, but rather the distance from the center point, and the radial location (i.e. bearing) from the center point. It now seemed almost silly that we were trying x, y inputs in the first place, we were literally trying to describe a spiral using nothing but squares (Euclid, please!).

Converting Data to Inputs
I now decided that my input-retina would consist of four neurons, one for each quadrant, and the value of the input would be the range from the center point. The means that if our x, y coordinates are +.3, +.4 then our point is in Quadrant 1 (the q1 neuron), the value is .5 and the other three inputs would be zero.

This was the same as simply asking the NN to remember that Quadrant 1 is “black-white-black” and Quadrant 2 is “white-black-white.” I could now say to you, “Quadrant 1, range 1” and you would respond “black,” and I’d say, “Quadrant 2, range 3,” and you’d say “white.” Even I can remember this!

I also thought the NN needed to know where the point was, within the quadrant, along the arc at that range. So I designed a fifth input I called “pan” (as in to pan left and right). This was another simple derivative of the x, y coordinate, calculated by subtracting the absolute value of x from the absolute value of y. This number becomes -1 as your point approaches the y axis, and +1 as your point approaches the x axis. It is zero if your point lies on a 45 degree angle from the center point, (i.e. x=y).

A simplified spiral drawn in an illustration program showing how the data was converted,
see the table below for the resulting inputs from these points.

Original Data Derived Inputs and Output

Point X Y Color Q1 Q2 Q3 Q4 Pan Output

A 0.45 0.35 -1 0.570088 0 0 0 0.1 -0.95

B -0.4 0.275 1 0 0.485412 0 0 0.125 0.9375

C -0.275 -0.775 -1 0 0 0.822344 0 0.5 -0.75

D 0.3 -0.3 1 0 0 0 0.424264 0 1

*Note: Points shown are accurate to the simplified illustration above
and are not actual facts from the two spirals dataset, but you get the idea.

I imagined this would be like putting four optical sensors (or as neurons, rods and cones) behind a plastic Fresnel lens. If I shined a light at one point, an entire ring of the lens would light up, but mostly in on quadrant. Now in my thought experiments, I would have a set of four input neurons each of which should respond something this waveform above. I also figured each of these neurons would be most accurate in the middle of its "optical area" at the 45 degree line, where pan = 0, and x=y.

And so I guessed that the total output of the NN would look something like this illustration (if you squint at it, its a spiral).

It would turn out that I grossly overestimated the amount of generalization I could get out of this dataset.

It's not all Black and White
It was to now time to begin experimentation. I wrote an SQL query to convert the two-spirals dataset (or any other set of x,y coordinates) into the inputs I just described, and I fired up Brainmaker.

My first experiment was just to see if I could train a NN to understand Quadrant 1. My inputs were the q1 neuron and the pan value, the dataset was reduced to just the points within Quadrant 1. This NN did not train.

So I took a good look at the dataset in Excel (why do I always wait until after the dataset doesn’t train to do this?).

The dataset for he first experiment, Q1 in Red, Pan in Green, Output in purple.

As you can clearly see, the pan value was entirely moot. As a matter of fact it's quite distracting even to look at. While you and I can easily ignore irrelevant input, it would take an NN a million or so training epochs to finally grind all those weights to zero, and mathematically ignore the input. So I deleted the pan value and tried again.

This NN almost immediately got about half of the facts right, but then bottomed out. Back to Excel, where I finally noticed the third and final problem with the two-spirals dataset, the outputs.

The instructions are to use sigmoid units and that the network should generalize. But there is absolutely no indication of how it should generalize. The outputs are all either black or white, and now that I’ve grouped them by quadrant, in large groups of black and white, with almost no space between them. This is a square waveform and not the not the nice curvy sine wave I had imagined.

The example outputs of the dataset (-1 or +1) are all at the extremes, at the maximum and the minimum, without a single example of the mean. How do you expect the NN to generalize when you don’t give it any hint as to where “gray” is, or for that matter that gray even exists? This also creates an incredible step-size problem, (which is common in MLPs with bad data).

I tripped over the Step-Size Problem

The Step-Size problem is basically the problem that as you adjust the output of any given neuron, (i.e. correcting it because it was "wrong") if you adjust it too much, you may make the neuron become wrong on similar inputs that used to be right. That is to say, if .2 is black, .3 is black, .4 is black, .5 is black, and your asking the NN about .6, its going to say "black." But the backprop algorithm is going to say, "No, white." and adjust all the neurons accordingly. But if the Learning Rate is too high, the NN may now will think that .5 is white too, or at least gray, and most likely screw up .4 as well.

On the other hand, if the learning rate is too low, it won't make a large enough adjustment and it won't be able to adjust the NN far enough to make .6 white, not in one epoch anyway. And so it will still take forever to train the NN which is just as bad.

Given the square wave we are asking for as output in this problem, the learning rate would have to be almost zero in order for .33333335 to be black and .33333336 to be white. And with that kind of learning rate it would take millions of years to train that NN with standard backprop. (That's a vast oversimplification that doesn't take into account the "local minimum" issue, but that doesn't apply here.)

Anyway the point is that your dataset MUST include at least some "gray."

When we look at the x,y plot of the dataset, we have ADDED gray to it. indeed the vast majority of the image is gray. In my thought experiments, I had realized that my quadrant based neurons would be most accurate along the 45 degree lines and less accurate as you approach the x or y axis. I had assumed those areas would be gray.

Now I decided that I had to demonstrate to the NN that we are less confident about the points closer to the axes. That would mean reducing the output values, bringing them closer to zero, (i.e. grayer). I realized that my (then unused) "pan" value was almost perfect for this function. I subtracted one-half the absolute value of pan from the output if the output was positive, or added the same amount if the output was negative.

This means that if a given point lies directly on an axis, its output was now either -.5 or +.5, instead of -1, or +1. It also means that almost none the points in the data set are now black or white, (only the ones lying directly along the 45 degree angles), and almost all of the points are some shade of gray. Also by moving all the output points off the max and min, we've shown the NN that there is indeed a mean in there somewhere. And we've created some much needed distance between max and min, thereby avoiding the step-size problem.

It is also worth noting that while almost all the training outputs are now "gray," they're still all well within the "40-20-40 Rule" for determining binary output from a sigmoid NN, that is to say: They are all still "correct."

The Little Dataset that Could
All 194 actual derived inputs, Q1 in Red,. Q2 in Green, Q3 in Blue and Q4 in Yellow and the final derived Output in Purple.
You can see the input "range" value increasing from 0 to 1 in each quadrant. You can see a clear direct relationship between the input and the output, and you can see that the output has a curve to it that is relatively gentle and very predictable.

Armed with this new dataset, I went back to BrainMaker. I tried several three-layer perceptrons with between 10 and 100 neurons in the hidden layer. In these experiments I learned that using a decaying learning rate based on the mean squared error was best, that the minimum learning rate should be very low (around .0001), and I had been having the best results with about 25 neurons in the the hidden layer. Unfortunately those "best results" still bottomed out at about 75% accuracy.

I had wrestling with this for a few hours and was about to give up, at least for the day, when mainly out of boredom, I clicked the button in BrainMaker that adds another hidden layer.

I usually don't like extra hidden layers in my perceptrons. I've read, and I had believed, that there is nothing four-layer perceptron can do that you can't do in three. And from my understanding of the math, it seemed to me that all you were doing with a four layer perceptron was just "spreading out" the calculation between the layers, spreading out the error, and spreading out the backpropagation function. It really didn't seem to that there was an advantage to do it. But I clicked the button anyway.

The damn thing trained in under 5300 epochs. I couldn't believe it. I had walked away from the computer so I really didn't see it happen, (I was playing Crazy Taxi at the time) but the auto training stop had gone off, 100% accurate.

I clicked test. 100% accurate. I clicked it again, it did again. I poured myself a drink.

The next thing I did was to generate a set of 10,000 x,y coordinates, and run them through the same SQL query I used on the original dataset, then I ran that through the NN I had just trained as a verification set. The results are shown here.

I should point out that The jaggedness you see is simply the result of the my using only 10,000 points in my verification set, had I used a million or ten million, the curves would have been much smoother, but Excel has problems with files that large.

I saved the NN, and then randomized all the weights and tried again. I was able to train several different 4-25-25-1 networks all to 100% accuracy. Although each of later ones trained in between 10,000 and 20,000 epochs. (So I may have been a little lucky with the random weights on that first try.)

The one anomaly that proves it
As you look at the output plot of the verification set, you may notice one anomaly: The upper right hand corner (i.e. Quadrant 1 where Range is greater than 1), this area is black, but I'm not sure precisely why.
There were no training facts that indicated the area should be black. Indeed, there are no training facts at all with a range >1. I believe that this is a side-effect of the calculations of the other neurons combined with a complete absence of training facts that contradict it.

In short, its a guess, a digital guess, and a damn good guess at that. I think this is proof that the NN understands the two spirals. The thickness of the white line (all right, the white arc) is correct, and it certainly fits the pattern. After all, what comes next in the pattern "black, white, black, white, black, white?" Obviously black. Its perfectly logical. And yet, it has nothing to do with the dataset.

Conclusions

The good old MLP trained with standard backprop is a lot more powerful than most people believe.

You must evaluate the dataset on a fact by fact basis and not some summary graphical basis that requires higher brain function to understand.

You (as a person) must be able to understand the relationship between the inputs and the outputs (on a fact by fact basis) and try to find the most efficient and logical way to present that data to the NN before you begin training.

There must be "gray" in the output of the training set.

A four-layer MLP can indeed learn a pattern that a three-layer MLP cannot.

APPENDIX 1
Why this wasn't cheating

Each "derived fact" was a direct mathematical function of exactly one "available data fact" and contained no other information.

Each "derived fact" was simply the result of well established mathematic principles (i.e. Pythagorean Thorium, or simple addition)

The derivation method (i.e. SQL query) was quickly and easily applicable to any other "available data" input the NN may encounter at runtime, i.e. any other point in the x, y plane.

There were exactly 194 facts in my training dataset.

APPENDIX 2
The SQL Query
The Two-Spirals dataset I downloaded (which I called "fctTwoSprials") used an X,Y grid of plus or minus 6.5 for some reason. This SQL query takes those X,Y coordinates (divides them by 6.5) and converts them into the input I describe here.

The inputs required for this query are X, Y, the output labeled "Pattern," and a random number field, "rnd" for random sorting. The query then calculates the proximity to the X and Y axes, the range from the centerpoint, and which quadrant. Then it displays the four derived neural inputs I used (Q1 through Q4) and my derived output "PatternB."

SELECT fctTwoSprials.x, fctTwoSprials.y, (Abs([x]/6.5)-Abs([y]/6.5))/2 AS Prox, IIf([Pattern]>0,[pattern]-Abs([prox]),[pattern]+Abs([prox])) AS patternB, IIf([x]+0.0001>0 And [y]+0.0001>0,1,IIf([x]+0.0001<0 And [y]+0.0001>0,2,IIf([x]+0.0001<0 And [y]+0.0001<0,3,4))) AS Quad, Sqr(([x]^2)+([y]^2))/6.5 AS range, Abs([x]/6.5)-Abs([y]/6.5) AS ProxY, IIf([quad]=1,[range],0) AS q1, IIf([quad]=2,[range],0) AS q2, IIf([quad]=3,[range],0) AS q3, IIf([quad]=4,[range],0) AS q4, fctTwoSprials.[rnd] FROM fctTwoSprials ORDER BY fctTwoSprials.[rnd];

(Its not exactly an elegant solution, I should have been able to figure out which quadrant without using a series of four nested "IIF" statements, but it works.)

Note: There are a few points that lie directly on the axes. So the query above adds .0001 to each coordinate to "goose" those points into one quadrant or another. That's the [x]+.0001 stuff you see. I ONLY do that when determining which quadrant. I don't actually change the x,y coordinate used in the range or prox calculations.

APPENDIX 3

x y pattern

6.5 0 1

-6.5 0 -1

6.3138 1.2559 1

-6.3138 -1.2559 -1

5.88973 2.43961 1

-5.88973 -2.43961 -1

5.24865 3.50704 1

-5.24865 -3.50704 -1

4.41941 4.41943 1

-4.41941 -4.41943 -1

3.43758 5.14473 1

-3.43758 -5.14473 -1

2.34392 5.65877 1

-2.34392 -5.65877 -1

1.18272 5.94601 1

-1.18272 -5.94601 -1

-0.00002 6 1

0.00002 -6 -1

-1.15837 5.82341 1

1.15837 -5.82341 -1

-2.24829 5.42778 1

2.24829 -5.42778 -1

-3.22928 4.8329 1

3.22928 -4.8329 -1

-4.06589 4.06584 1

4.06589 -4.06584 -1

-4.729 3.15978 1

4.729 -3.15978 -1

-5.19684 2.15256 1

5.19684 -2.15256 -1

-5.45563 1.08515 1

5.45563 -1.08515 -1

-5.5 -0.00004 1

5.5 0.00004 -1

-5.33301 -1.06085 1

5.33301 1.06085 -1

-4.96584 -2.05696 1

4.96584 2.05696 -1

-4.41716 -2.95151 1

4.41716 2.95151 -1

-3.71228 -3.71234 1

3.71228 3.71234 -1

-2.88198 -4.31328 1

2.88198 4.31328 -1

-1.9612 -4.7349 1

1.9612 4.7349 -1

-0.98759 -4.96524 1

0.98759 4.96524 -1

0.00006 -5 1

-0.00006 5 -1

0.96331 -4.84262 1

-0.96331 4.84262 -1

1.86564 -4.50389 1

-1.86564 4.50389 -1

2.67373 -4.00141 1

-2.67373 4.00141 -1

3.3588 -3.35871 1

-3.3588 3.35871 -1

3.89755 -2.60418 1

-3.89755 2.60418 -1

4.27297 -1.76985 1

-4.27297 1.76985 -1

4.47485 -0.89004 1

-4.47485 0.89004 -1

4.5 0.00007 1

-4.5 -0.00007 -1

4.35222 0.86578 1

-4.35222 -0.86578 -1

4.04195 1.6743 1

-4.04195 -1.6743 -1

3.58567 2.39595 1

-3.58567 -2.39595 -1

3.00515 3.00525 1

-3.00515 -3.00525 -1

2.32639 3.48182 1

-2.32639 -3.48182 -1

1.5785 3.81103 1

-1.5785 -3.81103 -1

0.79248 3.98445 1

-0.79248 -3.98445 -1

-0.00007 4 1

0.00007 -4 -1

-0.76824 3.86183 1

0.76824 -3.86183 -1

-1.48297 3.58 1

1.48297 -3.58 -1

-2.11817 3.16994 1

2.11817 -3.16994 -1

-2.6517 2.6516 1

2.6517 -2.6516 -1

-3.06609 2.0486 1

3.06609 -2.0486 -1

-3.34909 1.38716 1

3.34909 -1.38716 -1

-3.49406 0.69493 1

3.49406 -0.69493 -1

-3.5 -0.00008 1

3.5 0.00008 -1

-3.37143 -0.6707 1

3.37143 0.6707 -1

-3.11806 -1.29163 1

3.11806 1.29163 -1

-2.7542 -1.84039 1

2.7542 1.84039 -1

-2.29804 -2.29815 1

2.29804 2.29815 -1

-1.77082 -2.65035 1

1.77082 2.65035 -1

-1.19581 -2.88715 1

1.19581 2.88715 -1

-0.59739 -3.00367 1

0.59739 3.00367 -1

0.00008 -3 1

-0.00008 3 -1

0.57315 -2.88104 1

-0.57315 2.88104 -1

1.10029 -2.65612 1

-1.10029 2.65612 -1

1.5626 -2.33847 1

-1.5626 2.33847 -1

1.9446 -1.94449 1

-1.9446 1.94449 -1

2.23462 -1.49303 1

-2.23462 1.49303 -1

2.42521 -1.00447 1

-2.42521 1.00447 -1

2.51328 -0.49985 1

-2.51328 0.49985 -1

2.5 0.00007 1

-2.5 -0.00007 -1

2.39065 0.4756 1

-2.39065 -0.4756 -1

2.19419 0.90894 1

-2.19419 -0.90894 -1

1.92273 1.28482 1

-1.92273 -1.28482 -1

1.59094 1.59104 1

-1.59094 -1.59104 -1

1.21525 1.81888 1

-1.21525 -1.81888 -1

0.81314 1.96327 1

-0.81314 -1.96327 -1

0.40231 2.02288 1

-0.40231 -2.02288 -1

-0.00007 2 1

0.00007 -2 -1

-0.37805 1.90026 1

0.37805 -1.90026 -1

-0.71759 1.73225 1

0.71759 -1.73225 -1

-1.00702 1.507 1

1.00702 -1.507 -1

-1.23748 1.23739 1

1.23748 -1.23739 -1

-1.40314 0.93748 1

1.40314 -0.93748 -1

-1.50133 0.62181 1

1.50133 -0.62181 -1

-1.53249 0.30477 1

1.53249 -0.30477 -1

-1.5 -0.00006 1

1.5 0.00006 -1

-1.40987 -0.28049 1

1.40987 0.28049 -1

-1.27031 -0.52624 1

1.27031 0.52624 -1

-1.09128 -0.72923 1

1.09128 0.72923 -1

-0.88385 -0.88392 1

0.88385 0.88392 -1

-0.6597 -0.9874 1

0.6597 0.9874 -1

-0.43048 -1.03938 1

0.43048 1.03938 -1

-0.20724 -1.04209 1

0.20724 1.04209 -1

0.00004 -1 1

-0.00004 1 -1

0.18293 -0.91948 1

-0.18293 0.91948 -1

0.33488 -0.80838 1

-0.33488 0.80838 -1

0.45143 -0.67555 1

-0.45143 0.67555 -1

0.53035 -0.53031 1

-0.53035 0.53031 -1

0.57165 -0.38193 1

-0.57165 0.38193 -1

0.57744 -0.23915 1

-0.57744 0.23915 -1

0.5517 -0.10971 1

-0.5517 0.10971 -1

0.5 0.00002 1

-0.5 -0.00002 -1

The Two Spiral Dataset

I notice from my web stats that many of you are looking for the actual "Two Spiral Dataset"
when you find my site, so here it is:

One Last Note:

(In my research I've noticed that there seems to be a large number of apparently non-native-English-speaking mathematicians who call this "The Two Spiral Problem." So if you've been Googling "Two Spirals Problem" you may want to drop the plural and try again.)

Lastly, It has taken me far longer to write this webpage than it did to actually solve the Two Spirals Problem, on the order of a magnitude of ten. So I hope you appreciate it, and you learned about the process of applied neural network design.