Breadth- and Depth-First Thinking

Breadth-first vs. Depth-first

Breadth-first and depth-first are two opposing ways to explore. Breadth-first is kind of like questioning everyone at the crime scene before moving on. Depth-first is like immediately heading for the detective's car whenever you hear a lead, skipping the testimony of other witnesses. Usually we expect a thorough investigation at the crime scene. But what if the culprit fled, and the best way to catch him is to immediately continue the chase?

You can see how these two methods apply to other activities besides detective work. For example, if I were looking for a house, I might try to visit all the open houses in the same neighborhood on the same day. In that scenario I should investigate the current area before moving on. This is breadth-first. Let's take a look at another example. Imagine I'm in a band, and I'm planning our first tour across the US. I'd book venues so that, for certain legs, the intermediate stops were as far apart as possible. I wouldn't research every small-town venue on the way from Minneapolis to Boise. The big-city shows are what will make us famous, so it doesn't make sense to spend time comparing dive bars.

I think of these two search methods as opposing forces in the action of a piston. Both are necessary for proper function. Neither will get you all the way there. Alternating between the two to propel yourself through the search space can be an effective heuristic.

Some of the most interesting applications of this model are game-theoretic. Take an Easter egg hunt for instance. If you start alone in the middle of the field then it doesn't matter what search method you choose. Given time, you will always cover the entire field. The only thing you can affect is the order in which you cover it. If other kids start with you, however, then the method you choose matters a lot. In particular, because of the inverse-square law, a beeline away from the start will give you the best chance of covering territory no other kid has covered. If you spiral from the start, you're going to be re-covering territory. The more kids there are, the worse it is to spiral. The other kids don't even have to be copying your spiral strategy. Random behavior by your opponents will clear your turf before you can.

Now suppose instead that this egg hunt is cooperative instead of competitive. A reasonable goal in a cooperative setting is to minimize the territory each kid covers. And a good way to do this is to minimize the duplicated coverage. Under most models, this means separating kids into zones and performing some kind of breadth-first search. Breadth-first won't maximize any individual kid's egg take, but it should minimize the time it takes to clear the field, and how tired the kids are afterward.

Now we have an idea for characterizing the notions of breadth-first and depth-first. For a given notion of search distance, BFS is the strategy that minimizes local distance, while DFS is the strategy that maximizes it.

Like any model, this one has limitations. It does not fully capture the intricacies of search strategies. Discussions of search strategies, however, are even more limited in their descriptive ability. A high-level way to partition the space would be a useful addition.

In multiplayer games, often one of these strategies is cooperative and the other is competitive. In many instances depth-first search represents the selfish strategy, and without other clues it is reasonable to start with this assumption. However, it would be a mistake to assume that this assumption holds in every instance, or that the depth-first strategy always represents the personally-optimal solution. In the competitive egg hunt, maybe the equilibrium strategy is for the kids to oscillate radially from the center, so that the zones are pie slices. This is a complex search strategy with elements from both columns, so it is not accurately described as either breadth-first or depth-first.

The graph of ideas is one territory where description by cooperative or competitive search is particularly apt. If your peers consider discovery a collaborative effort, then you may be viewed as a defector in a zero-sum game for flitting into other people's zones to collect high-value targets. Here's an excerpt from Peter Lax's otherwise favorable eulogy for von Neumann:

...von Neumann's great versatility led some pure and puritanical mathematicians to charge that he had merely skimmed the cream off subjects, and left others to the arduous task of getting to the bottom. The story is told of the time when the mathematics faculty of the Institute for Advanced Study was discussing the appointment of Enrico Bombieri. His great contributions to function theory, calculus of variations, minimal surfaces, and algebraic number theory were described; everyone was favorably impressed, except one distinguished professor who commented: "This fellow dabbles in this, dabbles in that; he will end up like von Neumann."
from "Remembering John von Neumann"

The story may be apocryphal. Still, Lax knew this and included the story anyway. Bill Thurston was known for clearing fields of mathematics; he was so effective in clearing fields that he soon found himself doing it alone. Isolation can be devastating. Thurston was humbled into saving some of the work for others. Grothendieck resented Deligne for finishing the final Weil conjecture, which Deligne accomplished by sprinting away from the assembled pack. Grothendieck's understanding was that everyone should build the more generally useful theory of the entire field, and that no one should shortcut the process by jumping at the quicker and less generally useful solution to the specific problem. Grothendieck believed in a slow-moving program, Deligne decided to get the answer now—and the prizes that came with it. Perelman's solution to the Poincaré conjecture was another conclusion of a large program. Though his association with the program was less intimate, and the program less controlled, his behavior was different. Perelman declined the prizes and demurs on the value of his own contributions. He credits the program's creators. Without judgment on its appropriateness, the social convention is clear: stay in your zone.

Given the reactions, you might think of depth-first as the easy way out, but that isn't quite it. Depth-first strategies are in general harder. What's really annoying to other people about the depth-first strategy is the lopsided payoff ratio. The work is still hard, you just get outsized rewards. Sometimes rewards that look like they should go to other people.

In general it tends to be harder for individuals to pursue depth-first strategies. The reason is that the mental structures you assemble are weaker. You put the same investment into a platform that goes a longer way. So understanding along the way tends to be shakier. If your platform were concentrated about a center, that would be a more stable arrangement than one that travels away from a single point. A lily-pad is a stronger platform than a frond, for instance. Depth-first strategies may get you there faster; the tradeoff is that there's less guarantee the path holds. Breadth-first is the way to go if you need a reliable foundation.

If you have no idea where the path should go, then depth-first won't be much help. Moving faster in the wrong direction is only going to take you farther away. You have to stop and figure out where you're going. Usually that means examining your surroundings for clues. That will give you a sense of where to go, and then you can start taking informed leaps.

These kinds of leaps are required for problem solving. Mathematics puzzles of the sort that get passed around usually have an evolved resistance to breadth-first search. The point of the puzzle is that it requires some kind of logical leap, and the puzzle itself was chosen to give little indication of what kind of leap or what direction that leap should take. The solution doesn't have to be long. It may only consist of a few hops. The challenge there is in picking the right solution sequence from the explosion of possible paths.

So perhaps we can characterize depth-first search in a different way. If your goal is to exhaust the graph, then it's accurate to speak in the broad terms of breadth-first and depth-first. If you only intend to visit a small part of the graph, a few nodes, then it's more accurate to speak about building a directed path. In a detailed model of the Easter egg hunt, you could follow a directed path to each of the interesting features: bushes, benches, overturned wheelbarrows, anything that looks like a good hiding place for an egg. Repeatedly applying this technique will form an approximation of a spanning tree on interesting objects. If you were looking for the opposite tack, it could be to comb the grass like a volunteer search-and-rescue team in a hunt for missing persons.

At the zoom level of individual problems, combing the surrounding area may be ineffective as a strategy. Possibly the space of working solutions surrounding the problem is sparse, and so thinking breadth-first will waste time on examining the inactive nodes. Maybe this is why negative results are omitted in the literature. As Hamming said, there are a lot of ways to do it wrong, and only a few ways of doing it right. Hamming instructed young people to think about copying somebody who was already successful.

Solving Problems

In his "The Science of Problem Solving", Robb Seaton gives a powerful image, recreated below. The first time a problem is solved, the trial and error that accomplishes it takes on the form of a high-effort search. This search looks like a busy random graph. Once the problem has been solved, however, the effort required to solve it the next time becomes proportional to a fast hash lookup, or to taking a direct path from the problem to the solution.

When nobody knows the answer to a problem, nobody knows what the fast lookup looks like. When the answer is known, then the lookup can be replicated. It's a lot faster to replicate an answer, and so it isn't surprising that most instruction is based on copying work others have done. Students mimic prior work in a fabricated way. It wouldn't make sense, for instance, for a high school class to rediscover all of chemistry without the benefit of past experiments. If the students can copy the work faster, they can get to the frontier faster. In a way, education is an imitative latticework that students cross.

One way to solve an unsolved problem fast is to make a large leap. You don't have to know everything in the middle works. You merely have to have some idea the direction is correct. When you land on the other side, look at the steps you skipped, and see if they work as a path. Sometimes the path doesn't work and you start again. Sometimes the path works, and you made out: verification is an easier problem than exploring all those other paths. This kind of leap is riskier, of course, for the same reasons it's riskier for a retiarius to throw his trident than to cast his net.

If the terrain is a logical subject like math, then figuring out whether the path works means trying to prove the propositions forming the intermediate steps. If the terrain is something like, say, a strategic game tree, then the intermediate steps may represent potential choices of an opponent and your potential reactions. These cannot be known in advance, though you can assign them probabilities. In business the unknown steps may represent consumer or competitor reactions to product developments. It is interesting to note that senior level business managers are frequently generalists, and that the leaders of armies are expected to be generalists, not specialists.

There are some who, from some physical or moral peculiarity of character, "form a picture" of everything. No matter what knowledge, intellect, courage, or good qualities they may have, these men are unfit to command.
ascribed to Napolean by General Burnod, 1827

In high-risk one-off games, it's probably not a good idea to hold onto a picture. Better to evolve your strategy as information becomes available. In less risky multi-round games, people use the picture strategy to great effect. Poker players call it "putting a player on a hand". You assign probabilities to the opponent's possible positions and bet to maximize your probable winnings. To the extent that you visualize your opponent's hand correctly, you maximize your expected value. Confidently visualizing the wrong hand will lose your chips.

This visualization strategy is related to another trick from depth-first thinking, which is to get rid of probabilities altogether, and to reason only about boolean values. Only 55% sure about a given fact? Now it's 100% true. Switch-hitter bats left 70% of the time? Left-handed batter. This is a digital sampling of an analog world. Such a world may be false, but it is easy to reason about, and the results your reasoning produces can be astonishing. Even when they're wrong the results can be dangerously persuasive.

For the winning of assent, indeed, anticipations are far more powerful than interpretations, because being collected from a few instances, and those for the most part of familiar occurrence, they straightway touch the understanding and fill the imagination; whereas interpretations, on the other hand, being gathered here and there from very various and widely dispersed facts, cannot suddenly strike the understanding; and therefore they must needs, in respect of the opinions of the time, seem harsh and out of tune, much as the mysteries of faith do.
Francis Bacon, Novum Organum Scientiarum, 1620

Communication is always a sampling. If you want to communicate honestly, the best you can do is a good faith model. The alternative is a full state-transfer of your world—not feasible. You can hem and haw your way through your message by adding qualifications and still you drop state. If you're talking about populations at least as large as voting blocs, then counterexamples to any statement you make will exist in the state you dropped. If you're communicating to a population as large as the internet, then somebody will seize on one of those counterexamples. That's bad cooperation, for the simple reason that communication on the internet is not a cooperative game.

Sometimes communication that seems like it should be cooperative isn't. Probably it's always been this way, though propagandists like Bernays, who turned it into a profession, didn't help things. Public discourse can still be a way to arrive at the truth, only you have to view it as an adversarial advocacy system, like two lawyers arguing in a courtroom, and not as a cooperation between two parties in good faith.

Theory Building

As to rigor, all the members of Bourbaki cared about it: the Bourbaki movement was started essentially because rigor was lacking among French mathematicians, by comparison with the Germans, that is the Hilbertians. Rigor consisted in getting rid of an accretion of superfluous details. Conversely, lack of rigor gave my father an impression of a proof where one was walking in mud, where one had to pick up some sort of filth in order to get ahead. Once that filth was taken away, one could get at the mathematical object, a sort of crystallized body whose essence is its structure.
Catherine Chevalley

The crystallized bodies Chevalley refers to, theories, have a special property. They exist on their own, in a Platonic sense. They're there, waiting to be discovered by anyone. Sometimes they're very valuable. You can claim it for your own, or leave it for the next person.

Attribution has a special property as well. In business we call it the first-mover advantage, which is one of the side-effects of private ownership. It's what happens in competitions when the person to get there first stays there and claims a disproportionate amount of the reward. First-mover advantage is an incentive to move on ideas fast. It is a disincentive to prepare.

So people gamble. They risk ideas which won't work, or which they don't understand. If an idea has a glaring structural deficiency, it gets abandoned. It's the idea that makes it past this stage that sticks around. People build on it and build on it, and the idea looks like it works. The pieces fit together in a pleasingly harmonious way. It may even look magnificent. Everything goes smoothly up to where the builders try to connect the idea to reality.

A common misconception is that the stock market is a black box for selling. At any time stock can be sold at the market price. The price may go up, the price may go down, but that's the value of a share. Taken too far this model runs into trouble. It fails to account for the quantity of shares that you're selling. If a mutual fund sells millions of shares of one stock all at once, the price of the shares will begin an artificial free-fall. The stock will go to near worthless, and the company will forfeit much of the money they intended to receive. The mutual fund has to mete out shares from the lot over time, so that the demand in the market has a chance to refresh. It's a really bad idea to sell everything all at once. In this way the black box model of selling is imperfectly connected to reality.

Poincaré disliked some aspects of the reality of mathematics. He wanted to ignore legitimate but pathological functions that did nothing other than provide counterexamples to his theorems. Poincaré avoided mathematical rigor, which is to say he glossed over details. Why spend time on details that slowed his discoveries? In part, the Bourbaki group was a reaction to the freewheeling of Poincaré. The Bourbaki group ignored no detail. They emphasized rigor. Their aim was to strengthen the foundation for the next generation of mathematicians.

Strengthening foundations comes at the expense of getting less far along in mathematics, and sometimes, as I learned in my own education, at the expense of mathematical playfulness. The presumption that Bourbaki held is that if you ignore the foundation, then progress will slow until finally you are net behind. We know that the ideas Poincaré discovered would still be there even if he did not discover them. So what was Poincaré, a genius or a spendthrift? Possibly, a scout.

The reason to expand the foundations is to reduce the chance that new, internally consistent ideas are detached from the whole. A better foundation provides a basecamp for longer excursions. It is true that a water lily is stronger for being concentrated about a center, but really it's strong because it's buoyed by the water.

When a model is internally consistent, and invokes correct logic throughout, unless it rests on an accurate description of reality, then ultimately what it says is meaningless. Quantum computing has begun to look like such a model; I hope that it isn't. String theory may be such a model, we don't know. Neal Koblitz and Alfred Menezes say that the field of provable security in cryptography is a faulty model. They take a rigorous approach to mathematics and make a habit of smashing flawed theories.

Measure vs. Model

But the gradations of Turner differ from those of the modern French school by being gently and almost imperceptibly evolved one from another instead of being bodily and even roughly separated; and the brush of Turner followed the form of the objects he depicted, while our French friends often seem to take a pride in directly opposing it. For instance, they would prefer to paint a sea with up and down strokes rather than with horizontal; or a tree-trunk from right to left rather than up and down. This, I expect, is due to falling in love with one's theories, and making sacrifices of truth to them in order to demonstrate fidelity and admiration.
Winston Churchill, from Painting as a Pasttime, 1948

Our path has taken us into the middle of an old controversy. The controversy is over "empiricism versus rationalism" in the sciences and which is the proper approach. The empirical approach takes nothing on faith, it believes observation of the natural world continually upends expectation. The rational approach believes, basically, that we can anticipate the properties of the natural world through human reasoning. These anticipations are necessarily a chain of reasoning away from what has been observed. To summarize it a bit unfairly, the empiricists believe in measurement while the rationalists believe in modeling.

This is an ancient debate. It started at least 2,000 years ago and has continued since. The part of the debate we're concerned with started in the 1600s. If you think you have evidence which side is right, and feel like staking one position or the other, I would caution you to check that urge. You will sooner find a definitive argument for endianness. This subject is deep, and we are only going to draw a few final connections to it.

Fred Brooks gives his side in the 2010 book The Design of Design. He is an empiricist who praises scholars of both schools: Watt, Faraday, and Heaviside on the empirical and British side. On the rational and French; Fourier, Carnot, and Bourbaki.

Bourbaki? It is odd for Bourbaki to be listed in this way, given their introduction here as foundational thinkers. In The Design of Design, the Bourbaki are categorized as modelers. Possibly they are categorized like this because of Grothendieck and his association with algebraic geometry, a pairing which outside of mathematics represents the quintessential abstraction. Inside of mathematics Grothendieck and the other members of Bourbaki pursued a grounded, empirical approach. Still, their work is purely deductive. In the opposing camp are engineers like Heaviside, an empiricist-mathematician, who arrived at formulas via experimentation, and not via deductive reasoning from axioms. Brooks views the split from the camp of engineers and physicists. Pure mathematicians are modelers.

Brooks's industry experience is in computer architecture and software engineering. The way he thinks of a program is as a piece of engineering. It has real-life constraints, and fallible people write it. They have to interact with the in-progress program, then make changes to it based on what failed. Building a program is an iterative process which moves in the direction of "better" and terminates. Shell output is the experiment by which programming is measured.

There is another view of programs. The source code that makes up programs is the same as the logical symbols that started to underpin math in the early 1900s. Both link written characters to "pure thought-stuff" in an uncannily effective way. It cannot be coincidence that the von Neumann architecture and Hilbertian formalism appeared around the same time. In this view, source code is the same as formalized mathematics, which is a real-world representation of models. The real-world symbols correspond to theories in the platonic world of mathematics. Programs can be deduced and perfected by reasoning.

This rationalist view of programming works in the other direction as well. Since formal math is also source code, we can compile math to make sure it does not contain errors. What's magic about this is that in all the math we've looked at, if you compile one statement successfully, you'll get an error whenever you try to compile its negation, no matter how far away that negation is or however deeply it's concealed. How can the compiler know? Consistency of the compiler is so reliable that you can use inconsistency as a guide to know what's false. The omnipresence of consistency is fortunate, since if it were ever the case that that two contradictory statements were true, the effect would be wide-reaching. Deterministic reasoning would be rendered void. Math would be a flawed theory.

That we can check math like this at all is a new idea, speaking in terms of the full timeline. It has historically been a mathematical concession that proof is not a rational certainty but a matter of social convention: what is proved is proved insofar as I can convince others. At the boundary where new mathematics is produced this is still the rule. Mathematicians wait for consensus after the announcement of a major result. What's changed is the addition of the, admittedly reasonable, assumption that all such social proofs can be converted into source code, which in turn can be checked. Now when you prove a result, you have the implied force of the crystalline tower of rationality behind you. Finally, a stronghold to settle the ancient question.

The possibility of verifying mathematics is as old as the possibility of computers. As foundational work, verification of mathematics has been slower in coming. Mathematicians dislike this work because it slows down discovery. At any rate, the process is underway now, and a handful of major proofs have been checked in Coq. The bulk of known math remains unchecked.

The likely outcome is that the continued computer verification of math won't tell us much we don't already know. A few small patches of mathematics will be shown to be erroneous. Possibly computer searches departing from the corpus will deliver new useful theorems. Computers are good at checking the area surrounding problems. They are not yet as good as people at choosing the correct logical leap. Social mathematics will maintain its lead on computer-verified mathematics. There is a gap separating people from computers, and it's interesting to think about the future of this gap. Eventually verification will catch up to where we are today. This newly formalized body of mathematics will be a platform for the human mathematicians of the future, who will discover theorems with the aid of computer assistance.

As an explanation of the present and future, the preceding may a flawed model. This history of mathematics is, contrary to what formalism would lead you to believe, full of examples where the way we did math was later shown to be incorrect. Not incorrect entirely, exactly, just built on the wrong foundations. The work of the ancient Greeks has had to be rebuilt several times. Geometry was too big to understand all of it at once, and so it wasn't until we had traveled far from the beginning that we could see what needed to be rebuilt. Emprically, our formal, computer-verified model of math will need to be rebuilt as well.

One particular worry is whether all our reasoning has been accomplished in a tide pool. We have built a small ecosystem that works well. In terms of the larger ocean, we have no idea. Nobody can say what happens out there. We have been lucky that deduction has not exposed true but contradictory statements. If a contradiction is found, then repair work will have to be done and math will have to be resituated. Mathematicians take it on faith that such contradictions do not exist. The question has always been there, but it is usually dismissed. Skepticism is increasing about whether things work out so simply.

Grothendieck made a point of saying he wanted to dissolve problems by raising the water line of understanding. He was not interested in direct answers. As the sea-level rose, understanding would, in time, erode the problems from the outsides of the solutions. Perhaps, though, his thoughts were elsewhere. Serre gave an aside on the subject, which is that, as a talented problem-solver, Grothendieck saw the direct solutions in advance.