Wednesday, February 06, 2019

Assessing the nuclear winter hypothesis

After falling into disrepute for some years, the nuclear winter hypothesis has enjoyed something of a renaissance over the past decade. In the January 2010 edition of Scientific American, two of the principal proponents of the hypothesis, Alan Robock and Owen Brian Toon, published an article summarising recent work. This article focused on the hypothetical case of a regional nuclear war between India and Pakistan, in which each side dropped 50 nuclear warheads, with a yield of 15-kilotons each, on the highest population density targets in the opponent's territory.

Robock and his colleagues assumed that this would result in at least 5 teragrams of sooty smoke reaching the upper troposphere over India and Pakistan. A climate model was developed to calculate the effects, as Robock and Toon report:

"The model calculated how winds would blow the smoke around the world and how the smoke particles would settle out from the atmosphere. The smoke covered all the continents within two weeks. The black, sooty smoke absorbed sunlight, warmed and rose into the stratosphere. Rain never falls there, so the air is never cleansed by precipitation; particles very slowly settle out by falling, with air resisting them...

"The climatic response to the smoke was surprising. Sunlight was immediately reduced, cooling the planet to temperatures lower than any experienced for the past 1,000 years. The global average cooling, of about 1.25 degrees Celsius (2.3 degrees Fahrenheit), lasted for several years, and even after 10 years the temperature was still 0.5 degree C colder than normal. The models also showed a 10 percent reduction in precipitation worldwide...Less sunlight and precipitation, cold spells, shorter growing seasons and more ultraviolet radiation would all reduce or eliminate agricultural production.," (Scientific American, January 2010, p78-79).

These claims seem to have been widely believed within the scientific community. For example, in 2017 NewScientist magazine wrote a Leader article on the North Korean nuclear problem, which asserted that: "those who study nuclear war scenarios say millions of tonnes of smoke would gush into the stratosphere, resulting in a nuclear winter that would lower global temperatures for years. The ensuing global crisis in agriculture – dubbed a “nuclear famine” – would be devastating," (NewScientist, 22nd April 2017).

But is there any way of empirically testing the predictions made by Robock and his colleagues? Well, perhaps there is. In 1945, the Americans inflicted an incendiary bombing campaign on Japan prior to the use of nuclear weapons. Between March and June of 1945, Japan's six largest industrial centres, Tokyo, Nagoya, Kobe, Osaka, Yokohama and Kawasaki, were devastated. As military historian John Keegan wrote, “Japan's flimsy wood-and-paper cities burned far more easily than European stone and brick...by mid-June...260,000 people had been killed, 2 million buildings destroyed and between 9 and 13 million people made homeless...by July 60 per cent of the ground area of the country's sixty larger cities and towns had been burnt out,” (The Second World War, 1989, p481).

This devastation created a huge amount of smoke, so what effect did it have on the world's climate? Well Robock and Brian Zambri have recently published a paper, 'Did smoke from city fires in World War II cause global cooling?', (Journal of Geophysical Research: Atmospheres, 2018, 123), which addresses this very question.

Robock and Zambri  use the following equation to estimate the total mass of soot $M$ injected into the lower stratosphere:
$$
M = A*F*E*R*L \;.
$$ $A$ is the total area burned, $F$ is the mass of fuel per unit area, $E$ is the percentage of fuel emitted as soot into the upper troposphere, $R$ is the fraction that is not rained out, and $L$ is the fraction lofted from the upper troposphere into the lower stratosphere. Robock and Zambri then make the following statements:

"Because the city fires were at nighttime and did not always persist until daylight, and because some of the city fires were in the spring, with less intense sunlight, we estimate that L is about 0.5, so based on the values above, M for Japan for the summer of 1945 was about 0.5 Tg of soot. However, this estimate is extremely uncertain."

But then something strange happens at this point, because the authors make no attempt to quantify the uncertainty, or to place confidence intervals around their estimate of 0.5 teragrams.

I'll come back to this shortly, but for the moment simply note that 0.5 teragrams is one-tenth of the amount of soot which is assumed to result from a nuclear exchange between India and Pakistan, a quantity of soot which Robock and his colleagues claim is sufficient to cause a worldwide nuclear winter.

Having obtained their estimate that 0.5 teragrams of soot reached the lower stratosphere in 1945, Robock and Zambri examine the climate record to see if there was any evidence of global cooling. What they find is a reduction in temperatures at the beginning of 1945, before the bombing of Japan, but no evidence of cooling thereafter: "The injection of 0.5–1 Tg of soot into the upper troposphere from city fires during World War II would be expected to produce 0.1–0.2 K global average cooling...when examining the observed signal further and comparing them to natural variability, it is not possible to detect a statistically significant signal."

Despite this negative result, Robock and Zambri defiantly conclude that "Nevertheless, these results do not provide observational support to counter nuclear winter theory." However, the proponents of the nuclear winter hypothesis now seem to have put themselves in the position of making the following joint claim:

'5 teragrams of soot would cause a global nuclear winter, but the 0.5 teragrams injected into the atmosphere in 1945 didn't make a mark in the climatological record.'

Unfortunately, their analysis doesn't even entitle them to make this assertion, precisely because they failed to quantity the uncertainty in that estimate of 0.5 teragrams. The omission rather stands out like a sore thumb, because there are well-known, routine methods for calculating such uncertainties.

Let's go through these methods, starting with the formula $M = A*F*E*R*L \;.$ The uncertainty in the input variables here propagates through to the uncertainty in the output variable, the mass $M$. It seems reasonable to assume that the input variables here are mutually independent, so the uncertainty $U_M$ in the output variable can be inferred by a simple formula from the uncertainties attached to each of the input variables:
$$
U_M = \sqrt{(U_A^2 + U_F^2+U_E^2+U_R^2+U_L^2)} \;.
$$ $U_A$ is the uncertainty in the total area burned, $U_F$ is the uncertainty in the mass of fuel per unit area, $U_E$ is the uncertainty in the percentage of fuel emitted as soot into the upper troposphere, $U_R$ is the uncertainty in the fraction that is not rained out, and $U_L$ is the uncertainty in the fraction lofted from the upper troposphere into the lower stratosphere.

Next, to infer confidence intervals, we can follow the prescriptions of the IPCC, the Intergovernmental Panel on Climate Change. The 2010 Scientific American article boasts that Robock is a participant in the IPCC, so he will surely be familiar with this methodology.

First we note that because $M$ is the product of several variables, its distribution will tend towards a lognormal distribution, or at least a positively skewed distribution resembling the lognormal. The IPCC figure below depicts how the upper and lower 95% confidence limits can be inferred from the uncertainty in a lognormally distributed quantity. The uncertainty $U_M$ corresponds to the 'uncertainty half-range' in IPCC terms.  
   

The IPCC figure "illustrates the sensitivity of the lower and upper bounds of the 95 percent probability range, which are the 2.5th and 97.5th percentiles, respectively, calculated assuming a lognormal distribution based upon an estimated uncertainty half-range from an error propagation approach. The uncertainty range is approximately symmetric relative to the mean up to an uncertainty half-range of approximately 10 to 20 percent. As the uncertainty half-range, U, becomes large, the 95 percent uncertainty range shown [in the Figure above] becomes large and asymmetric, "(IPCC Guidelines for National Greenhouse Gas Inventories - Uncertainties, 3.62).

So, for example, given the large uncertainties in the input variables, the uncertainty half-range $U_M$ for the soot injected into the lower stratosphere in 1945 might well reach 200% or more. In this event, the upper limit of the 95% confidence interval would be of the order of +300%. That's +300% relative to the best estimate of 0.5 Tg. Hence, at the 95% confidence level, the upper range might well extend to the same order of magnitude as the hypothetical quantity of soot injected into the stratosphere by a nuclear exchange between India and Pakistan. 

Thus, the research conducted by Robock and Zambri fails to exclude the possibility that the empirical data from 1945 falsifies the nuclear winter hypothesis for the case of a regional nuclear exchange. 

In a sense, then, it's clear when Robock and Zambri refrained from including confidence limits in their paper. What's more perplexing is how and why this got past the referees at the Journal of Geophysical Research...

Sunday, January 13, 2019

Thruxton British F3 1989

The thickness of the atmospheric thermal boundary layer falls to a global minimum over Thruxton. Hence Thruxton is very cold. So much so, in fact, that the British Antarctic Survey have a station there, built into the noise-attenuation banking at the exit of The Complex, (much like a Hobbit-hole), where the younger scientists train to work in a frozen environment before travelling to the Halley Research Station on the Brunt ice shelf.

The late Paul Warwick in the Intersport Reynard. Puzzlingly, in the background there appear to be no takers for the shelter provided by the parasols.
In 1869, John Tyndall discovered why the sky is blue. If he'd lived in Thruxton, the question wouldn't even have occurred to him. Note the characteristic Wiltshire combination of distant mist, a stand of lifefless trees, and flat wind-swept expanses.
Marshals assist a driver who has entered a turnip field. The Wiltshire economy is entirely dependent upon (i) the annual turnip yield, and (ii) government subsidies into the thousand-year consultation process for a Stonehenge bypass/tunnel. The buildings in the background are what people from Wiltshire refer to as a 'collection of modern luxury flats and town-houses.' 
One of the drivers is distracted by an ancient ley line running tangential to the Brooklands kink.

Friday, January 11, 2019

Silverstone Tyre Test 1990

Generally speaking, it was impossible to see a car with the naked eye at Silverstone. However, with the assistance of the world's best astronomical optics, I was occasionally able to pluck an image out of the infinitesimally small strip separating the cold, grey sky from the wooden fence posts and metal railings.

Satoru Nakajima in the pioneering raised-nose Tyrrell 019. This image was obtained with the Wide Field and Planetary Camera on the Hubble Space Telescope.
Alessandro Nannini in the Benetton. This shot was taken with the 100-inch reflector on Mount Wilson.
Nigel Mansell, lighting up the front brake discs as he prepares for the turn-in to Copse. Nigel set the fastest lap on the day I attended the test; a mid-season pattern of performance which led Ferrari to reward Nigel by handing his chassis over to Prost.
Ayrton Senna in characteristic pose, head dipped forward and tilted towards the rapidly approaching apex of Copse corner .

Thursday, January 03, 2019

Formula One and Electro-Aerodynamics

Most travelling Formula One engineers probably think that an 'ionic wind' is the result of over-indulgence at the end-of-season curry night. On the contrary, in late 2018 a group of researchers from MIT published a paper in Nature detailing how an ionic wind was used for the first-ever flight of a heavier-than-air, self-propelled device with no mechanical moving parts.

The ionic wind was created by generating an ionic cascade between the paired elements in an array of high-voltage electrodes. Each positive electrode was a 0.2mm stainless-steel wire supported in front of a wing-section. The corresponding negative electrode was a thin layer of aluminium foil on the downstream wing-section. The ions are accelerated in the electric field, and impart some of their momentum to the ambient air-flow, thereby generating a forward thrust, (and in this case, presumably, some lift).

Ultra-light power-sources were used: a custom-made 600W battery, and a custom-made High-Voltage Power Converter (HVPC), yielding a DC voltage of ~40kV. The battery weighed only 230g, and the HVPC weighed 510g.

The thrust generated by the experimental device was ~3N, from a wing-span of 5.14m, so the thrust itself isn't about to grab the attention of the Formula One community. However, one might instead be tempted to re-task such ionic wind devices with accelerating the boundary layer flow in certain areas, enabling one to avoid separation at moments of extremis.
Ionic wind accelerating the flow at the bottom of the boundary layer. (From 'Ionic winds for locally enhanced cooling', Go, Garimella, Fisher & Mongia, Journal of Applied Physics 102, 2007). This retards separation by delaying the point at which the slope of the velocity profile, at the wall, becomes zero.
Plasma-actuators for boundary layer control have been under aeronautical development for some years, and unfortunately their use in Formula One seems to have already been proscribed. The Technical Working Group notes for December 2006 contain a request for clarification on the issue from James Allison, and in response Charlie Whiting declares that he had "already given a negative opinion, based on moving parts influencing the car's aerodynamics."

This is a slightly puzzling response, because the whole point about plasma actuators and ionic winds is that they involve no moving mechanical parts. The objects in motion are electrical currents, and the ambient airflow itself, both of which are considered to be consistent with the regulations, and indeed necessary for the function of a Formula One car.

So perhaps there's a future here for electro-aerodynamics in Formula One. It would be an exciting line of research, and one which might also be considered beneficial to Formula One's environmental credentials.  

Monday, December 31, 2018

Andrew Ridgeley crashes at Brands

It's the 1986 Cellnet Superprix, the rain is relentless, and it's dark, very dark. Too dark, in fact, for those amateur photographers foolish enough to have arrived with nothing other than 100 ASA film...

The field is strong, containing future luminaries such as Martin Donnelly, Perry McCarthy, Damon Hill, Gary Brabham, Andy Wallace, David Leslie and Julian Bailey. Also entered is Andrew Ridgeley, George Michael's partner in contemporary pop-duo Wham!

Failing to compensate for the reduction in the effective power-spectrum of the road-surface, Ridgeley Go-Goes into the gravel at Druids...
His momentum is somewhat checked by the Cellnet cars of Ross Cheever and David Hunt, which have already found the tyre-wall.
Nothing for it but to head back to the Kentagon for a warming cup of tea.

Friday, December 28, 2018

Brands Hatch F3000 1987

Second installment of the McCabism photographic archive. The place once again is Brands Hatch, this time for a round of the European F3000 championship. By 1987 the Grand Prix had moved to the numbing wind-swept expanses of Silverstone, so this was very much Brands' biggest single-seater race of the year.

F3000 cars of the era tended to be sans engine cover. This is the Madgwick Motorsport Lola of Andy Wallace.
The absent engine cover facilitated tall intake trumpets, which maximise torque at lower revs, F3000 engines being restricted to 9,000rpm. (When the intake valve opens, it creates a rarefaction wave, which propagates to the top of the open-ended trumpet, and reflects as a compression wave. The lower the revs, the longer the necessary trumpet length so that the compression wave returns just as the intake valve is closing, pushing more mass into the combustion chamber). This is the Pavesi Racing Ralt of Pierluigi Martini.
This is a motor racing circuit. It possesses gradient and contour, exists in a natural setting, and is distinguishable from a go-kart track.
First corner of the race, and Julian Bailey has immediately overtaken the front-row sitters, Gugelmin and Moreno in their Ralts.
Andy Wallace moves past Moreno into 3rd, but Bailey looks comfortable in the lead.
Stefano Modena ahead of Yannick Dalmas, the latter displaying a black tyre-mark on his nose-cone as evidence of prior contact with Modena. Dalmas had tapped Modena into a spin at Druids, but courteously waited for Modena to resume ahead of him.
Michel Trolle has a territorial dispute at Druids with A Yorkshireman. This inevitably results in a large accident, Trolle's car flipping upside down onto the top of the tyre-wall.
A slightly shocked Trolle stumbles to his feet after being hauled from the upturned car.
The race is red-flagged and Bailey awarded victory.

Sunday, December 23, 2018

Brands Hatch Tyre Test 1986

Time to dip into the McCabism personal photographic archive. The occasion here is the 1986 pre-Grand Prix Tyre Test at Brands Hatch.

Proper drivers, proper cars, and a proper circuit.

Nigel Mansell zipping down the pit-straight in a car weighing less than 700kg.
Stefan Johansson's Ferrari, in the days before people from Northern Europe and South Africa introduced the Scuderia to the exciting world of stable balanced downforce.
Nelson Piquet, plotting and scheming as he brakes for the hairpin at Druids.

Ayrton Senna negotiating the curved flow at Hawthorns.
Poking my camera over the bridge parapet to get a shot of Keke Rosberg accelerating down the hill to Clearways.
Nigel Mansell in the pits, complaining of an excessive surface/bulk tyre temperature delta, and some inconsistent behaviour from the Y250 vortex.
Ayrton Senna cresting the rise after the sylvan delights of Dingle Dell, and preparing to turn into Dingle Dell corner.
Keke Rosberg in the McLaren, somehow living to tell the tale without a halo for protection.

Thursday, July 19, 2018

Understanding tyre compound deltas

Pirelli revealed at the beginning of the 2018 F1 season that it was using new software to help it choose the three tyre compounds available at each race. Pirelli's Racing Manager Mario Isola commented:

"It's important that we collect the delta lap times between compounds to decide the selection. If we confirm the numbers that we have seen in Abu Dhabi [testing in November] - between soft and supersoft we had 0.6s, and supersoft to ultrasoft was 0.4s - depending on that, we can fine tune the selection and try to choose the best combination."

Getting the tyre compound deltas correct is indeed a crucial part of F1 race strategy, so let's review some of the fundamental facts about these numbers. The first point to note is that tyres are a performance multiplier, rather than a performance additive

To understand this in the simplest possible terms, consider the following equation:
$$F_y = \mu F_z $$ This states that the lateral force $F_y$ generated by a tyre is a product of the coefficient of friction $\mu$, and the vertical load $F_z$. All other things being equal, the greater the lateral force generated by a car in the corners, the faster the laptime. (Note, however, that in many circumstances one would wish to work with lateral acceleration rather than lateral force, given the influence of car-mass on lateral acceleration).

Now, suppose we have a base compound. Let's call it the Prime, and let's denote its coefficient of friction as $\mu_P$. Let's consider a fixed car running the Prime tyre with: (i) a light fuel-load, and (ii) a heavy fuel-load. 

Let's really simplify things by supposing that the performance of the car, and its laptime, can be reduced to a single vertical load due to downforce alone, and a single lateral force number. When the car is running a heavy fuel load, it will generate a downforce $F_z$, but when it's running a light fuel load it will be cornering faster, so the vertical load due to downforce will be greater, $F_z + \delta F_z$. (Recall that the contribution of greater fuel weight to vertical load results in a net loss of lateral acceleration due to weight transfer). The lateral forces will be as follows:

Prime tyre. High fuel-load

$\mu_P  F_z $

Prime tyre. Low fuel-load

$\mu_P (F_z + \delta F_z) = \mu_P F_z + \mu_P\delta F_z$

Now, let's suppose that there is a softer tyre compound available. Call it the Option. Its coefficient of friction $\mu_O$ will be greater than that of the Prime, $\mu_O = \mu_P + \delta \mu$. 

Consider the performance of the same car on the softer compound, again running a light fuel-load and a heavy fuel-load:

Option tyre. High fuel-load

$\mu_O  F_z = ( \mu_P +\delta \mu )  F_z $

Option tyre. Low fuel-load

$\mu_O (F_z + \delta F_z) = ( \mu_P +\delta \mu )(F_z + \delta F_z) $

So far, so good. Now let's consider the performance deltas between the Option and the Prime, once again using lateral force as our proxy for laptime. 

High-fuel Option-Prime delta

$( \mu_P +\delta \mu )  F_z-\mu_P  F_z = \delta \mu F_z$

Low-fuel Option-Prime delta

$( \mu_P +\delta \mu )(F_z + \delta F_z)-\mu_P (F_z + \delta F_z)=\delta \mu (F_z + \delta F_z)$

Notice that sneaky extra term, $\delta \mu \delta F_z$, in the expression for the low-fuel compound delta? As a consequence of that extra term, the Option-Prime delta is greater on a low fuel load than a heavy fuel-load. As promised, tyre-grip is a performance multiplier.

If you scrutinise the compound deltas in each FP2 session, you'll see that the low-fuel compound deltas from the beginning of the session are indeed greater than those from the high-fuel running later in the session. 

Given that the compound deltas input into race-strategy software are generally high-fuel deltas, one could make quite a mistake by using those low-fuel deltas. In fact, parties using low-fuel deltas might be surprised to see more 1-stop races than they were expecting.

There is another important consequence of the fact that tyres are performance multipliers: the pace gap between faster cars and slower cars increases when softer tyres are supplied. The faster cars have more downforce, and therefore more vertical load $F_z$ than the slower cars, at any equivalent fuel-weight. The delta in vertical load is multiplied by the delta in the coefficient of friction, and all things being equal, the faster cars duly benefit from that extra $\delta \mu \delta F_z$.  

Of course, that qualification about 'all things being equal', hides some complex issues. For example, softer tyres have a lower 'cornering stiffness', (i.e., the gradient of lateral force against slip-angle). A softer tyre therefore generates peak grip at a higher slip-angle than a harder tyre. If the aerodynamics of a car are particularly susceptible to the steering angle of the front wheels, then such a car might struggle to gain proportionately from the greater grip theoretically afforded by a softer tyre. Such a car would also appear to gain, relative to its opposition, towards the end of a stint, when the tyres are worn and their cornering stiffness increases.

Notwithstanding such qualifications, the following problem presents itself: the softer the tyres supplied to the teams in an attempt to enhance the level of strategic variety, the greater the pace-gaps become, and the less effect that strategic variety has...

Tuesday, May 15, 2018

Front-wing in yaw

Armchair aerodynamicists might be interested in a 2015 paper, 'Aerodynamic characteristics of a wing-and-flap in ground effect and yaw'. 

The quartet of authors from Cranfield University analyse a simple raised-nose and front-wing assembly, consisting of a main-plane and a pair of flaps, equipped with rectangular endplates. On each side of the wing, three vortices are created: an upper endplate vortex, a lower endplate vortex, and a vortex at the inboard edge of the flap. (The latter is essentially a weaker version of the Y250 which plays such an important role in contemporary F1 aerodynamics). 


The authors assess their front-wing in yaw, using both CFD and the wind-tunnel, and make the following observations:

1) In yaw, vortices generated by a lateral movement of air in the same direction as the free-stream, increase in strength, whereas those which form due to air moving in the opposite direction are weakened.

2) The leeward side of the wing generates more downforce than the windward side. This is due to an increase in pressure on the leeward pressure surface and a decrease in suction on the windward suction surface. The stagnation pressure is increased on the inner side of the leeward endplate, and the windward endplate partially blocks the flow from entering the region below the wing.

3) A region of flow separation occurs on the windward flap suction surface.

4) Trailing edge separation occurs in the central region of the wing. This is explained by the following: (i) The aluminium wing surface was milled in the longitudinal direction, hence there is increased surface roughness, due to the material grain, for air flowing spanwise across the surface; (ii) There is a reduction in the mass flow-rate underneath the wing; (iii) The effective chord-length has increased in yaw.

5) The vortices follow the free-stream direction. Hence, for example, the windward flap-edge vortex is drawn further towards the centreline when the wing is in yaw.


One comment of my own concerns the following statement:

"The yaw rate for a racing car can be high, up to 50°/sec, but is only significant aerodynamically during quick change of direction events, such as initial turn-in to the corner. The yaw angle, however, is felt throughout the corner and is usually in the vicinity of 3-5°. Although the yaw angle changes throughout the corner the yaw rate is not sufficiently high, other than for the initial turn-in event, to warrant any more than quasi-static analysis."

This is true, but it's vital to point out that the stability of a car in the dynamic corner-entry condition determines how much speed a driver can take into a corner. If the car is unstable at the point of corner-entry, the downforce available in a quasi-static state of yaw will be not consistently accessible. 

Aerodynamicists have an understandable tendency to weight conditions by their 'residency time'. i.e., the fraction of the grip-limited portion of a lap occupied by that condition. The fact that the high yaw-rate corner-entry condition lasts for only a fraction of a second is deceptive. Minimum corner speed depends not only on the downforce available in a quasi-static state of yaw, but whether the driver can control the transition from the straight-ahead condition to the quasi-static state of yaw.

Sunday, April 29, 2018

Local cosmography and the Wiener filter

The local cosmic neighbourhood has recently been mapped in spectacular fashion by the Cosmicflows research programme, yielding papers in Nature, and an article in Scientific American. The images generated are stunning.
Perspective view of the X-Y equatorial plane of our cosmic neighbourhood in supergalactic coordinates. The density of matter is represented by colour contours, deep blue regions indicating voids, red indicating zones of high density. The velocity streams are represented in white, with individual galaxies as spheres.

There's also an interesting mathematical back-story here because the work has been underpinned by techniques developed over 20 years ago by Yehuda Hoffman and colleagues. Drawing upon the exposition provided by Hoffman, let's take a look at these methods, beginning with Gaussian fields.

Gaussian fields

A random field is a field in which the value at each point is sampled from a univariate probability distribution, and the values at multiple points are sampled from a multivariate distribution. Typically, the sampled values at different points are spatially correlated, with a degree of correlation that declines with distance.

A Gaussian field is a type of random field in which the distribution at each point is Gaussian, and relationship between the values at different points is given by a multivariate Gaussian distribution. The properties of a Gaussian field are specified by its covariance matrix. Assuming a Gaussian field $\mathbf{s}$ of zero mean, this is denoted as:
$$
\mathbf{S} = \langle \mathbf{s} \mathbf{s}^T\rangle
$$ Treating $\mathbf{s}$ as a vector of random variables, this expression is understood as the 'outer-product' of the column vector $\mathbf{s}$ with its transpose row vector $\mathbf{s}^T$:
$$
\mathbf{S} = {\langle s_i s_j \rangle}
$$Convolution

Given the Gaussian field $\mathbf{s}$, the generation of the measured data-set $\mathbf{d}$  is represented as follows:
$$
\mathbf{d} = \mathbf{Rs}  + \mathbf{\epsilon}
$$ $\mathbf{R}$ represents the characteristics of the measuring instrument, and $\epsilon$ represents measurement noise. The covariance matrix of the noise is denoted as follows:
$$
 \mathbf{N} = \langle \mathbf{\epsilon} \mathbf{\epsilon}^T\rangle
$$ The transformation $\mathbf{R}$ is typically a convolution. In astronomical measurements of the sky it is often referred to as the Point Spread Function. It specifies how the energy detected to belong to a particular pixel of the sky is actually a weighted sum of the energy belonging to a range of adjacent pixels. Similarly, in spectroscopy it specifies how the energy detected at each particular wavelength is actually a weighted sum of the energy belonging to adjacent wavelengths, thereby smearing out the energy spectrum.

Given the true spectrum $f(\lambda)$, and the convolution kernel $g(\lambda - \lambda_1)$, the measured spectrum $I(\lambda)$ is defined by the convolution:
$$
I(\lambda) = f * g = \int f(\lambda_1) g(\lambda - \lambda_1) d\lambda_1
$$
Wiener filter

Stochastic filtering provides a collection of techniques for constructing estimates of the true values of a quantity or field from sparse and noisy measurement data. The Wiener filter is one such technique. It provides a transformation $\mathbf{F}$ that maps a measured dataset $\mathbf{d}$ into an estimate of the true field:
$$
\mathbf{s}^{WF} = \mathbf{F} \mathbf{d}
$$ The discrepancy between the true field and the estimated field is called the residual:
$$
\mathbf{r}  = \mathbf{s} - \mathbf{s}^{WF}
$$ The residual possesses its own covariance matrix $\langle \mathbf{r}\mathbf{r}^T \rangle$. The Wiener filter is defined so that it minimizes the covariance of the residual.

The Wiener filter possesses another property which makes it the natural choice of filter for cosmography: Given the prior distribution of the true field $\mathbf{s}$; given the relationship between the true field and the measured values; and given the measured values, a posterior Bayesian distribution $p(\mathbf{s}|\mathbf{d})$ can be obtained over the true field. In the case where the true field is a Gaussian field, and the noise is also Gaussian, the Wiener filter picks out the mean value of the posterior distribution.

The Wiener filter is given by the expression:
$$
\mathbf{F} = \mathbf{SR}^T (\mathbf{RSR}^T + \mathbf{N})^{-1}
$$
Constrained realizations

The method of constrained realizations goes back to a paper by Yehuda Hoffman and Erez Ribak in 1991 ('Constrained realizations of Gaussian fields - A simple algorithm', Ap. J. Lett., vol 380, pL5-L8). It's based upon a simple but crucial point: the application of the Wiener filter to a measured dataset will produce an estimated field which includes the covariance of the residual: $$\mathbf{s}^{WF} = \mathbf{s} - \mathbf{r}$$
Hence, the estimated field will be smoother than the true field. The idea proposed by Hoffman and Ribak is simple: to generate a realistic realization of the true field, you need to add a sampled realization from the residual field.

Their method works as follows:

(i) Generate a random realization $\tilde{\mathbf{s}}$ of the true field. (The tilde here indicates a realization).

(ii) Generate a realization of the measurement noise, and apply the measurement transformation to $\tilde{\mathbf{s}}$ to yield a random dataset realization: $$\mathbf{\tilde{d}} = \mathbf{R\tilde{s}}  + \mathbf{\tilde{\epsilon}}$$
(iii) Apply the Wiener filter to $\mathbf{\tilde{d}}$ to create an estimate of the true field realization: $\mathbf{F} \mathbf{\tilde{d}} $.

(iv) Generate a realization of the residual:
$$\mathbf{\tilde{r}} = \tilde{\mathbf{s}} - \mathbf{F} \mathbf{\tilde{d}} $$
(v) Add the realization of the residual to the estimated field which has been obtained by applying the Wiener filter to the actual measured dataset: 
$$\eqalign{\mathbf{s}^{CR} &=  \mathbf{\tilde{r}} + \mathbf{Fd} \cr &=\tilde{\mathbf{s}}+ \mathbf{F} (\mathbf{d-\tilde{d}})}$$
Reconstruction of continuous fields from finite data

Here the Gaussian field $\mathbf{s}$ is assumed to be a function $f(\mathbf{r})$, so that the covariance matrix becomes the auto-correlation function $\xi$:

$$
\mathbf{S} = \langle f(\mathbf{r}_i) f(\mathbf{r}_j)\rangle =  \xi(\mathbf{r}_i,\mathbf{r}_j)
$$ Assume that measurements are made of this field at a finite number of points, ${F_i}$. If there is no convolution operation, so that $\mathbf{R} = \mathbf{I}$, then in this case the combined application of the Wiener filter and constrained realization yields the following expression:
$$
 f(\mathbf{r})^{CR} = \tilde{f}(\mathbf{r})+ \xi(\mathbf{r}_i,\mathbf{r}) ( \xi(\mathbf{r}_i,\mathbf{r}_j) + \mathbf{N}_{ij} )^{-1}(F_j-\tilde{f}(\mathbf{r}_j))
$$ Repeated indices are summed over here.

Cosmicflows

The Cosmicflows research programme applied these techniques to a finite collection of sparse and noisy galaxy recession velocities to reconstruct the full 3-dimensional peculiar velocity field in our local cosmic neighbourhood. (The latter is defined to be a length-scale of the order 100 Mpc). 'Peculiar' velocities in this context are those which remain after the recession velocity due to cosmic expansion has been subtracted.

Our cosmic neighbourhood,within a cube 200Mpc on each side. The Milky Way is located at the origin of coordinates. There are three isodensity colour contours. The velocity flows are depicted as black lines.
Assuming that cosmic structure formation has been seeded by perturbations sampled from a Gaussian random field, and assuming that the expansion of the universe followed the Lambda-Cold Dark Matter ($\Lambda$CDM) model, the combination of the Wiener filter and Constrained Realization was applied to a sample of galaxy recession velocities along the line-of-sight. The density field in our neighbourhood was then inferred from the reconstructed velocity field.

Some of the most striking images produced by the Cosmicflows team are those which depict the streamlines of the peculiar velocity field in our cosmic neighbourhood. These reveal regions in which the streamlines converge, due to concentrations in the density of matter such as the Great Attractor, and regions in which the streamlines diverge from cosmic voids. In particular, the Cosmicflows team have attempted to identify the surface of divergent points that surrounds us. They define the volume within that watershed as our local galaxy Supercluster, which they name 'Laniakea'.

The Cosmicflows programme assumes only linear deviations from homogeneity and isotropy. This assumption preserves the Gaussian nature of the peculiar velocity field and the density perturbation field. But it does have the consequence that the velocity field is irrotational. i.e, it possesess zero vorticity. Hence, the images of cosmic streamlines contain watersheds, but no whirlpools. Whilst the linear theory should prevail on large scale, the non-linearity should dominate on smaller scales. The Cosmicflows team claim that the non-linearity should only dominate on the Mpc length-scale, and that their approach is therefore valid.

A slice through the equatorial plane in the supergalactic coordinate system. The boundary of the Laniakea supercluster is depicted as an orange line. The diagram is shaded with density contours, deep blue regions indicating voids, and red regions indicating zones of high density. The velocity streams within the Laniakea basis of attraction are represented in white.

Sunday, December 03, 2017

Neural networks and the neutron transport equation


The Monte Carlo simulation technique was conceived at Los Alamos in the late 1940s by Stanislaw Ulam and John von Neumann with a specific application in mind: neutron transport and nuclear fission chain-reactions. Perhaps, however, other simulation techniques are now available. I'd like to propose one such alternative below, but first we need to understand what the neutron transport problem is.

The neutron transport equation is a special case of the Boltzmann transport equation, the primary equation of non-equilibrium statistical mechanics. In the general case of the Boltzmann equation, the quantity of interest is a distribution function on phase-space, which specifies the expected number of particles per unit of phase-space volume. (At low particle-number densities, one might refer to the probability of particle occupancy per unit of phase-space volume).

In the case of the neutron transport equation, the quantity of interest can be taken to be $n(\mathbf{r},\mathbf{v},t)$, the expected number of neutrons per unit of physical space about $\mathbf{r}$, per unit of velocity space about $\mathbf{v}$, at time $t$. (Whilst, strictly speaking, phase space deals with position and momentum, it is often interchangeable with position and velocity).

The Boltzmann equation is of some philosophical interest because it can be used to describe the approach of a system towards equilibrium. Certainly, a population of high-energy neutrons in a region of space occupied by non-fissile atomic nuclei at room temperature, would transfer energy to the atomic nuclei by means of elastic and inelastic scattering interactions, and approach a state of equilibrium. The neutrons would 'thermalise', their energy spectrum approaching that of a Maxwell-Boltzmann distribution, with a temperature equal to that of the background nuclei.

However, the neutron transport equation can be deployed to describe the evolution of the neutron population in a nuclear reactor, and such a system remains, for a reasonable period of time, and with the appropriate control systems, in a stable state far-from-equilibrium.

A fissile chain reaction occurs when neutrons are absorbed by heavy atomic nuclei, which fission into smaller, more stable nuclei, and emit high-energy neutrons in the process. By means of this fissile chain reaction, the neutron temperature, flux and number density can be maintained at a constant level, despite intensive interactions with a background population of atomic nuclei at a much lower temperature. 

There are two populations of particles here, which remain at different temperatures despite being in contact with each other. The system therefore remains out of thermal equilibrium. Nevertheless, the entropy of the system is still increasing because the fission reactions release the free energy of the heavy fissile isotopes, and transform it into heat. A nuclear reactor is, of course, far from being a closed system, and stability far-from-equilibrium is only possible in this case if the heat generated by the fissile reactions is transported away, typically by conduction into a liquid or gas, which convectively transports the heat to a location outside the reactor vessel.

Before we define the neutron transport equation, let's begin with some definitions. Let's start with the neutron flux, $\phi(\mathbf{r},\mathbf{v},t) =  v \;n(\mathbf{r},\mathbf{v},t)$. For each neutron energy level $E = 1/2 m_n v^2$, (where $m_n$ is the rest-mass of a neutron), this is the number of neutrons at a point $\mathbf{r}$ passing through a unit surface area, per unit time. For each energy level $E$, it is equivalent to the total path length travelled by the energy-$E$ neutrons per unit volume at $\mathbf{r}$, in a unit of time.

To calculate the rate at which neutron reactions take place, the flux is multiplied by a 'macroscopic cross-section' $\Sigma_j(\mathbf{r},\mathbf{v},t)$. The macroscopic cross-sections are the product of the microscopic cross-sections with the number density of the target nuclei $\Sigma_j(\mathbf{r},\mathbf{v},t) = \Sigma_i \sigma_{ij}(\mathbf{r},\mathbf{v},t) N_i(\mathbf{r},t)$. 

The microscopic cross-sections $\sigma_{ij}(\mathbf{r},\mathbf{v},t)$ express the probability of a reaction of type $j$ between a neutron and a target nucleus of species $i$. There are microscopic cross-sections for neutron absorption, fission, elastic scattering and inelastic scattering. The microscopic cross-sections have units of area per atom, and depend upon the energy (velocity) of the incoming neutron, as well as the temperature of the target nuclei.

The macroscopic cross-sections have units of inverse distance. As such, they define the probability of a reaction per unit of neutron path-length. When multiplied with a neutron flux (equivalent to the total path-length travelled by the neutrons per unit time per unit volume), this yields the reaction rate per unit volume.

One other definition to note is the distinction between 'prompt' and 'delayed' neutrons. The prompt neutrons are released in the fission event itself, whilst the delayed neutrons are emitted after the beta decay of certain fission product fragments, and typically occur some time after the fission event. 

With those definitions in place, let's proceed to the neutron transport equation itself. (The equations which follow are adapted from Mathematical Methods in Nuclear Reactor Dynamics, Z.Akcasu, G.S.Lellouche, and L.M.Shotkin, Academic Press, 1971). We will assume for simplicity that there is a single type of fissile isotope. 

The time evolution of the neutron distribution function is governed by the following equation:

$$\eqalign{\partial n(\mathbf{r},&\mathbf{v},t)/\partial t = - \mathbf{v} \cdot \nabla n(\mathbf{r},\mathbf{v},t) -  \Sigma(\mathbf{r},\mathbf{v},t) \;v \;n(\mathbf{r},\mathbf{v},t) \cr &+ f_0(\mathbf{v}) (1-\beta)\int \nu(\mathbf{r},\mathbf{v}',t) \Sigma_f(\mathbf{r},\mathbf{v}',t) \;v' n(\mathbf{r},\mathbf{v}',t) d^3 \mathbf{v}' \cr &+ \Sigma_{i=1}^{6} f_i(\mathbf{v}) \lambda_i C_i(\mathbf{r},t)\cr &+ \int \Sigma_s(\mathbf{r},\mathbf{v}' \rightarrow \mathbf{v},t) \;v' n(\mathbf{r},\mathbf{v}',t)d^3 \mathbf{v}'}$$ Let's consider the various terms and factors in this equation one-by-one.

$\mathbf{v} \cdot \nabla n(\mathbf{r},\mathbf{v},t)$ is the loss of neutrons of velocity $\mathbf{v}$ from the volume about a point $\mathbf{r}$ due to the flow of neutrons down a spatial concentration gradient $\nabla n(\mathbf{r},\mathbf{v},t)$. The loss is only non-zero if the concentration gradient has a non-zero component in the direction of the velocity vector $\mathbf{v}$.

$\Sigma(\mathbf{r},\mathbf{v},t) \;v \;n(\mathbf{r},\mathbf{v},t)$ is the loss of neutrons of velocity $\mathbf{v}$ from the volume about a point $\mathbf{r}$ due to \emph{any} reaction with the atomic nuclei in that volume. $\Sigma$ is the sum of the macroscopic cross-sections for neutron capture $\Sigma_c$, fission $\Sigma_f$, and scattering $\Sigma_s$.

$\Sigma_f(\mathbf{r},\mathbf{v}',t) \;v' n(\mathbf{r},\mathbf{v}',t)$ is the rate at which fission events are triggered by incoming neutrons of velocity $\mathbf{v}'$. 

$\nu(\mathbf{r},\mathbf{v}',t)$ is the mean number of neutrons output from a fission event triggered by an incoming neutron of velocity $\mathbf{v}'$. 

$\beta$ is the fraction of fission neutrons which are delayed, hence $1-\beta$ is the fraction of fission neutrons which are prompt.

$ f_0(v)$ is the probability density function for prompt fission neutrons. i.e., it specifies the probability that a prompt fission neutron will have a speed $v$, (and a kinetic energy $E = 1/2 m_n v^2$).  It is equivalent to the energy spectrum of prompt fission neutrons.

Assuming the outgoing prompt fission neutrons are emitted isotropically, the probability of a prompt fission neutron having a velocity $\mathbf{v}$ is $f_0(\mathbf{v}) = f_0(v)/4 \pi$.

Hence $f_0(\mathbf{v})(1-\beta) \int \nu(\mathbf{r},\mathbf{v}',t) \Sigma_f(\mathbf{r},\mathbf{v}',t) \;v' n(\mathbf{r},\mathbf{v}',t) d^3 \mathbf{v}'$ is the rate at which prompt neutrons of velocity $\mathbf{v}$ are created at position $\mathbf{r}$ in the reactor, by incoming neutrons of any velocity $\mathbf{v}'$.

$C_i(\mathbf{r},t)$ is the concentration of species $i$ delayed neutron precursors, and $\lambda_i$ is the decay constant of that species, hence $\lambda_i C_i(\mathbf{r},t)$ is the decay-rate of the $i$-th delayed neutron precursor. This is equivalent to the production-rate of delayed neutrons from the $i$-th precursor. 

$f_i(\mathbf{v})$ is the probability density function over velocity for delayed neutrons produced by the $i$-th precursor species, hence $\Sigma_{i=1}^{6} f_i(\mathbf{v}) \lambda_i C_i(\mathbf{r},t)$ specifies the rate at which delayed neutrons of velocity $\mathbf{v}$ are created at position $\mathbf{r}$ in the reactor.

$\Sigma_s(\mathbf{r},\mathbf{v}' \rightarrow \mathbf{v},t)$ is the macroscopic cross-section for elastic or inelastic scattering events in which an incoming neutron of velocity $\mathbf{v}'$ transitions to an outgoing neutron of velocity $\mathbf{v}$ by colliding with a target nucleus. Hence $\int \Sigma_s(\mathbf{r},\mathbf{v}' \rightarrow \mathbf{v},t) \;v' n(\mathbf{r},\mathbf{v}',t) d^3 \mathbf{v}'$ specifies the rate at which neutrons of velocity $\mathbf{v}$ are created at position $\mathbf{r}$ in the reactor by incoming neutrons of any velocity $\mathbf{v}'$ scattering with target nuclei.

The concentration of each delayed neutron precursor satisfies the following equation:
$$\eqalign{\partial C_i(\mathbf{r},t)/\partial t &= \beta_i \int \nu(\mathbf{r},\mathbf{v}',t) \Sigma_f(\mathbf{r},\mathbf{v}',t) \;v' n(\mathbf{r},\mathbf{v}',t) d^3 \mathbf{v}' \cr &- \lambda_i C_i(\mathbf{r},t)} $$ $\beta_i$ is the fraction of fission neutrons produced by the delayed neutron precursor of species $i$. Hence, $\beta = \Sigma_i \beta_i$.

The rate at which delayed neutron precursors of species $i$ are produced by fission events is given by $\beta_i \int \nu(\mathbf{r},\mathbf{v}',t) \Sigma_f(\mathbf{r},\mathbf{v}',t) \;v' n(\mathbf{r},\mathbf{v}',t) d^3 \mathbf{v}'$, and the rate at which they decay is given by $\lambda_i C_i(\mathbf{r},t)$.  

Note that, unlike the general Boltzmann equation, there is no term for neutron-neutron interactions. Neutron densities in a reactor are only of the order of $10^9/cm^3$, compared to the number density of atomic nuclei, which is of the order $10^{22}-10^{23}/cm^3$. Hence, neutron-neutron interactions can be neglected.

Now, the interesting point about the neutron transport equation is that the most important terms have the form of integral transforms:
$$(Tf)(x) = \int K(x,y)f(y) dy \, $$ where $K(x,y)$ is the 'kernel' of the transform. In discrete form, this becomes:
$$ (Tf)(x_i) = \Sigma_{j} K(x_i,y_j)f(y_j)  \,.$$
The neutron transport equation takes neutron fluxes as the input, and calculates reaction rates, and thence heat production, by integrating those fluxes with macroscopic cross-sections. The macroscopic cross-sections provide the kernels of the integral transforms. In general schematic terms:
$$\text{Output}(\mathbf{r},t) = g\left(\int \Sigma(\mathbf{r},\mathbf{v'},t) \;\text{Input}(\mathbf{r},\mathbf{v}',t)d^3 \mathbf{v}' \right)\, ,$$ where $g$ is some function.

For example, given the heat produced per fission event, $W_f$, the rate of fission-product heat-production throughout the reactor is given by: $$H_f(\mathbf{r},t) = W_f \int \Sigma_f(\mathbf{r},\mathbf{v}',t) \;v' n(\mathbf{r},\mathbf{v}',t) d^3 \mathbf{v}'\, .$$ Neural networks which implement convolutions have become immensely powerful tools for pattern recognition in recent years. Convolutions are a particular type of integral transform, so let's briefly recall how such neural networks are defined.

On an abstract level, a neural network consists of a set of nodes, and a set of connections between the nodes. The nodes possess activation levels; the connections between nodes possess weights; and the nodes have numerical rules for calculating their next activation level from a combination of the previous activation level, and the weighted inputs from other nodes.

The nodes are generally divided into three classes: input nodes, hidden/intermediate nodes, and output nodes. There is a directionality to a neural network in the sense that patterns of activation propagate through it from the input nodes to the output nodes, and in a feedforward network there is a partial ordering relationship defined on the nodes, which prevents downstream nodes from signalling those upstream.

For the implementation of integral transforms, feedforward neural networks with strictly defined layers are used. The activation levels of the input layer of nodes represents the values of the input function at discrete positions; the weights represent the values of the discretized kernel; and the values of the nodes in the layer beneath represent the discretized version of the transformed function.

Thus, the activation levels $x^l_i$ of the neurons in layer $l$ are given by weighted sums over the activation levels of the neurons in layer $l-1$:
$$x^l_i =  f \left(\sum_{j=1}^{n}W^l_{ij}x^{l-1}_j \right), \,  \text{for } \, i = 1,\ldots,n$$ $W^l$ is the matrix of weights connecting layer $l-1$ to layer $l$, with $W^l_{ij}$ representing the strength of the connection from the $j$-th neuron in layer $l-1$ to the $i$-th neuron in layer $l$. $f$ is a non-linear threshold function.

Now, the empirical cross-section data required for nuclear reactor kinetics is far from complete. The cross-sections are functions of neutron energy, and can oscillate wildly in some energy ranges, (see diagram below). The cross-section curves are therefore not defined to arbitrary levels of resolution.


Perhaps, however, neural networks can offer a solution to this problem. If the input layer of a neural network is used to represent the neutron flux at various positions throughout a reactor, and the output layer is used to represent, say, the temperature levels measured throughout the reactor, such a network could be trained to predict the correct temperature distribution for any given pattern of neutron flux. It would do so by adjusting its weights, and because the weights would represent the nuclear reaction cross-sections, this would provide a means of filling in the gaps in the nuclear physics datasets.

Questions one might pose immediately include the following: (i) are the neutron flux measurements in a reactor sufficiently accurate to facilitate this technique; and (ii) how many neurons would one need in a layer of the neural network to provide the necessary degree of spectral resolution?

These are questions I don't yet know the answer to...