Wednesday, January 30, 2008

Florida Votes Against McCain Too!

We are now being told that McCain has 'momentum'. Why, just look at the difference between South Carolina and Florida! Big improvement for McCain huh? 33% to 36%. wow.
Why he's the proverbial steamroller !

Seriously, the only recent developments I am really alarmed about is how there seems to be a push to marginalize McCain's detractors in the Conservative base as the 'fringe' and the absolute gullibility of the Huckabee supporters and what they might do in the coming days and weeks.

Saturday, January 26, 2008

Loose Lips (You Know the Rest)

From the AP's Eileen Sullivan Today:

"The officials spoke on condition of anonymity because the information is classified as secret."

Geez. I remember when 'officials' WOULDN'T even speak if the information was classified. Of course, once upon a time most 'officials' were concerned about being traitors instead of just being known as one.

Hey AP! Can Eileen do a waterboarding story next so we can find out who's got the loose lips?

Thursday, January 24, 2008

The Jack Cafferty, George Soros, and Moonbat Brigade

They're the Axis of Idiocy!

I saw the so-called War Card ‘study’ announcement earlier this week and of course recognized it immediately for the hogwash it was. I figured at the time anyone with at least two brain cells could do the same and didn’t pay much more attention to it. Unfortunately I forgot about the people with their two brain cells sitting so far apart in their punkin’ heads that have to tilt their head side to side to get them to roll close enough for a spark to jump.

Yes, I’m talking about people like know-nothing Jack Cafferty!

Now, I haven’t really watched CNN since Desert Storm, but I accidentally clicked on Cafferty’s (Warning: BDS ALERT!) vile little rant with readings of moonbat-dominated feedback on my internet provider’s ‘content’ site.

Contrast the content-free snarkfest at CNN with Bob Owens’ exposure and analysis at PJM.

Facts are stubborn things Mr. Cafferty! So start clunking those two grey cells together and maybe, just maybe, you will be able to recognize one someday.

Sunday, January 20, 2008

GOP Votes AGAINST McCain in South Carolina

Geez. Some ‘winner’.
(Up front: I’m with the Discerning Texan and FOR Thompson.) South Carolina was really the first test of Republican candidates with a Conservative ‘factor’ and nearly 70% of Republicans voted for someone other than John McCain in yesterday’s primary.

How many of those 67% had McCain as their SECOND choice? How many had him at or near their LAST choice? Well let’s think about it. Here’s the breakdown:



Almost as many for Huckabee, and Thompson and Romney split an almost equal amount, with Giuliani and Paul picking up the change. What would drive most of the other Republicans to McCain in future contests? His stand on illegal aliens? His support for so-called 'Campaign Finanace' reform?

McCain is a classic RINO, and I see nothing to discern him from a pro-War On Terror Democrat. I see nothing-- NOTHING that would make me want to vote for McCain, and a lot that would make me vote against him.

McCain got only 33% of the vote in a state where he had an established organization? Except for the ‘bandwagon’ types, McCain is at his peak. That peak will only be good enough if the opposition remains fractured. If McCain wins the Republican nomination via the fragmentation of his opposition, a small majority of Republicans will vote for him holding their nose in the General Election. The rest will stay away in a funk and the next President will be a Democrat in a landslide.

Huckabee will fade like a Marine haircut. Thompson and Romney will pick up parts of Huckabee’s support. Giuliani will hang in there collecting ‘moderates’ and Ron Paul will remain a sideshow.

For all the hoopla the press creates, you'd swear McCain has some kind of momentum. Two points:

1. There's only been about 150 delagates of over 1900 to the National Convention declared so far.

2. If Fred Thompson keeps doing better every time out, the media can’t continue ignoring him: and he’s the only candidate that actually looks better the more you look at him.

I’m with Fred as long as he's in the fight.

Update 01/22/07: Well it didn't take long. Fred is no longer in the fight. Crap. Looks like it will be between Romney and Giuliani for me. Romney is ahead.

Saturday, January 19, 2008

We'll try this for a while

There is now an e-mail address on the right-hand column for anyone interested in contacting me but does not want to create a Blogger ID first.

If people read this blog at all they will notice that sometimes (OK often) I have a hard time finding the time to even post on simple things, and my real life is getting even more interesting very soon. Hopefully everyone will understand if I don't answer all or most e-mails.

I'll keep the mailbox as long as things don't get out of control.

Regards to all

Sunday, January 13, 2008

Horton is the Who Part III: More Lancet Dirt

Gee....this is getting kind of monotonous. At this rate I'll need a separate category just for Lancet expose's.

The story so far:

Horton Is the Who Part I :(updated w/direct YouTube link.): Meet the Lancet Editor Richard Horton!

Has Lancet Fired Horton Yet?: Meet 'study' author Les Roberts

Horton Is the Who Part II: See serious inconsistencies in the study exposed!

Now we find out Anti-American George Soros is the enabler behind the "Gagillion Iraq Deaths" Study.

What next? Are we going to find out the survey crew doing the canvassing were Al Queda?

And still I ask: Has the once prestigious Lancet fired Horton YET?!

Saturday, January 12, 2008

Krugman Warns...Again (Yawn)

Megan McArdle at Atlantic.com’s Asymmetrical Information calls out Paul Krugman and his latest prediction for “Recession”-- noting he has been worrying about one for quite some time. She lists a series of Krugman quotes:

--"[R]ight now it looks as if the economy is stalling..." — Paul Krugman, September 2002

--"We have a sluggish economy, which is, for all practical purposes, in recession..." — Paul Krugman,
May 2003

--"An oil-driven recession does not look at all far-fetched." — Paul Krugman,
May 2004

--"[A] mild form of stagflation - rising inflation in an economy still well short of full employment - has already arrived." — Paul Krugman,
April 2005

--"If housing prices actually started falling, we'd be looking at [an economy pushed] right back into recession. That's why it's so ominous to see signs that America's housing market...is approaching the final, feverish stages of a speculative bubble." — Paul Krugman,
May 2005

--"In fact, a growing number of economists are using the "R" word [i.e.,"recession"] for 2006." - Paul Krugman,
August 2005

--"But based on what we know now, there’s an economic slowdown coming." - Paul Krugman,
August 2006

--"this kind of confusion about what’s going on is what typically happens when the economy is at a turning point, when an economic expansion is about to turn into a recession" - Paul Krugman,
December 2006

--"Right now, statistical models ... give roughly even odds that we’re about to experience a formal recession. ... [T]he odds are very good — maybe 2 to 1 — that 2007 will be a very tough year." - Paul Krugman,
December 2006

So, how does all this doom compare to the economic record?

Like this:


Eventually of course we will have a recession and Krugman will get lucky... just like the proverbial ‘blind pig’. The real question is: how deep and how long ?

At McArdle’s place I commented with an extract of a G.B. Shaw quote:
"If all the economists were laid end to end, they'd never reach a conclusion."
Which I enjoy only marginally more than:

Q: Why did God make economists?
A: To make the Weathermen look good.

I loved Econ in college (sick, I know) but I've always marveled at how economists who live by the phrase of "All other factors held constant" never seem to fully appreciate how fantastic that assumption really is.

How 'Fraidy Cats' Do Public Relations

The 'Lifeboat Foundation' has a little 'poll' going on at their blog .

H/T Instapundit

Here's how I responded (w/typos and grammer cleaned up a titch)to the question of allocating their hypothetical $100M budget:

$1M Biological viruses [By improving early warning and mitigating/preventative actions]
$1M Environmental global warming [To study ways to exploit its benefits since we can't do anything about climate change anyway]
$0 Extraterrestrial invasion [If "they" can get here, what are we going to do to stop it?]
$34M Governments abusive power [3/4 to subvert totalitarian regimes and promote free markets abroad, and 1/4 to teach American History, Civics, and the Constitution in the U.S.]
$0 Nanotechnology gray goo [Free market will take care of this]
$1M Nuclear holocaust [By Adding 1% to a baseline 6+% GNP DEFENSE and Intelligence covert action budget]
$0 Simulation Shut Down if we live in one [E.T. Quote: "This is REALITY, Greg"]
$.05M Space Threats asteroids [High Risk (Low Probability & High Consequence) easily mitigated through current technology and development pace]
$0 Superintelligent AI un-friendly [Free market will take care of this too]
$62.95M "Other" To be allocated as needed to educate the American public on the nefarious ways in which Non State Actors (Including the United Nations) attempt to subvert the American Republic on behalf of despots, tyrants and utopian fantasists, with special attention to: 1. self-important celebrity 'activists' ,
2. discredited political and social movements such as socialism, fascism, and communism
3. adaquate mental health care counseling for the paranoiacs and mentally deficient among 1 & 2(likely to be the biggest slice of the budget pie).

$100 million total

Easy.

Sunday, January 06, 2008

It’s not the figures that lie, it’s the liars that figure

Or: “The press sabotages the economy news in an election year”

I saw this in my local paper and it is also of particular interest to an associate of mine so I thought I would 'blog' it.

Our Man in Academia, Dr Paul, is particularly attuned to how the press seems to bury ANY positive news about the economy and how the media seems to always try to emphasize the ‘bad’. (He’s not the only one by the way). I knew he would love this one.

Into the mix of economic news comes a little hit-piece by R.A. Dyer (January 2, 2008, page 4A) via our local paper, the Fort Worth Star-Telegram, once again reminding us why many locals often refer to it as the ‘Startlegram. The article is titled: “Surprise, your paycheck doesn’t go as far as it used to. Here’s why” with the tickler “Area Hit Hard by Energy Costs”. The online version can be found here.

The major problem with the story is that it claims that the impact to incomes due to energy costs has increased significantly since 2000… WITHOUT telling us exactly WHAT the impact is. The author seems more intent on presenting some rather soft data in as dramatic a fashion as possible, than in providing any real and useful insight into what changing energy costs really mean to the public. As a further indicator of some sleight-of-hand going on, the author uses about 20% of the column space presenting purely anecdotal information! The article is frankly: ‘all heat and no light”.

The article paints a bleak picture indeed:

“Combined energy costs – gasoline, home heating and electricity – for a typical local household [possibly the author’s household?] have gone up 72 percent.”

Rousing 'Average Reader'
It is easy to see how someone who is trusting and/or not well versed in mathematics , let’s call them ‘Average Reader ‘, might get some wrong ideas from this article. Average Reader would look at the number 72 percent and divide it by 7, and think they were experiencing increasing energy costs that were rising 10.3 percent per year. This is only the first incorrect assumption since mathematically you cannot simply divide 72 by 7 and get the correct result. Average Reader would then try to put that number in perspective and think “Holy smokes, my raise was only 4.8 percent and energy costs went up more than twice (10.3 percent) as much!”

Now, 72 percent does sound like a lot, but that averages out to 8.1 percent per year (not 10.3) over 7 years. 8.1 percent is not nearly as dramatic a number as 72 percent and somewhat less dramatic than 10.3 percent. This is not to deny it is still a significant increase when compared to the annual pay increase number given, but again, what is the real impact?

More directly put: HOW MUCH OF OUR INCOME HAS BEEN AFFECTED BY THIS INCREASE? The author doesn’t say, and in fact appears to NOT want to divulge this all-important parameter by carefully arranging his argument around it.

The author dances only so close as he dares to the heart of the question with: “The findings show a 55 percent increase between 2000 and 2006 in the proportion of monthly income that goes to energy.” OK, we got it. Now what does it mean? A 55% increase of ‘what’ exactly? How much real dollar difference is involved?

In a nutshell, Dr. Paul would point out that the author is just giving us the change in the data, but not what the data IS. For the mathematically inclined, this is akin to giving the derivative of the function, but not the function. Again, Average Reader would incorrectly come up with around 9.2 percent (if Average Reader assumed the rather vaguely-stated “between 2000 and 2006” meant 6 years), but in reality this would actually be only about 7.6 percent per year for 6 years, Not nearly as dramatic as 55 percent, is it?. Also, note that this is the PERCENT CHANGE to the PROPORTION of income spent on ‘energy’ and is NOT the INCREASE in AMOUNT THAT IS SPENT on energy.

This almost seems like intentional obfuscation, doesn’t it?

Next, the author states that incomes have only risen 29 percent between 2000 and 2006. Average Reader would again (and incorrectly) just divide the percentage by the number of years to get 4.8 percent, but the real answer is a somewhat smaller 4.3 percent per year.

Dr. Paul offers an exercise using two examples. This is made somewhat more difficult than it could be, because of what the author leaves out: the percentage of income spent on energy either to begin with or at the end of the timeframe. Like the word problems the author undoubtedly despised as a child, if you have ‘n’ variables in one equation, you have to have ‘n-1’ that are known to solve for the third. The author only presents the CHANGE in the percentage spent on energy and the AVERAGE amount spent on energy.

(If tables are unreadable click to enlarge)

Nevertheless, if we assume an annual salary of $23,000 in 2000:



The example above shows how income could increase by 29 percent and the percentage spent on energy could increase by 55% and what that means in dollars and cents to the household. As a percentage of household income, the change is between 8.4 percent (2000) and 12.9 percent (2006). Again, note that we do not know whether the data in the third column is representative or not -- since the author did not provide it.

To illustrate the major shortcoming of the article, consider the following equally valid yet very different example. This time let’s assume that in 2000 the income is $70,000 instead of $23,000.
Again, the second example shows how income could increase by 29 percent and the percentage spent on energy could increase by 55% and what that means in dollars and cents to the household. However, in this example the change is relatively small as a percentage of household income, and is between 2.9 percent (2000) and 4.5 percent (2006). Average Reader STILL does not know whether the data in the third column is representative (or not) since the author did not provide it.

To make matters worse, the author then presents anecdotes as authoritative information. The author (apparently) interviewed two ‘people on the street,’ so to speak: a retired insurance salesman, Cyrus Francis and a telephone company employee, Pamela Bradford.

Mr. Francis, who has a “7 mile-per-gallon motor home,” is quoted as follows:
“I used to not consider energy a major expense. Before, the major expenses were the mortgage or the car payment. But now, it’s the energy costs that are the major expense coming out of our budget.”

Could it be that Mr. Francis has no car payment and no house payment and lives in a motor home in his retirement? We do not know and cannot tell from the article. By presenting Mr. Francis’ case, the author has committed the Appeal to Emotion fallacy.

Ms. Bradford appealed to the Texas Railroad Commission (TRC) in 2005 over her high natural gas bills. Note that the TRC regulates natural gas prices in Texas. The striking part of this part of the article is the following quote: ‘“Everybody is feeling the crunch,” Bradford said.’

How can Pamela Bradford, who works for the phone company, possibly know what’ everybody is feeling’? This is a good example of the Appeal to Belief fallacy.

Adding yet another odd dimension to the article is the rather bizarre attempt to 'plug-in' some gravitas via commentary from a UT Austin economics professor:

Mike Brandl, a University of Texas at Austin economics professor. "What people don't realize is that the cost of living -- and especially energy costs -- are increasing faster than their pay."
While an interesting phrasing if the quote is accurate, IMHO it might have been better stated as: What people don't realize is that the cost of living is increasing faster than
their pay -- and that is largely due to energy costs --.

Without stating at this time whether Dr Paul or I either agree with or doubt the statement that asserts cost of living is increasing faster than pay, mine is the better phrasing because as shown here and in the graph below (downloaded 01/04/08 –it will change over time) we see that lately, overall inflation is more often than not driven largely BY energy costs.


We can ALMOST get some idea of the real net impact of rising energy costs by observing how low overall inflation is, both with and without the intrinsically linked ‘food and fuel’ numbers. This graph shows a lot of interesting data that perhaps we could discuss at another time, but what it tells me first and foremost is two things:

1. That the author of the paper 'cherry picked' and truncated his time-span (shades of global warming alarmists!) to cover the timeframe where energy costs were moving from a DEFLATIONARY influence in early 2002 to a peak inflationary influence around in 2005-6. While the terminus may have been determined by the cutoff of available statistics, the starting point was clearly selected as convenient (perhaps sub-consciously?) to suit the author’s ‘message’.

2. That clearly, if rising energy costs are driving inflation, then based upon the data the author presented, the total inflation is being held in check by factors other than energy because the data the author offers as representative of energy cost growth far exceeds the net impact of all factors in the inflation rates shown.

Would it also be too simple to point out here that if Average Reader would merely overlay the author’s numbers for income growth over the same timeframe, it would become apparent that income growth probably approximated and at times exceeded the inflation rate? If from the numbers offered by the author the rising cost of living increased faster than pay overall, it couldn’t have been by much.

Back to how Professor Brandl’s inputs are further manipulated. I think the author by either clever device or pure ignorance crafts a remarkable series of three paragraphs here:

Brandl said the cost of other basic needs such as groceries has also continued to creep up. In mid-December, for instance, the government reported a 0.8 percent increase in consumer prices -- the highest rate of increase since November 2005.

Brandl said extra expenses can lead to increased dependence on consumer debt, which then becomes part of a vicious cycle and takes a toll on disposable income.

"We're seeing increasing consumer debt, with the idea that consumers are making up the difference between income and spending by putting more on their credit cards, or accessing the home equity of their homes," he said. "This is very dangerous."

They are seemingly intended to read ‘Wham!”,”Bam!”,”Thank-You Ma’am!”, but in reality are “Blip”, “So”, and “Oh, By the Way”.

First the ‘Blip’: “In mid-December, for instance, the government reported a 0.8 percent increase in consumer prices -- the highest rate of increase since November 2005”. By itself this statement means exactly: "zip". If viewed in reference to our inflation plot above, we see that ‘big increase’ started from a LOW POINT. This could mean a lot in the long run, but it could also mean ‘squat’ in the long run. Right now, it is as meaningful as talking about the large decrease that happened a little while beforehand. It is a little snippet of data that is meant to imply a harbinger of bad things to come. I would characterize this as a Hasty Generalization as it was employed

Which Brings us to the ‘So’: “…extra expenses can lead to increased dependence on consumer debt…” So? Not changing you engine oil CAN lead to excessive wear. Yes, the logic rings true: but it isn’t demonstrated to be happening in the article – So What? As written (without data supporting the assertion), this should very well be considered a form of False Dilemma.

Now the OBTW: “increasing consumer debt, with the idea that consumers are making up the difference between income and spending by putting more on their credit cards”……"This is very dangerous."
“The IDEA? ’ if it is -- tell us, if it isn’t -- don’t bring it up! As to “This is very dangerous"—We agree. Thanks for the tip. Another potential False Dilemma!

Please note, on my end there is no real contention with the Professor’s sentiments and observations beyond perhaps whether or not pay is rising apace with the cost of living. My complaint with this portion of the article is that it appears, either via an author’s sloth or an editor’s overreach, that all meaningful relevance to what Dr. Brandl had to say is missing from the article.

Summary
This article was a pure 'puff piece' designed to evoke a particular emotional response. From the comments at the Star Telegram’s website, I’d say they they’ve gotten Average Reader’s number.

Dr. Paul thinks Math Teacher (in the comments) has a good idea for using the article and also plans to use it to teach statistics next semester.

In reviewing his textbook (Elementary Statistics - A Step by Step Approach - Bluman, pp. 16-19) Dr. Paul notes the following uses and misuses of statistics are cited:

1. Suspect samples - not specifying the size of the sample set; convenience samples; phone-in or mail-in surveys

2. Ambiguous averages - mean, median, mode, midrange?

3. Changing the subject - different values used to represent the same data. For example, if 6 million dollars is 3%, use the number which generates the desired effect. ("rose a whopping 6 million dollars" versus "rose only 3 percent")

4. detached statistics - a statistic in which no comparison is made - aspirin works 4 times faster - faster than what?

5.Implied connections - imply connections between variables tha may not actually exist

6. misleading graphs

Dr Paul finds this article guilty of at least numbers 2, 3 and 5. [He also informs me that Saturday’s (01/06/07) 'Startlegram' got around to violating #6 on page 6C with some ‘employment’ graphs (note the scales).]

Mr Dyer’s article is just one contemporary illustration of how statistics are used, or rather ‘misused’, by the media... and how the articles can mislead readers and woefully misrepresent the truth.

Friday, January 04, 2008

Horton is the Who...Part II



As in “Who is STILL the ranting Anti-globalist, Anti-capitalist, Anti-Western, Useful Idiot, Lancet Editor with his panties in a knot?”

After this revelation, NOW will the Lancet finally get around to firing Horton ?

Wednesday, January 02, 2008

Still Waiting For The Reunion in Hell

I can update this longer than you can hang around 'El Comandante' -- and my Tito's is still 'action ready'.

Tuesday, January 01, 2008

Extreme Dust Test: M4 and Others

Hat Tip: Christian Lowe @ DefenseTech.org

Note: I would have published earlier but I had sent this out for a ‘peer review’ of sorts. Special thanks to Don Meaker at Pater’s Place for taking the time to review this over his holiday. Of course, any errors that remain are my own. --Thanks again Don!

DefenseTech was way out in front of a pack of media sources when it posted two pieces (see here and here) about a recent “Extreme Dust Test”, that the Army conducted on the M4 and three other ostensibly ‘competitor’ rifles. The summary of results as provided in the article were ‘interesting’ to say the least, and I was particularly struck by the near instantaneous eruption of reader comments calling for radical action and remedies that came from readers on both the posts. At the time, I believed the calls were clearly unwarranted given how much was unknown about the testing. I made a comment on the second article stating that I would defer forming an opinion on the results of the test until I had more data in hand. I wrote:

Frankly, having been a reliability engineer, and without verifiable complaints from the users in the field, I would not form an opinion on this until I studied the supporting data. For starters, I'd need to know the failure distribution,the specific conditons under which the failures occurred, and failure modes ('jam' is a failure, a mode is a 'how' that has a 'why') to even begin to understand if there is a problem, and if there is a problem is it with the weapon or the way it is employed.

If there is a problem, is there a fix that is easier and cheaper than buying new weapons?While history is rife with examples of Army 'Not Invented Here' syndrome, unless there is good evidence that Army weapon evaluators WANT to field problem weapons, I see no reason to doubt the testers at this time.

Well, from the subsequent response to my cautionary note, one would think I had called for dissolution of the infantry! The M16 (and derivatives like the M4) have brought out more personal opinions and controversy than perhaps anything else in weapons acquisition (for any service of any scale) except perhaps the 9mm vs. 45ACP pistol arguments. I think that the M16/M4 actually IS the most controversial issue of the two, because it usually inspires rhetoric on two fronts: reliability AND stopping power. Both controversies are rooted, I believe, in the fact that there is nothing more personal to the warrior than the weapon the warrior wields – and there are a lot more warriors with rifles than tanks, aircraft or ships.

I decided to cast about for more information, but there really isn’t a lot of public and available information that is attributable to an either authoritative or verifiable source. Among a lot of rather alarmist and inflammatory articles and postings (just Google “M4 Dust Test”) I found little objective reporting and only a few tidbits not already covered by DefenseTech that were ‘seemingly’ credible (if unverifiable), such as this piece from David Crane at Defensereview.com.

So, you want some (unconfirmed/unverified) inside skinny i.e. rumor on the latest test, something you most likely won’t find anywhere else, even when everyone else starts reporting about this test? Here ya’ go, direct from one of our U.S. military contacts—and we're quoting:

"1. Because the HK416 and M4 were the only production weapons, the ten HK416 and M4 carbines were all borrowed 'sight unseen' and the manufacturers had no idea that they were for a test. The 10 SCARs and 10 XM-8s were all 'handmade' and delivered to Aberdeen with pretty much full knowledge of a test. (The SCAR even got some addition help with 'extra' lubrication)

2. With the HK416, 117 of the 233 malfunctions were from just one of the 10 weapons.

Interesting stuff! ….and credible, given that only the HK416 and M4 are in ‘full production’. But like I said: “unverifiable” by me at this time. (Later on we’ll see some things that tend to support Mr Crane’s ‘contact’.)

None of the information ‘out there’ was of use in determining answers to any of the questions that I had posed in my original comment, so I had reluctantly set the idea of further analysis and moved on. That is, I was moving on until Christian Lowe, the author of the original DefenseTech articles generously asked if I was interested in a copy of a ‘PEO Soldier’ briefing that he had been given. Of course, I said “yes please!”.

After dissecting the briefing, I still have more questions than answers -- some of which the answers to may never be released. The answers the briefing does provide are more philosophical than technical (But all things considered, that is alright with me).

As it is, the briefing provides some insight into what the Army was doing and how much importance we should place on the results -- given 1) where the Army is in its test efforts and 2) to what end it hopes to satisfy in performing these tests. I think it also points to the some of the things the Army is going to have to do in the future to get the answers it needs.

I have decided to present and parse the briefing, with my analysis of the data contained therein, slide by slide. I will limit my discussion and questions to ONLY those details surrounding the test articles, test conduct, test design, and test results that can be determined with certainty. I will speculate as little as possible. But when I do, it will be stated in the form of an opinion supported by the facts in hand or be presented as a posing of a question that the briefing or testing raises in my own mind, and will not be given as an assertion of fact.

Overall Briefing Impressions
Before getting into the details of the briefing, let me provide my initial observations on the brief as a whole. First, from experience in preparing this kind of presentation I can tell it was specifically tailored for Executive/General Officer review originally. If this briefing wasn’t intended for that purpose, I’ll bet the person who put it together usually does it for that purpose. The ‘tell’ is found in both the organization and level of detail provided. Much more detail and the briefer would be chewed out for wasting time, and if there was much less detail the briefee would see too many open issues and the briefer would be thrown out for not being prepared. There are elements of the briefing that make it clear it was intended for someone not familiar with the nitty-gritty of testing or data analysis. I think it is a tossup whether those details were provided exclusively for public consumption or to provide perspective for a potentially nervous ‘operator’. The brief is organized to tell the audience five things in a fairly logical flow:

1) what they were trying to accomplish,
2) what they did (1 and 2 are somewhat intermixed),
3) what was found,
4) what it means, and
5) what comes next.
The brief intermixes the first two ‘a bit’ and I would have arranged the information slightly differently. I suspect the briefing is slightly pared down for public consumption than in its original format (you will see a revision number in the footer of the slides). There is only one slide I think that I would have composed very differently and I will go over why when we see it later in the post. I hereby acknowledge that my preference may be due as much to Air Force-Army service differences as anything else. The slides are data heavy, with a lot less gee-whiz PowerPoint than what you’d find in an Air Force brief. In has, in other words, typical Army style and content.

The presentation totaled 17 slides and was created on 12 December 2007. Slide 1 is simply the title slide: “Extreme Dust Test” and Slide 17 is a call for questions. The meat of the briefing in between these slides will now be discussed.

What They Were Trying to Accomplish

Slide 2:

The first thing that strikes me about the ‘Purpose’ slide is that there is no mention whatsoever that, as it has been reported, this particular test was performed to appease any ‘outside’ concern. Whether this relationship is omitted out of convenience or perhaps even not true, we cannot determine from the briefing. What IS clearly stated, is that the Army is collecting information to help generate ‘future requirements’. So perhaps this effort to develop new requirements is the first step in response to a certain Senator’s call for a competition prior to acquisition of new rifles?

Most interesting is the point that this test is an adaptation of an earlier ‘lubricant test’, and that it is an ENGINEERING test and NOT an OPERATIONAL test. In subsequent slides we will see that this is clearly the case, and one wonders what useful data the Army hoped to gain from performing this test, beyond learning how to use it as a starting point: the beginning of designing a meaningful dust test. It must be noted that both the reuse of a deterministic test design already in hand, and the purpose “to see what we can see” is completely within the analytical character of the Army, which has been noted and described by the late Carl Builder (who in many ways admired the Army most above all the other services) in his 1989 book Masks of War (Chapter 10).

Under “Applicability’ on this slide is a list of what this test did NOT address. in only a roundabout way does this slide state that the only real information they expected to acquire was related to ‘reliability performance in extreme dust conditions’. And nowhere in the brief is it stated or implied that the Army was expecting to get definitive answers with direct implications in the operational arena. As we will see later, this was not as much a ‘functional use’ test as it was an ‘abuse’ test.

To my ‘Air Force’ mind, this test and analytical approach doesn’t really make a lot of sense unless the results are specifically for use in designing a meaningful test later. So I again turn to Builder who summarizes in Masks of War (at the end of Chapter 10) the differences between the questions the different services ‘pursue through analyses’:

Air Force: How can we better understand the problem – its dimensions and range of solutions?

Army: What do we need to plan more precisely – to get better requirements numbers?
And thus it does appear that the test objectives are wholly within a normal Army analytical approach, so I’ll take the reasons given for the test at face value.

My Interpretation of the Army Objectives: “We intended to reuse a test we had already developed for another purpose to gain insight into only one facet (dust exposure) of weapon reliability by testing weapons in conditions well beyond what would ever be experienced in the field and if we learn something we will feed that knowledge into future requirements”.

What They Did


Slide 3

There’s a couple of things on this slide that leap out at the viewer. First and foremost, while the test has been described as a “60,000 round test”, that is a somewhat misleading and imperfect description. More accurately, it should be described as 10 trials of a 6000 round test performed using 10 different weapons of each type (later on we will find reason to alter the definition further). I assume that when the Army calls it ‘statistically significant’ they have the data to support that firing 6000 rounds through a weapon (apparently to a system’s end-of-life) is a meaningful benchmark. And that performing the test 10 times on different weapons is enough to meet some standard (and unknown) statistical confidence interval. The second thing that leaps out is the simplified list of ‘controls’, knowing there are a host of potential variables in any test involving human input as well as a lot of other material variables to be controlled (like using common lots of ammunition and lubricants) as possible confounding factors. The human variable in any experiment is difficult to control which is why test engineers strive to automate as much as possible. I suspect the large number of rounds fired per test is designed to ‘average’ the human variability as much as anything else.

Slide 4

I found this slide highly illuminating as to the nuts and bolts of the test. First it clearly shows the level of dust buildup on the weapons: a solid coating that would be impossible to collect on a weapon being carried: I guess you hypothetically could find one like this in garrison until the CSM came around. I don’t believe it would be a leap of faith to assert that just carrying the weapon would tend to clean it up and make it cleaner than what you see here. Second, the slide shows a technician/tester firing a weapon from a bench setup. Can you imagine the tedious repeated firing of the weapons using selective fire in this environment? Now I am also wondering how did they control the timing/gap of the ‘resqueeze’ sequence on one magazine and between magazines? How sensitive is each weapon design to continuous cycling, and how does that relate to the operational need? Is the human operator more adept at clearing some malfunctions than others? How many operators fired and reloaded each type of weapon? Did they rotate responsibilities among the different weapon types to remove any operator variables? (I told you there would be more questions than answers).

Slide 5 “Slide 5” is kind of an intermediate summary chart to tie all the information already given to the briefee in slides 2,3 & 4 and place it in front of them one more time before showing them the results. This is not a bad idea in any briefing, but especially sound if you want to keep misunderstanding out of expectations and reactions.

What Was Found

Slide 6 This slide is the first indication to me that there was possibly a slide or two removed for this particular audience, because this is the first reference given to the “Summer of ’07” test in the briefing.

I believe the difference between the M4 results in the Fall and Summer ‘07 M4 tests is the most significant piece of information in the brief, because that disparity calls into question the entire test design as well as its execution. If dissection of the test event conduct (for either try) identifies no ‘smoking gun’ errors that would explain the reason(s) why on the second go around for the M4, the C1 & C2 Weapon stoppages were 4+ times greater, C1 & C2 Magazine Stoppages were 60+% higher, and the total of the two as well as Class 3 stoppages were nearly 2 times higher, then I would suspect that the test design itself is flawed OR the variability of the units under test is far greater than anticipated. The only way to decide between the two without other data is to perform repeated testing, preferably again on all the weapon types, to determine if there is an identifiable pattern/distribution of outcomes. I would be surprised if the Army didn’t have reams of other tests and test data to use in evaluating this test and eventually determining this point, but just from the data presented here, I don’t see how the Army could reach ANY firm conclusions on the relative or absolute performance results, and from the ‘barf box’ at the bottom of the slide it looks like they are scratching their heads at this time.

I’m still bothered by not knowing the relative ratios of Class 1 and Class 2 malfunctions and the ‘open ended’ time limit of the Class 2 definition, because you could have (for example) 100 Class 1s at 3 seconds downtime apiece and that looks as bad as having 100 Class 2s that are 30 seconds downtime apiece. The number of malfunctions is important as a single facet of performance, but the number of malfunctions times the downtime for each one is the TRUE measure.

Quantitatively, because the test is suspect due to the non-repeatability of the M-4 data, all you can say about the total malfunctions so far is that the M4 had many more times the failures than the other carbines THIS TIME (again, in conditions well beyond what should ever be experienced in the field).

Slide 7 Well in the previous slide, we’ve seen the first breakdown of failure modes in the discrete identification of ‘magazine failures’. Slide 7 is the breakdown for the ‘weapon’ failure modes. Immediately we can tell the M4’s did much worse than the other systems in 2 of the 8 modes: Failure to Feed and Failure to Extract. The M4 experienced a slight relative deficit in the Failure to Chamber category as well (perhaps that ‘forward assist’ feature is still there for a reason eh?). The lack of information concerning the distribution of failures among the weapon types is a crucial piece of the puzzle and is missing throughout the brief. If the failures are caused by uneven quality control in the manufacture, handling or storage process versus a design problem, it would probably show up in the distribution of failures within the weapon type specimens (one weapon fails x times instead of x weapons fails once for example). Also, so far in the brief we do not know if the failures occurred late or early in the process, or late or early in each test cycle (we’re getting closer though).

Before moving on, we should note the main point of characterizing the data as ‘Raw Data’, so this information is clearly a first cut and not the final word as to what actually happened.

Slide 8
This is the one slide I would have presented differently. I would have first shown the data using a true linear scale with a zero baseline to show TRUE relative failure impact compared to total number of rounds fired. This slide is good as a ‘closeup’ (other than the rounding error for the SCAR) , using a hand built pseudo-logarithmic scale that makes the relative failures between weapon types distinguishable. But, on its own makes the net performance of ALL the systems look like they performed poorer than reality. Here’s what I would have shown just before the Army’s Slide 8:

Compare this slide with what it looks like when you provide a chart with absolutely no perspective to the failure numbers. Scary, huh?

Slide 9 At last! We have some distributions to look at. At first look, one cannot determine if the number of failures includes magazine failures (for a ‘total’ impact point of view) or is just covering the ‘weapon’ failures. This is an interesting slide that got my hopes up at first, but I had to pare back my expectations a bit as I really looked at it. First off, this represents only the first 30K rounds of the 60K fired, so it is the ‘early’ data. I would like to see the numbers for the Summer ‘07 test overlaid as well, because I suspect(only) that the real difference in the M4’s performance between then and now would be found in the last two cycles before every cleaning instead of as a random scatter or uniform increase. And again, I would love to know the distribution of failures among the 10 M4s, only now I would be particularly interested in firing cycles 15 and 23-25. The slide’s conclusion (barf box) is, I think, about the only thing that one can conclude definitively about this testing from the information given.

After I ran the numbers of failures shown and calculated the failure rates of the first 30K, it became apparent that the only way to get to the final success/failure rates in Slide 8 to jive with the extracted failure rates for Slide 9 was if Slide 9 used the ‘total’ C1 & C2 failures and not just ‘weapon’ failures.

If we didn’t already know about the wide disparity between the first and second M4 testing, we would probably conclude that all the other designs well outperformed the M4. But since we know the M4 did better once, how can we be sure the other designs won’t do worse in the future? As they say in the stock market, “past performance is no guarantee of future results”.

A “What if” Excursion: Ignoring the very real possibility that the test design itself and/or the execution of it may have been flawed, I would conclude that the Gas Piston designs were not even stressed in this test and the Gas design (M4) was heavily stressed. I would conclude that the M4 is far more susceptible to failure when not cleaned properly. From the data, I would hypothesize that if another test sequence was run with a full cleaning every 600 rounds for the M4, the overall performance would improve dramatically, and that the other systems would not see any real improvement, largely because they aren’t stressed in the test in the first place. Then IF the M4 performance was radically improved, we would still be stuck with the question: what does the absolute performance in the test mean in the ‘real world’?

How much performance does one need, versus how much performance one gets for dollars spent? That should be determined by experience in the field combined with expert judgment in estimating future use. We are talking about using 'systems engineering' to field what is needed. The process isn’t perfect, but as it has been long demonstrated: ‘perfect’ is the enemy of ‘good enough’.

As I sit here typing, it occurs to me that for future requirements, the Army must have to also take into consideration changes to the size of the unit action in setting their new requirements: fewer soldiers in an action mean a single failure has a larger impact on engagement outcome. Historically, this has been a concern of the special operators and now perhaps it is a more ‘mainstream’ concern?


In looking at patterns, I thought it would be helpful to look at this data in several different ways. I first backed out the data from this chart and put it in a spreadsheet.


NOTE: I may have made some ‘1-off’ errors here and there in decomposing the plot provided due to the coarseness of the plot lines and markers, but I think I’ve got good enough accuracy for this little exercise.

The first thing I did with the data is to use it to see what portion of the total performance did the results for the first 30K rounds truly represent. By my ‘calibrated eyeball’ extraction of the data, we find the following:

As it is shown, most of the failures (C1 and C2) experienced occurred in the first half of testing for the XM8 and SCAR. According to the data, only the HK416 performance significantly degraded in the second half of testing: experiencing more than 2/3s of its failures in the second half of the test. These distributions may be another indication of a test process problem because ALL weapons were tested to ‘end of life’ and that is when one would usually expect marginally MORE problems not fewer. In any case, we see that there is a significant amount of data that is NOT represented in the plot on Slide 9 and unfortunately cannot be part of our detailed analysis.

Detailed Examination of Data
I first wondered what the failure pattern in Slide 9 look like as expressed in relationship to numbers of rounds fired over time (again, to keep proper perspective on the numbers):


Not very illuminating, is it? Looks like everybody did well, but what are the details? So I then decided to ‘zoom in’ and look only at the failures as a percentage of rounds fired (expressed in cycles) over time: Now this is more interesting. For the first 30K rounds, the SCAR was for a time the worst ‘performer’, and then it settled down and total performance approached that of the HK416 and XM8. This suggests there is merit to the quote mentioned earlier that indicated the SCAR had a change to the lubrication regimen in mid test (in my world this would cause a ‘retest’ by the way). If true, these numbers suggest that the SCAR would have been equal to or better in this test than the other top two performers if the later lubrication schedule been implemented from the start.
The M4 patterns point to something interesting as well. Cycle 15 and Cycles 23-25 earlier appeared odd to me because of the spike in number of failures, but while I would not rule out Cycle 15 behavior as being part of a possible normal statistical variation (again, “need more data!”) Cycles 23-25 appear out of sorts because of the pattern of failure. Keep this pattern in mind for later observations. If we knew the number or rate of failures increased after this last cycle shown, I might conclude it was part of a normal trend, but since at the 30K Rounds-fired point the failure rate is within 2/10ths of 1 percent of the final failure rate, we know the failures did not ‘skyrocket’ as the second half of testing progressed. We are somewhat stymied again by how much we do not know from the data provided.

Dust and Lube (Five Firing Cycles Per Dust and Lube Cycle)
Next, I thought it might be helpful to overlay each weapon’s performance in 5 (minor) ‘Dust and Lube’ (DL) cycle-series to see how repeatable (or variable) the cycle performance was. Keep in mind that at the end of Cycles 2 and 4, a ‘full cleaning’ occurred. Each DL series is comprised of 5 firing-cycles of 120 rounds.


Before we look at the other results, note the ‘outlier’ pattern of the M4’s Dust and Lube Cycle 5 (DL5), particularly the firing cycles 23-25 (last three nodes of DL5). If the results of firing cycles 23 & 24 of the M4 testing are be found to be invalid, it would lower the overall failure rate through the first 30K rounds by about 20%. The sensitivity to cleaning for the M4 also makes me wonder about how the variability of grit size and lubrication, and any interrelationship, would have contributed to failure rates. Again, I would be very interested in knowing the failure modes and distribution for those particular firing cycles as well as Firing Cycle 15.
The sensitivity of the M4 (within the bounds of this test) is clearly indicated to be far greater than the other systems. In fact, because the numbers of failures for the other three weapons are so small (vs. saying the M4’s are so large) I suspect that just given the number of potential confounding variables that we have mentioned so far, that the failures of the SCAR, HK416, and XM8 in the first half of testing approach the level of statistical ‘noise’ (allowing that there still may have been some interesting failure patterns for the HK416 in the second half).

Full Clean & Lube (10 Firing Cycles per Cleaning Cycle)
Looking at the same data by the major ‘full clean and lube’ cycles, I think, only reinforces the shorter interval observations. Because the 30K rounds data limit truncates the 3rd major ‘clean and lube’ cycle, it shows up as 2 ½ cycles in the plots:


A different view of this data will be seen again later in the post. The briefing itself moves on to address “Other Observations”

Slide 10So. well into this briefing we now learn that all the weapons were essentially worn out (as a unit) by the 6000 rounds-fired mark. Since this test was not for the purposes of determining the maximum operating life of each weapon type, this test should now be described as an “ X number of Rounds Extreme Dust Test with 10 trials using 10 different weapons” -- with “X” being the number of rounds fired before the first bolt had to be replaced. Once one weapon is treated differently than the rest, further testing cannot be reasonably considered part of the same test. One wonders how each ruptured case was cleared and whether or not each was considered a major or minor malfunction. The disparity in the number of occurrences also makes me wonder about the distribution of these failures leading up to 6000 rounds/weapon, and whether or not they have anything to do with only showing the failure events of the first half of the test in Slide 9.

I would be hesitant to write that based on this data (in this test), the HK416 was three times worse than the M4 because the absolute number of events is so small. I would, however, be more interested in the meaning of the disparity in the absolute number of events the closer the difference comes to being equal to an order of magnitude, so (again, within the context of the test only) I find the difference between the SCAR and M4 of ‘likely’ interest, and the difference between the M4 and the XM8 is ‘definitively’ of interest.

Slide 11The only thing I found really interesting in this slide was that while all the weapons had pretty much the same dispersal pattern at the end of the test, the XM8 was quite a bit ‘looser’ at the start. What this means, I have no idea, other than they all wore-in about the same. From the XM8s dispersion performance, my ‘inner engineer’ wonders if perhaps there were some ‘tolerance management’ or other novel aspects to the XM8 design that contributed to its reliability performance?

What it Means

Slide 12
Nearly 5000 words into this analysis and the Army pretty well sums it up in one slide: Slide 13
The Army begins to assert here (I think) that they recognize they will have to construct a more operationally realistic test in the future, and they are starting to identify and quantify what the operational environment looks like.
Since the slide now couches the need in terms of the individual Soldier, here’s what the same data we’ve been looking at looks like when expressed as an AVERAGE/per rifle by weapon type (Clarification: the X axis label is cumulative rounds fired broken down by cycle):
Using Slide 13 for perspective, we can view this data and say that IF the Extreme Dust Test data is valid and representative of the real world (and we have every reason to believe that the real world is a more benign environment) then the largest average disparity we might find in C1 & C2 stoppages between any two weapons for an engagement that consumed one basic load would be less than 1 stoppage difference for every TWO engagements. If for some reason the soldiers started shooting 2 basic loads on average, the greatest average difference in numbers of stoppages between different weapon types for one engagement would be about 1 ½ stoppages per engagement. Because of the absence of detailed failure data by specific weapon, failure, and failure mode we cannot determine whether or not this information is ‘good’ or ‘bad’ -- even if this data was representative of the ‘real world’. If for instance the M4 (or as has been noted possibly the HK416) had one ‘bad actor’ in the bunch, it would have completely skewed the test results. If we cannot even tell if THIS difference is significant, we STILL cannot assert any one weapon is ‘better’ than another even within the confines of this test. All we still KNOW is that the M4 experienced more failures. The good news is, the Army will have a better idea as to what they need to do to perform a better test the next time.

Slides 14 & 15
Here’s more “real world” perspective to think about when we view the test data. If someone has reason to doubt the CSMs - that is their business. I see nothing in the test design or test data that would invalidate their observations.
What Comes Next

Slide 16
(At Last!)
There’s something here for everyone: ‘Figure out what the test meant’ if anything, ‘use the info’ to build a better test, and Improve the breed or buy new if needed. Not mentioned in the slide, but just as important is the obvious ‘Don’t forget to clean your weapon!’

Works for me.

The only thing I fear coming out of these test results is that out of the emotion behind the concern, perhaps this test’s importance will be blown out of proportion within the total context of what a Soldier needs a weapon to do. I can see us very easily buying the best-darn-dust-proof-rifle-that-ever-t’was… and then spend the next twenty years worried about it corroding in a jungle someplace.

Postscript
I know this type of analysis always brings out the ‘don’t talk statistics to me -- this life and death!’ response. But the hard truth is, like in war itself, ALL weapons design and acquisition boils down to some cost-benefit equation that expresses in statistical terms 1) what contributes the most damage to the enemy 2) in the most (number and types) of situations, while 3) getting the most of our people home safe as possible, 4) within a finite dollar amount. Everyone does the math in their own way, and everyone disagrees with how it should be done. Just be glad you’re not the one responsible for making those cost-benefit decisions. I know I am.