TRANTECH Statistics

TRANTECH has undergone rapid advancements since 1999. As such, as of May 2002, TRANTECH statistics are being updated. Further upgrades are still being tested and, thus, more updates to this page may be required.

The most recent upgrades made to TRANTECH have further improved the forecast track errors, now exhibiting minor skill at most time steps. While this is the most noteworthy advance, the most noticeable is the removal of the multi-scheme technique. Actually, this was done in a previous upgrade. But, in terms of the comparison to the pre-1999 version, this is the most obvious upgrade. The multi-scheme technique yielded confusion and only marginal improvements with the Ensemble method. Because the Ensemble method did show some improvement, this approach may be revisted in the future. For the time being, however, its improvements do not warrant the added confusion, and it should be noted that this improvement was reduced to within the noise thereby making its improvements negligible. In the latest TRANTECH version a whole host of other more technical upgrades have been implemented as well. These are only really noticeable in that they have helped reduce the TRANTECH model errors. That is, they are not largely systematic wherein certain forecasts exhibit marked accuracy increases; rather, minor across the board track forecast improvements are seen.

Still, these track errors are not "good", as the longer range forecast period track errors are still failing to outperform the baseline, CLIPER, and are showing only the most subtle improvements from the previous TRANTECH version. The track errors can be called "reasonable" at this point, but not "good". On the other hand, the improvements are enough such that one might consider the shorter-term track forecast errors "good". Even these are not fabulous, only nearing the 10% skill mark in the 12-hour forecast, but these short term track errors do show considerable improvement from the old TRANTECH version.

As for the intensity forecasts, they have about held the same from the previous statistical analysis, especially in the longer-range forecasts. However, the shorter-range forecasts have shown a small improvement. Besides which, "staying the same" is fine, as the intensity forecast exhibited excellent skill at all time periods against the baseline, SHIFOR. So, let's take a look at the good and bad points/statistics one by one, starting with the worst of the statistics:

1) Though improved, the worst statistic remains the overall track forecast error. CLIPER is the baseline against which all other models are measured. An inability to beat CLIPER is an indication of no skill. Be aware that CLIPER is a fairly high quality baseline; in some basins around the world the dynamic models still cannot beat CLIPER (a study as recent as the mid-1990s showed CLIPER outperforming the best dynamic model in the South Pacific). However, in the Atlantic Basin the dynamic models do beat CLIPER. Unfortunately, in its current state, overall, TRANTECH shows only minor track skill, and only in the shorter range forecast periods. However, this is an improvement over the previous statistical analysis in which TRANTECH showed absolutely no skill. The following table shows the errors of TRANTECH as compared to CLIPER in both the old and new statistical analysis. Note that the new analysis was performed on a much greater sample size. As a result, being a different sample, the CLIPER statistics are different and, therefore, relisted for the new study. Errors are in nautical miles and the data consists of all storms from 1989 to 1996 (except the Halloween storm of 1991, for which there was no initialization data) for the old statistics, and all storms from 1940 to 2000 for the new statistics (note that the new runs were performed off of "best track" data rather than "real- time" data; as such the initialization is presumed to be error free):
 
 

Forecast Time Old TRANTECH Ensemble Old CLIPER New TRANTECH New CLIPER Old skill New skill
00 16.0 16.0 0.0 0.0 0.0 0.0
12 59.6 59.9 43.8 48.4 0.5% 9.6%
24 122.6 121.7 107.7 114.5 -0.7% 5.9%
36 193.7 190.0 181.0 187.4 -1.9% 3.4%
48 266.5 259.7 256.6 260.9 -2.6% 1.6%
72 402.4 398.3 401.4 400.1 -1.0% -0.3%
96 526.4 xxxxx 528.0 xxxxx xxxxx xxxxx
120 634.1 xxxxx 635.6 xxxxx xxxxx xxxxx

As you can see, this table shows, most importantly, minor skill through all of the early time periods. This is in contrast to the earlier statistics only showing a positive skill in the 12-hour forecast, and that skill is within the noise at 0.5%. "Noise" is about 4 nautical miles (note that this was incorrectly represented in the original post of the statistics, as "noise" was defined as 8 nm... originally, the "noise" was defined as a measurement error of 0.05 degrees in both latitude and longitude for both the CLIPER and TRANTECH points; this summed up to 8 nm... however, this is the MAXIMUM measurement error; the standard error ["noise"] is half of this, or 4 nm). The new TRANTECH statistics maintain skill outside the bounds of noise right up to 48 hours. As such, the old TRANTECH methodology was not "bad", with the negative skill only exceeding the noise at H+48 and just barely at H+72. So, the old scheme is not as bad as it looks, yet, it is still not good, showing no improvement over CLIPER. The new scheme, on the other hand, has the skill exceeding the noise most of the time, with its only non-skill forecast period, at H+72, falling within the noise. Thus, officially, the new TRANTECH scheme shows small, but significant skill in the track forecast in the early periods.

2) Though not a separate and specific item, like the track discussion above and the intensity discussion below, it is also necessary to note that ongoing work on TRANTECH may further improve these track forecasts. As such, as the reader sees these statistics, they should keep in mind that these are subject to change and are a worst case scenario for the average statistics. Keep in mind the nature of TRANTECH... I has been proven via TRANTECH, and it makes dynamical sense, that intensity and track are linked. By CLIPER not performing an intensity prediction (and vice versa... SHIFOR does not produce its own track forecast), it is theoretically impossible for TRANTECH to NOT outperform CLIPER in its average error at all time steps. As such, the track statistics above are more a reflection of there being development work still remaining on TRANTECH, rather than a true failure of TRANTECH to produce reliable longer range forecasts. The problem is in the research... that is, determining what exactly is being done incorrectly or, more likely, simply not in the highest quality manner. It is suspected that the problem lies in the speed and direction calculations (especially direction). For a variety of reasons, it is difficult to apply directional changes from analog storms to the actual storm being predicted. This is true, to a much lesser degree, for the motion speed as well. We have attempted the application of this analog data in a variety of ways. However, it is likely that the best and proper method still has simply not been determined. There are also other smaller adjustments to TRANTECH that could lead to minor improvements. For example, a strict statistical analysis to determine proper weighting functions for various parameters has never been performed. This has merely been done empirically. Because this aspect has been developed over many years, it is likely rather close to that which is statistically correct. However, it may yet not be perfect. So, some further adjustments may yield minor improvements. All in all, the point is simply this... the statistics presented above are a snapshot in the development life-cycle of TRANTECH. When reliable, updated statistics are ready, this page will be updated. And, in fact, improvements in TRANTECH's statistics should be expected.

3) As for intensity, frankly, TRANTECH is dominating the field. Due to differences in our databases and some subtleties in the statistics, the TRANTECH results cannot be directly compared to the results of the SHIPS and GFDI models shown on TPC's web site. For intensity forecasting SHIFOR is the baseline. SHIPS and GFDI only beat SHIFOR marginally in some of the periods. TRANTECH does not beat SHIFOR by vast margins, but it does win by more than SHIPS and GFDI and with greater consistency. The table below shows the intensity errors in knots of the old TRANTECH scheme at the various forecast hours, then it has listed the old SHIFOR, the new TRANTECH, the new SHIFOR, the old skill, and the new skill. Note that in the table below, the old TRANTECH scheme actually appears to be slightly better. But, as the sample sets are quite different, this could potentially contribute and, regardless, beyond H+24 the new TRANTECH scheme shows significant skill.
 
Forecast Time Old TRANTECH Old SHIFOR New TRANTECH New SHIFOR Old skill New skill
00 3.6 3.6 0.0 0.0 0.0 0.0
12 7.8 8.1 6.1 6.3 3.7% 3.2%
24 11.1 12.2 10.7 11.4 9.0% 6.1%
36 13.6 15.1 14.0 15.2 9.9% 7.9%
48 15.5 17.4 16.4 18.1 10.9% 9.4%
72 18.2 20.2 19.5 21.7 9.9% 10.1%
96 19.6 xxxx 21.1 xxxx xxxx xxxx
120 19.6 xxxx 22.4 xxxx xxxx xxxx

Now, look at the next table which shows skill, in percentage, for the old and new TRANTECH scheme as well as SHIPS and GFDI. Use caution with this table as huge differences in the sample datasets, as noted above, make this somewhat unreliable. It is simply being presented as the best available comparison between the models. Also, note that the old TRANTECH versus SHIPS and GFDI samples only have small differences. As such, the comparison here is not totally invalid, merely somewhat imperfect.
 
Forecast Time Old TRANTECH New TRANTECH SHIPS GFDI
00 0% 0% 0% 0%
12 3.7% 3.2% 1.2% -13.4%
24 9.0% 6.1% 3.5% -1.8%
36 9.9% 7.9% 7.1% 0.7%
48 10.9% 9.4% 7.1% 1.8%
72 9.9% 10.1% 2.8% 10.0%

It's clear here that while there are some questions over the validity of this comparison, TRANTECH is the dominant model, both old and new schemes. From H+12 to H+48 neither SHIPS nor the GFDI challenges either TRANTECH forecast. Only at H+72 does GFDI surpass the old TRANTECH, and only barely, by 0.1%, while NEVER surpassing the new TRANTECH scheme!

4) *As with point #2, the numbers herein may change as upgrades are ongoing. An important point to address in the intensity forecast is land interaction. It might be argued that forecasts containing land interaction are erroneously favoring TRANTECH, because SHIFOR does not handle land interaction well. This argument is incorrect for several reasons. First, again let us suppose the argument to be true; is TRANTECH's ability to handle land interaction better than SHIFOR an "unfair advantage"? These forecasts are important, especially in case of landfall and re-emergence over water, and any forecasting superiority is legitimate. Second, on the same note, it is not clear that TRANTECH does indeed handle land interaction better. Anyone closely following TRANTECH forecasts can attest to the fact that TRANTECH has some difficulties with its intensity forecasts when there is land interaction. Some improvements have been made in this area. So, it is likely that TRANTECH does now outperform SHIFOR with land interaction, but TRANTECH is clearly still imperfect in this respect. Finally, the old statistics (new statistics have not been examined in this respect) indicate that land interaction is NOT the basis for TRANTECH's success. Recall, storms were previously stratified for the sake of determining any pattern in track errors. Well, those groupings with the LEAST likely land interaction were the cases where TRANTECH performed the best!! The statistics show this pattern superbly. At H+72 the old TRANTECH scheme shows 20% skill in intensity forecasts for storms hitting Canada (i.e., storms with considerable land-free tracks).

5) With all these statistics and somewhat mixed results (though, more unequivocably favoring TRANTECH than in the earlier statistics) it raises some questions. Clearly, TRANTECH is a valuable tool, especially in intensity forecasting. But, when it comes to track forecasting one must know when to trust TRANTECH and when to discard its forecast outright. With this in mind, it seems helpful to examine how TRANTECH performed in some of the critical cases from 1940 to 2001. The following table shows the track and intensity errors of the new TRANTECH scheme versus CLIPER and SHIFOR for various storms:

***THIS TABLE, AND THE INFORMATION DISCUSSING IT THEREAFTER IS NOT YET UPDATED. IT WILL BE UPDATED SOON***
 

STORM TRANTECH 48hr Track Error TRANTECH 48hr Intensity Error CLIPER 48hr Track Error SHIFOR 48hr Intensity Error TRANTECH 72hr Track Error TRANTECH 72hr Intensity Error CLIPER 72hr Track Error SHIFOR 72hr Intensity Error
Hurricane Hugo - 1989 354 23 352 37
Hurricane Bob - 1991 615 21 608 27
Hurricane Andrew - 1992 400 33 405 42
Hurricane Emily - 1993 423 17 458 18
T.S. Alberto - 1994 365 22 378 27
Hurricane Allison - 1995 501 10 449 12
Hurricane Erin - 1995 373 27 385 39
Hurricane Luis - 1995 361 17 380 25
Hurricane Marilyn - 1995 383 15 458 17
Hurricane Opal - 1995 470 31 460 27
Hurricane Roxanne- 1995 324 21 330 19
Hurricane Bertha - 1996 337 10 368 24
Hurricane Fran - 1996 331 19 308 27

These results are fairly impressive.  Out of 13 critical storms, TRANTECH showed skill on the track forecasts 8 times.  It lost by more than the margin of error (about 8nm) in only Allison, Opal, and Fran; and only one of these was inexplicably poor.  The Allison and Opal tracks clearly have similarities, and the Allison track showed a major degradation with the latest upgrade (indicating simple instability), while the Opal error barely exceeded the noise.  It is difficult to conjecture why the Fran forecasts were so poor.  The two "big time" storms in this crew, Hugo and Andrew, both, unfortunately, degraded with the recent upgrade.  Still Hugo shows little/no degradation, while Andrew continues to show skill, just to a lesser degree.  As for intensity, TRANTECH wins 11 of 13 times (an improvement since the recent upgrade).  The only losses were Opal and Roxanne.  Opal's errors are a bit disturbing at 31kts, 4kts worse than SHIFOR, but Roxanne's are within reason, only 2kts worse than SHIFOR.  The two "big time" storms exhibited significant skill with TRANTECH; 38% skill over SHIFOR on Hugo, and 21% skill over SHIFOR on Andrew (this has improved with the latest upgrade).

SUMMARY:All in all, the latest TRANTECH upgrade has resulted in some fairly significant track improvements. The most significant issue remaining unresolved is the forecast degradation in time. Because of divergence, track errors in the short term should expand in time. Thus, larger short-term errors should lead to larger long-term errors. This argument makes it difficult to understand how TRANTECH's skill can reduce and go negative at H+72. The reduction is not so disturbing, as skill is a percentage. That is, the forecast with greater errors in the short term could, indeed, exhibit more divergence than a superior forecast and, yet, the skill on the superior forecast could decrease if the divergence is not enough. For example, say that Forecast A has a 12 hour error of 45 miles while Forecast B has an error of 50 miles; then, at 24 hours Forecast A has an error of 80 miles, while Forecast B has an error of 87 miles. The divergence principle holds in that Forecast B was worse by 5 miles at 12 hours, and was worse by 7 miles at 24 hours. However, in terms of skill, Forecast A sees its skill reduce from 10.0% to 8.0% from 12 to 24 hours. Thus, in this example one can see how skill can reduce even as the forecasts diverge. BUT, what we see with TRANTECH is NOT this. The error on TRANTECH increases more rapidly than CLIPER, even though its initial forecast period is superior. So, TRANTECH not only loses its skill, but shows negative skill (albeit negligible) by 72 hours. The big question is, why? This goes against common sense of diverging forecasts. Thus, while disappointing in one respect, it is exciting in another respect. That is, it is exciting because this divergence principle should hold. As such, there is reason to believe that TRANTECH is simply doing something improperly and that future upgrades will lead to major improvements in the long-range forecast. At present, though, these improvements have not been realized. Presently, TRANTECH looks excellent in the very near term forecast, decent in the mid-range (36-48 hours), and only "reasonable" (in line with TRANTECH) at 72 hours and beyond.

That all deals with track. As for intensity, TRANTECH is unequivocably as success.  It challenges (and based on the only available comparison, beats) all other available intensity forecast models, including those strictly dynamically based.  In the latter time periods its skill runs roughly 10%, or just under.  Amazingly, from D+3 to D+5 it shows little increase in error.  One could argue that that's simply because the error is so high that it is random noise beyond that.  However, this is clearly not the case since the D+5 error still has a lower error than the D+3 SHIFOR error!  Therefore, there is skill and value in the TRANTECH forecast all the way out to five days.  It is especially superb in storms making landfall in the United States, showing a skill level near 25%. And in those same 13 notable storms TRANTECH was superior to SHIFOR 77% of the time.

Perhaps the most important thing to note, besides the skillful intensity forecasts, is the room for improvement.  Some could look at these statistics and chuckle at the poor quality of the track forecasts in this simplistic analog model.  However, the ability to stratify the results and isolate the problem areas opens the door for significant improvements in TRANTECH in the future.  One example, storms remaining far to the south, was already given.  Another example is when storms begin moving NE and ENE over the Atlantic.  This has been seen to pose particular problems with TRANTECH as well.  Note that the errors introduced in Hurricane Bob (1991) were predominantly caused by this effect.  These problems are under investigation and some success has been had in working these issues out.  In fact, the statistics shown herein are NOT the best TRANTECH statistics to date and, so, will obviously not be the last word.  These track errors will certainly improve pending further upgrades.  The only question is, will they improve enough to bring consistent skill to TRANTECH?  Only time will tell.  One thing to note, however, increased track forecast accuracy will lead to increased intensity forecast accuracy.  Any intensity forecast improvement will simply make TRANTECH unbeatable by the current suite of statistical and dynamical forecast models in use by TPC.

In short, there have been some important successes and there is reason for hope for improvement, but there is clearly work still to be done.  TRANTECH will never challenge today's best dynamical models (at least not on a consistent basis) in their track forecasts.  The hope is simply that TRANTECH will put out reasonably successful track forecasts with mild skill, in tandem with superior intensity models.  The work continues...

* A more complete update of these statistics will be issued following another round of testing.