17 Comments
User's avatar
Julia Lucas's avatar

I don't even care if taking estrogen and suppressing T in males makes them closer or on par with a woman. Even if you did, you have changed the nature of women's and girls' sports by now introducing mediocre males, thus turning female sports not into competition against the best athletes but against whoever is able to compete - while male sports remains a competition among the best. So you have unfairly diluted female sports.

GBM's avatar

This analysis should be taught in medical school. As a retired professor, I can tell you that our medical students are not taught about how to approach and analyze manuscripts. This particular article is a great example of trying to be more important than its actual data and findings would support AND it has a politically correct bias, it appears.

James Smoliga, DVM, PhD's avatar

Thank you for the kind remarks.

We have heard the phrase “knowing enough to be dangerous” in terms of medicine, but we also need to think about it in terms of research.

I am seeing all sorts of people saying “it’s a meta-analysis, the very highest form of evidence” and such. But many do not understand how to examine the data critically. We know the buzzwords, but not the deeper meaning of research methods (including appropriate statistical approaches) and interpretation.

This is not for the purpose of being a contrarian, or to find an interpretation that agrees with one’s philosophy. Rather, it is to ensure that we are not misinterpreting scientific findings in general.

As I have written previously, a peer-review paper is the beginning of a conversation, not the end. No study is perfect, and we must interpret the evidence in light of the shortcomings. Sometimes, the shortcomings are minor, sometimes they are major, and sometimes they make a study so flawed that it is essentially meaningless.

In this case, I am not saying the study itself is terrible, but rather the results answer one question (effects on recreation-level fitness), and many are inappropriately extrapolating them to answer a completely separate question (about elite competitive performance).

Snarling Fifi's avatar

Yes. formally speaking, the level of evidence for a meta-analysis can be no higher than the lowest level of evidence of any of its included studies. An exaggeration illustrates this: if you tried to sneak a case study into an SRMA otherwise full of RCTs, the level of evidence for that meta analysis would be a 6 or 7.

Leslie MacMilla's avatar

As you imply, it's a myth that meta-analyses or systematic reviews are at the top of the evidence pyramid. Randomized controlled trials are at the top. The reason we do systematic reviews and meta-analyses at all, ever, are:

1) There are no RCTs.

2) RCTs that have been done are "negative" but are known to have been too small to be confident that a clinically important difference wasn't missed. Studies of negative trials have found them to be surprisingly weak in their negative conclusions. They can miss quite large effects. So by combining a whole bunch of them together you can get them to add up to a trial that is big enough to detect the important effect. Well, sometimes....

But the biases involved in trying to do a meta-analysis may outweigh the larger "sample" size you get from combining small trials, making the juice often not worth the squeeze. My own contrarian suspicion is that meta-analyses are a way for "armchair" researchers to get publications by sitting in front of a computer instead of going out into the field or into the lab and doing real original research.

As your analysis shows, no RCT or meta-analysis is going to be big enough to pick up the very small differences between medaling and being an also-ran. At best you are going to find differences in the recreational middle, as you point out, and be able to conclude nothing about elite performance.

Mark Christenson's avatar

Really helpful, especially for me as I got the conclusion correct but missed the details of why the study was wrong about its claimed conclusion.

James Smoliga, DVM, PhD's avatar

Thanks. My critique is not the only critique of the study. Many have expressed lots of concern over the statistical analyses. While I often do find issues with inappropriate statistics in peer-review literature, I didn't get that far with this study, since the validity of it was very weak (i.e., extrapolating recreational level fitness data to elite athlete performance).

I would be interested to your initial thoughts on why the study was flawed.

Mark Christenson's avatar

I misread another commentator and though the focus was on lean muscle mass, and that if you adjusted for lean muscle mass the results would be the same. But, at least to me, that kind of seemed like saying, “if you discount the advantages that males have, then they don’t have any advantages.”

Snarling Fifi's avatar

Great analysis. I'm surprised to see VO2 max as part of routine testing for recreational athletes, though I've certainly met my share of wannabes with lots of money. Most of the max tests in a non-elite population probably wouldn't even be true max tests but would be symptom limited; that is, the average person would blow up from the pain of exertion before they get to an RER >1, practically a functional test.

James Smoliga, DVM, PhD's avatar

I don’t think too many recreational athletes are getting VO2max tests for their own incentives, but they may be included in research study protocols.

I’ve lead hundreds of VO2max tests in my day (both cycling and running), mostly in competitive athletes, but also in sedentary individuals. Your idea is consistent with my experience, but with some caveats. When competitive athletes stop their test, the first thing they say when they take off their mask is “I stopped too early… I could have gone more.” They couldn’t have… But in the 10 seconds it takes to get the breathing apparatus out of their mouth, their pain levels slightly decrease, and they think they didn’t quite hit max. But, they did (and we can objectively show this using RER values ~1.10).

For sedentary individuals, it was the opposite. They would be doing a cycling test and speak through the mask saying they were don'e and couldn’t go anymore. I would keep confirming with them, for around 30 seconds asking “okay you’re sure?” and “okay, so you need me to stop the test” and so on, and the whole time they would confirm they had reached exhaustion… pedaling the whole time.

I think the individuals in these tests were in the middle, and could probably push themselves reasonably hard, but not to the same 100% all out, leave nothing in the tank, level that the elite endurance athletes routinely do. So, techincally, maybe VO2peak, rather than VO2max.

So, yes, that’s a limitation of the studies. But I didn’t criticize that component (or other statistical issues), because it’s a small distraction to the bigger issue of overall validity.

Thanks for your comment and insight!

Theresa Gee's avatar

As if the term 'transwoman' wasn't Orwellian enough what possessed the author to use the even more misleading 'transgender women'?

Such deceptive terms lose the battle for sanity before the sentence is terminated... so for the sake of clarity and accuracy I suggest using 'trans identifying men' instead. Likewise, mutatis mutandis, for women.

We sex realists will thank you

Gary S McCaleb's avatar

To expand on MacMila's observation a bit, it seems to me there is an uncontrolled psychological variable that should be considered before extrapolating a fitness test to any competition context, because a person taking a fitness test typically shoots to pass the test, not to be the best of the test-takers. Consider my good-old-days, when before fire season we'd take the "step test" to determine cardiovascular fitness before our red cards could issue. So all winter along I'd plug a few miles in jogging, knowing from experience what it would take to pass the step test. But I also knew that the step test was a bare minimum/lowest common denominator sort of screening test, and between passing the test and the onset of active wildfire season, I'd ramp my jog up to a 7 mile run up a ridge that topped 8500' feet. But even then, I didn't care if I was the fittest firefighter in the government, but whether I was ready to give my 90% on a nasty fire in miserable conditions for 21 days. I was measuring myself against myself, not others, and that is a very different motivation than competing against others.

A S's avatar
Feb 15Edited

I read most of this piece and it seems to be missing the most important point - that people also engage in social behavior. Someone who is running recreationally with women is likely to try to match their speed. When I was running with a group, even if we separated, we definitely influenced each other.

Leslie MacMilla's avatar

That was very well done. Thank you.

You give the authors of the studies more credit than they deserve in referring to "objective" measures of performance. Running 1.5 miles is, obviously, dependent on motivation and effort, not just training. Every trans-identified athlete participating in studies like these knows (or can suspect) that the study's null hypothesis is that there is no difference in performance between a trained trans-feminine athlete like himself (or his more talented elite-level colleagues) and the female athletes he and they want to compete against. So he has every incentive to run less hard during his time on estrogen so that he will "prove" that null hypothesis by, lo and behold!, slowing down. Other supposedly objective tests like grip strength, vertical jump height, and maximal oxygen uptake are also rather obviously effort-dependent. (There are sub-maximal tests that use nomograms to predict MVO2 from heart rate and O2 uptake at any imposed exercise load. But nothing beats driving the subject until his O2 uptake stops increasing deep into anaerobic territory. That is agonizing. I get imaginary angina just thinking about it.)

The only truly objective test that motivates maximum performance that I'm aware of is forced swimming. A rat is put into a tub of water that he can't climb out of and you measure how long he lasts before he drowns. Next best is a varsity athlete trying to stay ahead of a coach yelling at him from his car and threatening to run him over if he flags. I don't think coaches are allowed to do that anymore....

ShawnPG's avatar

Can we also assume that the participants knew the purpose of the study? It would be helpful - and telling - to see the informed consent.

James Smoliga, DVM, PhD's avatar

The BJSM study is a meta-analysis; one of >50 different studies. So, there is no one single consent form; it's from various studies.

I'm not too concerned about the research participants behaving differently to bias the results of the underlying studies. I do believe that the GAHT they undertook will cause a detriment to their physical performance; again that is biological.

But, my bigger point is that these studies represent basic physical fitness in the general population (or mildly trained individuals). These were not elite athletes, and the results from this study cannot be meaningfully applied to elite athletes, let alone be used to recommend policies related to competitive sport.

There is one study that did look at athletes; I will break that down separately.

ShawnPG's avatar

Yes, I get your primary point, I just find it hard to believe that the participants don’t see the bigger picture.