How a Flawed Meta-Analysis Misled the Public on Transgender Athletes

A deeply flawed meta-analysis on transgender fitness reveals a troubling breakdown in the British Journal of Sports Medicine’s peer-review process.

May 11, 2026

"Check out my latest article at Reality's Last Stand! In the article, I dissect a recent meta-analysis that claimed to show that no difference in muscle strength exists between women and transgender women. I explain how the meta-analysis was highly flawed and misled the public about transgender fitness."

- James L. Nuzzo

Reality’s Last Stand is a reader-supported publication. Please consider becoming a paying subscriber or making a one-time or recurring donation to show your support.

About the Author

James L. Nuzzo, PhD, is an exercise scientist and men’s health researcher. Dr. Nuzzo has published over 80 research articles in peer-reviewed journals. He writes regularly about exercise, men’s health, and academia at The Nuzzo Letter, with additional writings appearing at Reality’s Last Stand and Australians for Science & Freedom. In 2025, Dr. Nuzzo documented his academic cancellation using a freedom of information request. He is active on X @JamesLNuzzo.

Earlier this year, a meta-analysis on physical fitness in transgender individuals was published in the British Journal of Sports Medicine, one of the most historically important journals in sports medicine and exercise science. The paper’s eight authors, led by Sofia Mendes Sieczkowska, concluded that “transgender women do not exhibit significant differences in upper-body strength, lower-body strength, or maximal oxygen consumption relative to cisgender women after 1–3 years of [gender-affirming hormone therapy].” Based on this conclusion, the authors expressed support for “nuanced, sport-specific policies rather than blanket bans” regarding the inclusion of transgender individuals in sports.

The paper caused an initial stir online—and rightly so. Below, I show how the authors misled the public by concluding that their meta-analysis provided evidence of no difference in muscle strength between women and transgender women (i.e., biological males) after “1–3 years” of cross-sex hormones. More specifically, the authors did not even follow their own inclusion criteria. In some cases, they included data that should have been excluded; in others, they excluded data that should have been included. Underlying all of this is the fact that the muscle-strength meta-analysis was underpowered from the start, with only a few studies contributing to the analysis.

Let us now take a closer look at the data and the claims.

Figure 2 in the paper, shown below, presents the data comparing muscle strength in cisgender and transgender women. Panel A is the forest plot for the meta-analysis of upper-body strength, while Panel B is the forest plot for the meta-analysis of lower-body strength.

On the left side of each plot are the labels for the individual studies included in the meta-analysis. These labels contain the surnames of the researchers who conducted the original studies, the year of publication, and the studies’ reference numbers in the paper’s reference list.

In total, six studies are shown in Panels A and B. The first is the Alvares 2025 study, which appears in both panels because it included data on both upper-body and lower-body strength. The others are Ceolin 2024, Hamilton 2024, Andrade 2022, Jenkins 2020, and Saitong 2025. Saitong 2025 appears twice in Panel B for reasons I will explain later.

In the plots, each study has a blue square associated with it. The blue square represents that study’s effect size, which reflects the magnitude of the difference in muscle strength between cisgender and transgender women. The upper-body strength analysis included four effect sizes from four studies, while the lower-body strength analysis included only four effect sizes from three studies.

The thin black lines extending to the left and right of each blue square are the 95% confidence intervals (CIs). The narrower the CI around the blue square, the more confident we can be that the blue square accurately reflects the true difference in muscle strength between cisgender and transgender women. The wider the CI, the less confident we can be.

The solid black vertical line in the middle of the plot aligns with zero on the x-axis and indicates no difference in muscle strength between cisgender and transgender women. The farther a blue square appears to the left of this zero line, the stronger the cisgender women were relative to the transgender women. The farther it appears to the right, the stronger the transgender women were relative to the cisgender women.

A large, light-blue diamond is displayed at the bottom of each plot. This diamond represents the overall effect size when all the studies in that plot are pooled together. Researchers use this pooled estimate to determine whether there is a statistically significant difference in a given fitness outcome between two groups. If the diamond crosses the vertical zero line, researchers generally conclude that there is no statistically significant difference between the groups. If the diamond does not cross the zero line, they conclude that a statistically significant difference does exist.

As seen in both Panels A and B, the diamonds cross the vertical zero line. This is why the authors concluded that there was no difference in upper-body or lower-body muscle strength between cisgender and transgender women.

Story over, right?

Too Few Studies

The overall effect size—the diamond—is only as meaningful as the inputs that comprise it. If researchers include studies that are irrelevant to the question, or exclude studies that are relevant to it, then the diamond, and the conclusions drawn from it, become meaningless.

More fundamentally, the number of individual studies that make up the overall effect size is critical. A meaningful meta-analysis typically includes a dozen or several dozen effect sizes. Meta-analyses based on only three or four effect sizes are inherently fragile. They can be easily distorted by a single outlier study, and that is precisely what happened in Figure 2.

Sieczkowska and her co-authors are not at fault for the limited number of available effect sizes. But they are responsible for choosing to conduct a meta-analysis on a topic for which so little data were available. Reaching such a bold conclusion from such sparse evidence was a further scientific failure. Put bluntly, Sieczkowska and her co-authors should never have conducted the muscle-strength meta-analyses shown in Figure 2.

And low statistical power is not the only problem with the muscle-strength meta-analyses in Figure 2. Incompetence and misleading statements were also on display.

Andrade 2022 Study: The Wrong Sex!

Let us start with the most shocking error in Figure 2: the inclusion of Andrade 2022. This study should not have been included in the muscle-strength meta-analysis at all.

According to Sieczkowska and her co-authors, Figure 2 presents results from studies comparing muscle strength between cisgender women and transgender women. But Andrade 2022 did not compare muscle strength between cisgender women and transgender women. It compared muscle strength between cisgender men and transgender men!

This is stated plainly in the title of the paper itself: “Bone mineral density, trabecular bone score and muscle strength in transgender men receiving testosterone therapy versus cisgender men” [italics added].

Ceolin 2024 and the Duration of Gender-Affirming Hormone Therapy

Removing Andrade 2022 reduces the number of relevant muscle-strength studies in Figure 2 to five, and the number in Panel A, on upper-body strength, to three. One of these three remaining studies is Ceolin 2024.

Sieczkowska and her co-authors stated that, to be eligible for their meta-analysis, a study needed to include a group of transgender individuals who had undergone gender-affirming hormone therapy, apparently for any duration. With that inclusion criterion in mind, the next most shocking error in Figure 2 is the inclusion of Ceolin 2024.

Ceolin 2024 did not include transgender women who had undergone gender-affirming hormone therapy. In fact, this is stated explicitly in the paper’s title: “Bone health and body composition in transgender adults before gender-affirming hormonal therapy: data from the COMET study” [italics added].

Ceolin 2024 reported baseline data on upper-body strength in transgender women before they underwent gender-affirming hormone therapy. For this reason, the study should not have been included in the meta-analysis of upper-body strength shown in Panel A of Figure 2. Removing it reduces the number of eligible studies to four overall, and to only two for upper-body strength.

But the problems concerning the duration of gender-affirming hormone therapy do not end there.

In their conclusion, Sieczkowska and her co-authors stated that the transgender women in the meta-analysis had undergone “1–3 years” of gender-affirming hormone therapy. A closer look at the four remaining studies suggests otherwise.

The Alvares 2025 study did not state explicitly the average duration of the participants’ gender-affirming hormone therapy. So, how could Sieczkowska and co-authors possibly know that all the transgender women in the Alvares 2025 study had undergone 1-3 years of gender-affirming hormone therapy? If anything, one can assume that the average therapy duration was longer than 3 years based on the average age of the transgender women in the study (30.3 years old) and the average age at which the transgender women in the study started their gender-affirming hormone therapy (23.0 years old).

The Saitong 2025 study reported that the transgender women had “been undergoing feminizing gender-affirming therapy for 8-10 years.” Thus, the Saitong 2025 study clearly falls outside of the 1-3 years range mentioned in Sieczkowska and co-authors’ concluding remarks.

The Hamilton 2024 study reported that the transgender women had undergone one year or more years of gender-affirming hormone therapy, with an average therapy duration of four years. Thus, several of the participants in the Hamilton 2024 study would have been undergoing gender-affirming hormone therapy for more than 3 years.

The Jenkins 2020 study reported that the transgender women had undergone a minimum of 2 years of gender-affirming hormone therapy, but no group average or range was provided. Thus, some participants in the Jenkins 2020 study would have also likely been undergoing gender-affirming hormone therapy for more than 3 years.

To their credit, Sieczkowska and co-authors did mention elsewhere in their paper that “[t]herapy duration varied widely, ranging from 3 months to 14 years, with most studies reporting the following participants for 1–3 years of therapy.” Their statement confirms large heterogeneity in therapy duration, and this heterogeneity amplifies the existing problems in the already underpowered meta-analysis. Their statement also reveals that the “1-3 years” remark, which was used in their concluding statements, was a rough range based on all 52 studies in their paper—not just the studies that compared muscle strength in cisgender and transgender women.

So, Sieczkowska and co-authors misled readers when they said there was no difference in muscle strength between cisgender women and transgender women who had undergone “1-3 years of gender-affirming hormone therapy.” In fact, one could argue that none of the studies that were included in panels A and B in Figure 2 are represented by that remark.

Remarkably, the errors do not end there.

Alvares 2025 Study: The Small Men

The Alvares 2025 study, published in the British Journal of Sports Medicine, examined body composition and physical fitness in cisgender women, transgender women, and cisgender men who had “engaged in regular volleyball training for at least 1 year.”

Alvares 2025 is a clear outlier in Panels A and B. It “pulls” the diamonds in both panels to the left and is the main reason the diamond in Panel A crosses the zero line. It also contributed substantially to the diamond in Panel B crossing the zero line.

Given that Alvares 2025 was an obvious outlier that strongly influenced the conclusions of the meta-analysis, one might have expected Sieczkowska and her co-authors to explain why this study was an outlier. Yet no such explanation was provided.

Here are some key details about Alvares 2025. First, the transgender women in the study were, on average, 5 cm shorter than the cisgender women and 13 cm shorter than the cisgender men. Second, the transgender women had an average body mass 7.8 kg lower than the cisgender women and 17.8 kg lower than the cisgender men. Third, the transgender women practiced volleyball 10 hours per week less than both the cisgender women and the cisgender men.

So what explains these bizarre results?

The cisgender women in Alvares 2025 were not ordinary recreational volleyball players or weekend warriors. They played in “second division national level championships.”

Importantly, Alvares and colleagues themselves addressed this issue in a section of their paper titled “Confounding factors.” There, they explicitly identified training experience, body height, and body mass as confounding factors that help explain why the average grip strength of the cisgender women in their study was greater than that of the transgender women.

Yet Sieczkowska and her co-authors did not inform readers about these confounding factors, which help explain why Alvares 2025 is an outlier in Figure 2. Moreover, because the cisgender and transgender women in Alvares 2025 were so oddly mismatched, the results of that study, and any meta-analysis that relies heavily on it, are of little scientific or practical value.

Jenkins 2020: Missing Data

One of the other problems with the muscle-strength meta-analysis in Figure 2 is that it omitted data that were eligible for inclusion under the authors’ own stated criteria.

The first example is Jenkins 2020, which examined body composition and physical fitness in cisgender women, transgender women, and cisgender men. One of the fitness tests Jenkins conducted was the vertical jump. Results from this test appear to be what Sieczkowska and her co-authors displayed in Panel B.

However, Jenkins and colleagues also reported grip-strength data for cisgender and transgender women. Yet Sieczkowska and her co-authors did not include these data in their meta-analysis of upper-body strength in Panel A.

This omission matters because the grip-strength data run counter to Sieczkowska and her co-authors’ conclusion. The average grip strength of the transgender women was 93.0 kg, compared with 63.9 kg for the cisgender women and 112.7 kg for the cisgender men. The 29.1 kg difference between transgender women and cisgender women was statistically significant, whereas the 19.7 kg difference between transgender women and cisgender men was not.

Saitong 2025: Missing Data

Sieczkowska and her co-authors categorized the vertical jump test as a measure of lower-body “strength.” But today, the vertical jump is more commonly, though still somewhat debatably, described as a test of “power.” In fact, this is how Alvares, Jenkins, and Saitong referred to it. Strength and power are correlated, and strength does correlate with vertical jump height. However, velocity and power are stronger correlates of vertical jump height than strength.

I highlight this issue of nomenclature for two reasons. First, inconsistent use of terminology can confuse readers about which fitness attributes are, and are not, affected by sex and gender-affirming hormone therapy. Second, in this meta-analysis, the inclusion of only the vertical jump as a lower-body strength test is odd, especially because at least one of the included studies—Saitong 2025—reported results from a more direct test of lower-body muscle strength: isokinetic peak torque of the knee extensor and knee flexor muscles.

Saitong 2025 examined these attributes in cisgender women, cisgender men, transgender women with orchiectomy (the surgical removal of the testes), and transgender women without orchiectomy. The group comparisons for the vertical jump and isokinetic peak torque tests were similar, with cisgender women tending to score lowest among all groups, though not necessarily at statistically significant levels. Nevertheless, Sieczkowska and her co-authors’ decision to include the vertical jump results, but not the isokinetic peak torque results, is odd.

Even stranger, Saitong 2025 also included two tests of upper-body strength that Sieczkowska and her co-authors did not include in their meta-analysis in Panel A: isokinetic peak torque of the elbow extensor muscles and isokinetic peak torque of the elbow flexor muscles. The results from these two tests are presented below. They show that cisgender women had the lowest average strength values, although their averages were not statistically different from those of transgender women who had undergone approximately 8–10 years of gender-affirming hormone therapy.

Elbow extensor isokinetic peak torque:

Cisgender women: 25.9 ± 4.9 Nm
Transgender women with orchiectomy and ~8–10 years of gender-affirming therapy: 27.2 ± 5.4 Nm
Transgender women without orchiectomy and ~8–10 years of gender-affirming therapy: 29.1 ± 5.2 Nm
Cisgender men: 47.3 ± 14.7 Nm

Elbow flexor isokinetic peak torque:

Cisgender women: 19.2 ± 5.1 Nm
Transgender women with orchiectomy and ~8–10 years of gender-affirming therapy: 21.5 ± 5.7 Nm
Transgender women without orchiectomy and ~8–10 years of gender-affirming therapy: 21.3 ± 5.0 Nm
Cisgender men: 36.3 ± 9.0 Nm

Another important point about Saitong 2025 is that the results from both transgender women with orchiectomy and transgender women without orchiectomy were compared against the same reference group of cisgender women. From the standpoint of study feasibility, this is understandable: it avoids the need to recruit a second reference group of cisgender women. But from a statistical standpoint, the double inclusion of the same reference group in a low-powered meta-analysis is problematic. If that reference group of cisgender women is unusual compared with the broader population of cisgender women, then its uniqueness will disproportionately influence an already flawed analysis.

The Non-Included Alvares Study: More Missing Data

Another bizarre aspect of the meta-analysis in Figure 2 is that at least one other study referenced elsewhere in the paper appears to qualify for inclusion in the figure but is absent from it. This was a second study by Alvares, listed as reference #38 in the paper and included in Figures 1 and 3.

This second Alvares study was published in the British Journal of Sports Medicine in 2022. It compared cisgender men, cisgender women, and transgender women who had undergone gender-affirming hormone therapy for 14 years, on average. One of the outcomes was grip strength. Grip strength was about 19 percent higher in transgender women than in cisgender women, 35.3 kg versus 29.7 kg, though this difference was not statistically significant. The grip strength of the cisgender men was 48.4 kg.

Including the grip-strength data from the transgender and cisgender women in Panel A would have “pulled” the diamond to the right, away from Sieczkowska and her co-authors’ conclusion.

Conclusion

On March 26, 2026, about a month and a half after Sieczkowska and her co-authors’ meta-analysis was published, the International Olympic Committee announced its new policy restricting eligibility for the female category at the Olympic Games and other IOC events to biological females. Thankfully, the IOC does not appear to have been influenced at the last minute by Sieczkowska’s paper. That said, findings published in papers like this can still influence sports policies developed by organizations other than the IOC, and they can still impact law.

As I have detailed, the meta-analysis of muscle-strength data in Figure 2 contains many flaws. It included multiple studies that do not meet the authors’ own stated eligibility criteria, while excluding multiple studies that do. Moreover, the meta-analysis was unsound from the start, because too few studies were available to support meaningful conclusions.

Sieczkowska and her co-authors should either issue a correction to their work or retract it altogether, particularly if similar problems exist elsewhere in the paper. Their conclusion that transgender women do not exhibit significant differences in upper-body or lower-body strength compared with cisgender women after 1–3 years of gender-affirming hormone therapy is inappropriate in light of the flaws highlighted above.

At this point, the more appropriate format for synthesizing this literature would be a narrative review, which would allow researchers to take a deeper look at the few available studies in this emerging area. Unfortunately, exercise scientists are often overly eager to publish meta-analyses because they are easier to complete than laboratory experiments, frequently cited by other researchers, and carry a patina of prestige because they are quantitative and require researchers to check a long list of procedural boxes that supposedly improve research quality.

Yet, as we have seen, these supposedly gold-standard procedures are no cure for good old-fashioned human incompetence.

This meta-analysis had eight authors, presumably two to four peer reviewers, and at least one editor. Yet apparently none of them identified one or more of the problems I have highlighted here. That is terrifying!

Finally, it is important to note that three of the papers discussed above were published in the British Journal of Sports Medicine. This meta-analysis makes four. Two of those papers—the Hamilton 2024 study and the second Alvares study—required formal corrections. If Sieczkowska and her co-authors’ meta-analysis is corrected, it would become the third paper on this topic in the journal to require correction. That is a remarkably high proportion, and it reflects the British Journal of Sports Medicine’s failing peer review process.

Over the past few years, the British Journal of Sports Medicine has become increasingly political. It has placed itself at the center of diversity, equity, and inclusion (DEI) in sports medicine and exercise science publishing. One of the boxes Sieczkowska and her co-authors were required to check before publishing in the journal was the author DEI statement. These statements take up one or two paragraphs of valuable space in papers already limited to 5,000 words. Sieczkowska and her co-authors’ DEI statement took up 80 words.

Think about that for a minute.

Consider the one or two important points of scientific clarification that could have been made with those 80 words. For example, they could have been used to explain why Alvares 2025 was such an outlier and to caution readers against overinterpreting the results of the muscle-strength meta-analysis. Instead, readers learned about the authors’ sexes, gender identities, sexual orientations, career stages, geographic locations, and marginalization statuses—none of which help policymakers understand whether there are meaningful differences in muscle strength between cisgender and transgender women.

This is how DEI undermines science.

Support Dr. Nuzzo’s Research

Dr. Nuzzo conducts independent research on sex differences in physical fitness—some of which has been covered at Reality’s Last Stand. To support Dr. Nuzzo’s research, please see his GoFund Me research account.

Thank you for supporting Reality’s Last Stand! If you enjoyed this article and know someone else you think might enjoy it, please consider gifting a paid subscription below.

Give a gift subscription