LD matrix documentation #3353

lkirk · 2025-12-16T17:44:14Z

This PR provides some documentation for the ld_matrix method so that we can consider it a finalized, public API method.

I've tried to provide a sufficient level of detail without going too deep into the method. I also provide some simple demos with the provided example tree sequences. I attempt to provide everything a user would need in the docstring for the ld_matrix method, I'm not sure if it's too much detail.

I didn't touch the existing Todo note in the Multi site statistics section of the documentation, I wasn't exactly sure how we wanted to handle that. To my knowledge, the LdCalculator still exists and it would be useful to frame this method in context with that one.

cc: @petrelharp @apragsdale

codecov · 2025-12-16T17:49:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.76%. Comparing base (25a26a7) to head (c1fc5c7).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3353   +/-   ##
=======================================
  Coverage   89.76%   89.76%           
=======================================
  Files          29       29           
  Lines       31292    31292           
  Branches     5738     5738           
=======================================
  Hits        28089    28089           
  Misses       1794     1794           
  Partials     1409     1409

Flag	Coverage Δ
c-tests	`86.77% <ø> (ø)`
lwt-tests	`80.38% <ø> (ø)`
python-c-tests	`87.12% <ø> (ø)`
python-tests	`98.85% <ø> (ø)`
python-tests-no-jit	`33.51% <ø> (ø)`
python-tests-numpy1	`50.28% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
python/tskit/trees.py	`98.89% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gregorgorjanc · 2025-12-16T18:22:18Z

docs/stats.md

+subset of sites, either by specifying a single vector for both rows and columns
+or a pair of vectors for the row sites and column sites separately.
+
+The following example produces a matrix containing {math}`r^2` computed pairwise


Suggested change

The following example produces a matrix containing {math}`r^2` computed pairwise

The following computes a matrix of {math}`r^2` measure of linkage-disequilibrium (LD) pairwise

gregorgorjanc · 2025-12-16T18:30:45Z

docs/stats.md

+The two-locus API provides a mechanism by which to subset the samples under
+consideration, providing the ability to compute a separate LD matrix for each
+sample set or an LD matrix between sample sets. Output dimensions are handled in
+the same manner as the rest of the stats api (see


Suggested change

the same manner as the rest of the stats api (see

the same manner as the rest of the stats API (see

gregorgorjanc · 2025-12-16T18:41:13Z

docs/stats.md

+: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = p_{ab} - p_{a}p_{b}`
+
+  This statistic is inherently polarised, as the unpolarised result of this
+  statistic is expected to be zero. Uses the `total` normalisation method.


Expectation is zero when unpolarised, but actual values can differ from zero, particularly in small populations or under intense selection, no?

gregorgorjanc · 2025-12-16T18:47:47Z

python/tskit/trees.py

+
+        :param list sample_sets: A list of lists of Node IDs, specifying the
+            groups of nodes to compute the statistic with.
+            Defaults to all samples grouped by population. TODO: does it?


Lingering TODO

gregorgorjanc · 2025-12-16T18:49:27Z

I spotted minor bits for you to check/consider. Someone more versed in the tskit code should also have a look;)

apragsdale · 2025-12-16T19:06:36Z

Thank you @lkirk! I’ll give detailed comments on it later this evening. Great to see your methods getting documentation

apragsdale

Hi Lloyd - many small, minor comments here that will hopefully help with documentation clarity in places. Thanks again!

apragsdale · 2025-12-17T00:44:16Z

docs/stats.md

+The {meth}`~TreeSequence.ld_matrix` method provides an interface to a collection
+of two-locus statistics with predefined summary functions (see
+{ref}`sec_stats_two_locus_summary_functions`) and `site` and `branch` {ref}`modes
+<sec_stats_mode>`. This method differs from the other stats methods because it


"The API for this method differs..."

apragsdale · 2025-12-17T00:46:54Z

docs/stats.md

+of two-locus statistics with predefined summary functions (see
+{ref}`sec_stats_two_locus_summary_functions`) and `site` and `branch` {ref}`modes
+<sec_stats_mode>`. This method differs from the other stats methods because it
+provides a collection of statistics, instead of the usual one method per


Maybe, "The LD matrix method differs from other statistics methods in that it provides a unified API with an argument to specify different two-locus summaries of the data." Or something along those lines.

apragsdale · 2025-12-17T00:52:49Z

docs/stats.md

+<sec_stats_mode>`. This method differs from the other stats methods because it
+provides a collection of statistics, instead of the usual one method per
+stat. Otherwise, it behaves similarly to most other functions with respect to
+`sample_sets` and `indexes`. Site statistics can be computed from multi-allelic


Maybe, "Two-locus statistics can be computed using two modes, either site or branch, and these should be interpreted in the same was as single-site statistics. Site statistics allow for multi-allelic data, while branch statistics assume an infinite sites model."

apragsdale · 2025-12-17T00:56:27Z

docs/stats.md

+
+#### Site
+
+The `site` mode computes two-locus statistics from pairs of alleles on sites. By


"from pairs of alleles on sites" - it's not entirely clear what this means. Maybe, "summarized over all pairs of alleles between specified sites."?

apragsdale · 2025-12-17T00:58:00Z

docs/stats.md

+
+The `site` mode computes two-locus statistics from pairs of alleles on sites. By
+default, this method will compute a matrix for all pairs of sites, with rows and
+columns representing each site in the tree sequence (ie an n×n matrix where n is


"i.e.,".

Usually use "n" for sample size, maybe "m" for number of sites.

apragsdale · 2025-12-17T01:31:47Z

python/tskit/trees.py

+
+        In the site mode, the sites under consideration can be restricted using the
+        ``sites`` argument. Sites can be passed as a list of lists, specifying the
+        ``[[row_sites], [col_sites]]`` or by specifying ``[all_sites]``, where a square


Though note that it doesn't have to be "all sites". It can be a single subset of sites on the tree, in which case we get a square matrix.

apragsdale · 2025-12-17T01:32:40Z

python/tskit/trees.py

+
+        We can also compute two-way LD statistics between two sample sets. If the
+        ``indexes`` argument is specified, at least two sample sets must also be
+        specified. ``indexes`` specifies the sample sets indexes between which to


"sample set indexes" ... or "sample sets' indexes"?

apragsdale · 2025-12-17T01:33:54Z

python/tskit/trees.py

+         :math:`\widehat{\pi_2}`   n            n                 "pi2_unbiased"
+        ========================= ============ ================= =================
+
+        :param list sample_sets: A list of lists of Node IDs, specifying the


of sample node IDs, maybe?

apragsdale · 2025-12-17T01:35:20Z

python/tskit/trees.py

+            mode. [[row_sites], [col_sites]] or [all_sites].
+        :param list positions: A list of genomic positions to restrict. Can be
+            specified as a list of lists to control the row and column sites.
+            [[row_sites], [col_sites]] or [all_sites].


Is this only applicable in branch mode?

apragsdale · 2025-12-17T01:35:57Z

python/tskit/trees.py

+            specified as a list of lists to control the row and column sites.
+            [[row_sites], [col_sites]] or [all_sites].
+        :param list indexes: A list of 2-tuples or a single 2-tuple, specifying
+            the indexes of two populations on which to compute two-way LD


of two sample sets over which

lkirk added 4 commits December 16, 2025 11:35

Scaffolding for two-locus docs

1004275

Final draft of documentation for ld_matrix

cbadbb4

remove commented out draft writing

05fef13

get rid of one more commented bit

c1fc5c7

gregorgorjanc reviewed Dec 16, 2025

View reviewed changes

apragsdale reviewed Dec 17, 2025

View reviewed changes

	The following example produces a matrix containing {math}`r^2` computed pairwise
	The following computes a matrix of {math}`r^2` measure of linkage-disequilibrium (LD) pairwise

	the same manner as the rest of the stats api (see
	the same manner as the rest of the stats API (see


		#### Site

		The `site` mode computes two-locus statistics from pairs of alleles on sites. By

LD matrix documentation #3353

Are you sure you want to change the base?

LD matrix documentation #3353

Conversation

lkirk commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gregorgorjanc commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apragsdale commented Dec 16, 2025

Uh oh!

apragsdale left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lkirk commented Dec 16, 2025 •

edited

Loading

codecov bot commented Dec 16, 2025 •

edited

Loading

gregorgorjanc commented Dec 16, 2025 •

edited

Loading