Feature/py residue #26

jcharkow · 2025-12-16T21:05:07Z

Add a wrapper around the Residue class which is exposed when iterating the amino acids with Py_AASequence

Summary by CodeRabbit

Release Notes

New Features
- Added residue wrapper type providing access to amino acid properties including molecular weight, modifications, and physicochemical characteristics.
- Sequences now support concatenation with residue objects.
Behavior Changes
- Single-element sequence indexing now returns residue objects instead of sub-sequences.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

This class is a single amino acid, wrapper around Residue object

coderabbitai · 2025-12-16T21:05:17Z

Walkthrough

A new Py_Residue wrapper class is introduced for OpenMS amino acid residues with comprehensive accessors and utility methods. The Py_AASequence class is updated to return Py_Residue objects for single-item indexing and to support concatenation with Py_Residue instances. Tests cover both the new class and modified behavior.

Changes

Cohort / File(s)	Summary
New Py_Residue wrapper class `openms_python/py_residue.py`	Introduces immutable `Py_Residue` wrapper around OpenMS Residue with properties for name, codes, weight, formula, modifications, neutral losses, pKa values, basicity metrics, and residue-set membership. Includes factory methods `from_native()` and `from_string()`, and standard Python special methods (`__str__`, `__repr__`, `__eq__`, `__ne__`, `__hash__`).
Public exports `openms_python/__init__.py`	Imports and exports `Py_Residue` in `__all__`.
Py_AASequence operator updates `openms_python/py_aasequence.py`	Adds import of `Py_Residue`; modifies `__getitem__` to return `Py_Residue` for single indices (via `Py_Residue.from_native`) instead of `Py_AASequence`; extends `__add__` signature to accept `Py_AASequence \| str \| Py_Residue` and appends residue's one-letter code; extends `__radd__` to support `Py_Residue` on left-hand side.
Integration and comprehensive tests `tests/test_py_aasequence.py`, `tests/test_py_residue.py`	Updates existing Py_AASequence tests to use `one_letter_code` on residue objects and test concatenation with Py_Residue. Adds new test suite for Py_Residue covering construction, properties, modifications, weights, formulas, neutral losses, pKa values, basicity, residue-set membership, equality, and hashing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Breaking API change: __getitem__ with integer index now returns Py_Residue instead of Py_AASequence—verify backward compatibility considerations and all call sites
Py_Residue.from_native(): Ensure proper wrapping of all Residue states, including modified residues and edge cases
Concatenation operators: Validate __add__ and __radd__ logic for all type combinations (Py_AASequence, str, Py_Residue) and ensure consistent behavior
Test coverage: Confirm comprehensive coverage of Py_Residue API surface and updated Py_AASequence behavior

Possibly related PRs

Add more Py_AASequence Convenience functions #25: Modifies Py_AASequence operator and indexing behavior; this PR extends those changes by introducing Py_Residue return type and adding Py_Residue concatenation support.

Suggested reviewers

timosachsenberg

Poem

🐰 A residue wraps in Pythonic grace,
Where sequences dance with elegant space,
One index, one residue—the contract is clear,
Concatenation flows like lettuce so dear,
Amino acids bloom in the wrapper's embrace! 🌿

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Feature/py residue' is vague and generic, using non-descriptive terms that don't clearly convey what the changeset accomplishes.	Use a more descriptive title like 'Add Py_Residue wrapper class for amino acid residues' that clearly indicates the main change being introduced.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 97.53% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/py_residue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

openms_python/py_aasequence.py (2)

223-253: Docstring and return type annotation need updating for __getitem__.

The docstring (line 233) still states "Returns: Py_AASequence" and the example shows seq[1] # Returns Py_AASequence("E"), but single-index access now returns Py_Residue. The method signature also lacks a return type annotation.

-    def __getitem__(self, index):
+    def __getitem__(self, index: int | slice) -> Py_Residue | Py_AASequence:
         """
         Get residue(s) at position(s).
 
-        Supports both single indexing and slicing, returning Py_AASequence objects.
+        Supports both single indexing and slicing. Single index returns Py_Residue,
+        slicing returns Py_AASequence.
 
         Args:
             index: Integer for single residue, or slice object for subsequence.
 
         Returns:
-            Py_AASequence: Wrapped residue or subsequence.
+            Py_Residue | Py_AASequence: Py_Residue for single index, Py_AASequence for slice.
 
         Example:
             >>> seq = Py_AASequence.from_string("PEPTIDE")
-            >>> seq[1]  # Returns Py_AASequence("E")
+            >>> seq[1]  # Returns Py_Residue for "E"
             >>> seq[1:4]  # Returns Py_AASequence("EPT")
-            >>> seq[-1]  # Returns Py_AASequence("E")
+            >>> seq[-1]  # Returns Py_Residue for "E"
         """

289-305: Update __radd__ type annotation to include Py_Residue.

The method now handles Py_Residue on the left-hand side but the type hint still only shows str.

-    def __radd__(self, other: str) -> Py_AASequence:
+    def __radd__(self, other: str | Py_Residue) -> Py_AASequence:
         """
-        Support string + Py_AASequence.
+        Support string + Py_AASequence and Py_Residue + Py_AASequence.
 
         Example:
             >>> seq = Py_AASequence.from_string("TIDE")
             >>> combined = "PEP" + seq
             >>> print(combined.sequence)
             PEPTIDE
         """

🧹 Nitpick comments (5)

openms_python/py_residue.py (4)

9-38: Well-structured wrapper class with clear documentation.

The class follows the Py_ naming convention and provides a clean docstring with usage examples. The constructor correctly handles the optional native_residue parameter.

However, the docstring claims the wrapper is "immutable" (line 11), but set_modification and set_modification_by_diff_mass methods mutate the underlying residue. Consider updating the docstring to reflect the actual behavior.
 class Py_Residue:
     """
-    A Pythonic, immutable wrapper around pyOpenMS Residue.
+    A Pythonic wrapper around pyOpenMS Residue.
 
     This class provides intuitive properties and methods for working with
     amino acid residues, including access to modifications, formulas,
     weights, and physicochemical properties.
78-89: Improve exception handling with proper chaining.

The static analysis correctly identifies issues here: catching bare Exception and not using exception chaining. This makes debugging harder when the lookup fails.
         # Try direct lookup (works for all formats)
         try:
             residue = db.getResidue(code)
             return cls(residue)
-        except Exception:
-            raise ValueError(f"Residue '{code}' not found in ResidueDB.")
+        except Exception as err:
+            raise ValueError(f"Residue '{code}' not found in ResidueDB.") from err
120-128: Misleading docstring for mono_weight property.

The docstring mentions "Args can be passed" but this is a property with no parameters. The parameterized version is get_mono_weight(). Consider clarifying the docstring.
     @property
     def mono_weight(self) -> float:
         """
-        Get monoisotopic weight.
-
-        Args can be passed to get weight for different residue types
-        (Full, Internal, NTerminal, CTerminal, etc.).
+        Get monoisotopic weight (full residue type).
+
+        For weights with different residue types, use get_mono_weight().
         """
         return self._residue.getMonoWeight()
220-233: Mutation methods don't return self for method chaining.

Per coding guidelines, mutation methods should return self to enable fluent interfaces. This applies to both set_modification and set_modification_by_diff_mass.

Based on coding guidelines, consider returning self for method chaining:
-    def set_modification(self, mod_name: str) -> None:
+    def set_modification(self, mod_name: str) -> Py_Residue:
         """
         Set modification by name.
 
-        Note: This modifies the underlying residue object.
-
         Args:
             mod_name: Name of modification (must exist in ModificationsDB).
 
+        Returns:
+            Py_Residue: Self for method chaining.
+
         Example:
             >>> res = Py_Residue.from_one_letter_code("M")
             >>> res.set_modification("Oxidation")
         """
         self._residue.setModification(mod_name)
+        return self
 
-    def set_modification_by_diff_mass(self, diff_mono_mass: float) -> None:
+    def set_modification_by_diff_mass(self, diff_mono_mass: float) -> Py_Residue:
         """
         Set modification by mass difference.
         ...
         """
         self._residue.setModificationByDiffMonoMass(diff_mono_mass)
+        return self

tests/test_py_residue.py (1)

11-30: Test function contains multiple inline docstrings instead of test separation.

The function test_py_residue_from_string contains three docstrings for what should be separate test cases. This pattern hides test intent and makes it harder to identify which assertion failed.

Consider splitting into separate test functions or using pytest parametrize:

-def test_py_residue_from_string():
-
-    """Test creating residue from one-letter code."""
-    ala = Py_Residue.from_string("A")
-    assert ala.one_letter_code == "A"
-    assert ala.three_letter_code == "Ala"
-    assert ala.name == "Alanine"
-
-    """Test creating residue from three-letter code."""
-    ala = Py_Residue.from_string("Ala")
-    ...
+@pytest.mark.parametrize("code,expected_one,expected_three,expected_name", [
+    ("A", "A", "Ala", "Alanine"),
+    ("Ala", "A", "Ala", "Alanine"),
+    ("Alanine", "A", "Ala", "Alanine"),
+])
+def test_py_residue_from_string(code, expected_one, expected_three, expected_name):
+    """Test creating residue from various code formats."""
+    residue = Py_Residue.from_string(code)
+    assert residue.one_letter_code == expected_one
+    assert residue.three_letter_code == expected_three
+    assert residue.name == expected_name

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 204e8b6 and b3a397e.

📒 Files selected for processing (5)

openms_python/__init__.py (2 hunks)
openms_python/py_aasequence.py (5 hunks)
openms_python/py_residue.py (1 hunks)
tests/test_py_aasequence.py (4 hunks)
tests/test_py_residue.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (6)

**/*.py