Skip to content

Conversation

@pavanimanchala53
Copy link

@pavanimanchala53 pavanimanchala53 commented Dec 11, 2025

PR: Add NL Parser With Typo Tolerance, Confidence Scoring, Clarifications & Unit Tests

This PR adds a standalone natural language parser for the cortex install flow, improving demo reliability and addressing all acceptance criteria in the bounty request.

✔ Features Implemented

Typo tolerance (e.g., kubernets → kubernetes, pyhton → python)

Rule-based + fuzzy semantic intent detection

Confidence scoring (0.0–1.0)

Ambiguity handling with clarification prompts

Short explanation showing AI reasoning

Slot extraction:

Python version

Platform

Packages mentioned

12+ unit tests validating NL behavior

✔ Files Added

nl_parser.py

tests/test_nl_parser.py

✔ How to Test

Run:

python -m pytest tests/test_nl_parser.py -q

✔ Bounty Requirement Status

All acceptance criteria met:

10+ NL test cases — ✔

Ambiguous handling — ✔

Typo tolerance — ✔

Explanation output — ✔

Confidence scoring — ✔

Unit tests — ✔

Closes: #248

Summary by CodeRabbit

  • New Features

    • Natural language parsing for installation requests with spell correction, slot extraction, and intent detection.
    • Clarification prompts when intent or details are ambiguous.
    • Optional LLM-assisted intent enhancement and optimization suggestions.
    • Installation planning engine that produces GPU-aware, step-by-step plans.
    • Session context to remember GPU, previous intents, installed packages, and clarifications.
  • Tests

    • Added comprehensive tests for parsing, intent detection, clarification, planning, and session context.
  • Chores

    • Added PyYAML dependency.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

Adds a natural-language install parsing subsystem: a new nl_parser, an intent system (detector, clarifier, context, planner), an LLM-assisted agent, tests for these components, and a small dependency and .gitignore comment tweak.

Changes

Cohort / File(s) Summary
Core NLP Parser
nl_parser.py
New module implementing normalization, tokenization, lightweight spell correction, semantic and rule-based intent scoring, slot extraction (python version, platform, packages), confidence aggregation, clarification suggestions, and parse_request orchestration.
Intent Detection & Clarification
src/intent/detector.py, src/intent/clarifier.py
New rule-based Intent dataclass and IntentDetector for common package/GPU/verify intents; new Clarifier class that returns clarifying questions based on detected intents and input text.
Session Context
src/intent/context.py
New SessionContext class to store detected GPU, previous intents, installed packages, and clarifications, with getters/setters and reset.
Installation Planner
src/intent/planner.py
New InstallationPlanner class with GPU-aware build_plan that composes installation/configuration/verification steps from intents.
LLM-Assisted Agent
src/intent/llm_agent.py
New LLMIntentAgent coordinating detection, optional clarification, optional Anthropic LLM enhancement/suggestions, context updates, and plan generation with guarded fallbacks.
Unit Tests (intent system & planner)
src/test_clarifier.py, src/test_context.py, src/test_intent_detection.py, src/test_llm_agent.py, src/test_planner.py
New tests covering Clarifier, SessionContext, IntentDetector, LLMIntentAgent (mocked), and InstallationPlanner behaviors and integrations.
NL Parser Tests
tests/test_nl_parser.py
New pytest suite for nl_parser.parse_request validating intent mapping, typo corrections, slot extraction, confidences, and clarifications across multiple scenarios.
Requirements & Misc
requirements.txt, .gitignore, test/test_installation_history.py
Added explicit PyYAML==6.0.3 entry (duplicate/overlap with pyyaml>=6.0.0 noted); removed a PyInstaller comment line in .gitignore; trivial formatting change in test/test_installation_history.py.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Agent as LLMIntentAgent
    participant Detector as IntentDetector
    participant Clarifier
    participant Context as SessionContext
    participant LLM as Anthropic LLM
    participant Planner as InstallationPlanner

    User->>Agent: process(text)
    Agent->>Detector: detect(text)
    Detector-->>Agent: intents
    Agent->>Clarifier: needs_clarification(intents, text)
    Clarifier-->>Agent: clarification? 

    alt clarification needed
        Agent-->>User: return clarification question
    else no clarification
        alt LLM available & API key
            Agent->>LLM: enhance_intents_with_llm(text, intents)
            LLM-->>Agent: enhanced_intents
            Agent->>Context: add_intents(enhanced_intents)
            Agent->>Planner: build_plan(enhanced_intents)
            Planner-->>Agent: plan
            Agent->>LLM: suggest_optimizations(text)
            LLM-->>Agent: suggestions
        else LLM unavailable
            Agent->>Planner: build_plan(intents)
            Planner-->>Agent: plan
        end
        Agent-->>User: return {intents, plan, suggestions?, gpu_info, confidence}
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Areas needing extra attention:
    • nl_parser.py (scoring, spell-correction heuristics, slot regexes)
    • src/intent/llm_agent.py (LLM prompt construction, response parsing, timeouts, graceful fallback)
    • Integration between detector → clarifier → context → planner (data shapes and de-dup logic)
    • Tests that rely on mocked LLM behavior and edge-case coverage for ambiguous inputs

Suggested labels

enhancement

Suggested reviewers

  • dhvll

Poem

🐰 I hopped through text and fixed each typo,
Pieced intents from crumbs where meanings lie,
Planned installs, checked GPUs with a twitch,
A tiny rabbit, making setup less glitch. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 2 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description is comprehensive and covers features, test instructions, and bounty criteria, but does not follow the required template structure with Related Issue section, Summary section, and Checklist. Reformat description to match the required template: add 'Related Issue' section, restructure content into 'Summary' section, and include the checkbox-based 'Checklist' with test pass status.
Docstring Coverage ⚠️ Warning Docstring coverage is 12.82% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The PR title is vague and generic, using 'improvements' without specifying the main change, making it unclear to reviewers scanning history. Revise title to be more specific, e.g., 'Add NL parser with typo tolerance and confidence scoring' or follow the template format '[#248] NL parser improvements'.
Out of Scope Changes check ❓ Inconclusive Several file changes appear unrelated to the core NL parser: .gitignore comment removal, requirements.txt PyYAML addition, and multiple new modules (clarifier, context, detector, llm_agent, planner) with tests not mentioned in #248 objectives. Clarify whether all added modules (detector, clarifier, context, llm_agent, planner) and modified files (requirements.txt, .gitignore) are integral to the NL parser feature or should be separated into a distinct PR.
✅ Passed checks (1 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR implements all acceptance criteria from #248: 12+ NL test cases, ambiguity handling with clarifications, typo tolerance via spell correction, explanation output, and confidence scoring with unit tests.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac7fbae and 5dabcd6.

📒 Files selected for processing (2)
  • requirements.txt (1 hunks)
  • test/test_installation_history.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • requirements.txt
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • test/test_installation_history.py
**/*install*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*install*.py: Dry-run by default for all installations in command execution
No silent sudo execution - require explicit user confirmation
Implement audit logging to ~/.cortex/history.db for all package operations

Files:

  • test/test_installation_history.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : Implement audit logging to ~/.cortex/history.db for all package operations

Applied to files:

  • test/test_installation_history.py
🔇 Additional comments (1)
test/test_installation_history.py (1)

10-10: PEP 8 compliance improvement.

The blank line correctly separates standard library imports from local package imports, following PEP 8 import grouping guidelines.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (13)
test/test_installation_history.py (1)

10-16: Confirm capitalized module name and consider PEP 8 alignment

Switching to from Installation_history import ... is fine if the module is actually named with a capital I. If the module is under your control, consider renaming it to a lowercase, underscore‑separated name and adjusting imports for PEP 8 compliance; otherwise, just ensure this import matches the real filename in the package layout.

src/intent/detector.py (1)

6-10: Tighten public API typing/docstrings for Intent and detect()

The detection logic itself looks solid and matches the intended flows. To better align with the project’s typing/docstring guidelines, consider:

  • Making details more explicit than Optional[dict], e.g. Optional[dict[str, object]] or a typed alias if you expect a fixed structure.
  • Adding a short docstring to Intent (or at least to detect) describing expected input text and the shape/ordering of returned intents.

This will make the new intent surface clearer to downstream callers and type checkers. As per coding guidelines, docstrings and strong type hints are expected on public APIs.

Also applies to: 26-49

src/intent/context.py (1)

13-69: Add explicit -> None return types on mutating SessionContext methods

SessionContext behavior looks fine, but several public methods (__init__, set_gpu, add_intents, add_installed, add_clarification, reset) currently omit their -> None return annotations. Adding these will fully satisfy the “type hints required” guideline and make intent clearer to type checkers:

-    def set_gpu(self, gpu_name: str):
+    def set_gpu(self, gpu_name: str) -> None:
@@
-    def add_intents(self, intents: List[Intent]):
+    def add_intents(self, intents: List[Intent]) -> None:
@@
-    def add_installed(self, pkg: str):
+    def add_installed(self, pkg: str) -> None:
@@
-    def add_clarification(self, question: str):
+    def add_clarification(self, question: str) -> None:
@@
-    def reset(self):
+    def reset(self) -> None:

(You can similarly annotate __init__ if you want full consistency.)

As per coding guidelines, type hints are required.

src/intent/planner.py (1)

6-49: Use GPU_PACKAGES in has_gpu and annotate it as ClassVar

Right now GPU_PACKAGES is never used, and has_gpu only checks for a gpu target, so a plan with only cuda/pytorch intents won’t trigger GPU detection/config steps. You can both fix this and satisfy the Ruff ClassVar warning by:

-from typing import List
+from typing import List, ClassVar
@@
-    GPU_PACKAGES = ["cuda", "cudnn", "pytorch", "tensorflow"]
+    GPU_PACKAGES: ClassVar[List[str]] = ["cuda", "cudnn", "pytorch", "tensorflow"]
@@
-        # 1. If GPU-related intents exist → add GPU detection
-        has_gpu = any(i.target == "gpu" for i in intents)
+        # 1. If GPU-related intents exist → add GPU detection
+        has_gpu = any(
+            i.target == "gpu" or i.target in self.GPU_PACKAGES
+            for i in intents
+        )

This keeps the behavior aligned with the intent of “GPU-related intents” while cleaning up the static-analysis complaint.

src/test_intent_detection.py (1)

1-15: Tests cover core detector behavior; type hints optional for consistency

These tests nicely exercise the happy path and an empty-input case for IntentDetector. If you want full alignment with the “type hints required” policy, you could add return annotations like def test_detector_basic() -> None, but that’s purely consistency polish on test code.

src/test_clarifier.py (1)

1-12: Clarifier integration test looks good; consider optional type annotations

The test correctly validates that a clarification question is produced for an underspecified ML request. As with the other tests, you could optionally add -> None on the test function to mirror the project’s typing convention, but the current test is functionally solid.

src/test_llm_agent.py (2)

3-9: Consider using a factory method or instance attribute for mutable content.

The content class attribute on line 8 is a mutable list. While this works for the current test, it could cause unexpected behavior if multiple Response instances share state. Since this is a simple mock, it's low risk, but using an instance attribute or property would be safer.

 class MockMessages:
     def create(self, **kwargs):
         class Response:
             class Content:
                 text = "install: tensorflow\ninstall: jupyter"
-            content = [Content()]
+            @property
+            def content(self):
+                return [self.Content()]
         return Response()

15-28: Test file location inconsistent with project structure.

This test file is in src/ while other NL parser tests are in tests/test_nl_parser.py. Consider moving to tests/test_llm_agent.py for consistency. Also, the test could benefit from additional assertions to verify the structure of the plan (e.g., that plan items are strings) and that suggestions is a list.

     assert "plan" in result
     assert len(result["plan"]) > 0
     assert "suggestions" in result
+    assert isinstance(result["plan"], list)
+    assert isinstance(result["suggestions"], list)
src/test_planner.py (1)

4-16: Test validates planner correctly; consider moving to tests/ and using constants.

The test properly validates the CUDA/PyTorch/GPU pipeline. Assertions against hardcoded strings (e.g., "Install CUDA 12.3 + drivers") may become brittle if planner output changes. Consider either:

  1. Moving to tests/test_planner.py for consistency
  2. Optionally defining expected plan steps as constants shared with the planner

Current test logic is correct.

src/intent/llm_agent.py (2)

67-70: Consider logging swallowed exceptions and using narrower exception types.

Bare except Exception catches can mask unexpected errors. While the fallback behavior is appropriate for resilience, logging the exception would aid debugging in production.

+import logging
+
+logger = logging.getLogger(__name__)
+
         # 4. Improve intents using LLM (safe)
         try:
             improved_intents = self.enhance_intents_with_llm(text, intents)
-        except Exception:
+        except Exception as e:
+            logger.warning("LLM enhancement failed, using rule-based intents: %s", e)
             improved_intents = intents

Also applies to: 79-82


100-106: LLM prompt may produce inconsistent output format.

The prompt at lines 96-106 asks for "install: package" format but doesn't strongly constrain the LLM output. Consider adding examples or using a more structured prompt to improve reliability of parsing.

 Return improvements or extra intents.
-Format: "install: package" or "configure: component"
+Format each intent on its own line using exactly one of:
+- install: <package_name>
+- configure: <component>
+- verify: <target>
+
+Example:
+install: tensorflow
+configure: gpu
 """
nl_parser.py (2)

142-142: Clarify deduplication idiom with a comment.

The dict.fromkeys() pattern for deduplication is clever but not immediately obvious to all readers.

Apply this diff:

-        slots["packages"] = list(dict.fromkeys(pkgs))  # unique preserve order
+        # Deduplicate while preserving order (dict.fromkeys maintains insertion order in Python 3.7+)
+        slots["packages"] = list(dict.fromkeys(pkgs))

173-220: Consider adding input length validation.

The function processes arbitrary-length text without validation. While not likely to be a practical issue for typical use cases, extremely long inputs could cause performance degradation.

Consider adding a length check at the start of the function:

 def parse_request(text: str) -> Dict[str, Any]:
     """Main function used by tests and demo."""
+    # Prevent potential DoS from extremely long inputs
+    if len(text) > 10000:
+        return {
+            "intent": "unknown",
+            "confidence": 0.0,
+            "explanation": "Input too long",
+            "slots": {},
+            "corrections": [],
+            "clarifications": ["Please provide a shorter request (max 10000 characters)"],
+        }
+    
     norm = normalize(text)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a1755c and ac7fbae.

📒 Files selected for processing (15)
  • .gitignore (0 hunks)
  • nl_parser.py (1 hunks)
  • requirements.txt (1 hunks)
  • src/intent/clarifier.py (1 hunks)
  • src/intent/context.py (1 hunks)
  • src/intent/detector.py (1 hunks)
  • src/intent/llm_agent.py (1 hunks)
  • src/intent/planner.py (1 hunks)
  • src/test_clarifier.py (1 hunks)
  • src/test_context.py (1 hunks)
  • src/test_intent_detection.py (1 hunks)
  • src/test_llm_agent.py (1 hunks)
  • src/test_planner.py (1 hunks)
  • test/test_installation_history.py (1 hunks)
  • tests/test_nl_parser.py (1 hunks)
💤 Files with no reviewable changes (1)
  • .gitignore
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • src/intent/detector.py
  • src/intent/context.py
  • src/test_clarifier.py
  • tests/test_nl_parser.py
  • src/intent/planner.py
  • src/intent/llm_agent.py
  • src/test_llm_agent.py
  • src/test_intent_detection.py
  • src/intent/clarifier.py
  • test/test_installation_history.py
  • src/test_context.py
  • src/test_planner.py
  • nl_parser.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

  • tests/test_nl_parser.py
**/*install*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*install*.py: Dry-run by default for all installations in command execution
No silent sudo execution - require explicit user confirmation
Implement audit logging to ~/.cortex/history.db for all package operations

Files:

  • test/test_installation_history.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*.py : Type hints required in Python code

Applied to files:

  • requirements.txt
🧬 Code graph analysis (2)
tests/test_nl_parser.py (1)
nl_parser.py (1)
  • parse_request (173-220)
src/intent/llm_agent.py (1)
src/intent/detector.py (2)
  • Intent (7-10)
  • detect (26-49)
🪛 Ruff (0.14.8)
src/intent/planner.py

8-8: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

src/intent/llm_agent.py

69-69: Do not catch blind exception: Exception

(BLE001)


81-81: Do not catch blind exception: Exception

(BLE001)

src/test_llm_agent.py

4-4: Unused method argument: kwargs

(ARG002)


8-8: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


22-22: Unused lambda argument: a

(ARG005)


22-22: Unused lambda argument: k

(ARG005)

🔇 Additional comments (8)
requirements.txt (1)

12-12: Validate PyYAML pin against supported Python versions and security policy

Adding PyYAML==6.0.3 looks fine; just confirm this exact pin is required (vs >=) for your nl parsing/config use cases and that it’s compatible with your minimum Python version and current vulnerability guidance.

src/intent/clarifier.py (1)

6-33: Clarification logic is coherent and matches intended ambiguity handling

The branching in needs_clarification cleanly captures the key ambiguous cases (GPU presence, generic ML tools, CUDA without GPU mention, PyTorch version choice) with clear questions, and types/docstrings are in good shape. No changes needed here.

src/test_context.py (1)

4-13: Test logic is sound; consider moving to tests/ directory.

The test correctly validates SessionContext's core functionality: GPU setting, intent addition, and installed package tracking. For consistency with tests/test_nl_parser.py, consider relocating this file to tests/test_context.py.

tests/test_nl_parser.py (1)

1-37: Comprehensive test coverage aligns with PR requirements.

The test suite covers:

  • 12 parametrized intent cases (exceeds 10+ requirement)
  • Typo tolerance ("kubernets", "pyhton")
  • Slot extraction (python_version, platform)
  • Ambiguity handling with clarification checks
  • Confidence scoring (>= 0.5 threshold)

This satisfies the acceptance criteria from issue #248.

src/intent/llm_agent.py (1)

119-119: The .lower() call is appropriate and correctly designed for this codebase.

The downstream systems (detector.py and planner.py) consistently use lowercase for all internal package matching. The COMMON_PACKAGES dictionary in detector.py (line 17) and GPU_PACKAGES list in planner.py (line 8) define packages as lowercase: "pytorch", "tensorflow", "cuda", etc. All package comparisons in planner.py (lines 29, 32) check against lowercase strings. The lowercasing at line 119 is intentional and necessary—user-facing display strings with proper casing (e.g., "Install PyTorch") are generated separately in the output messages.

nl_parser.py (3)

60-67: LGTM!

The spell correction logic is sound, with a reasonable fuzzy match cutoff of 0.75 to avoid over-correction.


85-97: LGTM!

The semantic matching logic correctly finds the best intent by comparing against all examples. The O(n*m) complexity is acceptable given the small, fixed set of intents and examples.


173-220: LGTM on the overall parsing logic!

The main parsing workflow is well-structured: normalization → spell correction → dual intent detection (rule + semantic) → slot extraction → confidence aggregation → clarification decisions. The cascading intent selection logic (lines 184-195) appropriately prioritizes rule-based matches, validates with semantic agreement, and falls back gracefully to semantic-only or unknown.

Comment on lines +1 to +5
import difflib
import re
from difflib import SequenceMatcher
from typing import Dict, Any, List, Tuple

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add module-level docstring.

Per PEP 257 and coding guidelines, public modules should have a docstring explaining their purpose.

Add a module docstring at the top:

+"""Natural language parser for installation requests.
+
+Provides intent detection, spell correction, semantic matching, and slot extraction
+for parsing user installation requests with confidence scoring and clarification logic.
+"""
+
 import difflib
 import re
 from difflib import SequenceMatcher
🤖 Prompt for AI Agents
In nl_parser.py around lines 1 to 5, there is no module-level docstring; add a
top-of-file docstring (PEP 257) that briefly states the module's purpose (what
the natural-language parser does), summarizes public classes/functions/expected
inputs and outputs, and any important usage notes or references; keep it concise
(one to a few paragraphs) and place it as the first statement in the file.

Comment on lines +1 to +3
import difflib
import re
from difflib import SequenceMatcher
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove redundant import.

difflib is imported on line 1 and SequenceMatcher is imported from difflib on line 3. Since SequenceMatcher is the only item used from difflib, the import on line 1 is redundant.

Apply this diff:

-import difflib
 import re
 from difflib import SequenceMatcher

Then update line 64 to use SequenceMatcher from difflib directly:

-    close = difflib.get_close_matches(token, VOCAB, n=1, cutoff=0.75)
+    from difflib import get_close_matches
+    close = get_close_matches(token, VOCAB, n=1, cutoff=0.75)

Or keep the difflib import and remove the SequenceMatcher import:

 import difflib
 import re
-from difflib import SequenceMatcher

Then update line 82:

-    return SequenceMatcher(None, a, b).ratio()
+    return difflib.SequenceMatcher(None, a, b).ratio()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import difflib
import re
from difflib import SequenceMatcher
import difflib
import re
🤖 Prompt for AI Agents
In nl_parser.py (lines 1-3, and affecting lines 64 and 82): the module currently
imports both difflib and from difflib import SequenceMatcher which is redundant;
remove the top-level "import difflib" (leave "from difflib import
SequenceMatcher"), then update any calls that use difflib.SequenceMatcher on
line 64 and line 82 to call SequenceMatcher(...) directly; ensure no other
difflib.* usages remain or re-add the full difflib import if other functions
from difflib are needed.

Comment on lines +7 to +13
VOCAB = {
"python", "pip", "venv", "virtualenv", "conda", "anaconda",
"docker", "kubernetes", "k8s", "kubectl",
"nginx", "apache", "httpd", "web", "server",
"flask", "django", "tensorflow", "pytorch", "torch",
"install", "setup", "development", "env", "environment",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type annotation for VOCAB constant.

As per coding guidelines, type hints are required. Module-level constants should be annotated.

Apply this diff:

 # Vocabulary for typo correction
-VOCAB = {
+VOCAB: set[str] = {
     "python", "pip", "venv", "virtualenv", "conda", "anaconda",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
VOCAB = {
"python", "pip", "venv", "virtualenv", "conda", "anaconda",
"docker", "kubernetes", "k8s", "kubectl",
"nginx", "apache", "httpd", "web", "server",
"flask", "django", "tensorflow", "pytorch", "torch",
"install", "setup", "development", "env", "environment",
}
VOCAB: set[str] = {
"python", "pip", "venv", "virtualenv", "conda", "anaconda",
"docker", "kubernetes", "k8s", "kubectl",
"nginx", "apache", "httpd", "web", "server",
"flask", "django", "tensorflow", "pytorch", "torch",
"install", "setup", "development", "env", "environment",
}
🤖 Prompt for AI Agents
In nl_parser.py around lines 7 to 13 the module-level VOCAB constant is missing
a type annotation; add an explicit annotation like VOCAB: Set[str] = {...} and
ensure typing.Set is imported (from typing import Set) at the top of the file if
not already present so the constant is properly typed as a set of strings.

Comment on lines +16 to +45
INTENT_EXAMPLES = {
"install_ml": [
"install something for machine learning",
"install pytorch",
"install tensorflow",
"i want to run pytorch",
],
"install_web_server": [
"i need a web server",
"install nginx",
"install apache",
"set up a web server",
],
"setup_python_env": [
"set up python development environment",
"install python 3.10",
"create python venv",
"setup dev env",
],
"install_docker": [
"install docker",
"add docker",
"deploy containers - docker",
],
"install_docker_k8s": [
"install docker and kubernetes",
"docker and k8s",
"k8s and docker on my mac",
],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type annotation for INTENT_EXAMPLES constant.

As per coding guidelines, type hints are required. Module-level constants should be annotated.

Apply this diff:

 # Canonical examples for lightweight semantic matching
-INTENT_EXAMPLES = {
+INTENT_EXAMPLES: Dict[str, List[str]] = {
     "install_ml": [

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
nl_parser.py around lines 16 to 45: INTENT_EXAMPLES is a module-level constant
without a type annotation; add a typing hint to the declaration (e.g., import
Dict and List from typing and annotate as Dict[str, List[str]]), or use
Final[Dict[str, List[str]]] if your project marks constants as Final; update the
import section if needed to include the typing names and keep the existing
dictionary value unchanged.

Comment on lines +48 to +53
def normalize(text: str) -> str:
text = text.lower()
text = text.replace("-", " ")
text = re.sub(r"[^a-z0-9.\s]", " ", text)
text = re.sub(r"\s+", " ", text).strip()
return text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstring to public function.

Per coding guidelines, docstrings are required for all public APIs.

Apply this diff:

 def normalize(text: str) -> str:
+    """Normalize text for parsing: lowercase, remove special chars, collapse whitespace.
+    
+    Preserves dots and digits for version numbers (e.g., 'Python 3.10').
+    """
     text = text.lower()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def normalize(text: str) -> str:
text = text.lower()
text = text.replace("-", " ")
text = re.sub(r"[^a-z0-9.\s]", " ", text)
text = re.sub(r"\s+", " ", text).strip()
return text
def normalize(text: str) -> str:
"""Normalize text for parsing: lowercase, remove special chars, collapse whitespace.
Preserves dots and digits for version numbers (e.g., 'Python 3.10').
"""
text = text.lower()
text = text.replace("-", " ")
text = re.sub(r"[^a-z0-9.\s]", " ", text)
text = re.sub(r"\s+", " ", text).strip()
return text
🤖 Prompt for AI Agents
In nl_parser.py around lines 48 to 53, the public function normalize is missing
a docstring; add a concise triple-quoted docstring immediately below the def
line that summarizes what the function does, documents the text parameter (type:
str) and the return value (type: str), and briefly describes the normalization
steps (lowercasing, hyphen to space, removal of non-alphanumeric/period/space
chars, collapsing whitespace). Follow project docstring style (one-line summary
plus short details or Args/Returns sections).

Comment on lines +129 to +144
def extract_slots(text: str) -> Dict[str, Any]:
slots = {}

v = VERSION_RE.search(text)
if v:
slots["python_version"] = v.group(1)

p = PLATFORM_RE.search(text)
if p:
slots["platform"] = p.group(1)

pkgs = PACKAGE_RE.findall(text)
if pkgs:
slots["packages"] = list(dict.fromkeys(pkgs)) # unique preserve order

return slots
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstring to public function.

Per coding guidelines, docstrings are required for all public APIs.

Apply this diff:

 def extract_slots(text: str) -> Dict[str, Any]:
+    """Extract structured slots from text: python_version, platform, packages.
+    
+    Returns:
+        Dictionary with extracted slots (only present if found in text).
+    """
     slots = {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def extract_slots(text: str) -> Dict[str, Any]:
slots = {}
v = VERSION_RE.search(text)
if v:
slots["python_version"] = v.group(1)
p = PLATFORM_RE.search(text)
if p:
slots["platform"] = p.group(1)
pkgs = PACKAGE_RE.findall(text)
if pkgs:
slots["packages"] = list(dict.fromkeys(pkgs)) # unique preserve order
return slots
def extract_slots(text: str) -> Dict[str, Any]:
"""Extract structured slots from text: python_version, platform, packages.
Returns:
Dictionary with extracted slots (only present if found in text).
"""
slots = {}
v = VERSION_RE.search(text)
if v:
slots["python_version"] = v.group(1)
p = PLATFORM_RE.search(text)
if p:
slots["platform"] = p.group(1)
pkgs = PACKAGE_RE.findall(text)
if pkgs:
slots["packages"] = list(dict.fromkeys(pkgs)) # unique preserve order
return slots
🤖 Prompt for AI Agents
In nl_parser.py around lines 129 to 144, the public function extract_slots is
missing a docstring; add a concise docstring immediately below the def line that
describes the function’s purpose, documents the parameters (text: str), the
return value (Dict[str, Any]) including keys that may appear (python_version,
platform, packages) and their types, and any notable behavior (uses regexes
VERSION_RE, PLATFORM_RE, PACKAGE_RE; packages are de-duplicated preserving
order). Follow the project docstring style (brief summary, param and return
descriptions) and keep it concise.

Comment on lines +147 to +157
def aggregate_confidence(c_rule, c_sem, num_corrections, c_classifier=0.0):
penalty = 1 - (num_corrections * 0.1)
penalty = max(0.0, penalty)

final = (
0.4 * c_rule +
0.4 * c_sem +
0.2 * c_classifier
) * penalty

return round(max(0.0, min(1.0, final)), 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type hints and docstring to public function.

Per coding guidelines, type hints and docstrings are required for all public APIs. The parameters are missing type annotations.

Apply this diff:

-def aggregate_confidence(c_rule, c_sem, num_corrections, c_classifier=0.0):
+def aggregate_confidence(c_rule: float, c_sem: float, num_corrections: int, c_classifier: float = 0.0) -> float:
+    """Aggregate confidence from multiple sources with spell-correction penalty.
+    
+    Args:
+        c_rule: Confidence from rule-based detection (0.0-1.0)
+        c_sem: Confidence from semantic matching (0.0-1.0)
+        num_corrections: Number of spell corrections applied
+        c_classifier: Confidence from classifier (0.0-1.0), default 0.0
+        
+    Returns:
+        Aggregated confidence score (0.0-1.0), penalized by 0.1 per correction.
+    """
     penalty = 1 - (num_corrections * 0.1)
🤖 Prompt for AI Agents
In nl_parser.py around lines 147 to 157, the public function
aggregate_confidence lacks type annotations and a docstring; add type hints
(c_rule: float, c_sem: float, num_corrections: int, c_classifier: float = 0.0)
and a return type of float, and add a concise docstring that explains the
function purpose (combines rule, semantic and classifier confidences with a
penalty based on num_corrections), documents each parameter and the return
value, and notes that the result is clamped to [0.0, 1.0] and rounded to two
decimals.

Comment on lines +160 to +170
def decide_clarifications(intent, confidence):
if intent == "unknown" or confidence < 0.6:
return [
"Install Docker and Kubernetes",
"Set up Python development environment",
"Install a web server (nginx/apache)",
"Install ML libraries (tensorflow/pytorch)",
]
if intent == "setup_python_env" and confidence < 0.75:
return ["Use venv", "Use conda", "Install a specific Python version"]
return []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type hints and docstring to public function.

Per coding guidelines, type hints and docstrings are required for all public APIs.

Apply this diff:

-def decide_clarifications(intent, confidence):
+def decide_clarifications(intent: str, confidence: float) -> List[str]:
+    """Determine if clarification prompts are needed based on intent and confidence.
+    
+    Args:
+        intent: Detected intent string
+        confidence: Confidence score (0.0-1.0)
+        
+    Returns:
+        List of clarification prompt strings (empty if no clarification needed).
+    """
     if intent == "unknown" or confidence < 0.6:

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In nl_parser.py around lines 160 to 170, the public function
decide_clarifications lacks type hints and a docstring; update its signature to
include parameter and return types (intent: str, confidence: float) and a return
type of List[str], add an appropriate docstring describing parameters, return
value and behavior, and ensure typing.List (or from typing import List) is
imported if not already present; keep implementation logic unchanged.

# ----------------------------------------------
# Main request handler
# ----------------------------------------------
def process(self, text: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstrings to public methods.

Per coding guidelines, docstrings are required for all public APIs. The methods process, enhance_intents_with_llm, and suggest_optimizations lack docstrings.

     def process(self, text: str):
+        """
+        Process user text to detect intents, optionally enhance with LLM, and build a plan.
+        
+        Returns a dict with 'plan', 'intents', 'suggestions', and 'gpu' keys,
+        or 'clarification_needed' if clarification is required.
+        """
         # 1. Rule-based intent detection

Also applies to: 94-94, 142-142

🤖 Prompt for AI Agents
In src/intent/llm_agent.py around lines 46, 94 and 142, the public methods
process, enhance_intents_with_llm, and suggest_optimizations are missing
docstrings; add concise Google- or NumPy-style docstrings to each method that
briefly describe purpose, list parameters with types (e.g., text: str or
intents: List[...]), describe the return value and its type, and note any
exceptions raised or important side effects; keep them short (one summary line +
parameters/returns) to comply with project docstring guidelines.

("deploy containers - docker", "install_docker"),
("k8s and docker on my mac", "install_docker_k8s"),
("i want to run pytorch", "install_ml"),
("setup dev env", "ambiguous"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find nl_parser.py and examine intent handling
find . -type f -name "nl_parser.py" | head -20

Repository: cortexlinux/cortex

Length of output: 76


🏁 Script executed:

#!/bin/bash
# Search for ambiguous/unknown intent patterns
rg -n "ambiguous|unknown" --type py | head -30

Repository: cortexlinux/cortex

Length of output: 2238


🏁 Script executed:

#!/bin/bash
# Get the full test file to understand context
wc -l tests/test_nl_parser.py

Repository: cortexlinux/cortex

Length of output: 88


🏁 Script executed:

#!/bin/bash
# Read nl_parser.py around the critical lines
sed -n '180,200p' nl_parser.py

Repository: cortexlinux/cortex

Length of output: 669


🏁 Script executed:

#!/bin/bash
# Read the complete test file
cat -n tests/test_nl_parser.py

Repository: cortexlinux/cortex

Length of output: 1808


🏁 Script executed:

#!/bin/bash
# Check if "setup dev env" would trigger clarifications
# Look for clarifications handling in nl_parser.py
rg -n "clarifications" nl_parser.py -A 3 -B 3

Repository: cortexlinux/cortex

Length of output: 687


🏁 Script executed:

#!/bin/bash
# Check the return statement structure in parse_request
rg -n "return {" nl_parser.py -A 10 | head -40

Repository: cortexlinux/cortex

Length of output: 141


🏁 Script executed:

#!/bin/bash
# Let me check the decide_clarifications function more carefully
sed -n '160,175p' nl_parser.py

Repository: cortexlinux/cortex

Length of output: 665


Correct the expected intent value for ambiguous cases from "ambiguous" to "unknown".

Line 14 expects "ambiguous", but parse_request never returns this value. When confidence is low or intent is unclear (e.g., "setup dev env"), the function returns "unknown" instead (see nl_parser.py line 194). The test currently passes because it checks for clarifications rather than intent equality when expected is "ambiguous" (line 23), masking the incorrect expectation. Change line 14 to:

("setup dev env", "unknown"),

The clarifications assertion at line 24 will still work correctly since decide_clarifications populates clarifications for unknown intents (line 161 in nl_parser.py).

🤖 Prompt for AI Agents
In tests/test_nl_parser.py around line 14, the expected intent for the input
"setup dev env" is incorrectly set to "ambiguous"; update it to "unknown"
because parse_request never returns "ambiguous" (it uses "unknown" for
low-confidence/unclear intents). Replace the tuple ("setup dev env",
"ambiguous") with ("setup dev env", "unknown") so the test aligns with
nl_parser.py behavior; leave the clarifications assertions unchanged since
decide_clarifications still populates clarifications for unknown intents.

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Natural Language Install: Polish and Edge Cases

1 participant