Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ MANIFEST
# ==============================
# PyInstaller
# ==============================
# Usually contains temporary files from pyinstaller builds
*.manifest
*.spec

Expand Down
222 changes: 222 additions & 0 deletions nl_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
import difflib
import re
from difflib import SequenceMatcher
Comment on lines +1 to +3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove redundant import.

difflib is imported on line 1 and SequenceMatcher is imported from difflib on line 3. Since SequenceMatcher is the only item used from difflib, the import on line 1 is redundant.

Apply this diff:

-import difflib
 import re
 from difflib import SequenceMatcher

Then update line 64 to use SequenceMatcher from difflib directly:

-    close = difflib.get_close_matches(token, VOCAB, n=1, cutoff=0.75)
+    from difflib import get_close_matches
+    close = get_close_matches(token, VOCAB, n=1, cutoff=0.75)

Or keep the difflib import and remove the SequenceMatcher import:

 import difflib
 import re
-from difflib import SequenceMatcher

Then update line 82:

-    return SequenceMatcher(None, a, b).ratio()
+    return difflib.SequenceMatcher(None, a, b).ratio()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import difflib
import re
from difflib import SequenceMatcher
import difflib
import re
🤖 Prompt for AI Agents
In nl_parser.py (lines 1-3, and affecting lines 64 and 82): the module currently
imports both difflib and from difflib import SequenceMatcher which is redundant;
remove the top-level "import difflib" (leave "from difflib import
SequenceMatcher"), then update any calls that use difflib.SequenceMatcher on
line 64 and line 82 to call SequenceMatcher(...) directly; ensure no other
difflib.* usages remain or re-add the full difflib import if other functions
from difflib are needed.

from typing import Dict, Any, List, Tuple

Comment on lines +1 to +5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add module-level docstring.

Per PEP 257 and coding guidelines, public modules should have a docstring explaining their purpose.

Add a module docstring at the top:

+"""Natural language parser for installation requests.
+
+Provides intent detection, spell correction, semantic matching, and slot extraction
+for parsing user installation requests with confidence scoring and clarification logic.
+"""
+
 import difflib
 import re
 from difflib import SequenceMatcher
🤖 Prompt for AI Agents
In nl_parser.py around lines 1 to 5, there is no module-level docstring; add a
top-of-file docstring (PEP 257) that briefly states the module's purpose (what
the natural-language parser does), summarizes public classes/functions/expected
inputs and outputs, and any important usage notes or references; keep it concise
(one to a few paragraphs) and place it as the first statement in the file.

# Vocabulary for typo correction
VOCAB = {
"python", "pip", "venv", "virtualenv", "conda", "anaconda",
"docker", "kubernetes", "k8s", "kubectl",
"nginx", "apache", "httpd", "web", "server",
"flask", "django", "tensorflow", "pytorch", "torch",
"install", "setup", "development", "env", "environment",
}
Comment on lines +7 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type annotation for VOCAB constant.

As per coding guidelines, type hints are required. Module-level constants should be annotated.

Apply this diff:

 # Vocabulary for typo correction
-VOCAB = {
+VOCAB: set[str] = {
     "python", "pip", "venv", "virtualenv", "conda", "anaconda",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
VOCAB = {
"python", "pip", "venv", "virtualenv", "conda", "anaconda",
"docker", "kubernetes", "k8s", "kubectl",
"nginx", "apache", "httpd", "web", "server",
"flask", "django", "tensorflow", "pytorch", "torch",
"install", "setup", "development", "env", "environment",
}
VOCAB: set[str] = {
"python", "pip", "venv", "virtualenv", "conda", "anaconda",
"docker", "kubernetes", "k8s", "kubectl",
"nginx", "apache", "httpd", "web", "server",
"flask", "django", "tensorflow", "pytorch", "torch",
"install", "setup", "development", "env", "environment",
}
🤖 Prompt for AI Agents
In nl_parser.py around lines 7 to 13 the module-level VOCAB constant is missing
a type annotation; add an explicit annotation like VOCAB: Set[str] = {...} and
ensure typing.Set is imported (from typing import Set) at the top of the file if
not already present so the constant is properly typed as a set of strings.


# Canonical examples for lightweight semantic matching
INTENT_EXAMPLES = {
"install_ml": [
"install something for machine learning",
"install pytorch",
"install tensorflow",
"i want to run pytorch",
],
"install_web_server": [
"i need a web server",
"install nginx",
"install apache",
"set up a web server",
],
"setup_python_env": [
"set up python development environment",
"install python 3.10",
"create python venv",
"setup dev env",
],
"install_docker": [
"install docker",
"add docker",
"deploy containers - docker",
],
"install_docker_k8s": [
"install docker and kubernetes",
"docker and k8s",
"k8s and docker on my mac",
],
}
Comment on lines +16 to +45
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type annotation for INTENT_EXAMPLES constant.

As per coding guidelines, type hints are required. Module-level constants should be annotated.

Apply this diff:

 # Canonical examples for lightweight semantic matching
-INTENT_EXAMPLES = {
+INTENT_EXAMPLES: Dict[str, List[str]] = {
     "install_ml": [

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
nl_parser.py around lines 16 to 45: INTENT_EXAMPLES is a module-level constant
without a type annotation; add a typing hint to the declaration (e.g., import
Dict and List from typing and annotate as Dict[str, List[str]]), or use
Final[Dict[str, List[str]]] if your project marks constants as Final; update the
import section if needed to include the typing names and keep the existing
dictionary value unchanged.



def normalize(text: str) -> str:
text = text.lower()
text = text.replace("-", " ")
text = re.sub(r"[^a-z0-9.\s]", " ", text)
text = re.sub(r"\s+", " ", text).strip()
return text
Comment on lines +48 to +53
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstring to public function.

Per coding guidelines, docstrings are required for all public APIs.

Apply this diff:

 def normalize(text: str) -> str:
+    """Normalize text for parsing: lowercase, remove special chars, collapse whitespace.
+    
+    Preserves dots and digits for version numbers (e.g., 'Python 3.10').
+    """
     text = text.lower()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def normalize(text: str) -> str:
text = text.lower()
text = text.replace("-", " ")
text = re.sub(r"[^a-z0-9.\s]", " ", text)
text = re.sub(r"\s+", " ", text).strip()
return text
def normalize(text: str) -> str:
"""Normalize text for parsing: lowercase, remove special chars, collapse whitespace.
Preserves dots and digits for version numbers (e.g., 'Python 3.10').
"""
text = text.lower()
text = text.replace("-", " ")
text = re.sub(r"[^a-z0-9.\s]", " ", text)
text = re.sub(r"\s+", " ", text).strip()
return text
🤖 Prompt for AI Agents
In nl_parser.py around lines 48 to 53, the public function normalize is missing
a docstring; add a concise triple-quoted docstring immediately below the def
line that summarizes what the function does, documents the text parameter (type:
str) and the return value (type: str), and briefly describes the normalization
steps (lowercasing, hyphen to space, removal of non-alphanumeric/period/space
chars, collapsing whitespace). Follow project docstring style (one-line summary
plus short details or Args/Returns sections).



def tokenize(text: str) -> List[str]:
return text.split()
Comment on lines +56 to +57
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstring to public function.

Per coding guidelines, docstrings are required for all public APIs.

Apply this diff:

 def tokenize(text: str) -> List[str]:
+    """Split text into tokens by whitespace."""
     return text.split()
🤖 Prompt for AI Agents
In nl_parser.py around lines 56 to 57, the public function tokenize lacks a
docstring; add a concise one-line docstring (or short multi-line if needed)
explaining the function purpose, parameters, and return type (e.g., "Split input
text into tokens and return a list of token strings."), include type/param and
return descriptions consistent with project docstring style, and place it
immediately below the def tokenize(...) line.



def spell_correct_token(token: str) -> Tuple[str, bool]:
"""Return corrected_token, was_corrected"""
if token in VOCAB:
return token, False
close = difflib.get_close_matches(token, VOCAB, n=1, cutoff=0.75)
if close:
return close[0], True
return token, False


def apply_spell_correction(tokens: List[str]) -> Tuple[List[str], List[Tuple[str, str]]]:
corrections = []
new_tokens = []
for t in tokens:
new, fixed = spell_correct_token(t)
if fixed:
corrections.append((t, new))
new_tokens.append(new)
return new_tokens, corrections
Comment on lines +70 to +78
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstring to public function.

Per coding guidelines, docstrings are required for all public APIs.

Apply this diff:

 def apply_spell_correction(tokens: List[str]) -> Tuple[List[str], List[Tuple[str, str]]]:
+    """Apply spell correction to all tokens.
+    
+    Returns:
+        Tuple of (corrected_tokens, corrections) where corrections is a list of (original, corrected) pairs.
+    """
     corrections = []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def apply_spell_correction(tokens: List[str]) -> Tuple[List[str], List[Tuple[str, str]]]:
corrections = []
new_tokens = []
for t in tokens:
new, fixed = spell_correct_token(t)
if fixed:
corrections.append((t, new))
new_tokens.append(new)
return new_tokens, corrections
def apply_spell_correction(tokens: List[str]) -> Tuple[List[str], List[Tuple[str, str]]]:
"""Apply spell correction to all tokens.
Returns:
Tuple of (corrected_tokens, corrections) where corrections is a list of (original, corrected) pairs.
"""
corrections = []
new_tokens = []
for t in tokens:
new, fixed = spell_correct_token(t)
if fixed:
corrections.append((t, new))
new_tokens.append(new)
return new_tokens, corrections
🤖 Prompt for AI Agents
In nl_parser.py around lines 70 to 78, the public function
apply_spell_correction is missing a docstring; add a concise triple-quoted
docstring immediately below the def that documents the parameters (tokens:
List[str]), the return value (Tuple[List[str], List[Tuple[str, str]]]) and
briefly explains that it returns the corrected token list and a list of
(original, corrected) pairs for any tokens that were changed; keep the wording
short and use the project's docstring style.



def fuzzy_phrase_score(a: str, b: str) -> float:
return SequenceMatcher(None, a, b).ratio()
Comment on lines +81 to +82
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstring to public function.

Per coding guidelines, docstrings are required for all public APIs.

Apply this diff:

 def fuzzy_phrase_score(a: str, b: str) -> float:
+    """Compute similarity ratio between two strings (0.0 to 1.0)."""
     return SequenceMatcher(None, a, b).ratio()
🤖 Prompt for AI Agents
In nl_parser.py around lines 81 to 82, the public function fuzzy_phrase_score
lacks a docstring; add a concise triple-quoted docstring immediately below the
def that describes the function purpose, documents parameters (a: str, b: str)
and their expected values, states the return type (float) and meaning
(similarity ratio between 0 and 1), and optionally notes any behavior/usage
(uses SequenceMatcher). Keep it brief and follow project docstring style.



def semantic_intent_score(text: str) -> Tuple[str, float]:
"""Compare text with intent examples."""
best_intent = "unknown"
best_score = 0.0

for intent, examples in INTENT_EXAMPLES.items():
for ex in examples:
score = fuzzy_phrase_score(text, ex)
if score > best_score:
best_score = score
best_intent = intent

return best_intent, best_score


def rule_intent(text: str) -> Tuple[str, float]:
"""Simple keyword/rule-based detection."""
t = text

if "docker" in t:
if "kubernetes" in t or "k8s" in t or "kubectl" in t:
return "install_docker_k8s", 0.95
return "install_docker", 0.9

if "kubernetes" in t or "k8s" in t or "kubectl" in t:
return "install_docker_k8s", 0.9
Comment on lines +109 to +110
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n -C5 'install_docker_k8s|install_kubernetes' --type=py

Repository: cortexlinux/cortex

Length of output: 3094


🏁 Script executed:

# Search for all intents defined in the code
rg -n '"install_' --type=py -A 1 -B 1 | head -100

# Check for any comments explaining the design rationale
rg -n 'kubernetes|k8s' --type=py -B 2 -A 2 | grep -i 'comment\|#\|reason\|design' | head -30

Repository: cortexlinux/cortex

Length of output: 2649


Add test coverage and clarify kubernetes-only intent handling.

Lines 109–110 return install_docker_k8s when the user mentions only kubernetes/k8s keywords (without docker). However:

  • The INTENT_EXAMPLES for install_docker_k8s (lines 40–44) only show examples with both docker AND kubernetes.
  • No test case covers kubernetes-only input (e.g., "install kubernetes").
  • No separate install_kubernetes intent exists.

Either: (1) add an install_kubernetes intent with corresponding examples and test cases, or (2) if kubernetes always requires docker in the cortex install flow, add a test case for kubernetes-only input and a comment explaining this design choice.


if "nginx" in t or "apache" in t or "httpd" in t or "web server" in t:
return "install_web_server", 0.9

if "python" in t or "venv" in t or "conda" in t or "anaconda" in t:
return "setup_python_env", 0.9

if any(word in t for word in ("tensorflow", "pytorch", "torch", "machine learning", "ml")):
return "install_ml", 0.9

return "unknown", 0.0


VERSION_RE = re.compile(r"python\s*([0-9]+(?:\.[0-9]+)?)")

Check warning on line 124 in nl_parser.py

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Use concise character class syntax '\d' instead of '[0-9]'.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZsOK8Kz3zSehC9xcj3a&open=AZsOK8Kz3zSehC9xcj3a&pullRequest=293

Check warning on line 124 in nl_parser.py

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Use concise character class syntax '\d' instead of '[0-9]'.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZsOK8Kz3zSehC9xcj3b&open=AZsOK8Kz3zSehC9xcj3b&pullRequest=293
PLATFORM_RE = re.compile(r"\b(mac|macos|windows|linux|ubuntu|debian)\b")
PACKAGE_RE = re.compile(r"\b(nginx|apache|docker|kubernetes|k8s|kubectl|python|pip|venv|conda|tensorflow|pytorch)\b")
Comment on lines +124 to +126
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type annotations for regex constants.

As per coding guidelines, type hints are required for module-level constants.

Apply this diff:

+import re
+from typing import Pattern
+
-VERSION_RE = re.compile(r"python\s*([0-9]+(?:\.[0-9]+)?)")
-PLATFORM_RE = re.compile(r"\b(mac|macos|windows|linux|ubuntu|debian)\b")
-PACKAGE_RE = re.compile(r"\b(nginx|apache|docker|kubernetes|k8s|kubectl|python|pip|venv|conda|tensorflow|pytorch)\b")
+VERSION_RE: Pattern[str] = re.compile(r"python\s*([0-9]+(?:\.[0-9]+)?)")
+PLATFORM_RE: Pattern[str] = re.compile(r"\b(mac|macos|windows|linux|ubuntu|debian)\b")
+PACKAGE_RE: Pattern[str] = re.compile(r"\b(nginx|apache|docker|kubernetes|k8s|kubectl|python|pip|venv|conda|tensorflow|pytorch)\b")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
VERSION_RE = re.compile(r"python\s*([0-9]+(?:\.[0-9]+)?)")
PLATFORM_RE = re.compile(r"\b(mac|macos|windows|linux|ubuntu|debian)\b")
PACKAGE_RE = re.compile(r"\b(nginx|apache|docker|kubernetes|k8s|kubectl|python|pip|venv|conda|tensorflow|pytorch)\b")
VERSION_RE: re.Pattern[str] = re.compile(r"python\s*([0-9]+(?:\.[0-9]+)?)")
PLATFORM_RE: re.Pattern[str] = re.compile(r"\b(mac|macos|windows|linux|ubuntu|debian)\b")
PACKAGE_RE: re.Pattern[str] = re.compile(r"\b(nginx|apache|docker|kubernetes|k8s|kubectl|python|pip|venv|conda|tensorflow|pytorch)\b")
🤖 Prompt for AI Agents
In nl_parser.py around lines 124 to 126, the regex constant declarations lack
type annotations; add explicit type hints (e.g. Pattern[str] or re.Pattern[str])
for VERSION_RE, PLATFORM_RE, and PACKAGE_RE and ensure the appropriate import
(from typing import Pattern or use re.Pattern depending on project conventions)
is present at the top of the module so each constant is declared with its regex
type.



def extract_slots(text: str) -> Dict[str, Any]:
slots = {}

v = VERSION_RE.search(text)
if v:
slots["python_version"] = v.group(1)

p = PLATFORM_RE.search(text)
if p:
slots["platform"] = p.group(1)

pkgs = PACKAGE_RE.findall(text)
if pkgs:
slots["packages"] = list(dict.fromkeys(pkgs)) # unique preserve order

return slots
Comment on lines +129 to +144
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstring to public function.

Per coding guidelines, docstrings are required for all public APIs.

Apply this diff:

 def extract_slots(text: str) -> Dict[str, Any]:
+    """Extract structured slots from text: python_version, platform, packages.
+    
+    Returns:
+        Dictionary with extracted slots (only present if found in text).
+    """
     slots = {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def extract_slots(text: str) -> Dict[str, Any]:
slots = {}
v = VERSION_RE.search(text)
if v:
slots["python_version"] = v.group(1)
p = PLATFORM_RE.search(text)
if p:
slots["platform"] = p.group(1)
pkgs = PACKAGE_RE.findall(text)
if pkgs:
slots["packages"] = list(dict.fromkeys(pkgs)) # unique preserve order
return slots
def extract_slots(text: str) -> Dict[str, Any]:
"""Extract structured slots from text: python_version, platform, packages.
Returns:
Dictionary with extracted slots (only present if found in text).
"""
slots = {}
v = VERSION_RE.search(text)
if v:
slots["python_version"] = v.group(1)
p = PLATFORM_RE.search(text)
if p:
slots["platform"] = p.group(1)
pkgs = PACKAGE_RE.findall(text)
if pkgs:
slots["packages"] = list(dict.fromkeys(pkgs)) # unique preserve order
return slots
🤖 Prompt for AI Agents
In nl_parser.py around lines 129 to 144, the public function extract_slots is
missing a docstring; add a concise docstring immediately below the def line that
describes the function’s purpose, documents the parameters (text: str), the
return value (Dict[str, Any]) including keys that may appear (python_version,
platform, packages) and their types, and any notable behavior (uses regexes
VERSION_RE, PLATFORM_RE, PACKAGE_RE; packages are de-duplicated preserving
order). Follow the project docstring style (brief summary, param and return
descriptions) and keep it concise.



def aggregate_confidence(c_rule, c_sem, num_corrections, c_classifier=0.0):
penalty = 1 - (num_corrections * 0.1)
penalty = max(0.0, penalty)

final = (
0.4 * c_rule +
0.4 * c_sem +
0.2 * c_classifier
) * penalty

return round(max(0.0, min(1.0, final)), 2)
Comment on lines +147 to +157
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type hints and docstring to public function.

Per coding guidelines, type hints and docstrings are required for all public APIs. The parameters are missing type annotations.

Apply this diff:

-def aggregate_confidence(c_rule, c_sem, num_corrections, c_classifier=0.0):
+def aggregate_confidence(c_rule: float, c_sem: float, num_corrections: int, c_classifier: float = 0.0) -> float:
+    """Aggregate confidence from multiple sources with spell-correction penalty.
+    
+    Args:
+        c_rule: Confidence from rule-based detection (0.0-1.0)
+        c_sem: Confidence from semantic matching (0.0-1.0)
+        num_corrections: Number of spell corrections applied
+        c_classifier: Confidence from classifier (0.0-1.0), default 0.0
+        
+    Returns:
+        Aggregated confidence score (0.0-1.0), penalized by 0.1 per correction.
+    """
     penalty = 1 - (num_corrections * 0.1)
🤖 Prompt for AI Agents
In nl_parser.py around lines 147 to 157, the public function
aggregate_confidence lacks type annotations and a docstring; add type hints
(c_rule: float, c_sem: float, num_corrections: int, c_classifier: float = 0.0)
and a return type of float, and add a concise docstring that explains the
function purpose (combines rule, semantic and classifier confidences with a
penalty based on num_corrections), documents each parameter and the return
value, and notes that the result is clamped to [0.0, 1.0] and rounded to two
decimals.



def decide_clarifications(intent, confidence):
if intent == "unknown" or confidence < 0.6:
return [
"Install Docker and Kubernetes",
"Set up Python development environment",
"Install a web server (nginx/apache)",
"Install ML libraries (tensorflow/pytorch)",
]
if intent == "setup_python_env" and confidence < 0.75:
return ["Use venv", "Use conda", "Install a specific Python version"]
return []
Comment on lines +160 to +170
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type hints and docstring to public function.

Per coding guidelines, type hints and docstrings are required for all public APIs.

Apply this diff:

-def decide_clarifications(intent, confidence):
+def decide_clarifications(intent: str, confidence: float) -> List[str]:
+    """Determine if clarification prompts are needed based on intent and confidence.
+    
+    Args:
+        intent: Detected intent string
+        confidence: Confidence score (0.0-1.0)
+        
+    Returns:
+        List of clarification prompt strings (empty if no clarification needed).
+    """
     if intent == "unknown" or confidence < 0.6:

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In nl_parser.py around lines 160 to 170, the public function
decide_clarifications lacks type hints and a docstring; update its signature to
include parameter and return types (intent: str, confidence: float) and a return
type of List[str], add an appropriate docstring describing parameters, return
value and behavior, and ensure typing.List (or from typing import List) is
imported if not already present; keep implementation logic unchanged.



def parse_request(text: str) -> Dict[str, Any]:
"""Main function used by tests and demo."""
norm = normalize(text)
tokens = tokenize(norm)

tokens_corr, corrections = apply_spell_correction(tokens)
corrected_text = " ".join(tokens_corr)

rule_int, c_rule = rule_intent(corrected_text)
sem_int, c_sem = semantic_intent_score(corrected_text)

if rule_int != "unknown" and rule_int == sem_int:
chosen_intent = rule_int
c_classifier = 0.95
elif rule_int != "unknown":
chosen_intent = rule_int
c_classifier = 0.0
elif c_sem > 0.6:
chosen_intent = sem_int
c_classifier = 0.0
else:
chosen_intent = "unknown"
c_classifier = 0.0

slots = extract_slots(corrected_text)

confidence = aggregate_confidence(
c_rule, c_sem, len(corrections), c_classifier
)

clarifications = decide_clarifications(chosen_intent, confidence)

explanation = []
if corrections:
explanation.append(
"corrected: " + ", ".join(f"{a}->{b}" for a, b in corrections)
)
explanation.append(f"rule_intent={rule_int} ({c_rule:.2f})")
explanation.append(f"semantic_match={sem_int} ({c_sem:.2f})")

return {
"intent": chosen_intent,
"confidence": confidence,
"explanation": "; ".join(explanation),
"slots": slots,
"corrections": corrections,
"clarifications": clarifications,
}


1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ pyyaml>=6.0.0

# Type hints for older Python versions
typing-extensions>=4.0.0
PyYAML==6.0.3
Empty file added src/intent/__init__.py
Empty file.
33 changes: 33 additions & 0 deletions src/intent/clarifier.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# clarifier.py

from typing import List, Optional
from intent.detector import Intent

class Clarifier:
"""
Checks if the detected intents have missing information.
Returns a clarifying question if needed.
"""

def needs_clarification(self, intents: List[Intent], text: str) -> Optional[str]:
text = text.lower()

# 1. If user mentions "gpu" but has not specified which GPU → ask
if "gpu" in text and not any(i.target in ["cuda", "pytorch", "tensorflow"] for i in intents):
return "Do you have an NVIDIA GPU? (Needed for CUDA/PyTorch/TensorFlow installation)"

# 2. If user says "machine learning tools" but nothing specific
generic_terms = ["ml", "machine learning", "deep learning", "ai tools"]
if any(term in text for term in generic_terms) and len(intents) == 0:
return "Which ML frameworks do you need? (PyTorch, TensorFlow, JupyterLab...)"

# 3. If user asks to install CUDA but no GPU exists in context
if any(i.target == "cuda" for i in intents) and "gpu" not in text:
return "Installing CUDA requires an NVIDIA GPU. Do you have one?"

# 4. If package versions are missing (later we can add real version logic)
if "torch" in text and "version" not in text:
return "Do you need the GPU version or CPU version of PyTorch?"

# 5. Otherwise no clarification needed
return None
69 changes: 69 additions & 0 deletions src/intent/context.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# context.py

from typing import List, Optional
from intent.detector import Intent

class SessionContext:
"""
Stores context from previous user interactions.
This is needed for Issue #53:
'Uses context from previous commands'
"""

def __init__(self):
self.detected_gpu: Optional[str] = None
self.previous_intents: List[Intent] = []
self.installed_packages: List[str] = []
self.clarifications: List[str] = []

# -------------------
# GPU CONTEXT
# -------------------

def set_gpu(self, gpu_name: str):
self.detected_gpu = gpu_name

def get_gpu(self) -> Optional[str]:
return self.detected_gpu

# -------------------
# INTENT CONTEXT
# -------------------

def add_intents(self, intents: List[Intent]):
self.previous_intents.extend(intents)

def get_previous_intents(self) -> List[Intent]:
return self.previous_intents

# -------------------
# INSTALLED PACKAGES
# -------------------

def add_installed(self, pkg: str):
if pkg not in self.installed_packages:
self.installed_packages.append(pkg)

def is_installed(self, pkg: str) -> bool:
return pkg in self.installed_packages

# -------------------
# CLARIFICATIONS
# -------------------

def add_clarification(self, question: str):
self.clarifications.append(question)

def get_clarifications(self) -> List[str]:
return self.clarifications

# -------------------
# RESET CONTEXT
# -------------------

def reset(self):
"""Reset context (new session)"""
self.detected_gpu = None
self.previous_intents = []
self.installed_packages = []
self.clarifications = []
49 changes: 49 additions & 0 deletions src/intent/detector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# detector.py

from dataclasses import dataclass
from typing import List, Optional, ClassVar

@dataclass
class Intent:
action: str
target: str
details: Optional[dict] = None

class IntentDetector:
"""
Extracts high-level installation intents from natural language requests.
"""

COMMON_PACKAGES: ClassVar[dict[str, List[str]]] = {
"cuda": ["cuda", "nvidia toolkit"],
"pytorch": ["pytorch", "torch"],
"tensorflow": ["tensorflow", "tf"],
"jupyter": ["jupyter", "jupyterlab", "notebook"],
"cudnn": ["cudnn"],
"gpu": ["gpu", "graphics card", "rtx", "nvidia"]
}

def detect(self, text: str) -> List[Intent]:
text = text.lower()
intents = []

# 1. Rule-based keyword detection (skip GPU to avoid duplicate install intent)
for pkg, keywords in self.COMMON_PACKAGES.items():
if pkg == "gpu":
continue # GPU handled separately below
if any(k in text for k in keywords):
intents.append(Intent(action="install", target=pkg))

# 2. Look for verify steps
if "verify" in text or "check" in text:
intents.append(Intent(action="verify", target="installation"))

# 3. GPU configure intent (use all GPU synonyms)
gpu_keywords = self.COMMON_PACKAGES.get("gpu", ["gpu"])
if any(k in text for k in gpu_keywords) and not any(
i.action == "configure" and i.target == "gpu"
for i in intents
):
intents.append(Intent(action="configure", target="gpu"))

return intents
Loading