Skip to content

Conversation

@david-cortes-intel
Copy link
Contributor

Description

KNN algorithms on sklearn support data with missing values, but sklearnex makes a check for 'all finite'. This PR makes it fall back to sklearn when receiving data that has missing values.

Disclaimer: I'm not 100% sure about whether oneDAL supports KNN with missing values or not.


Checklist:

Completeness and readability

  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

/azp run Nightly

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

/azp run Nightly

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
azure 80.42% <ø> (-0.01%) ⬇️
github 82.02% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sklearnex/neighbors/common.py 92.44% <ø> (-0.05%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@icfaust
Copy link
Contributor

icfaust commented Dec 4, 2025

scikit-learn/scikit-learn#25319 @david-cortes-intel . Can you detail the missing value use case with an example? As of now we support these metrics: https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/sklearnex/neighbors/common.py#L221 which does not include nan_euclidean which would be the use case of this missing value support.

@david-cortes-intel
Copy link
Contributor Author

scikit-learn/scikit-learn#25319 @david-cortes-intel . Can you detail the missing value use case with an example? As of now we support these metrics: https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/sklearnex/neighbors/common.py#L221 which does not include nan_euclidean which would be the use case of this missing value support.

It still fails this test:
https://github.com/scikit-learn/scikit-learn/blob/4a10d0ed8d85e6ed24a647bd28a65c0c64b101ef/sklearn/neighbors/tests/test_neighbors.py#L2356

@icfaust
Copy link
Contributor

icfaust commented Dec 4, 2025

@david-cortes-intel then I think there is a bug in the metrics check. I think that should be the avenue taken to fix the issue.

@icfaust
Copy link
Contributor

icfaust commented Dec 4, 2025

Just wanted to note that I know that its a bit convoluted and annoying on the neighbors side of things here, I just want to make sure that if we can solve it without a separate finite check, then we should do that.

@david-cortes-intel
Copy link
Contributor Author

@david-cortes-intel then I think there is a bug in the metrics check. I think that should be the avenue taken to fix the issue.

Switched it to use the metric checks instead.

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sklearn-patch sklearn patching

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants