Off-by-one-Error in `train_two_layer` implementation

Hi,

I encountered an assert issue when training `RMI` on very small (100 items) datasets for testing purposes:

```
thread '<unnamed>' panicked at 'start index was 100 but end index was 100', [...]/RMI/rmi_lib/src/train/two_layer.rs:27:5
```

Upon closer investigation, I think I have found an off-by-one error in the `train_two_layer` function implementation. I did not take a look at the context beyond the function, therefore take what I am about to say with a grain of salt:

---

[Link to source file](https://github.com/learnedsystems/RMI/blob/master/rmi_lib/src/train/two_layer.rs)

1. The value of `split_idx` is calculated here:

https://github.com/learnedsystems/RMI/blob/5fdff45d0929beaccf6bc56f8f4c0d82baf10304/rmi_lib/src/train/two_layer.rs#L132

2. `split_idx` should be in the interval `[0, md_container.len())`

https://github.com/learnedsystems/RMI/blob/5fdff45d0929beaccf6bc56f8f4c0d82baf10304/rmi_lib/src/train/two_layer.rs#L139

**Now lets look at the case where `split_idx == md_container.len() - 1`, which is valid per [2.]:**

3. The else branch is taken, since `split_idx < md_container.len()`

https://github.com/learnedsystems/RMI/blob/5fdff45d0929beaccf6bc56f8f4c0d82baf10304/rmi_lib/src/train/two_layer.rs#L147

4. `split_idx + 1` (`== md_container.len()`) is passed to `build_models_from` as `start_idx`

```rust
|| build_models_from(&md_container, &top_model, layer2_model,
                                   split_idx + 1, md_container.len(),
                                   split_idx_target,
                                   second_half_models)
```

```rust
fn build_models_from<T: TrainingKey>(data: &RMITrainingData<T>,
                                    top_model: &Box<dyn Model>,
                                    model_type: &str,
                                    start_idx: usize, end_idx: usize,
                                    first_model_idx: usize,
                                    num_models: usize) -> Vec<Box<dyn Model>>
```

[Link 4.1](https://github.com/learnedsystems/RMI/blob/5fdff45d0929beaccf6bc56f8f4c0d82baf10304/rmi_lib/src/train/two_layer.rs#L166:L169)

[Link 4.2](https://github.com/learnedsystems/RMI/blob/master/rmi_lib/src/train/two_layer.rs#L20:L25)

5. Assert fails, since `md_container.len() > md_container.len()` is false

```rust    
assert!(end_idx > start_idx,
        "start index was {} but end index was {}",
        start_idx, end_idx
);
```

[Link 5](https://github.com/learnedsystems/RMI/blob/master/rmi_lib/src/train/two_layer.rs#L27:L29)

---

An obvious fix would be to change the condition in [3.] to `split_idx >= md_container.len() - 1`, however I am not entirely certain whether that leads to issues in other contexts. I guess a similar issue would happen if `similar_idx == 0`, only for [the first call](https://github.com/learnedsystems/RMI/blob/5fdff45d0929beaccf6bc56f8f4c0d82baf10304/rmi_lib/src/train/two_layer.rs#L162). I changed the condition in my local version and re-ran the tests - it seems to work just fine:

```
Running test cache_fix_osm
Test cache_fix_osm finished.
Running test cache_fix_wiki
Test cache_fix_wiki finished.
Running test max_size_wiki
Test max_size_wiki finished.
Running test radix_model_wiki
Test radix_model_wiki finished.
Running test simple_model_osm
Test simple_model_osm finished.
Running test simple_model_wiki
Test simple_model_wiki finished.
============== TEST RESULTS ===============
python3 report.py
PASS cache_fix_osm
PASS cache_fix_wiki
PASS max_size_wiki
PASS radix_model_wiki
PASS simple_model_osm
PASS simple_model_wiki
```

I can open a pull request with that fix if you would like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Off-by-one-Error in `train_two_layer` implementation #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Off-by-one-Error in train_two_layer implementation #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Off-by-one-Error in `train_two_layer` implementation #8