Skip to content

Commit 6cf4919

Browse files
committed
Add documentation for README
1 parent 46eb210 commit 6cf4919

File tree

1 file changed

+109
-3
lines changed

1 file changed

+109
-3
lines changed

README.md

Lines changed: 109 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,115 @@
11
# difflib-parser
22

3-
Parser for Python's `difflib`. Built on top of https://github.com/yebrahim/difflibparser/blob/master/difflibparser.py
3+
Parser for Python's `difflib` output.
44

5-
Key changes made to the above library:
5+
Built on top of <https://github.com/yebrahim/difflibparser/blob/master/difflibparser.py>
6+
7+
Key changes from above library:
68

79
1. Using generator pattern instead of using iterator pattern when iterating over diffs
8-
2. Using more `@dataclass` over generic dictionaries to enforce strict typing
10+
2. Using `@dataclass` over generic dictionaries to enforce strict typing
911
3. Using type annotations for strict typing
12+
13+
## Getting started
14+
15+
```sh
16+
pip install difflib-parser
17+
```
18+
19+
```py
20+
from difflib_parser import difflib_parser
21+
22+
parser = difflib_parser.DiffParser(["hello world"], ["hello world!"])
23+
for diff in parser.iter_diffs():
24+
print(diff)
25+
```
26+
27+
### `Diff` structure
28+
29+
```py
30+
class DiffCode(Enum):
31+
SAME = 0
32+
RIGHT_ONLY = 1
33+
LEFT_ONLY = 2
34+
CHANGED = 3
35+
36+
37+
@dataclass
38+
class Diff:
39+
code: DiffCode
40+
line: str
41+
left_changes: List[int] | None = None
42+
right_changes: List[int] | None = None
43+
newline: str | None = None
44+
```
45+
46+
## What is `difflib`?
47+
48+
A `difflib` output might look something like this:
49+
50+
```python
51+
>>> import difflib
52+
>>> print("\n".join(list(difflib.ndiff(["hello world"], ["hola world"]))))
53+
- hello world
54+
? ^ ^^
55+
56+
+ hola world
57+
? ^ ^
58+
```
59+
60+
The specifics of diff interpretation can be found in the [documentation](https://docs.python.org/3/library/difflib.html).
61+
62+
## Parsing `difflib`
63+
64+
There are concretely four types of changes we are interested in:
65+
66+
1. No change
67+
2. A new line is added
68+
3. An existing line is removed
69+
4. An existing line is edited
70+
71+
Given that the last two cases operate on existing lines, they will always be preceded by `- `. As such, we need to handle them delicately.
72+
73+
If an existing line is removed, it will not have any follow-up lines.
74+
75+
If an existing line is edited, it will have several follow-up lines that provide details on the values that have been changed.
76+
77+
From these follow-up lines, we can further case the changes made to a line:
78+
79+
1. Only additions made (i.e. `"Hello world"` -> `"Hello world!"`)
80+
2. Only removals made (i.e. `"Hello world"` -> `"Hllo world"`)
81+
3. Both additions and removals made (i.e. `"Hello world"` -> `"Hola world!"`)
82+
83+
Each of them have their unique follow-up lines:
84+
85+
1. `-`, `+`, `?`
86+
87+
```python
88+
>>> print("\n".join(list(difflib.ndiff(["hello world"], ["hello world!"]))))
89+
- hello world
90+
+ hello world!
91+
? +
92+
```
93+
94+
2. `-`, `?`, `+`
95+
96+
```python
97+
>>> print("\n".join(list(difflib.ndiff(["hello world"], ["hllo world"]))))
98+
- hello world
99+
? -
100+
101+
+ hllo world
102+
```
103+
104+
3. `-`, `?`, `+`, `?`
105+
106+
```python
107+
>>> print("\n".join(list(difflib.ndiff(["hello world"], ["helo world!"]))))
108+
- hello world
109+
? -
110+
111+
+ helo world!
112+
? +
113+
```
114+
115+
As such, we have included them as separate patterns to process.

0 commit comments

Comments
 (0)