-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Port MASTG-TEST-0004: App Exposing Sensitive Data to Embedded Libraries #3485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
A couple of things are going on here: FIrst of all just to clarify scope. The weaknesses we should be targeting are:
TestThe original test has several parts. Part 1
This isn't good, and parts of it actually belong somewhere else. Your new test addresses the "identification of potentially sensitive data that may have been passed to embedded third-party libraries used by the application." TODO:
Part 2
This seems to be related to MASWE-0109: Lack of Anonymization or Pseudonymisation Measures. TODO:
Part 3
This is MASTG-TEST-0206: Undeclared PII in Network Traffic Capture and the new suggested test that uses MASTG-TECH-0119 ("Method 1" in your current test). DemoThe proposed demo is a bit misleading. When analyzing the issue statically, you won't find the sensitive user data (e.g., email, name, username) in the results. We would need to make this more realistic.
TODO: update demo title to "Uses of Firebase Analytics APIs on Potential PII with Semgrep" New DemoWe need a dynamic demo and test: we hook all those APIs and will find out which ones are actually used and what they contain. We can then correlate that with the hooks to the network APIs and the traffic capture. For the data we detect using these tests, the final questions are:
Putting it all togetherNow we have a pretty solid test strategy: Static:
Dynamic:
Network:
Summary of ## Required Actions for This PR
Separate Follow-Up Issues (not part of this PR)Create issues for:
|
Adjust demo addressing test number 1
| profiles: [P] | ||
| --- | ||
|
|
||
| ## Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be moved in a new issue
| profiles: [P] | ||
| --- | ||
|
|
||
| ## Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be moved in a new issue
|
Adjusted the scope of this PR + see point 4 bellow. 1. A new test based on Method 2, point 1: "Use of Third-Party Tracking & Analytics SDKs." 2. A new test based on Method 1 using MASTG-TECH-0119 (network API hooking). 3. A new test for anonymization/pseudonymisation (MASWE-0109). 4. A new dynamic test for discovering actual sensitive data passed to SDKs. 5. A new network-analysis test for verifying transmission of sensitive data. |
|
|
||
| ## Steps | ||
|
|
||
| 1. Identify common SDK APIs (methods) the SDK uses as entry points to collect data by researching the library's documentation online or its codebase. For example, if the library is `com.example.analytics` and it has a method `trackEvent(String eventName, Map<String, String> properties)` used to accept data, then the method to search for would be `com.example.analytics.trackEvent`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use a real example, e.g. from Firebase. Also this sounds like it should be a prerequisite and not a step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have a Prerequisites section. Move to the overview, then?
|
|
||
| ## Evaluation | ||
|
|
||
| The test case fails if you can find the use of these SDK methods in the app code, indicating that the app is sharing data with the third-party SDK. If no such references are found, the test case passes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say: if the methods are found and the data they handle is sensitive. Since this is static analysis, it'd require additional reverse engineering and ideally running the dynamic version of it.
Maybe we should start using techniques in the evaluation (in this case reverse eng.). I'm seeing a pattern, see the next test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this suggestion goes down a consolidation path, and fyi it's contrary to the initial direction:
Keep the PR focused strictly on detecting potential sensitive-data-handling SDK API usage via static analysis. What is suggested is mixing this (2te1) with 2te3, or?
The current format is:
- 02te0: detect analytics SDKs in App (via SBOM)
- 02te1: detect references on the SDK (found in 02te0)
- 02te3: validate that references found in 02te0 are used
Essentially, where one test finishes, the next takes over.
I could see 02te1 and 3 merged, however, I cannot see 02te3 living along with 02te1. 🤔
| title: References to SDK APIs Known to Handle Sensitive Data | ||
| id: MASTG-TEST-02te1 | ||
| type: [static] | ||
| weakness: MASWE-0112 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| weakness: MASWE-0112 | |
| weakness: MASWE-0112 | |
| prerequisites: | |
| - identify-sensitive-data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't think so, because this test evaluates references, not validating the existence of sensitive data. 02te3 does that (and has it). Why do you think this is needed?
|
|
||
| ## Overview | ||
|
|
||
| This test verifies whether an app is sending sensitive data to an embedded SDK (third-party library) via its APIs (methods). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're about to unify static and dynamic as much as possible. This is a perfect example of the test purpose being exactly the same and the way of testing (with its pros/cons) being different. I'd keep the same description for these 2 tests for now.
| This test verifies whether an app is sending sensitive data to an embedded SDK (third-party library) via its APIs (methods). | |
| This test verifies whether an app uses SDK (third-party library) APIs known to handle sensitive data. |
| ) { | ||
| Column(modifier = Modifier.padding(16.dp)) { | ||
| // Normal visible selection UI: list of radio buttons for blood types | ||
| val bloodTypes = listOf("A+", "A-", "B+", "B-", "AB+", "AB-", "O+", "O-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice but I'd avoid adding anything to the UI unless the test is about anything involving the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both Demos have this on purpose: To give the reader a strong understanding of the difference between static and dynamic content. If I had hardcoded a sensitive value here, then why not use Semgrep instead?
This extra UI conveys the point of why we actually use a dynamic test here.
| onValueChange = { input = it }, | ||
| modifier = Modifier | ||
| .fillMaxWidth(), | ||
| label = { Text("Enter something to sent to Firebase Analytics") }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also nice, but I'd also avoid modifying the UI if it's not strictly needed. The simpler the UI the better. See same suggestion for the other demo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss it in the other comment. For here: this input gives emphasis on potential sensitive data as explained on TEST. The user might write their name, password, or....not. Therefore, I find it critical to convey that "potential" sensitive data message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can just use the code from 00de3 for both demos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's agree on UI topics first :)
Co-authored-by: Carlos Holguera <perezholguera@gmail.com>
Co-authored-by: Carlos Holguera <perezholguera@gmail.com>
This PR closes #2942
Description
Port MASTG-TEST-0004: Sensitive Data Leaked via Embedded Libraries (android)
As I could not create a generic Semgrep rule for all kinds of libraries, the demo uses Firebase Analytics as an example.
TODOs before merging: