Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Commit d48eaee

Browse files
Geolocation and user language extraction analysis: isuue #37
1 parent 1d278a4 commit d48eaee

File tree

2 files changed

+785
-0
lines changed

2 files changed

+785
-0
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
## Analysis:
2+
3+
My analysis revolves around checking what percentage of / which websites in this dataset are tracking users location and language preferences so as to provide them with a customized content based on the users preferences (eg. location, language)
4+
5+
## Dataset used: Sample 10 percent
6+
- [sample 10 percent](https://public-data.telemetry.mozilla.org/bigcrawl/sample_10percent.parquet.tar.bz2) - 3.7GB download / 7.4GB on disk
7+
8+
## Inference:
9+
10+
11+
Out of the total of __11292867__ websites / locations in this dataset __72304__ (0.64%) websites were found to be checking for preferred language of the user, usually the language of the browser UI, and their subsequent location/scripts can be found in the `language_pref_df` dataframe.
12+
13+
Out of the total of __11292867__ websites / locations in this dataset __2414__ (0.02%) websites were found to be checking for user's location using the geolocation api, and their subsequent location/scripts can be found in the `geolocation_df` dataframe.
14+
15+
Running it on the full dataset can yield an `higher accuracy`.

0 commit comments

Comments
 (0)