Skip to content

Commit 76f9e6c

Browse files
authored
Merge pull request #3 from sumitsisodiya/master
added selenium for scraping just a basic understanding BeautifulSoup.py
2 parents f75f1f7 + 92b8d79 commit 76f9e6c

File tree

1 file changed

+19
-1
lines changed

1 file changed

+19
-1
lines changed

Web Scraping with BeautifulSoup.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,38 @@
55
#pip3 install requests
66
#pip3 install bs4
77

8+
#run in the browser also what are you doing with the help of chrome driver
89

910
# ## Basic fundamentals of web scraping
1011

1112
# import these two modules bs4 for selecting HTML tags easily
1213
from bs4 import BeautifulSoup
1314
# requests module is easy to operate some people use urllib but I prefer this one because it is easy to use.
1415
import requests
16+
from selenium import webdriver
1517

1618
# I put here my own blog url ,you can change it.
1719
url="https://getpython.wordpress.com/"
18-
20+
BASE_URL = "https://getpython.wordpress.com/"
1921
#Requests module use to data from given url
2022
source=requests.get(url)
2123

24+
25+
def get_chrome_web_driver(options):
26+
return webdriver.Chrome("./chromedriver", chrome_options=options)
27+
28+
29+
def get_web_driver_options():
30+
return webdriver.ChromeOptions()
31+
32+
33+
def set_ignore_certificate_error(options):
34+
options.add_argument('--ignore-certificate-errors')
35+
36+
37+
def set_browser_as_incognito(options):
38+
options.add_argument('--incognito')
39+
2240
# BeautifulSoup is used for getting HTML structure from requests response.(craete your soup)
2341
soup=BeautifulSoup(source.text,'html')
2442

0 commit comments

Comments
 (0)