Day 13 : Web Scraping using python (Corona data)
Hello guys,
Previously we learn about libraries and oops concepts of python.
Today we will learn about web scraping.
Previously we learn about libraries and oops concepts of python.
Today we will learn about web scraping.
What is Web Scraping ?
Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting etc.) is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.
Why Web Scraping?
Web scraping is used to collect large information from websites. But why does someone have to collect such large data from websites? To know about this, let’s look at the applications of web scraping:
- Price Comparison: Services such as ParseHub use web scraping to collect data from online shopping websites and use it to compare the prices of products.
- Email address gathering: Many companies that use email as a medium for marketing, use web scraping to collect email ID and then send bulk emails.
- Social Media Scraping: Web scraping is used to collect data from Social Media websites such as Twitter to find out what’s trending.
- Research and Development: Web scraping is used to collect a large set of data (Statistics, General Information, Temperature, etc.) from websites, which are analyzed and used to carry out Surveys or for R&D.
- Job listings: Details regarding job openings, interviews are collected from different websites and then listed in one place so that it is easily accessible to the user.
There are two types of websites
1 Static website
2 Dynamic Website
To scrap static websites we use two libraries
1 beautifulsoup
2 requests
How to use?
install libraries
pip install beautifulsoup4
pip install requests
Let's have some fun
We will scrap https://www.worldometers.info/coronavirus/ this website
This website gives information about corona
Code:
import requests as rq
from bs4 import BeautifulSoup
data = rq.get("https://www.worldometers.info/coronavirus/")
soup = BeautifulSoup(data.text, 'html.parser')
count = soup.find_all("div", class_="maincounter-number")
print(counts)
Output:
[<div class="maincounter-number">
<span style="color:#aaa">1,361,598 </span>
</div>, <div class="maincounter-number">
<span>76,315</span>
</div>, <div class="maincounter-number" style="color:#8ACA2B ">
<span>293,654</span>
</div>]
So whats going on here,
1. import requests as rq
The
requests
module allows you to send HTTP requests using Python.
The HTTP request returns a Response Object with all the response data (content, encoding, status, etc).
2 from bs4 import BeautifulSoup.
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree.
3. data = rq.get("https://www.worldometers.info/coronavirus/")
get method Sends a GET request to the specified url and retrun to data
4. if we print(data) that will return <Response [200]>.
5.soup = BeautifulSoup(data.text, 'html.parser')
Beautiful Soup gives us a
BeautifulSoup
object, which represents the document as a nested data structure
basically returns HTML page.
6. counts = soup.find_all("div", class_="maincounter-number")
Because
find_all()
is the most popular method in the Beautiful Soup search API, you can use a shortcut for it. If you treat the BeautifulSoup
object or a Tag
object as though it were a function, then it’s the same as calling find_all()
on that object. These two lines of code are equivalent:
find_all(tag,attr) : here we specify div tag and attribute class with value "maincounter-number"
returns list of attributes.
7. print(counts)
Here we print list
Now if we want that actual count not that HTML stuff div then
for count in counts:
print(count.get_text())
1,362,045
76,328
293,655
get_text() : return text within HTML tags.
There are such functions like get_link() etc
Quite Fun and Looks like magic😎
Task :
Scrap other such websites
❤❤Quarantine python group link ❤❤
8805271377 WhatsApp
Follow here ❤
@mr._mephisto_ Instagram
There will be no restrictions just feel free to learn.
Share and take one more step to share knowledge to others.
Believe in yourself 🤟 you are awesome.
Be safe, Be happy😁
Take care of yourself and your family
Of course watch movies and series🤟😉
And follow the guidelines of government
Comments
Post a Comment