Day 13 : Web Scraping using python (Corona data)

Hello guys,
Previously we learn about libraries and oops concepts of python.

Today we will learn about web scraping.

What is Web Scraping ?


Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting etc.) is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.

Why Web Scraping?

Web scraping is used to collect large information from websites. But why does someone have to collect such large data from websites? To know about this, let’s look at the applications of web scraping:
  • Price Comparison: Services such as ParseHub use web scraping to collect data from online shopping websites and use it to compare the prices of products.
  • Email address gathering: Many companies that use email as a medium for marketing, use web scraping to collect email ID and then send bulk emails.
  • Social Media Scraping: Web scraping is used to collect data from Social Media websites such as Twitter to find out what’s trending.
  • Research and Development: Web scraping is used to collect a large set of data (Statistics, General Information, Temperature, etc.) from websites, which are analyzed and used to carry out Surveys or for R&D.
  • Job listings: Details regarding job openings, interviews are collected from different websites and then listed in one place so that it is easily accessible to the user.

There are two types of websites
1 Static website
2 Dynamic Website

To scrap static websites we use two libraries
1 beautifulsoup
2 requests

How to use?
install libraries

pip install beautifulsoup4

pip install requests

Let's have some fun

We will scrap https://www.worldometers.info/coronavirus/  this website 


This website gives information about corona


Code:
import requests as rq
from bs4 import BeautifulSoup


data = rq.get("https://www.worldometers.info/coronavirus/")

soup = BeautifulSoup(data.text, 'html.parser')
count = soup.find_all("div", class_="maincounter-number")
print(counts)


Output:

[<div class="maincounter-number">
<span style="color:#aaa">1,361,598 </span>
</div>, <div class="maincounter-number">
<span>76,315</span>
</div>, <div class="maincounter-number" style="color:#8ACA2B ">
<span>293,654</span>
</div>]


So whats going on here,

1. import requests as rq 
The requests module allows you to send HTTP requests using Python.
The HTTP request returns a Response Object with all the response data (content, encoding, status, etc).

2 from bs4 import BeautifulSoup.

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree.



3. data = rq.get("https://www.worldometers.info/coronavirus/")

get method Sends a GET request to the specified url and retrun to data


4. if we print(data) that will return <Response [200]>.


5.soup = BeautifulSoup(data.text, 'html.parser')

Beautiful Soup gives us a BeautifulSoup object, which represents the document as a nested data structure

basically returns HTML page.

6. counts = soup.find_all("div", class_="maincounter-number")

Because find_all() is the most popular method in the Beautiful Soup search API, you can use a shortcut for it. If you treat the BeautifulSoup object or a Tag object as though it were a function, then it’s the same as calling find_all() on that object. These two lines of code are equivalent:

find_all(tag,attr) : here we specify div tag and attribute class with value "maincounter-number"

returns list of attributes.


7. print(counts)

Here we print list

Now if we want that actual count not that HTML stuff div then 

for count in counts:
print(count.get_text())


1,362,045 


76,328


293,655



get_text() : return text within HTML tags.

There are such functions like get_link() etc


Quite Fun and Looks like magic😎

Task :
Scrap other such websites




❤❤Quarantine python group link ❤❤

8805271377 WhatsApp

Follow here ❤

@mr._mephisto_ Instagram 

There will be no restrictions just feel free to learn. 

Share and take one more step to share knowledge to others. 

Believe in yourself 🤟 you are awesome. 

Be safe, Be happy😁
Take care of yourself and your family 
Of course watch movies and series🤟😉 

And follow the guidelines of government



Comments

Popular posts from this blog

Day 16 : Pandas Basics

News website using Flask and news API - Working with templates : Part 5

Day 9 : Encapsulation and polymorphism