Day 13 : Web Scraping using python (Corona data)

- April 07, 2020

Hello guys,
Previously we learn about libraries and oops concepts of python.

Today we will learn about web scraping.

What is Web Scraping ?

Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting etc.) is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.

Why Web Scraping?

Web scraping is used to collect large information from websites. But why does someone have to collect such large data from websites? To know about this, let’s look at the applications of web scraping:

Price Comparison: Services such as ParseHub use web scraping to collect data from online shopping websites and use it to compare the prices of products.
Email address gathering: Many companies that use email as a medium for marketing, use web scraping to collect email ID and then send bulk emails.
Social Media Scraping: Web scraping is used to collect data from Social Media websites such as Twitter to find out what’s trending.
Research and Development: Web scraping is used to collect a large set of data (Statistics, General Information, Temperature, etc.) from websites, which are analyzed and used to carry out Surveys or for R&D.
Job listings: Details regarding job openings, interviews are collected from different websites and then listed in one place so that it is easily accessible to the user.

There are two types of websites

1 Static website

2 Dynamic Website

To scrap static websites we use two libraries

1 beautifulsoup

2 requests

How to use?

install libraries

pip install beautifulsoup4

pip install requests

Let's have some fun

We will scrap https://www.worldometers.info/coronavirus/ this website

This website gives information about corona

Code:

import requests as rq

from bs4 import BeautifulSoup

data = rq.get("https://www.worldometers.info/coronavirus/")

soup = BeautifulSoup(data.text, 'html.parser')

count = soup.find_all("div", class_="maincounter-number")

print(counts)

Output:

[<div class="maincounter-number">

</div>, <div class="maincounter-number">

</div>, <div class="maincounter-number" style="color:#8ACA2B ">

</div>]

So whats going on here,

1. import requests as rq

The requests module allows you to send HTTP requests using Python.

The HTTP request returns a Response Object with all the response data (content, encoding, status, etc).

2 from bs4 import BeautifulSoup.

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree.

3. data = rq.get("https://www.worldometers.info/coronavirus/")

get method Sends a GET request to the specified url and retrun to data

4. if we print(data) that will return <Response [200]>.

5.soup = BeautifulSoup(data.text, 'html.parser')

Beautiful Soup gives us a BeautifulSoup object, which represents the document as a nested data structure

basically returns HTML page.

6. counts = soup.find_all("div", class_="maincounter-number")

Because find_all() is the most popular method in the Beautiful Soup search API, you can use a shortcut for it. If you treat the BeautifulSoup object or a Tag object as though it were a function, then it’s the same as calling find_all() on that object. These two lines of code are equivalent:

find_all(tag,attr) : here we specify div tag and attribute class with value "maincounter-number"

returns list of attributes.

7. print(counts)

Here we print list

Now if we want that actual count not that HTML stuff div then

for count in counts:

print(count.get_text())

1,362,045

76,328

293,655

get_text() : return text within HTML tags.

There are such functions like get_link() etc

Quite Fun and Looks like magic😎

Task :

Scrap other such websites

❤❤Quarantine python group link ❤❤

8805271377 WhatsApp

Follow here ❤

@mr._mephisto_ Instagram

Github https://github.com/Mitesh2499

There will be no restrictions just feel free to learn.

Share and take one more step to share knowledge to others.

Believe in yourself 🤟 you are awesome.

Be safe, Be happy😁

Take care of yourself and your family

Of course watch movies and series🤟😉

And follow the guidelines of government

Search This Blog

PyBuddy - Python and web development

Day 13 : Web Scraping using python (Corona data)

What is Web Scraping ?

Why Web Scraping?

Comments

Post a Comment

Popular posts from this blog

Day 16 : Pandas Basics

News website using Flask and news API - Working with templates : Part 5

Day 9 : Encapsulation and polymorphism