Skip to content

How to Build Your First Web Scraper Using Python

Screenshot of the generated CSV file with columns Title and Price

Python has become the go-to language for web scraping due to its powerful libraries, clean syntax, and strong community support.

[adrotate banner=”3″]

In this section, you’ll learn how to build a simple yet effective web scraper in Python that extracts data from a real website — no prior experience required!

We’ll use:

  • requests to fetch HTML content
  • BeautifulSoup to parse and extract data
  • pandas to organize and export the results

Let’s get started!

🔧 Step 1: Set Up Your Environment

Before writing any code, make sure your system has the necessary tools installed.

Requirements:

📦 Install Required Libraries:

Open your terminal and run:

pip install requests beautifulsoup4 pandas

📌 Tip: You can create a virtual environment first to keep dependencies isolated:

python -m venv env
source env/bin/activate   # On Windows: env\Scripts\activate
pip install requests beautifulsoup4 pandas

🌐 Step 2: Send an HTTP Request

Use the requests library to send an HTTP GET request to the target URL and retrieve the page’s HTML content.

import requests

url = 'https://books.toscrape.com/ '
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    html_content = response.text
else:
    print(f"Failed to retrieve page. Status code: {response.status_code}")

📌 Tip: Always check the status code to avoid errors. 200 means success!

🧾 Step 3: Parse the HTML Content

Now that we have the HTML, let’s parse it using BeautifulSoup.

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_content, 'html.parser')
print(soup.prettify())  # Optional: view nicely formatted HTML

With BeautifulSoup, we can now search for specific elements like <h1>, <div class="price">, or <a href="#">.

🧹 Step 4: Extract Data from the Page

Let’s extract all book titles and prices from the page.

books = soup.find_all('article', class_='product_pod')

for book in books:
    title = book.h3.a['title']
    price = book.find('p', class_='price_color').text
    print(f"{title} – {price}")

📌 This loop finds each book item (<article class="product_pod">) and extracts the title and price.

📊 Step 5: Store the Data

To save the scraped data, we’ll use pandas to create a DataFrame and export it as a CSV file.

import pandas as pd

data = []

for book in books:
    title = book.h3.a['title']
    price = book.find('p', class_='price_color').text
    data.append({'Title': title, 'Price': price})

df = pd.DataFrame(data)
df.to_csv('books.csv', index=False)
print("Data saved to books.csv")

You’ll now find a books.csv file in your working directory containing all the scraped data.

🔄 Bonus: Scrape Multiple Pages

Most websites span multiple pages. Let’s modify our script to scrape all pages.

base_url = 'https://books.toscrape.com/catalogue/page- {}.html'
page_number = 1
all_books = []

while True:
    url = base_url.format(page_number)
    response = requests.get(url)

    if response.status_code != 200:
        break  # Stop when there are no more pages

    soup = BeautifulSoup(response.text, 'html.parser')
    books = soup.find_all('article', class_='product_pod')

    if not books:
        break  # No more books found

    for book in books:
        title = book.h3.a['title']
        price = book.find('p', class_='price_color').text
        all_books.append({'Title': title, 'Price': price})

    page_number += 1

df = pd.DataFrame(all_books)
df.to_csv('all_books.csv', index=False)
print("All data saved to all_books.csv")

📌 Tip: Be respectful by adding a delay between requests:

import time
time.sleep(2)  # Wait 2 seconds before next request

⚠️ Important Notes on Ethics and Best Practices

Even though you’re building your own scraper, always follow ethical practices:

  • Respect robots.txt
  • Avoid excessive requests
  • Identify your bot with a custom User-Agent
  • Don’t scrape sensitive or private data
Screenshot of the generated CSV file with columns Title and Price
Screenshot of the generated CSV file with columns Title and Price

🎉 Congratulations! You Just Built a Web Scraper

You’ve successfully created a working web scraper in Python that:

  • Fetches HTML content
  • Parses and extracts relevant data
  • Stores it in a structured format (CSV)
  • Handles pagination

This foundation can be extended to scrape product listings, job boards, news articles, and more!

Related Article:

Part 1: Web Scraping ! The Ultimate Guide for Data Extraction

Part 2: Web Scraping! Legal Aspects and Ethical Guidelines

Part 3: Web Scraping! Different Tools and Technologies

Part 4: How to Build Your First Web Scraper Using Python

Part 5: Web Scraping Advanced Techniques in Python

Part 6: Real-World Applications of Web Scraping

Najeeb Alam

Najeeb Alam

Technical writer specializes in developer, Blogging and Online Journalism. I have been working in this field for the last 20 years.

Leave a Reply

Your email address will not be published. Required fields are marked *