Personal Project • Dec 2024

Scraping Craigslist to Find A New Home

Role

Developer & Automator

Timeline

Dec 2024

Team

Only me!

Skills

Python

Automation

Twilio

Web Scraping

Overview

Automating apartment hunting during the pandemic

During the pandemic, finding a place to rent was near impossible. Properties were snapped up within minutes or never met our criteria. Craigslist remained a valuable resource, but lacked a notification system when new listings got posted.

The Problem

I was stuck checking Craigslist at multiple intervals a day. This led me to think: Why am I doing this manually when I know Python? I set out to automate the process: finding the element names for the vital information we wanted, setting a filter for our criteria, checking if a listing had images (no images often indicate low-effort scams), and sending a notification with all of the important details.

Python, please save me.

The Setup

Libraries and dependencies

Taking a look at our imports:

time & datetime

Used for managing the script's schedule and logging.

pandas

Utilized for handling data storage and manipulation.

requests

To handle the HTTP requests for fetching Craigslist pages.

beautifulsoup

To parse the HTML content and extract the required data.

twilio.rest

Twilio library to send SMS notifications to the phone number of my choice.

Initially, I used a Telegram bot, but we later switched to AirTable per my partner's request (she was on an AirTable kick at the time). My most recent version uses Twilio to make this process easier for a friend who is the current benefactor of this script.

1import time
2from datetime import datetime
3import pandas as pd
4import requests
5from bs4 import BeautifulSoup
6from twilio.rest import Client

Twilio Configuration

Setting up SMS notifications

First, we configure Twilio and set up a function to send SMS messages. This allows the script to notify me immediately when a new listing that meets the criteria is found.

1ACCOUNT_SID = 'MY_TWILIO_SID'
2AUTH_TOKEN = 'MY_TWILIO_AUTH_TOKEN'
3FROM_PHONE_NUMBER = 'MY_TWILIO_NUMBER'
4TO_PHONE_NUMBER = 'MY_RECIPIENTS_NUMBER'
5
6client = Client(ACCOUNT_SID, AUTH_TOKEN)
7
8def send_sms(message):
9    client.messages.create(
10        body=message,
11        from_=FROM_PHONE_NUMBER,
12        to=TO_PHONE_NUMBER
13    )

Data Extraction

Extracting listing information

Next, the script extracts relevant information from Craigslist's main listing page and each individual listing's page.

1def extract_main_page_details(soup):
2    listings = soup.find_all('li', class_='cl-static-search-result')
3    data = []
4    for listing in listings:
5        item = {}
6        title = listing.find('div', class_='title')
7        item['title'] = title.text.strip() if title else None
8        price = listing.find('div', class_='price')
9        item['price'] = price.text.strip() if price else None
10        location = listing.find('div', class_='location')
11        item['location'] = location.text.strip() if location else None
12        link = listing.find('a', href=True)
13        item['link'] = link['href'] if link else None
14        data.append(item)
15    return data
16
17def extract_listing_details(url, location):
18    response = requests.get(url)
19    soup = BeautifulSoup(response.content, 'html.parser')
20    item = {}
21    item['link'] = url
22    title = soup.find('title')
23    item['title'] = title.text.strip() if title else None
24    price = soup.find('span', class_='price')
25    item['price'] = price.text.strip() if price else None
26    item['bd&bth'] = None
27    attrgroup = soup.find('div', class_='mapAndAttrs')
28    if attrgroup:
29        attrs = attrgroup.find_all('span', class_='attr')
30        for attr in attrs:
31            if 'br' in attr.text.lower() or 'ba' in attr.text.lower():
32                item['bd&bth'] = item['bd&bth'] + ' ' + attr.text.strip() if item['bd&bth'] else attr.text.strip()
33    date_posted = soup.find('time', {'class': 'date timeago'})
34    item['date_posted'] = date_posted['datetime'] if date_posted else None
35    item['location'] = location
36    return item

Data Storage

Storing and checking listings

Since the data volume isn't enormous, I store the listings in a CSV file. This function checks the CSV file and if a listing with the same title already exists, it skips over it, adding only new listings. This helps eliminate duplicate listings, a common issue on Craigslist.

1def check_and_add_to_csv(item, csv_file):
2    try:
3        df = pd.read_csv(csv_file)
4    except FileNotFoundError:
5        df = pd.DataFrame(columns=['title', 'price', 'location', 'link', 'bd&bth', 'date_posted'])
6    if item['title'] in df['title'].values:
7        print(f"Record with title '{item['title']}' already exists in CSV, skipping.")
8        return False
9    else:
10        new_df = pd.DataFrame([item])
11        df = pd.concat([df, new_df], ignore_index=True)
12        df.to_csv(csv_file, index=False)
13        print(f"New record with title '{item['title']}' added to CSV.")
14        return True

Main Loop

Running the automation

Finally, the main loop runs every 10 minutes, checking for new listings and sending SMS notifications for any new ones found. The script pauses for 30 seconds between notifications to avoid overwhelming the recipient with multiple texts at once.

1while True:
2    curr_time = datetime.now().strftime('%-m/%-d/%y %-I:%M%p')
3    print(f"{curr_time}: Checking for new listings...")
4    main_page_url = "craigslist_url"
5    response = requests.get(main_page_url)
6    main_page_soup = BeautifulSoup(response.content, 'html.parser')
7    listings = extract_main_page_details(main_page_soup)
8    detailed_listings = []
9    for listing in listings:
10        url = listing['link']
11        location = listing['location']
12        details = extract_listing_details(url, location)
13        if check_and_add_to_csv(details, 'craigslist_listings.csv'):
14            detailed_listings.append(details)
15    if detailed_listings:
16        for listing in detailed_listings:
17            message = f"New Listing:\nTitle: {listing['title']}\nPrice: {listing['price']}\nLocation: {listing['location']}\nBR & Ba: {listing['bd&bth']}\nLink: {listing['link']}\nDate Posted: {listing['date_posted']}"
18            send_sms(message)
19            print(f"Notification sent: {message}")
20            print("Waiting 30 seconds so we don't blow up their phone lol")
21            time.sleep(30)
22    print(f"{curr_time}: Sleeping for 10 minutes")
23    time.sleep(600)

Future Work

Potential enhancements

While this script is effective for small-scale use, there are a few enhancements I would consider for future iterations:

Error Handling

Implementing additional try-except blocks around network requests and file operations would better handle exceptions like network outages or file permission issues. The primary issue I've encountered so far was my system running out of memory, which prevented new listings from being written to the CSV.

Database Storage

As the dataset grows or if I need to perform more complex queries, transitioning from a CSV file to a more robust solution like a PostgreSQL database would be beneficial. This would improve scalability and data management.

Thanks for reading along! If you're struggling to find a place, please use my code and adjust it to your situation.

SPENCELLM

Hey, what would you like to know?