Personal Project • Dec 2024
Scraping Craigslist to Find A New Home

Role
Developer & Automator
Timeline
Dec 2024
Team
Only me!
Skills
Python
Automation
Twilio
Web Scraping
Overview
Automating apartment hunting during the pandemic
During the pandemic, finding a place to rent was near impossible. Properties were snapped up within minutes or never met our criteria. Craigslist remained a valuable resource, but lacked a notification system when new listings got posted.
The Problem
I was stuck checking Craigslist at multiple intervals a day. This led me to think: Why am I doing this manually when I know Python? I set out to automate the process: finding the element names for the vital information we wanted, setting a filter for our criteria, checking if a listing had images (no images often indicate low-effort scams), and sending a notification with all of the important details.
Python, please save me.
The Setup
Libraries and dependencies
Taking a look at our imports:
time & datetime
Used for managing the script's schedule and logging.
pandas
Utilized for handling data storage and manipulation.
requests
To handle the HTTP requests for fetching Craigslist pages.
beautifulsoup
To parse the HTML content and extract the required data.
twilio.rest
Twilio library to send SMS notifications to the phone number of my choice.
Initially, I used a Telegram bot, but we later switched to AirTable per my partner's request (she was on an AirTable kick at the time). My most recent version uses Twilio to make this process easier for a friend who is the current benefactor of this script.
1import time
2from datetime import datetime
3import pandas as pd
4import requests
5from bs4 import BeautifulSoup
6from twilio.rest import ClientTwilio Configuration
Setting up SMS notifications
First, we configure Twilio and set up a function to send SMS messages. This allows the script to notify me immediately when a new listing that meets the criteria is found.
1ACCOUNT_SID = 'MY_TWILIO_SID'
2AUTH_TOKEN = 'MY_TWILIO_AUTH_TOKEN'
3FROM_PHONE_NUMBER = 'MY_TWILIO_NUMBER'
4TO_PHONE_NUMBER = 'MY_RECIPIENTS_NUMBER'
5
6client = Client(ACCOUNT_SID, AUTH_TOKEN)
7
8def send_sms(message):
9    client.messages.create(
10        body=message,
11        from_=FROM_PHONE_NUMBER,
12        to=TO_PHONE_NUMBER
13    )Data Extraction
Extracting listing information
Next, the script extracts relevant information from Craigslist's main listing page and each individual listing's page.
1def extract_main_page_details(soup):
2    listings = soup.find_all('li', class_='cl-static-search-result')
3    data = []
4    for listing in listings:
5        item = {}
6        title = listing.find('div', class_='title')
7        item['title'] = title.text.strip() if title else None
8        price = listing.find('div', class_='price')
9        item['price'] = price.text.strip() if price else None
10        location = listing.find('div', class_='location')
11        item['location'] = location.text.strip() if location else None
12        link = listing.find('a', href=True)
13        item['link'] = link['href'] if link else None
14        data.append(item)
15    return data
16
17def extract_listing_details(url, location):
18    response = requests.get(url)
19    soup = BeautifulSoup(response.content, 'html.parser')
20    item = {}
21    item['link'] = url
22    title = soup.find('title')
23    item['title'] = title.text.strip() if title else None
24    price = soup.find('span', class_='price')
25    item['price'] = price.text.strip() if price else None
26    item['bd&bth'] = None
27    attrgroup = soup.find('div', class_='mapAndAttrs')
28    if attrgroup:
29        attrs = attrgroup.find_all('span', class_='attr')
30        for attr in attrs:
31            if 'br' in attr.text.lower() or 'ba' in attr.text.lower():
32                item['bd&bth'] = item['bd&bth'] + ' ' + attr.text.strip() if item['bd&bth'] else attr.text.strip()
33    date_posted = soup.find('time', {'class': 'date timeago'})
34    item['date_posted'] = date_posted['datetime'] if date_posted else None
35    item['location'] = location
36    return itemData Storage
Storing and checking listings
Since the data volume isn't enormous, I store the listings in a CSV file. This function checks the CSV file and if a listing with the same title already exists, it skips over it, adding only new listings. This helps eliminate duplicate listings, a common issue on Craigslist.
1def check_and_add_to_csv(item, csv_file):
2    try:
3        df = pd.read_csv(csv_file)
4    except FileNotFoundError:
5        df = pd.DataFrame(columns=['title', 'price', 'location', 'link', 'bd&bth', 'date_posted'])
6    if item['title'] in df['title'].values:
7        print(f"Record with title '{item['title']}' already exists in CSV, skipping.")
8        return False
9    else:
10        new_df = pd.DataFrame([item])
11        df = pd.concat([df, new_df], ignore_index=True)
12        df.to_csv(csv_file, index=False)
13        print(f"New record with title '{item['title']}' added to CSV.")
14        return TrueMain Loop
Running the automation
Finally, the main loop runs every 10 minutes, checking for new listings and sending SMS notifications for any new ones found. The script pauses for 30 seconds between notifications to avoid overwhelming the recipient with multiple texts at once.
1while True:
2    curr_time = datetime.now().strftime('%-m/%-d/%y %-I:%M%p')
3    print(f"{curr_time}: Checking for new listings...")
4    main_page_url = "craigslist_url"
5    response = requests.get(main_page_url)
6    main_page_soup = BeautifulSoup(response.content, 'html.parser')
7    listings = extract_main_page_details(main_page_soup)
8    detailed_listings = []
9    for listing in listings:
10        url = listing['link']
11        location = listing['location']
12        details = extract_listing_details(url, location)
13        if check_and_add_to_csv(details, 'craigslist_listings.csv'):
14            detailed_listings.append(details)
15    if detailed_listings:
16        for listing in detailed_listings:
17            message = f"New Listing:\nTitle: {listing['title']}\nPrice: {listing['price']}\nLocation: {listing['location']}\nBR & Ba: {listing['bd&bth']}\nLink: {listing['link']}\nDate Posted: {listing['date_posted']}"
18            send_sms(message)
19            print(f"Notification sent: {message}")
20            print("Waiting 30 seconds so we don't blow up their phone lol")
21            time.sleep(30)
22    print(f"{curr_time}: Sleeping for 10 minutes")
23    time.sleep(600)Future Work
Potential enhancements
While this script is effective for small-scale use, there are a few enhancements I would consider for future iterations:
Error Handling
Implementing additional try-except blocks around network requests and file operations would better handle exceptions like network outages or file permission issues. The primary issue I've encountered so far was my system running out of memory, which prevented new listings from being written to the CSV.
Database Storage
As the dataset grows or if I need to perform more complex queries, transitioning from a CSV file to a more robust solution like a PostgreSQL database would be beneficial. This would improve scalability and data management.
Thanks for reading along! If you're struggling to find a place, please use my code and adjust it to your situation.