Scraping Craigslist to Find A New Home
During the pandemic, finding a place to rent was near impossible. At the time, my partner and I set Zillow alerts alongside checking individual property management companies' websites daily. However, it felt like a futile effort—properties were either snapped up within minutes or never met our criteria. Craigslist, a popular online platform for classifieds, has long been a go-to for finding rental listings. Although it has had a reputation for scams, outdated listings, and even murderers (oh my!), it remains a valuable resource due to the volume and diversity of posts. My main gripe with Craigslist was the the lack of a notification system when new listings got posted. Because of this, I was stuck checking the site at multiple intervals a day. This led me to think: Why am I doing this manually when I know Python? I set out to automate the process: finding the element names for the vital information we wanted, setting a filter for our criteria, checking if a listing had images (no images often indicate low-effort scams), and sending a notification with all of the important details. Python, please save me.
Taking a look at our imports:
time and datetime: Used for managing the script's schedule and logging. pandas: Utilized for handling data storage and manipulation. requests: To handle the HTTP requests for fetching Craigslist pages. beautifulsoup: To parse the HTML content and extract the required data. twilio.rest: Twilio library to send SMS notifications to the phone number of my choice. Initially, I used a Telegram bot, but we later switched to AirTable per my partner's request (she was on an AirTable kick at the time). My most recent version uses Twilio to make this process easier for a friend who is the current benefactor of this script.
Here's the code to import the libraries:
import timefrom datetime import datetimeimport pandas as pdimport requestsfrom bs4 import BeautifulSoupfrom twilio.rest import Client
Twilio Configuration and Setup
First, we configure Twilio and set up a function to send SMS messages. This allows the script to notify me immediately when a new listing that meets the criteria is found.
ACCOUNT_SID = 'MY_TWILIO_SID'AUTH_TOKEN = 'MY_TWILIO_AUTH_TOKEN'FROM_PHONE_NUMBER = 'MY_TWILIO_NUMBER'TO_PHONE_NUMBER = 'MY_RECIPIENTS_NUMBER'client = Client(ACCOUNT_SID, AUTH_TOKEN)def send_sms(message):client.messages.create(body=message,from_=FROM_PHONE_NUMBER,to=TO_PHONE_NUMBER)
Next, the script extracts relevant information from Craigslist’s main listing page and each individual listing’s page.
def extract_main_page_details(soup):listings = soup.find_all('li', class_='cl-static-search-result')data = []for listing in listings:item = {}title = listing.find('div', class_='title')item['title'] = title.text.strip() if title else Noneprice = listing.find('div', class_='price')item['price'] = price.text.strip() if price else Nonelocation = listing.find('div', class_='location')item['location'] = location.text.strip() if location else Nonelink = listing.find('a', href=True)item['link'] = link['href'] if link else Nonedata.append(item)return datadef extract_listing_details(url, location):response = requests.get(url)soup = BeautifulSoup(response.content, 'html.parser')item = {}item['link'] = urltitle = soup.find('title')item['title'] = title.text.strip() if title else Noneprice = soup.find('span', class_='price')item['price'] = price.text.strip() if price else Noneitem['bd&bth'] = Noneattrgroup = soup.find('div', class_='mapAndAttrs')if attrgroup:attrs = attrgroup.find_all('span', class_='attr')for attr in attrs:if 'br' in attr.text.lower() or 'ba' in attr.text.lower():item['bd&bth'] = item['bd&bth'] + ' ' + attr.text.strip() if item['bd&bth'] else attr.text.strip()date_posted = soup.find('time', {'class': 'date timeago'})item['date_posted'] = date_posted['datetime'] if date_posted else Noneitem['location'] = locationreturn item
Storing and Checking Listings
Since the data volume isn’t enormous, I store the listings in a CSV file. This function checks the CSV file and if a listing with the same title already exists, it skips over it, adding only new listings. This helps eliminate duplicate listings, a common issue on Craigslist.
def check_and_add_to_csv(item, csv_file):try:df = pd.read_csv(csv_file)except FileNotFoundError:df = pd.DataFrame(columns=['title', 'price', 'location', 'link', 'bd&bth', 'date_posted'])if item['title'] in df['title'].values:print(f"Record with title '{item['title']}' already exists in CSV, skipping.")return Falseelse:new_df = pd.DataFrame([item])df = pd.concat([df, new_df], ignore_index=True)df.to_csv(csv_file, index=False)print(f"New record with title '{item['title']}' added to CSV.")return True
Main Loop
Finally, the main loop runs every 10 minutes, checking for new listings and sending SMS notifications for any new ones found. The script pauses for 30 seconds between notifications to avoid overwhelming the recipient with multiple texts at once.
while True:curr_time = datetime.now().strftime('%-m/%-d/%y %-I:%M%p')print(f"{curr_time}: Checking for new listings...")main_page_url = "craigslist_url"response = requests.get(main_page_url)main_page_soup = BeautifulSoup(response.content, 'html.parser')listings = extract_main_page_details(main_page_soup)detailed_listings = []for listing in listings:url = listing['link']location = listing['location']details = extract_listing_details(url, location)if check_and_add_to_csv(details, 'craigslist_listings.csv'):detailed_listings.append(details)if detailed_listings:for listing in detailed_listings:message = f"New Listing:\nTitle: {listing['title']}\nPrice: {listing['price']}\nLocation: {listing['location']}\nBR & Ba: {listing['bd&bth']}\nLink: {listing['link']}\nDate Posted: {listing['date_posted']}"send_sms(message)print(f"Notification sent: {message}")print("Waiting 30 seconds so we don't blow up their phone lol")time.sleep(30)print(f"{curr_time}: Sleeping for 10 minutes")time.sleep(600)
Potential Enhancements
While this script is effective for small-scale use, there are a few enhancements I would consider for future iterations:
- Error Handling: Implementing additional try-except blocks around network requests and file operations would better handle exceptions like network outages or file permission issues. The primary issue I've encountered so far was my system running out of memory, which prevented new listings from being written to the CSV. Given that I typically only run the script for a few days at a time before my system (and consequently, the script) gets restarted, this hasn't been a frequent problem but is something worth addressing.
- Database Storage: As the dataset grows or if I need to perform more complex queries, transitioning from a CSV file to a more robust solution like a PostgreSQL database would be beneficial. This would improve scalability and data management.
Thanks for reading along!
If you're struggling to find a place, please use my code and adjust it to your situation. If you have any questions, feel free to shoot me a message.