Automatically download all pdf files from a webpage

import os import requests from urllib.parse import urljoin from bs4 import BeautifulSoup


url = "https://...com/"
#If there is no such folder, the script will create one automatically

folder_location = r'C:\Users\jing\Dropbox\Harper\homeSchool\Achieve3000'

if not os.path.exists(folder_location):os.mkdir(folder_location)

response = requests.get(url) soup= BeautifulSoup(response.text, "html.parser") for link in soup.select("a[href$='.pdf']"): #Name the pdf files using the last portion of each link which are unique in this case filename = os.path.join(folder_location,link['href'].split('/')[-1]) with open(filename, 'wb') as f: f.write(requests.get(urljoin(url,link['href'])).content)

Reference:
https://stackoverflow.com/questions/54616638/download-all-pdf-files-from-a-website-using-python

I only made minor changes based on the code posted on here:
https://stackoverflow.com/questions/54616638/download-all-pdf-files-from-a-website-using-python

One thought on “Automatically download all pdf files from a webpage”

Hi,
I am using soup.select(“a[href$=’.pdf’]”)
for download pfd from https://www.reddit.com/r/SecurityAnalysis/comments/kvq6tj/q4_2020_letters_reports/ but this is not returning any links. please do a needful. Thanks in advance.

One thought on “Automatically download all pdf files from a webpage”

Leave a Reply Cancel reply