Web Scrapping with Python15 Dec 2017
Internet has superabundant information and knowledge but to extract that knowledge in organised, scalable and systemetic way for data analysis and other research purpose is known as Web Scrapping.
Web Scrapping automatically extracts the data from a particular website and present it in user desired form. We can scrap many things from the web like Stock updates, Price of any product from Amazon. In this post I’ll scrap news updates from my college’s website.
I’ll use Python and BeautifulSoup for this post because both Python and BeautifulSoup are beginner friendly.
- For Mac and Linux users, Python is preinstalled.
- Windows users can install Python from official website.
For installing BeautifulSoup library, simply enter following command in terminal.
pip install beautifulsoup4
Finding Target from HTML
All the news & updates of IIT Bhubaneswar are enlisted on this webpage. Visit this Webpage and press
Ctr + Shift + I to inspect source code of the webpage. Now click on the the mouse button present on the top-left part of inspector panel and hover over list of news updates. We can see in inspector panel that all news and updates of that webpage are inside an unordered list with class rectlist.
Thus we have to scrap all list items present in aforementioned unordered list.
Writing script for scrapping
After running this script we’ll get required result in terminal.
You can learn more about BeautifulSoup library by visiting it’s official documentation.
Update: I recently came to know another very good resource(https://likegeeks.com/python-web-scraping) for Web Scrapping. I think you may also like it.
You can comment below any doubt / criticism or compliment. Also if you like this post please share it among your hacker friends. Don’t forget to bookmark my blog and keep visiting frequently. Bonne journee!