My upcoming book, Learning Go, is due out in March from O'Reilly.It is targeted at experienced developers who are curious about Go as well as those have started using Go. Learning Go teaches you how to write idiomatic Go code, is up-to-date for Go 1.16, and includes a chapter that previews Go's Generics support, coming in early 2022. You can read the complete draft on O'Reilly Learning now at. Web scrapping with Golang September 4, 2018 kaviComments 0 Comment Web scrapping is a technic to parse HTML output of website. Most of the online bots are based on same technic to get required information about particular website or page. Building a Web Scraper. As I mentioned in the introduction, we’ll be building a simple web scraper in Go. Note that I didn’t say web crawler because our scraper will only be going one level deep (maybe I’ll cover crawling in another post). A simple concurrent scraper Our scraper will basically try to download a list of web pages we’re giving him first, and check it gets a 200 HTTP status code (meaning the server returned an HTML page without an error). We can use goquery for web page scraping in GoLang. We can download the HTML and scrape useful information using goquery package in GoLang.
Web scrapping is a technic to parse HTML output of website. Most of the online bots are based on same technic to get required information about particular website or page.
Using XML parser we can parse HTML page and get the required information. However, jquery selector are best to parse HTML page. So, in this tutorial we will be using Jquery library in Golang to parse the HTML doc.
Project Setup and dependencies
As mention above, we will be using Jquery library as a parser. So go get the library using following command
Golang Web Scraping Tutorial
Create a file webscraper.go and open it in any of your favorite text editor.
Web Scraper code to get post from website
2 4 6 8 10 12 14 16 18 20 22 24 26 28 | // import standard libraries 'github.com/PuerkitoBio/goquery' doc,err:=goquery.NewDocument('http://code2succeed.com') log.Fatal(err) // use CSS selector found with the browser inspector doc.Find('#main article .entry-title').Each(func(index int,item *goquery.Selection){ linkTag:=item.Find('a') fmt.Printf('Post #%d: %s - %sn',index,title,link) } funcmain(){ } |