Listen and read

Step into an infinite world of stories

  • Read and listen as much as you want
  • Over 1 million titles
  • Exclusive titles + Storytel Originals
  • 7 days free trial, then €9.99/month
  • Easy to cancel anytime
Subscribe Now
Details page - Device banner - 894x1036

Go Web Scraping Quick Start Guide: Implement the power of Go to scrape and crawl data from the web

Language
English
Format
Category

Non-fiction

Learn how some Go-specific language features help to simplify building web scrapers along with common pitfalls and best practices regarding web scraping.

Key Features

• Use Go libraries like Goquery and Colly to scrape the web

• Common pitfalls and best practices to effectively scrape and crawl

• Learn how to scrape using the Go concurrency model

Book Description

Web scraping is the process of extracting information from the web using various tools that perform scraping and crawling. Go is emerging as the language of choice for scraping using a variety of libraries. This book will quickly explain to you, how to scrape data data from various websites using Go libraries such as Colly and Goquery.

The book starts with an introduction to the use cases of building a web scraper and the main features of the Go programming language, along with setting up a Go environment. It then moves on to HTTP requests and responses and talks about how Go handles them. You will also learn about a number of basic web scraping etiquettes.

You will be taught how to navigate through a website, using a breadth-first and then a depth-first search, as well as find and follow links. You will get to know about the ways to track history in order to avoid loops and to protect your web scraper using proxies.

Finally the book will cover the Go concurrency model, and how to run scrapers in parallel, along with large-scale distributed web scraping.

What you will learn

• Implement Cache-Control to avoid unnecessary network calls

• Coordinate concurrent scrapers

• Design a custom, larger-scale scraping system

• Scrape basic HTML pages with Colly and JavaScript pages with chromedp

• Discover how to search using the "strings" and "regexp" packages

• Set up a Go development environment

• Retrieve information from an HTML document

• Protect your web scraper from being blocked by using proxies

• Control web browsers to scrape JavaScript sites

Who this book is for

Data scientists, and web developers with a basic knowledge of Golang wanting to collect web data and analyze them for effective reporting and visualization.

© 2019 Packt Publishing (Ebook): 9781789612943

Release date

Ebook: January 30, 2019

Others also enjoyed ...

  1. Big Data for Beginners: An Introduction to the Data Collection, Storage, Data Cleaning and Preprocessing Brian Murray
  2. Building Microservices: Designing Fine-Grained Systems Sam Newman
  3. Modern Web Development with Go Dušan Stojanovi?
  4. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking Foster Provost
  5. Ultimate Web Automation Testing with Cypress Vitaly Skadorva
  6. JavaScript: The Definitive Guide: Master the World's Most-Used Programming Language, 7th Edition David Flanagan
  7. Monolith to Microservices: Evolutionary Patterns to Transform Your Monolith Sam Newman
  8. Permanent Record: A Memoir of a Reluctant Whistleblower Edward Snowden
  9. Blockchain For Dummies Tiana Laurence
  10. Mastering Serverless Computing with AWS Lambda Eidivandi Omid
  11. Fundamentals of Software Architecture: An Engineering Approach Neal Ford
  12. Social Engineering: The Science of Human Hacking 2nd Edition Christopher Hadnagy
  13. Data Science John D. Kelleher
  14. Data Science Demystified: A Comprehensive Guide to Data-Driven Decision Making Alexander Scott
  15. Programming Interviews For Dummies Eric Butow
  16. Database Internals: A Deep Dive into How Distributed Data Systems Work, 1st Edition Alex Petrov
  17. The Clean Coder: A Code of Conduct for Professional Programmers Robert C. Martin
  18. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems Martin Kleppmann
  19. Articulating Design Decisions: Communicate with Stakeholders, Keep Your Sanity, and Deliver the Best User Experience Tom Greever
  20. 97 Principles for Software Architects: Axioms for software architecture and development written by industry practitioners Multiple Authors
  21. NoSQL Databases: Comprehensive Introduction to Non-Relational Data Management Brian Paul
  22. Time Series Databases: A Practical Guide to Storing, Analyzing, and Visualizing Time-Stamped Data SAM CAMPBELL
  23. Clean Code: A Handbook of Agile Software Craftsmanship Robert C. Martin
  24. The Deep Learning Revolution Terrence J. Sejnowski
  25. How Smart Machines Think Sean Gerrish
  26. Mastering Blockchain: Unlocking the Power of Cryptocurrencies, Smart Contracts, and Decentralized Applications Daniel Cawrey
  27. Design Thinking Introbooks Team
  28. Software Designing and Development. Clear Guide for Beginners: Unlocking the Secrets of Software Design and Development for Beginners James Ferry
  29. Python: - The Bible- 3 Manuscripts in 1 book: Python Programming for Beginners - Python Programming for Intermediates - Python Programming for Advanced Maurice J. Thompson
  30. Connected Strategy: Building Continuous Customer Relationships for Competitive Advantage Christian Terwiesch
  31. X-Plan Parenting: Become Your Child's Ally—A Guide to Raising Strong Kids in a Challenging World Bert Fulks
  32. Summary of Eric Ries's The Startup Way IRB Media
  33. The Digital Transformation Roadmap: Rebuild Your Organization for Continuous Change David L. Rogers
  34. LEAN: Ultimate Collection: Lean Startup, Lean Analytics, Lean Enterprise, Kaizen, Six Sigma, Agile Project Management, Kanban, Scrum Jason Bennett, Jennifer Bowen
  35. Hacker, Hoaxer, Whistleblower, Spy: The Many Faces of Anonymous Gabriella Coleman
  36. Digital @ Scale: The Playbook You Need to Transform Your Company Jurgen Meffert
  37. Mixed Problem Solving Methodology: The skill that changes your life Rocco Mela
  38. Three Mothers: How the Mothers of Martin Luther King Jr, Malcolm X and James Baldwin Shaped a Nation Anna Malaika Tubbs
  39. Software Architecture: The Hard Parts: Modern Trade-Off Analyses for Distributed Architectures Neal Ford
  40. Bleaker House: Chasing My Novel to the End of the World Nell Stevens
  41. Noise Daniel Kahneman
  42. A Course Called Scotland: Searching the Home of Golf for the Secret to Its Game Tom Coyne
  43. Clean Agile: Back to Basics Robert C. Martin
  44. China Unbound: A New World Disorder Joanna Chiu

This is why you’ll love Storytel

  • Listen and read without limits

  • 800 000+ stories in 40 languages

  • Kids Mode (child-safe environment)

  • Cancel anytime

Unlimited stories, anytime
Time limited offer

Unlimited

Listen and read as much as you want

9.99 € /month
  • 1 account

  • Unlimited Access

  • Offline Mode

  • Kids Mode

  • Cancel anytime

Try now