Skip to main content

Proxies

In the world of web scraping and automated data extraction, proxies play a vital role in ensuring anonymity, avoiding detection and bypassing restrictions set by websites. A proxy server acts as an intermediary between the user's device and the target website, masking the user's actual IP address and routing requests through different IPs. This is particularly important when scraping large volumes of data, as websites often impose rate limits or block repeated requests from the same IP address. By utilizing a proxy pool— a collection of multiple proxies, users can distribute their web requests across various IPs, reducing the risk of being flagged or blocked. Proxies and their proper management, including rotation and failover mechanisms, are essential for optimizing the performance and reliability of scraping agents, ensuring efficient and uninterrupted data extraction.

A proxy (or proxy server) is an intermediary between a user's device and the internet, acting as a gateway that allows users to access websites or services indirectly by routing their requests through the proxy. The user's actual IP address is replaced by a proxy or a list of proxies, which can be rotated with each page load based on the user's requirements.

Note: Sequentum Data Center is the free proxy pool provided to ORG to use. However, there is one more Proxy Pool “Sequentum Residential” which is paid and comes with a fee per GB of usage. Prices depend on the subscription tier.

image-20241020-113913.png

Proxy type 

This setting specifies the proxy type used for scraping. Sequentum Cloud supports the 'Proxy Pool' type. Users will leverage the pool of proxies to enhance their web scraping activities, potentially improving anonymity and efficiency.

  • Proxy pool: Proxies play a crucial role in web scraping operations to ensure anonymity and avoid website blocking. Sequentum Cloud allows for the configuration of proxy pools, which are collections of proxies purchased or rented from a provider. A proxy pool can combine multiple proxy provider pools into one large pool for efficient management. When this option is selected, two new textboxes - ”Proxy Rotation Interval” and “Proxy Pool” along with a “New Proxy Pool”' button, will appear on the user interface as shown below : 

image-20241014-153541.png

Proxy rotation interval: This property specifies how frequently the proxy pool is rotated based on the number of page loads executed. The proxy will rotate with each new page load according to the count specified by the user, helping to prevent website blocking. The effectiveness of this property may vary depending on the website’s performance and how frequently it experiences blocking. By default, the count is set to 0 which means the proxies will not get rotated.

image-20241014-153433.png

Rotating Proxies in Design Time

The Rotate Proxy functionality in Sequentum Cloud is an advanced feature designed to enhance the performance and reliability of web scraping agents by preventing blocking issues and optimizing network requests. When scraping large amounts of data or accessing websites that implement rate-limiting and IP-based restrictions, using proxies becomes crucial to avoid detection and maintain smooth agent operation.

In Sequentum Cloud, proxies allow the agent to disguise its real IP address by routing web requests through different proxy servers. The Rotate Proxy feature allows users to dynamically switch between multiple proxy IPs during manual execution, offering flexibility and improved success rates when accessing sites that monitor and restrict repeated requests from the same IP address.

Here’s how the Rotate Proxy functionality works and its benefits:

  1. Using Proxies to Avoid Blocking:
    Websites often implement security measures to block or limit access from IP addresses that make too many requests in a short time, which can cause the scraping agent to fail. By using a pool of proxies, Sequentum Cloud agents can distribute requests across different IPs, reducing the chances of being flagged as suspicious. This ensures smooth and uninterrupted scraping, even when extracting data from sites with strict anti-scraping measures.

  2. Proxy  Rotation:
    If a user has a proxy pool with multiple proxies (e.g., A1, A2, A3, A4, A5), the Rotate Proxy feature allows the agent to rotate between these proxies either manually or automatically. For example, if the current proxy is A1 and the user clicks the Rotate Proxy button, the agent will switch to a random proxy from the pool, such as A3 or A5. This rotation happens on each click, selecting a new random proxy from the pool, which helps in balancing the load and reducing the likelihood of detection.

  3. Manual Rotation for Flexibility:
    The Rotate Proxy button can be used at any time during manual browsing or agent execution. If the user suspects that the current proxy is being blocked or restricted, they can manually click the button to immediately change the proxy IP and continue browsing or scraping with a fresh identity. This helps the agent recover quickly from potential roadblocks or timeouts without having to restart the entire session.

  4. Randomized Proxy Selection:
    The rotation of proxies is randomized, meaning that each time the button is clicked, a new IP address from the pool is selected at random, preventing patterns or predictable behavior. This randomness helps make the scraping agent’s behavior less detectable by anti-scraping algorithms, further minimizing the risk of being blocked.

  5. Automatic Proxy Cycling:
    In addition to manual rotation, Sequentum Cloud allows users to configure automatic proxy cycling. This feature automatically rotates proxies after a specified number of requests or time intervals, ensuring that the agent never overuses a single proxy and evenly distributes requests across the entire proxy pool.

  6. Improved Success Rates for Data Extraction:
    By rotating proxies, Sequentum Cloud agents can bypass IP-based restrictions such as CAPTCHA challenges, rate-limiting, or outright bans. This leads to higher success rates for data extraction, especially when scraping large volumes of data from sites that actively monitor traffic patterns.

  7. Dynamic Proxy Management:
    Users can manage their proxy pool within the Sequentum Cloud interface, adding or removing proxies as needed. The Rotate Proxy feature works seamlessly with any number of proxies in the pool, whether it’s a small set of 5 proxies or a larger pool of dozens. This dynamic management gives users the flexibility to scale their proxy usage based on the demands of their project.

  8. Use Cases for Proxy Rotation:

    • Geo-restricted content: When scraping content that is restricted by geographical location, proxies can be used to route requests through different regions. Rotating proxies ensures that the agent can access content from multiple locations without being blocked.

    • High-frequency scraping: For projects that involve making frequent requests to the same website, proxy rotation helps avoid triggering rate-limits or IP bans by spreading requests across multiple IP addresses.

    • Avoiding bans on competitive sites: Some websites actively monitor IP addresses to prevent competitors from scraping their content. By rotating proxies, the agent can stay under the radar and gather data without triggering defenses.

  9. Proxy Failover Support:
    In cases where a particular proxy is down or unreachable, the Rotate Proxy feature can be used to switch to another functioning proxy without interrupting the scraping workflow. This failover mechanism ensures that the agent continues to operate smoothly even if one or more proxies in the pool become unavailable.

  10. Seamless Integration in the Status Bar:
    The Rotate Proxy button is conveniently integrated into the Sequentum Cloud status bar, making it easy for users to switch proxies with a single click. This user-friendly interface provides real-time control over the proxy rotation, offering a quick solution to any network-related issues during browsing or agent execution.

Proxy pool: In this textbox, the user can either manually enter the proxy pool name or select a desired proxy pool from the dropdown menu. The available options in the dropdown are those created by the user in the Sequentum Cloud Control Center, accessible via Organizations > Proxy Pools, as shown below

image-20241010-090436.png

Sequentum Cloud Control Center Proxy Pools

                   

image-20241010-101203.png

Sequentum Cloud Editor Proxy Pools 

New Proxy Pool: If the user wants to add and create their own Proxy Pool, then this option comes into play. Clicking this option/button generates a new window with various options as listed below:

image-20241010-101345.png
  • Back to proxies: This option navigates the user back to the previous page, which is the Proxy Settings.

image-20241010-100859.png

Before clicking on Back to proxies

image-20241014-153200.png

After clicking on Back to proxies

  • Proxy pool name: The user must enter a desired name for the Proxy Pool, which will be displayed in the UI within the Proxy Pool List dropdown menu and will be used by the user at runtime.

image-20241010-100712.png

Description: The user must provide any description, such as the country of the proxy pool being used or details about a specific agent for which the proxy pool is created.

image-20241010-100547.png
  • Proxies: The user must enter a list of valid proxies, including the associated usernames and passwords.

image-20241010-100414.png
  • Add Proxy Pool: This option is used to successfully create the proxy pool and the same will be reflected on the UI under the Proxy Pool List dropdown menu.

image-20241010-100212.png

Before clicking on Add Proxy Pool 

image-20241010-095944.png

After clicking on Add Proxy Pool 

  • Cancel: This option cancels the process of creating a new proxy pool and returns the user to the previous page, i.e., Proxy Settings.

image-20241010-095754.png

                           

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.