Good news everyone! Page Replica is Available as a Web App!

Good news everyone! Page Replica is Available as a Web App!

If you want to avoid the hassle of setting up your own pre-rendering tool, check out Page Replica. Manage and re-render your pages effortlessly!

Key Feature

Page Replica is free to use for up to 5,000 requests per month.
Unlimited sites
API access

Need Assistance?

If you have any questions or need support, we're here to help! Join our GitHub Discussion to get in touch with us.

Page Replica free tool

"Page Replica" is a versatile web scraping and caching tool built with Node.js, Express, and Puppeteer. It helps prerender web app (React, Angular, Vue,...) pages, which can be served via Nginx for SEO or other purposes.

The tool allows you to scrape individual web pages or entire sitemaps trough an api, selectively removing JavaScript, and caching the resulting HTML.

Additionally, it features an Nginx configuration that optimally handles user and search engine bot traffic.

Installation

Clone the Repository:

git clone https://github.com/html5-ninja/page-replica.git
cd page-replica

Install Dependencies:
```
npm install
```
Settings:

index.js

const CONFIG = {
baseUrl: "https://example.com",
removeJS: true,
addBaseURL: true,
cacheFolder: "path_to_cache_folder",
}

app.js : set the port for your API

Start the API:
```
npm start
```

Usage

By scraping a page or a sitemap, a copy of the prerendered page will be stored in the cache folder.

Scraping Individual Pages

To scrape a single page, make a GET request to /page with the url query parameter:

curl http://localhost:8080/page?url=https://example.com

Scraping Sitemaps

To scrape pages from a sitemap, make a GET request to /sitemap with the url query parameter:

curl http://localhost:8080/sitemap?url=https://example.com/sitemap.xml

Serve the Cached Pages to Bots with Nginx (My Recipe)

In this case, the cached pages are served using Nginx. You can adapt this configuration to your needs and your server.

The Nginx configuration, residing in nginx_config_sample/example.com.conf, thoughtfully manages traffic. It efficiently routes regular users to the main application server and redirects search engine bots to a dedicated server block for cached HTML delivery.

Please review the nginx_config_sample/example.com.conf file to gain an understanding of its functionality.

Contribution

We welcome contributions! If you have ideas for new features or server/cloud configurations that could enhance this tool, feel free to:

Open an issue to discuss your ideas.
Fork the repository and make your changes.
Submit a pull request with a clear description of your changes.

Feature Requests and Suggestions

If you have any feature requests or suggestions for server/cloud configurations beyond Nginx, please open an issue to start a discussion.

Folder Structure

nginx_config_sample: Presents a sample Nginx configuration for redirecting bot traffic to the cached content server.
api.js: An Express application responsible for handling web scraping requests.
index.js: The core web scraping logic employing Puppeteer.
package.json: Node.js project configuration.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
nginx_config_sample		nginx_config_sample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api.js		api.js
index.js		index.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Good news everyone! Page Replica is Available as a Web App!

Key Feature

Need Assistance?

Page Replica free tool

Installation

Usage

Scraping Individual Pages

Scraping Sitemaps

Serve the Cached Pages to Bots with Nginx (My Recipe)

Contribution

Feature Requests and Suggestions

Folder Structure

About

Releases

Packages

Languages

License

Page-Replica/page-replica

Folders and files

Latest commit

History

Repository files navigation

Good news everyone! Page Replica is Available as a Web App!

Key Feature

Need Assistance?

Page Replica free tool

Installation

Usage

Scraping Individual Pages

Scraping Sitemaps

Serve the Cached Pages to Bots with Nginx (My Recipe)

Contribution

Feature Requests and Suggestions

Folder Structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages