Apify Platform
Apify is a platform built to serve large-scale and high-performance web scraping and automation needs. It provides easy access to compute instances (Actors), convenient request and result storages, proxies, scheduling, webhooks and more, accessible through a web interface or an API.
While we think that the Apify platform is super cool, and it's definitely worth signing up for a free account, Crawlee is and will always be open source, runnable locally or on any cloud infrastructure.
We do not test Crawlee in other cloud environments such as Lambda or on specific architectures such as Raspberry PI. We strive to make it work, but there are no guarantees.
Logging into Apify platform from Crawlee
To access your Apify account from Crawlee, you must provide credentials - your API token. You can do that either by utilizing Apify CLI or with environment variables.
Once you provide credentials to your scraper, you will be able to use all the Apify platform features, such as calling actors, saving to cloud storages, using Apify proxies, setting up webhooks and so on.
Log in with CLI
Apify CLI allows you to log in to your Apify account on your computer. If you then run your scraper using the CLI, your credentials will automatically be added.
npm install -g apify-cli
apify login -t YOUR_API_TOKEN
Log in with environment variables
Alternatively, you can always provide credentials to your scraper
by setting the APIFY_TOKEN
environment
variable to your API token.
There's also the
APIFY_PROXY_PASSWORD
environment variable. Actor automatically infers that from your token, but it can be useful when you need to access proxies from a different account than your token represents.
Log in with Configuration
Another option is to use the Configuration
instance and set your api token there.
import { Actor } from 'apify';
const sdk = new Actor({ token: 'your_api_token' });
What is an actor
When you deploy your script to the Apify platform, it becomes an actor. An actor is a serverless microservice that accepts an input and produces an output. It can run for a few seconds, hours or even infinitely. An actor can perform anything from a simple action such as filling out a web form or sending an email, to complex operations such as crawling an entire website and removing duplicates from a large dataset.
Actors can be shared in the Apify Store so that other people can use them. But don't worry, if you share your actor in the store and somebody uses it, it runs under their account, not yours.
Related links
Running an actor locally
First let's create a boilerplate of the new actor. You could use Apify CLI and just run:
apify create my-hello-world
The CLI will prompt you to select a project boilerplate template - let's pick "Hello world". The tool will create a directory called my-hello-world
with a Node.js project files. You can run the actor as follows:
cd my-hello-world
apify run