We have a scraping tool base and we want to continue to improve it.
We have a part of a chrome extension for scraping that we want to improve. You will have access to the code already produced.
The idea will be to enrich this extension with features that you will see with the chosen person.
In a second step, or in parallel, set up a Scrapy distributed architecture in serverless mode on AWS with the following features:
- Rotate proxy or IP
- browser random
- choice of data storage on different base, mysql, postgresql, Oracle, Elasticsearch, S3 etc.
- task scheduling.
the communication must be done between the chrome and scrapy extension via an API to set up.
You can also propose any solution equivalent to scrapy.
In a third time, make a mobile application, Android and iPhone of the extension chrome to also be able to launch tasks of recovery of data via the mobile.
In a fourth phase, implement a job tracking solution with error notification and success and also a dashboard for us to track usage times and costs per user of the solution.
This solution is for internal use, we need data to improve our AI product.
We are a young company that works in improving the business process of sales people. We want to simplify the life of companies by offering them simple and fast services of use.
At the moment we are 4 associates and 2 employees.
You do not have to answer all of our requests. You can do just one part.
The idea would be to synchronize with the various stakeholders for the completion of this project.
I remain at your disposal for any additional question.