Amazon Web Services
For cloud computing; this gives us the flexibility we need on various fronts combined with scalability. All of our production services run on EC2 instances. We're excited about how Amazon is helping scale innovation in general by increasingly building up from the bottom of the stack. For our needs their newer innovations have sometimes not panned out for us (we had issues with their Elastic Load Balancers a while back, and their search service lacked some flexibility and had a latency profile that just didn't work well for us) but we've been convinced for a while it's the most solid platform to be on. We are currently using S3, ELBs, CloudFormation, ElasticFileStore, among other services.
This is for cluster configuration, machine, and role management, keeping our production system sane as it scales. It’s an area where we’ve seen some iteration. At one point we had home-grown Go scripts to do deployment. Then we migrated to Docker. We’re excited that Google has finally brought some of its special sauce in this arena to the market.
We currently use RethinkDB as its an open-source, scalable JSON database built from the ground up for the realtime web. It inverts the traditional database architecture by exposing an exciting new access model – instead of polling for changes, we can tell RethinkDB to continuously push updated query results to applications in realtime. RethinkDB’s realtime push architecture dramatically reduces the time and effort necessary to build scalable realtime apps. It also works well with Bleve our current search.
Redis (Amazon ElastiCache)
A massive distributed cache, predominantly of individual object properties and query results, so we can serve most of our data without hitting the DB.
At the heart of our system must be a powerful pubsub solution, and Kafka is the latest technology we’ve adopted here. It must handle huge numbers of subscriptions and notifications so that when object properties change, we can notify all interested parties quickly. We process MySQL binlogs into a Kinesis stream, which we then read from. We're in the process of moving from to Kafka. In the meantime, our NetX stack uses a very basic home-grown pub-sub written in GoLang.
We switched from Solr and now power our search with this search platform. So far it's been felxible and generally much faster.
We record lots of raw data and metrics about NetX activity to help us understand how the product is working. We funnel all kinds of logs here, from mission-critical change data that affects product functionality to event logs that feed into reports on our key company metrics. In the past we’ve used Scribe and Flume.
Datadog / InfluxDb.
For collecting, graphing, and alerting on operational stats about production systems. We’ve tried Grafana in the past and are moving towards Datadog.
Useful Technical Information
Each module/service connects to the main repository via api calls. Main repository is exposing calls to the frontend app via http API.
Each repository module can use own technology and can communicate with other services via repository calls.
Repository and services are currently wrapped in a Docker container. We are now in the process of splitting these into at least three containers; Repository + File service + Services.