How to setup Amazon Cloudfront ( learning with experimentation )
I have some experience with Akamaiâ€™s WAA (Web applications archive) service, which Iâ€™ve been using in my professional capacity for a few years now. And Iâ€™ve have been curious about how cloudfront compares with it. Until a few weeks ago, Cloudfront didnâ€™t have a key feature which I think was critical for it to win the traditional CDN customers. â€œCustom originâ€ is an amazing new feature which I finally got to test last night and here are my notes for those who are curious as well.
My test application which I tried to convert was my news aggregator portal http://www.scalebig.com/. The application consists of a rapidly changing front page (few times a day) , a collection of old pages archived in a sub directory and some other webpage elements like headers, footers, images, style-sheets etc.
- While Amazon Coudfront does have a presence on AWS management console, it only supports S3 buckets as origins.
- Since my application didnâ€™t have any components which requires server side processing, I tried to put the whole website on an S3 bucket and tried to use S3 as the origin.
- When I initially set it up, I ended up with multiple URLs which I had to understand
- S3 URL â€“ This is the unique URL to your S3 bucket. All requests to this URL will go to Amazons S3 server cluster, and if your objects are marked as private, anyone can get these objects. The object could be a movie, an image, or even an HTML file.
- Cloudfront URL â€“ This is the unique Cloudfront URL which maps to your S3 resource through the cloudfront network. For all practical purposes its the same as the first one, except that this is through the CDN service.
- Your own domain name â€“ This is the actual URL which end users will see, which will be a CNAME to the cloudfront URL.
- So in my case, I configured the DNS entry for www.scalebig.com to point to DNS entry Cloudfront service created for me (dbnqedizktbfa.cloudfront.net).
- First thing which broke is that I forgot that this is just an S3 bucket, so it canâ€™t handle things like â€œsparsed htmlâ€ to dynamically append headers/footers. I also realized that it canâ€™t control cache policies, setup expiry, etc. But the worst problem was that if you went to â€œhttp://www.scalebig.com/â€ it would throw an error. It was expecting a file name, so http://www.scalebig.com/index.html would have worked.
- In short I realized that my idea of using S3 as a webserver full of holes.
- When I started digging for options to enable â€œcustom originâ€ I realized that those options do not exist on the AWS management console !!. I was instead directed to some third party applications to do this instead. (most of them were commercial products, except two)
- I finally created the cloudfront configuration using Cloudberry S3 Explorer PRO which allowed me to point Cloudfront to a custom domain name (instead of an S3 resource).
- In my case my server was running on EC2 with a public reserved IP. Iâ€™m not yet using AWS ELB (Elastic loadbalancer).
- Once I got that working, which literally worked out of the box, the next challenge is to setup the cache controls and expiries working. If they are set incorrectly, it may stop users from getting latest content. I setup the policies using â€œ.htaccessâ€. Below Iâ€™ve attached a part of the .htaccess I have for the /index.html page which is updated many times a day. There is a similar .htaccess page for rest of the website which recommends a much longer expiry.
- Finally I realized that it is possible that I might have to invalidate parts of the caches at times (could be due to a bug). Cloudberry and AWS management console didnâ€™t have any option avaliable, but apparently â€œbotoâ€ has some APIs which can work with Amazon cloudfront APIs to do this.
# turn on the module for this directory
# set default
ExpiresDefault "access plus 1 hours"
ExpiresByType image/jpg "access plus 1 hours"
ExpiresByType image/gif "access plus 1 hours"
ExpiresByType image/jpeg "access plus 1 hours"
ExpiresByType image/png "access plus 1 hours"
ExpiresByType text/css "access plus 1 hours"
ExpiresByType application/x-shockwave-flash "access plus 1 hours"
Header set Cache-Control "max-age=3600"
Here is how I would summarize the current state of Amazon cloudfront.
- Its definitely ready for static websites which donâ€™t have any server side execution code.
- Cloudfront only accepts GET and HEAD requests
- Cloudfront ignores cookies, so server canâ€™t set any. (Browser based cookie management will still work, which could be used to keep in-browser session data)
- While Cloudfront can log access logs to an S3 bucket of your choice, Iâ€™ll recommend using something like Google Analytics to do log analysis.
- Iâ€™ll recommend buying one of the commercial third party products if you want to use Custom Origin and would recommend reading more about the protocols/APIs before you fully trust a production service to Cloudfront.
- I wish Cloudfront starts supporting something like ESI, which could effectively make an S3 bucket a full fledged webserver without the need of having a running EC2 instance all the time.
- Overall Cloudfront has a very long way to go, in the number of features, to be treated as a competitor for Akamaiâ€™s current range of services.
- And if you look at Akamaiâ€™s current world wide presence, Cloudfront is just a tiny blip. [ Cloudfront edge locations ]
- But I suspect that Cloudfrontâ€™s continuous evolution is being watched by many and the next set of features could change the balance.
Iâ€™m planning to leave http://www.scalebig.com/ on Cloudfront for some time to learn a little more about its operational issues. If you have been using Cloudfront please feel free to leave comments about what important features, you think, are still missing.