The complete announcement is here, but here are the changes for the java SDK. The two big changes I liked is the fact that there is now an â€œalways onâ€ feature, and â€œtasksâ€ feature has graduated out of beta/testing.
- The Always On feature allows applications to pay and keep 3 instances of
their application always running, which can significantly reduce application latency.
- Developers can now enable Warmup Requests. By specifying a handler in an app’s appengine-web.xml, App Engine will attempt to to send a Warmup Request to initialize new instances before a user interacts with it. This can reduce the latency an end-user sees for initializing your application.
- The Channel API is now available for all users.
- Task Queue has been officially released, and is no longer an experimental feature. The API import paths that use ‘labs’ have been deprecated. Task queue storage will count towards an application’s overall storage quota, and will thus be charged for.
- The deadline for Task Queue and Cron requests has been raised to 10 minutes. Datastore and API deadlines within those requests remain unchanged.
- For the Task Queue, developers can specify task retry-parameters in their queue.xml.
- Metadata Queries on the datastore for datastore kinds, namespaces, and entity properties are available.
- URL Fetch allowed response size has been increased, up to 32 MB. Request
size is still limited to 1 MB.
- The Admin Console Blacklist page lists the top blacklist rejected visitors.
- The automatic image thumbnailing service supports arbitrary crop sizes up to 1600px.
- Overall average instance latency in the Admin Console is now a weighted average over QPS per instance.
- Added a low-level AysncDatastoreService for making calls to the datastore asynchronously.
- Added a getBodyAsBytes() method to QueueStateInfo.TaskStateInfo, this returns the body of the task state as a pure byte-string.
- The whitelist has been updated to include all classes from javax.xml.soap.
- Fixed an issue sending email to multiple recipients. http://code.google.com/p/googleappengine/issues/detail?id=1623
Any blog which promotes the concept of cloud infrastructure would be doing injustice if it doesnâ€™t provide references to implementations where it failed horribly. Here is an excellent post by Carlos Ble where he lists out all the problems he faced on Google App engine (python). He lists 13 different limitations, most of which are very well known facts, and then lists some more frustrating reasons why he had to dump the solution and look for an alternative.
The tone of the voice is understandable, and while it might look like App-Engine-bashing, I see it as a great story which others could lean from.
For us, GAE has been a failure like Wave or Buzz were but this time, we have paid it with our money. I’ve been too stubborn just because this great company was behind the platform but I’ve learned an important lesson: good companies make mistakes too. I didn’t do enough spikes before developing actual features. I should have performed more proofs of concept before investing so much money. I was blind.
Cloud is not for everyone or for all problems. While some of these technologies take away your growing pain points, they assume you are ok with some of the limitations. If you were surprised by these limitations after you are neck deep in coding, then you didnâ€™t do your homework.
Here are the 13 points issues he pointed out. I havenâ€™t used Google App engine lately, but my understanding is that App engine team have solved, or on the path of solving (or reducing pain) some of these issues.
- Requires Python 2.5
- Cant use HTTPS
- 30 seconds to run
- URL fetch gets only 5 seconds
- Canâ€™t use python libraries compiled in C
- No â€œLIKEâ€ operators in datastore
- Canâ€™t join tables
- â€œToo many indexesâ€
- Only 1000 records at a time returned
- Datastore and memcache can fail at times
- Max memcache size is 1MB
Yesterday Google formally announced Google Storage to a few (5000?) of us at Google I/O. Here is the gist of this as I see it from the various discussions/talks I attended.
To begin with, I have to point out that there is almost nothing new in what Google has proposed to provide. Amazon has been doing this for years with its S3. The key difference is that if you are a google customer you wonâ€™t have to look elsewhere for storage services like this one.
Lets get the technical details out
- Its tries to implement a Strong consistency model (CA of the CAP: Consistent and Available). Which means data you store is automatically replicated in a consistent way across multiple datacenter
- Currently it replicates to multiple locations within US. In future it does plan to replicate across continents.
- Currently there are no controls to control how replication happens or to where. They plan to learn from usage in beta period and develop controls over time.
- There are two basic building blocks for objects
- Buckets â€“ Containers
All objects are stored in flat container. However, the tools understand â€œ/â€ and â€œ*â€ (wild cards) and does the right thing when used correctly
- Objects â€“ objects/files inside those containers
- Implements RESTful APIs (GET/PUT/POST/DELETE/HEAD/etc)
- All resources are identified by a URI
- No theoretical size limit of Buckets or containers. However a 100GB limit per account would be imposed during beta phase.
- Its of course built on Google very well tested, scalable, highly available infrastructure
- It provides multiple, flexible authentication and sharing models
- Does support standard public/private key based auth
- Will also have integration with some kind of groups which will allow object to be shared with or controlled by with multiple identities.
- ACLs can be applied to both Buckets and Objects
- Control who can list objects
- Who can create/delete objects
- Who can read/write into the bucket
- Who can read
- Who can read/write
- There were two tools mentioned during the talk
- GS manager looks like a web application which allows an admin to manage this service
- GS util is more like the shell tools AWS provides for S3.
- As I mentioned before GS util accepts wild card
- So something like this is possible
- gsutil cp gs://gs2010/* /home/rkt/gs2010
- The service was created with â€œdata liberationâ€ as one of the goals. As shown by the previous command it takes just one line of code to transfer all of your data out.
- Resume feature (if connection breaks during a big upload) is not available yet, but thats on the roadmap.
- Groups feature was discussed a lot, but its not ready in the current release
- Versioning feature is not available. Wasnâ€™t clear if its on the roadmap or how long before its implemented.
Few other notes.
- Its not clear how this plays with the â€œstorage serviceâ€ Google currently provides for Gmail/Docs storage. From what I heard this is not related to that storage service at all and there are no plans to integrate it.
- The service is free in beta period to all developers who get access to it, but when its released it will follow a pricing model similar others in the industry. The pricing model is already published on their website
- The speakers and the product managers didnâ€™t comment on whether storage access from google apps engine would be charged (or at what rate)
- They do provide MD5 signatures as a way of verifying if an object on the client is same as the object on the server, but its not used storing files itself. (So MD5 collisions issue shouldnâ€™t be a problem)
- US Navy is already using this service with about 80TB of data on Google Storage, and from what I heard they looked pretty happy talking about it.
I suspect this product will be in beta for a while before they release it out in the open.
Lots of interesting updates today.
But would like to first mention the fantastic work Cloud computing group at UCSB are doing to make appengine framework more open. They have done significant work at making appscale â€œworkâ€ with different kinds of data sources including HBase, Cassandra, Voldemort, MongoDB, Hypertable and Mysql and MemcacheDB. Appscale is actively looking for folks interested in working with them to make this stable and production ready.
- GAE 1.3.1 released: I think the biggest news about this release is the fact that 1000 row limit has now been removed. You still have to deal with the 30 second processing limit per http request, but at least the row limit is not there anymore. They have also introduced support for automatic transparent datastore api retries for most operations. This should dramatically increase reliability of datastore queries, and reduces the amount of work developers have to do to build this auto-retry logic.
- Elastic search is a lucene based indexing product which seems to do what Solr used to do with the exception that it can now scale across multiple servers. Very interesting product. Iâ€™m going to try this out soon.
- MemcacheDB: A distributed key-value store which is designed to be persistent. It uses memcached protocol, but its actually a datastore (using Berkley DB) rather than cache.
- Nasuni seems to have come up with NAS software which uses cloud storage as the persistent datastore. It has capability to cache data locally for faster access to frequently accessed data.
- Guys at Flickr have two interesting posts you should glance over. â€œUsing, Abusing and Scaling MySQL at Flickrâ€ seems to be the first in a series of post about how flickr scales using Mysql. The next one in the series is â€œTicket Servers: Distributed Unique Primary Keys on the Cheapâ€
- Finally a fireside chat by Mike Schroepfer, VP of Engineering, about Scaling Facebook.
If you donâ€™t like EC2 you have an option to move your app to a new vendor. But if you donâ€™t like GAE (Google app engine) there arenâ€™t any solutions which can replace GAE easily.
AppScale might change that.
AppScale is an open-source implementation of the Google AppEngine (GAE) cloud computing interface from the RACELab at UC Santa Barbara. AppScale enables execution of GAE applications on virtualized cluster systems. In particular, AppScale enables users to execute GAE applications using their own clusters with greater scalability and reliability than the GAE SDK provides. Moreover, AppScale executes automatically and transparently over cloud infrastructures such as the Amazon Web Services (AWS) Elastic Compute Cloud (EC2) and Eucalyptus, the open-source implementation of the AWS interfaces.
The list of supported infrastructures is very impressive. However the key, in my personal opinion, would be stability and compatibility with current GAE APIs.
Learn more about AppScale:
- AppScale Home page
- Google Code page
- Google Group for AppScale
- Demo at Bay area GAE Developers meeting: At Googleplex ( Feb 10, 2010)
Last week I spent a few hours building a search engine testing tool called â€œBlackboxSETâ€. The purpose of the tool was to allow users to see search results from three different search providers and vote for the best set of results without knowing the source of the results. The hope was that the search engine which presents best set of results on the top of the page will stand out. What we found was interesting. Though Googleâ€™s search score arenâ€™t significantly better than Yahooâ€™s or Bingâ€™s, it is the current leader on BlackboxSET.
But this post is about what it took me to build BlackboxSET on GAE which as you can see is a relatively simple application. The entire app was built in a few hours of late night hacking and I decided to use Googleâ€™s AppEngine infrastructure to learn a little more about GAE.
- Ability to randomly show results from the three search engines
- Persist data collected after the user votes
- Report the results using a simple pie chart in real time if possible
- Each time the user does a new search, a random sequence is generated on the server which represents the order in how the user will see the results on the browser.
- When the user clicks on â€˜Voteâ€™ button, the browser will make a call to the server to log the result and to retrieve the source of search results from the server.
Decisions and observations made while trying to build this on GAE
- Obviously using Java was not optional since I didnâ€™t know python.
- And since I havenâ€™t played with encrypted cookies, the decision was made to persist the randomized order in session object which looked pretty straight forward.
- Since the user sessions are relatively short and since session objects in GAE/java are persisted to memcache automatically, it was decided not to interact with memcache directly. This particular feature of GAE/java is not documented clearly, and from what Iâ€™ve heard from Google Engineers its something they donâ€™t openly recommend to rely on. But it works and I have used in the past without any problems.
- When the voting results from the browser are sent to the server, the server logs it without any processing in a simple table in datastore. The plan was to keep sufficient information in these event logs so that if the app does get hacked/gamed, additional information in the event logs will help us filter out events which should be rejected. It unfortunately also means that to extract anything interesting from this data, one would have to spend a lot of computational resources to parse it.
- Google Chart API was used for graphing. This was a no brainer. But because GAE limits on the number of rows per datastore query to 1000, I had to limit the chart API to look at only last 1000 results. GAE now provides a â€œTaskâ€ feature which I think can be used offline processing but havenâ€™t used it yet.
Problems I ran into â€“ I had designed the app to resist gaming, but was not adequately prepared for some of the other challenging problems related to horizontal scalability.
- The first problem was that processing 1000 rows of voting logs to generate graph for each person was taking upto 10 to 15 seconds on GAE infrastructure. The options I had to solve this problem was, to either reduce the log sample size requested from Datastore (something smaller than 1000), or to cache the results for a period of time so that not all users were impacted by the problem. I went with the second option.
- The second problem was sort of a showstopper. Some folks were reporting inaccurate search resultsâ€¦ in some cases there were duplicates with the same set of search results shown in two out of three columns. This was bad. And even more weird was the fact that it never happened when I was running the app on my desktop inside the GAE sandbox. Also mysterious was that the problems didnâ€™t show up until the load started picking up app (thanks to a few folks who twittered it out).
- The root cause of these issues could be due to the way I assumed the session objects are persisted and replicated in GAE/java. I assumed that when I persist an object in the apps session object, it is synchronously replicated to the memcache.
- I also assumed that if multiple instances of the app were brought up by GAE under heavy load, it will try to do some kind of sticky loadbalancing. Sticky loadbalacing is an expensive affair so on hindsight I should have expected this problem. However I didnâ€™t know that GAE infrastructure will start loadbalancing across multiple instances even at 2 requests per second which seems too low.
- Since the randomization data cannot be stored in cookie (without encrypting), I had to store it on the server. And from the point when the user is presented with a set of search results, to the point when the user votes on it, it would be nice to keep the user on the same app instance. Since I GAE was switching users (was doing loadbalancing based on load) I had to find a more reliable way to persist the randomization information.
- The solution I implemented was two fold. First I reduced the number of interactions between the browser and the backend server from 4 to 2 HTTP requests. This effectively reduced the probability of users switching app instances during the most critical part of the appâ€™s operation . The second change was that I decided not to use Session object and instead used memcache directly to make this the randomization data persist a little more reliably.
- On hindsight, I think encrypted cookies would have been a better approach for this particular application. It completely side-steps the requirement of keeping session information on the server.
Iâ€™m sure this is not the end of all the problems. If there is an update Iâ€™ll definitely post it here. If there are any readers who are curious about anything specific please let me know and Iâ€™ll be happy to share my experiences.