UW Information Technology

December 3, 2015

__cloud content__ FAQ

Frequently Asked Questions about Cloud Computing

Q: What is the cloud?

A: Cloud is an umbrella term for separating what you do on computers (compute!) from the physical computer hardware; and this has many implications. Cloud computing is sometimes described as a utility like electricity: A subscriber does not manage power lines (computer hardware); but pays for electricity (computing) on an as-used basis. While this is true the metaphor is overly simple. Electricity is electricity and water is water but what you can do with information is complex and limitless; so public clouds like those offered by Amazon and Microsoft and Google are all rapidly expanding their services to explore that potential. By providing these services — much more than just processors and storage — the cloud vendors make a win-win bet: They get revenue and their subscribers recover time and save money over the traditional model where they purchase and maintain their own computer systems and recreate those services.

Q: Wait a moment.. aren’t there other versions of this FAQ on the web?

A: Of course!  There are dozens of ‘What is the Cloud?’ FAQs.  Search ‘cloud computing FAQ’ to find them; for example here and here and here. Public cloud vendors are keen to teach you about their respective environments also. Here is a starting point for Amazon Web Services and here is one for Microsoft Azure. Here is one for Google Cloud.

Q: Why is cloud computing a big deal?

A: We provide a research-oriented answer: Cloud computing has emerged as an aggregation of ideas and practices and services, many inspired by the world of business, with different levels of sophistication. Research computing and data science can be mapped to cloud computing but the process takes some work: ‘What is the best fit to what I’m trying to get done?’ It’s a big deal when good matches enable tremendous advances for the Researcher, particularly when the cloud provides a cost-effective means of addressing the huge amounts of data we generate. Consider the list of data actions and attributes from typical research data-flow: “Concerning my research data I must… [ Imagine, plan, acquire, clean, store, track provenance, curate, build metadata, enable others to discover, analyze, synthesize, reduce, enable reproducibility for, attribute, publish, share, collaborate, query, visualize, archive, provide trans-disciplinary access, find storage capacity, anticipate future expansibility, plan a life-cycle, determine longevity, reach insight]. These things all require attention but the purpose of the data resides in only the last two words. Hence the lament that researchers spend too much time managing data rather than exploring it. When the cloud enables us to tip that balance it becomes a big deal.

Q: What is the public cloud?

A: The public cloud is cloud computing offered by companies to consumers under the utility model: Pay for what you use. It includes computing infrastructure, platforms and services offered with support by companies like Amazon Web Services (AWS), Microsoft (Azure), Google (Google Cloud) and others.

Q: What is the cloud computing transition / cloud migration?

A: Moving some of one’s computing process to the cloud environment; and this is advocated by this deparment only if there is a clear path to benefit for the Researcher (or Teacher / Student / IT Professional / etc).

Q: Is the cloud expensive?

A: No simple answer as this is case by case. The fastest way to an answer is to do a back of the envelope estimate using one of the pricing calculators provided by the major public cloud vendors.

Q: Is the cloud unsafe?

A: No; but as with any utility the responsibility to operate safely rests on the Researcher.

Q: Is the cloud time consuming?

A: No.

Q: What is elasticity?

A: Elasticity refers to the ability of a cloud service to provide resources on demand and then de-allocate those resources when a computing task is completed. This is the primary argument for cloud cost-effectiveness since cloud usage can be directly coupled to demand, in contrast to dedicated on premises computers that may be inadequate for heavy loads and/or may fall idle for periods of time.

Q: Why does cloud computing jargon look so business oriented; and does this make it irrelevant to research?

A: A lot of cloud computing development and inter-provider competition is market driven; and this has left a very deep business-oriented imprint on the cloud vocabulary. Research into large processing tasks has also driven the development of the cloud; so briefly cloud computing is highly relevant to research but does involve some learning curve climbing. Some research teams have phenomenal computing expertise; and the good news is that early cloud adopters are also building learning resources for those who follow.

Q: What is ‘whack’? Is ‘whack’ bad?

A: I have no idea. I think ‘whack’ means ‘bad’.

Q: How do I get started trying out and using the cloud?

A: Go to YouTube and type in the search bar ‘cloud computing’ plus something you are interested in. I typed in ‘cloud computing machine learning’ and found a sequence of no less than 32 videos on the topic. Surfing YouTube content has the advantage that people are motivated to communicate clearly when they have to do it standing in front of a classroom.

Q: Do you have a bias in favor of any particular cloud provider?

A: This department has one strong bias and that is for the Researcher. Our aim is to help research groups explore cloud computing options and find something that works well, costs as little as possible, and gets everyone to work as quickly as possible. We have relationships with some of the cloud vendors and to a person they have all been incredibly helpful and generous with their time.

Q: Does UW have its own cloud resources?

A: UW has many compute resources which taken together comprise a private cloud. Mapping the cloud computing (public cloud) landscape into the UW IT infrastructure is beyond the scope of this FAQ.

Q: Does UW have a supercomputer?

A: Absolutely: UW operates the Hyak supercomputer cluster, an excellent means of achieving powerful computing capabilities.

FAQ For Amazon Web Services (AWS)
Q: How can I learn about the XYZ service available at AWS?
A: In addition to a Search at the AWS website and a broad web search we always recommend doing a search at YouTube. This will often produce a great video showing the implementation of XYZ in practice with lots of remarks from the speaker on key details to be aware of.
Q: Is there a good YouTube video on Amazon Machine Learning?
A: Try this one.
Q. What is the AWS equivalent of an on-premises WTS (Windows Terminal Server) instance?
Background: Matt Weatherford has implemented WTS for about 1500 total clients students faculty staff at College of Arts and Sciences with 3 main servers and some backing SIM cluster servers (10) to handle power users. His servers reboot once per week, his SIM cluster once per month. He might have as many as 50-100 people logged on at once; each seeing their own stuff.
Detail: There is a backing file system called H-Drive that persists User content through reboots. It is distinct from but similar to the UW IT-provided U-Drive.
Consideration: Google Drive does not work with this system via the Google Drive client owing to synching (say) 100 users x 1 TB of stuff. This would clobber the server. OneDrive is a problem as it is MSFT-Client-only and this won’t install on a terminal server.
Detail: Matt has gone to considerable lengths over the years to install lots and lots of software on the servers; so he needs 3 node-locked licenses for everything from MATLAB to you name it. This has involved some ‘unorthodox software installs’; and it is this panoply of software that gives the system its value to the customers. So: Many users, “keep their stuff”, google drive synch problem avoided, lots of software licenses some of which only run on Windows Server.
Detail: There is an app in use (Matt Weatherford’s system) called Expandrive that makes a cloud storage environment look like a locally mounted disk on a Windows machine; which is important for software that can only hit just that. It works against Google Drive and perhaps Box and DropBox but not against MS OneDrive. OneDrive only allows access by the MSFT client; not by third party. So Expandrive gets big points for providing a “don’t even notice it” view into cloud storage.
To quote Matt: “I want a high performance windows terminal server (which installs on bare metal) in the cloud with software licensing (Windows Server OS) included and with the ability to transfer heavy-duty users to other machines so as not to bog other people down.” He mentions the VIDA CITRIX VDI (Virtual Desktop Infrastructure) as a failed attempt to build this at UW. I do not know the history there.
Q. What is the AWS solution for a Precision Medicine study involving UW-external participants where “temporary accounts” inside the Net ID system are not feasible?
Imagine a mobile app study (for example) where medical information is going to come flying in from many sources; and these sources are constantly shifting. We want a non-UW person to have their own authentication to a non-UW server with a lot of data capacity and guarantees around the security of this data (HIPAA or etc). Furthermore we want a mechanism whereby data can be brought across into UW-authenticated CI, ‘over the Net ID firewall’ is my best way of characterizing it. This is the core task of a group at the Sch of Med; they are working up a major NIH project that has certain similarities to Apple Research Kit. This is a large initiative.
Q: What would be required hypothetically for AWS to serve as a back-end to U-Drive
Q: Let’s build the AWS Table of Acts, implications, status:
FISMA    https://en.wikipedia.org/wiki/Federal_Information_Security_Management_Act_of_2002 (cf Justin Prosser, Tech Team at UW School of Medicine)
Q: How do I rapidly implement a Django instance including a simple API call: Pass it an integer, receive back that number + 3.
Q: Basic EC: Consider a function F that generates N cosine values against a passed argument x: cos(x), cos(x+1), … cos(x + N – 1).
      Set up a task manager that wants to get results for M x N values; seeding them with x = 0, x = N, x = 2N, … x = MN.
      Collect the results in a sensible elastic way
      Benchmark the entire process: How much time went to what?
      How was the diaspora managed?
           Suppose Q is a number << M. How do I decide whether to allocate Q cores versus 2Q cores versus 3Q cores versus … versus M cores?
Q. EC Part 2: Create something like the Basic EC that relies on a large data push to each core in order to do its work. Same basic questions but now with the I/O overhead also benchmarked.
Background: Consider this article. This chap seems to have done some benchmarking on AWS and the other two; is it clear that he did this in the smartest way possible on AWS?
Q: Migrate this to Top Ten: Describe the UW AWS umbrella agreement.
A: UW has created an agreement with Internet2, fulfilled by DLT for AWS. Therefore a UW community member with access to the UW IT Service Catalog can pay for AWS with a budget number. Here is the link to getting started with that on the UW IT service catalog – https://www.washington.edu/itconnect/service/amazon-web-services/
UW considered signing an enterprise agreement with Amazon but realized it was nothing more than a purchasing agreement that failed to provide actual privacy/confidentiality/security compliance terms. The net result would have been to oblige UW IT to act as a billing service.  Instead UW uses the Internet2 NET+ service which in turn uses a third-party reseller (DLT) to handle billing. This provides some discounting aggregated across multiple participating schools as well as a BAA (which needs to be expanded).
FAQ For Microsoft Azure
Q: Does Microsoft Azure stack up to Amazon Web Services?
A: Absolutely; as does the Google Cloud. Each has its merits and the objective is to open doors to any and all options for UW researchers and community members. We are unbiased cloud enthusiasts.
Q: Does Microsoft provide startup cloud credits for research?
A: Yes; search on Azure For Research to find the proposal page. A one-page proposal is all that is required; and these are evaluated on a quarterly basis. In addition you can secure a personal Azure account with a small amount of credit (USD100 or so) to get started immediately.
Q: How do I find training materials for using Azure once I have an account?