What is our primary use case?
In terms of the tool's use case, if it is serverless, and if the compute involved is not too high, or if it is a PoC kind of a thing, and you want the microservices kind of architecture to be going and go for a pay as you go model, you can use the tool. With the tool, you know what is happening, so maybe you can cut costs by going with an on-premises model and having a stable system for computing.
What is most valuable?
The models that the tool has, the libraries, and the ML libraries are rich. That is good, and that is one of the features of the tool. The other feature is how the tool interacts with other components of AWS, like Lambda and S3. It's seamless if you are using AWS architecture.
I was working with Amazon SageMaker Ground Truth. The serverless feature is important and worth it because you don't have to spin up an EC2 instance every time or a server. For us, the main attraction is that it's serverless. You pay only for the compute that you use, and then there is its ecosystem. Whenever you use something in AWS, the ecosystem is very rich.
What needs improvement?
For any cloud provider, the cost has to be substantially reduced, especially in the case of Amazon SageMaker, which is extremely expensive for huge workloads. In EC2, you have spot instances that cut costs tremendously, but you don't have that in Amazon SageMaker. You pay for the local usage. I would like to see better integration with GPUs. GPUs are very expensive for AWS or any cloud provider. NVIDIA has introduced options with Databricks for GPUs, so it would be interesting to see how Amazon SageMaker can parallelize GPU usage. I haven't used it to scale multiple GPUs automatically for model training. The key points are the cost and how effectively they integrate GPUs into the workload for training machine learning models. We want to see how seamless it is and how it can work. I haven't used multiple GPUs scaled automatically. For model training, the first concern is cost, and the second is how effectively they want to integrate GPUs into the workload for training machine learning models.
For how long have I used the solution?
I have six to seven years of experience with Amazon SageMaker.
What do I think about the stability of the solution?
After production, a different team handles the tool. I would be there till the PoC phase and then after production we move on to other projects. From my standpoint, I rate the solution's stability a nine out of ten. I don't continuously maintain production workloads, as it is managed by a different team.
What do I think about the scalability of the solution?
The tool is extremely scalable. If you go on and use it for the entire life cycle of ML, it's very expensive. I say it's scalable, but it's expensive. For a production project, I would think that deployment and inferencing can be in SageMaker. We will have to, at some point, move other stuff to a less costly thing. It is very scalable.
My clients are mostly mid-level to enterprise businesses. I have worked with huge clients, but our clients may not be that huge. When we worked as a product engineering partner to AWS for two years, I dealt with huge clients, but that was very specific. I was working with Amazon SageMaker Ground Truth. I have worked with Nissan, Sony, and Samsung, but even though it was ML, it was a very niche kind of thing. We were doing labeling support for the ML models and training data. We were doing training data and labeling by partly using ML. Sometimes, we used to use Amazon SageMaker as well, but that was very niche. If we were to embark on a complete ML journey with some clients, then I would say that I have dealt with small to mid-scale customers.
How are customer service and support?
The technical support of the tool was good. I rate the technical support a ten out of ten.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
My company takes a tool-agnostic approach and utilizes a variety of AI and ML tools across the ecosystem, focusing on Amazon, Microsoft, Databricks, and Snowflake.
How was the initial setup?
Speaking about the product's initial setup phase, I am in an Amazon ecosystem. I find the tool good because it depends on whether you are very command-line or user-interface-driven. If you are user interface-driven, I think Azure is good. I am okay with the command lines and shell kind of an interface. I am more used to AWS than other cloud providers. I think it's good for me. For a programmer, it's easy. As a technical person, it's easy. As a business person, I think Azure is a little easier than AWS.
The tool is deployed on a private cloud. Sometimes, we don't use it ourselves and then just give access to the apps to the client, meaning we host it. We host the model, and then the clients use the model but don't have access to infrastructure. We mostly used AWS along with some other cloud providers too.
The tool's deployment was fast, and it took a week.
For the deployment, whatever we do in general for on-premise, we do almost the same thing in SageMaker. Just that it offers parallelism. We do EDA and all that. We first have the data in the S3 bucket, and then we do the EDAs. Then we do the training. Then, we do the grid optimization. First, we mostly start with the pilot. When I say that deployment takes one week, it's just for having it up and running and showing some results with the model. It's not for the complete production model, as it is a different cycle. We start the PoC with the pilot model first, and then we kind of do it. Whatever steps are involved for on-premises, we do the same steps for SageMaker. Just that the computing cost is less, and we don't have to spin up a server. For the rest of the steps, like getting and cleaning the data, we don't do a lot in SageMaker. We kind of do it outside, or we do it with EC2 instances or use PySpark, maybe, and Databricks because that is a little easier compared to SageMaker libraries. You can do it, but it's kind of expensive. We do all that data transformation, and then we do the modeling. Most of the time, 100 percent deployment is done in SageMaker, but the rest of the parts are a mix of technologies. There is an influenced pipeline.
What's my experience with pricing, setup cost, and licensing?
The cost offers a pay-as-you-go pricing model. It depends on the instance that you do.
What other advice do I have?
I used Amazon SageMaker as a customer from the client side, though I have worked as a contractor with Amazon for two years. I work for Persistent, but the engagement with our client was over last year. When I was working at Amazon for two years, I was working at Amazon. Now, we are partners as well. We are strategic partners for Amazon, Microsoft, Google, Databricks, Snowflake, and most of the ecosystems.
If you want to try out something with, say, for instance, you want to do a PoC for testing more, I mean, say, for instance, that you're doing a data annotation project. You want to see how it goes. You don't want to invest a lot, and you want to try it out and see whether it works or not. For those kinds of typical PoC situations, I would say Amazon SageMaker is good. You are using a microservices architecture, and you want to go serverless. That is the first use case. The second use case is that you want to go serverless and plug and play a lot of components rather than having a bulk of computing like EC2 and all that. You would rather have an Amazon setup that is serverless.
We use it for tuning, but it's just like any other tool, except for the fact that it's serverless. It's not that it significantly boosts anything; it's just a choice. Either we tune it on-premise or we tune it on the cloud. We use Azure, AWS, and all that. So, in terms of tuning, it's not special. It's just the way you tune any model in any environment, and that is not a huge thing. It is a good tool that works well with its components and other components. There's nothing special about the tuning itself. You can either use PySpark or other cloud technologies. It's not that we get a huge boost just because it's AWS.
The serverless feature and the complete lifecycle that can be handled inside SageMaker are important. It covers everything from training the model to deploying it and sometimes using it for data pipelines. However, we generally don't use it for pipelining and data transformations because it's expensive inside SageMaker. We do use it for model training, although sometimes we train outside. We also utilize model training and Amazon SageMaker JumpStart, which is pretty handy because you don't have to train the model from scratch. You can use it, especially for LLM settings, right out of the box. There are models inside SageMaker that make it a little faster, both from a computing perspective and from a bandwidth deployment perspective, so you don't have to spend a lot of time training before deployment. Amazon SageMaker JumpStart is definitely valuable, along with the whole lifecycle for ML as well.
I would recommend others if they want to do a quick PoC workload, or proof of concept, and if they want to do something very quick, then I would definitely recommend it. If it's a very huge production workload, then I might want to consider other options. But for anything where there is a PoC kind of thing, I would recommend products in such areas.
Speaking about AI, I can say that it's kind of quick to set up and get it running. I can't say specifically. We have worked on a lot of projects. We have worked with document processing projects a lot. In those cases, if you were asking about specific projects, I can remember a recent project where we were trying to digitize documents using manual annotation and automated ML models, but there, we didn't use SageMaker. We used Amazon SageMaker Ground Truth, which is under the umbrella of SageMaker. If you use Ground Truth, it's a SageMaker product. We were using SageMaker Ground Truth, which is pretty handy because it sits well in the environment. If you are specifically asking how it accelerated the process, it was easy to set up, and we just got going in less than a week. So, yeah, I can think of that example.
I rate the tool an eight out of ten.
Which deployment model are you using for this solution?
Private Cloud