Thursday, July 12, 2012

Controlling OSGi Services in Cloud Ecosystems

In my previous post I looked at the basics of setting up an OSGi Cloud Ecosystem. With that as a basis I'm going to look at an issue that was brought up during the second OSGi Cloud Workshop, held this year at EclipseCon in Washington, where this list of ideas was produced. One of topics on this list is about new ways in which services can fail in a cloud scenario. For example, a service invocation might fail because the service consumer's credit card has expired. Or the number of invocations allocated to a certain client has been used up, etc.

In this post I'll be looking at how to address this in the OSGi Cloud Ecosystem architecture that I've been writing about.

First lets look at the scenario where a cloud service grants a maximum number of invocations to a certain consumer.

An invocation policy for a remote OSGi service


From the consumer side, I'm simply invoking my remote demo TestService as usual:
  TestService ts = ... from the Service Registry ...
  String result = ts.doit();

On the Service Provider side we need to add additional control. As this control (normally) only applies to remote consumers from another framework in the cloud I've extended the OSGi Remote Services implementation to provide extra control points. I did this on a branch of the Apache CXF-DOSGi project, which is the Remote Services implementation I'm using. I came up with the idea of a RemoteServiceFactory. Conceptually a bit similar to a normal OSGi ServiceFactory. Where the normal OSGi ServiceFactory can provide a new service instance for each client bundle, a RemoteServiceFactory can provide a new service (and exert other control) for each remote client. I'm currently distinguishing each client by IP address, not completely sure whether this covers all the cases, maybe some sort of a security context would also make sense in here.
My initial version of the RemoteServiceFactory interface looks like this:
  public interface RemoteServiceFactory {
    public Object getService(String clientIP, ServiceReference reference);
    public void ungetService(String clientIP, ServiceReference reference, Object service);
  }

Now we can put the additional controls in place. I can limit service invocations to 3 per IP address:
public class TestServiceRSF implements RemoteServiceFactory, ... {
  ConcurrentMap<String, AtomicInteger> invocationCountnew ConcurrentHashMap<>();

  @Override
  public Object getService(String clientIP, ServiceReference reference) {
    AtomicInteger count = getCount(clientIP);
    int amount = count.incrementAndGet();
    if (amount > 3)
      throw new InvocationsExhaustedException("Maximum invocations reached for: " + clientIP);

    return 
new TestServiceImpl(); // or reuse an existing one
  }
 

  private AtomicInteger getCount(String ipAddr) {
    AtomicInteger newCnt = new AtomicInteger();
    AtomicInteger oldCnt = invocationCount.putIfAbsent(ipAddr, newCnt);
    return oldCnt == null ? newCnt : oldCnt;
  }

A RemoteServiceFactory allows me to add other policies as well, for example it can prevent concurrent invocations from a single consumer, see here for an example, select provided functionality based on the client or even charge the customer per invocation.

To register the RemoteServiceFactory in the system I'm currently adding it as a service registration property:
public class Activator implements BundleActivator {
  // ...
  public void start(BundleContext context) throws Exception {
    TestService ts = new TestServiceImpl();
    Dictionary<String, Object> tsProps = new Hashtable<>();
    tsProps.put("service.exported.interfaces", "*");
    tsProps.put("service.exported.configs", "org.coderthoughts.configtype.cloud");
    RemoteServiceFactory tsControl = new TestServiceRSF(context);
    tsProps.put("org.coderthoughts.remote.service.factory", tsControl);
    tsReg = context.registerService(TestService.class.getName(), ts, tsProps);
   ...

More control to the client

Being able to control the service as described above is nice, but seeing an exception in the client when trying to invoke the service isn't great. It would be good if the client could prevent such a situation by asking the framework whether a service it's hosting will accept invocations. For this I added service variables to the OSGiFramework interface. You can ask frameworks in the ecosystem for metadata regarding the services it provides:
  public interface OSGiFramework {
    String getServiceVariable(long serviceID, String name);
  }

I implemented this idea using the OSGi Monitor Admin service (chapter 119 of the Compendium Specification)

Service Variables are accessed via OSGi Monitor Admin

From my client servlet I'm checking the service status before calling the service:
  fw.getServiceVariable(serviceID, OSGiFramework.SV_STATUS);
(note given a service reference, you can find out whether it's remote by checking for the service.imported property, you can find the hosting framework instance by matching their endpoint.framework.uuid properties and you can get the service ID of the service in that framework by looking up the endpoint.service.id of the remote service - see here for an example).
So I can ask the OSGiFramework whether I can invoke the service, it can respond with various return codes, as a starting point for possible return values, I took the following list, largely inspired by the HTTP response codes:
  SERVICE_STATUS_OK // HTTP 200
  SERVICE_STATUS_UNAUTHORIZED // HTTP 401
  SERVICE_STATUS_PAYMENT_NEEDED // HTTP 402
  SERVICE_STATUS_FORBIDDEN // HTTP 403
  SERVICE_STATUS_NOT_FOUND // HTTP 404
  SERVICE_STATUS_QUOTA_EXCEEDED // HTTP 413
  SERVICE_STATUS_SERVER_ERROR // HTTP 500
  SERVICE_STATUS_TEMPORARY_UNAVAILABLE // HTTP 503

It might also be worth adding some response codes around saying that things are still ok, but not for long, I was thinking of these responses:
  SERVICE_STATUS_OK_QUOTA_ALMOST_EXCEEDED
  SERVICE_STATUS_OK_PAYMENT_INFO_NEEDED_SOON
These need more thought but I think they can provide an interesting mechanism in preventing outages.

Under the hood I'm using the OSGi Monitor Admin Specification. I took an implementation from KnowHowLab (thanks guys!). It gives a nice Whiteboard pattern-based approach to providing metadata via a Monitorable service. As the RemoteServiceFactory is the place where I'm implementing the policies for my TestService, it provides me a natural place to publish the metadata too.

When the client calls OSGiFramework.getServiceVariable(id, SV_STATUS) the OSGiFramework service implementation in turn finds the matching Monitorable, which provides the status information. The Monitorable for my TestService is implemented by its RemoteServiceFactory:
public class TestServiceRSF implements ..., Monitorable {
  ConcurrentMap<String, AtomicInteger> invocationCount = new ConcurrentHashMap<>(); 

  // ...

  @Override
  public StatusVariable getStatusVariable(String var) throws IllegalArgumentException {
    String ip = getIPAddress(var);
    AtomicInteger count = invocationCount.get(ip);

    String status;
    if (count == null) {
      status = OSGiFramework.SERVICE_STATUS_NOT_FOUND;
    } else {
      if (count.get() < 3)
        status = OSGiFramework.SERVICE_STATUS_OK;
      else
        status = OSGiFramework.SERVICE_STATUS_QUOTA_EXCEEDED;
    }
    return new StatusVariable(var, StatusVariable.CM_SI, status);
  }

  private String getIPAddress(String var) {
    // The client IP is the suffix of the variable name
    if (!var.startsWith(OSGiFramework.SERVICE_STATUS_PREFIX))
      throw new IllegalArgumentException("Not a valid status variable: " + var);


    String ip = var.substring(OSGiFramework.SERVICE_STATUS_PREFIX.length());
    return ip;
  }

Using MonitorAdmin thought the OSGi ServiceRegistry gives me a nice, loosely coupled mechanism to provide remote service metadata. It fits nicely with the RemoteServiceFactory approach but can also be implemented otherwise.

When I use my web servlet again I can see it all in action. It invokes the test service 5 times:



In the Web UI you can see the 2 OSGi Frameworks in this cloud ecosystem. The local one (that also hosts the Web UI) and another one in a different cloud VM.
The Servlet hosting the webpage invokes the TestService 5 times. In this case there is only a remote instance available. After 3 invocations it reports that the invocation quota have been used up.

The Web UI servlet also invokes another remote service (the LongRunningService) twice concurrently. You can see the policy that prevents concurrent invocation in action where only one invocation succeds (it waits for a while then returns 42) and the other reports an error and does not invoke the actual service.

The demo simply displays the service status and the return value from the remote service, but given this information I can do some interesting things.
  • I can make the OSGi service consumer aware and avoid services that are not OK to invoke. Standard OSGi service mechanics allow me to switch to another service without much ado. 
  • I can go even further and add a mechanism in the OSGi framework that automatically hides services if they are not OK to invoke. I wrote a blog post a while ago on how that can be done: Altering OSGi Service Lookups with Service Registry Hooks - the standard OSGi Service Registry Hooks allow you to do things like this.

Do it yourself!

I have updated osgi-cloud-infra (and its source) with the changes to the OSGiFramework service. The bundles in this project are also updated to contain the Monitor Admin service implementation and the changes to CXF-DOSGi that I made on my branch to support the RemoteServiceFactory.

Additionally I updated the demo bundles in osgi-cloud-disco-demo to implement the RemoteServiceFactory, add Service Variables and update the webui as above.

There are no changes to the discovery server component.

Instructions to run it all are identical to what was described in the previous blog post - just follow the steps from 'try it out' in that post and you'll see it in action.
Note that it's possible that this code will be updated in the future. I've tagged this version as 0.2 in git.