Attempt to get html from visualforce page results in “BLOB is not a valid UTF-8 String”

I’m running into a problem on a specific customer’s org when attempting to get the HTML from a custom object’s standard ‘new’ page layout.

String objPrefix = My_Object__c.SObjectType.getDescribe().getKeyPrefix();
PageReference page = new PageReference('/' + objPrefix + '/e?nooverride=1');
Blob b = page.getContent();
String html = b.toString();

Using the above code results inBLOB is not a valid UTF-8 string when attempting to use the toString() method.

This code is running in a managed package and has worked successfully in many other orgs.

I know that Blob.toString() method is only supported for UTF-8 encoded strings, but shouldn’t all VisualForce pages be encoded as UTF-8?

Answer

For shits and giggles you can try demoting it to ASCII before it gets to the page:

b = EncodingUtil.urlEncode(b, 'ASCII');
b = EncodingUtil.urlDecode(b, 'ASCII');

Or something elsewhere in your code is trying to stuff binary data into a string. Can happen in a few cases when trying to output:

  • Document.Body
  • Attachment.Body
  • StaticResource.Body
  • etc

Where’s the rest of your class? Nothing trying to dump unescaped outputText into a page with funny contentType? 🙂

Worth noting NA0 (ssl.salesforce.com) is a special pod that does ISO-8559-1 only. Worth raising a case to Salesforce if this is inconsistent with behaviour in other pods.

You can determine the character set for your organization by doing a global describe and inspecting the encoding value returned.

Attribution
Source : Link , Question Author : cseaton , Answer Author : Matt and Neil

Leave a Comment