DISCUSSION: Flash + Google Indexing

Today, I am reviewing methods of including content for Google indexing. As well as my thoughts of where this should be going in the future:

Current Trends:

  • SWFObject style – division filled with content re-populated if Flash player exists.

    •  First, potential of code to be seen briefly before the Flash content is loaded on slower machines.
    • Question over whether Google respects such code and what risks of Googlebot viewing such methods as code hiding.

  • Dual HTML/Flash content sites, using Flash only for animations, videos and other media.

    • Not suitable for all Flash based sites, especially those that more so full fledged applications or RIAs
    • Once again limiting the web to the age old, over-extended cumbersome HTML
    • Lose out on the full power of the Flash environment

  • Limited garnering by Googlebot via Adobe’s swf2html SDK

  • No Flash (obviously not an acceptable option for RIA developers like myself)

Are there other means being implemented? Please chime in….


Where should we be going? Sadly, this is a ship that is stranded in a sargasso sea of the doldrums with little us mere sailers are able to do to free her.

The solution requires joint work on the part of Adobe and Google. And while such is not within my power to make happen. I do have a thought about how to do it, but not being involved in the nitty-gritty inner workings of Flash itself these thoughts are not much more than whimsical – but I’ll share anyways.

Flash recently went to a very event listener based model. I believe this should be leveraged for Flash/Flex apps. My thought is to have a listener that can be defined as active or not call echobot.

If echobot is active, it gives a hook for a spider to listen. Obviously, Adobe would have to work very closely with Google so that their spiderbot would be modified to listen. The echobot listener would essentially receive a simple collection of the current content in an XML structure most likely. Delivering text content along with basic formatting (ie: font size, weight, etc) which would allow google to weigh the content a bit. Also, would receive tabular info via data grid which would include row content and global content (ie: # records, number of fields, etc). Perhaps the means to define “alt” values for graphics/animations/effects/etc.

Of course, there is another aspect which is necessary. As is often brought up, the issue of tracking location within a Flash application. And this would tie in to a URL/state definition. So that a spider could go thru various states similar to loading pages. And the echobot listener would deliver the bot the content result of the given state. (Or in a traditional Flash app, the current stopped frame.)

Now, clearly this would require some logistical changes both internally to Flash and with Google’s bot. Adobe would need to devise the internal and then put forth the workings as an API for Google (or any other indexer) to utilize.

Of course there is a performance hit, you’re listening and delivering said content at all these event points. However, firstly, the echobot listener would be able to be specified as disabled. Second, even if enabled, it should not be active unless passed a signal by the spider-bot saying “I’m here, I want to listen”. Only then should the listener be active.

This would enable Flash sites to be indexed, even to sub states. As well as encouraging much more internal state existence for purposes of URL bookmarking, etc.

7 Responses to “DISCUSSION: Flash + Google Indexing”

  1. 1 Josh March 17, 2008 at 5:33 pm

    Personally, I don’t want to see search engines try to understand the SWF format any more than they do now. Being a complex binary format, SWFs are a poor format for crawling. Heck, many SWFs don’t even have data that should be considered searchable. I want to see SWF content authors find better ways to give the data to search engines in ways those search engines already understand or can understand with much more useful enhancements (beyond just the Flash use-case).

    The best solution that I’ve seen so far is the one that Ted Patrick used for Flex Directory. He uses XSL to transform a simple, semantic XML/HTML document by replacing it with SWF embed code. The SWF then reloads the page without the XSL transform, and it displays the data as part of an RIA.

    Ted’s solution is a good first step. I’d like to see it brought to a finished product’s level of quality, though. If JavaScript is disabled, the embed code fails. I’d like to see decent error handling such that if the content cannot be embedded, the raw data will be formatted nicely as a regular webpage (probably with CSS too). Basically, have it fail gracefully such that the user may not even notice that the preferred Flash Player view didn’t work.

    On the search engine front, I think Google/Yahoo!/others should have better handling of pure XML and other non-HTML text data formats. Ted found that he had to change his page’s content from XML to HTML because the search engines would completely ignore the page’s content if they didn’t understand the tags being used. The web is moving beyond HTML. We have other XML-based formats like RSS, ATOM, JSON, and others that could provide all sorts of new information to search engines. Even better, these formats could be very useful in the crawling of RIAs created with technologies like Flash, Silverlight, and others.

  2. 2 Andrew Westberg March 17, 2008 at 7:22 pm

    Interesting thoughts. It would have to be embedded at the flash player layer though as putting it any higher in the stack would have people hacking the content for the way google sees them.

  3. 3 Jim March 17, 2008 at 8:48 pm

    One way is to place the textual data that is to be supplied to the SWF in the HTML page that contains the SWF, and use XSL to transform it for the browser. This is easier to show by example than it is to describe.

    To see an example, go to http://directory.onflex.org/ and you’ll see a Flex UI. Now, view the HTML source in the browser, and you’ll see an XML page that does *not* embed any SWF. Instead, it holds all the data. This is what the search engines will see, and as you can see it is highly indexable. This page also supplies the data for the SWF, so it performs double duty.

    Now look at the XSL for the above page at http://directory.onflex.org/template003.xsl and you’ll see that the XML is transformed into a simple HTML page that uses JavaScript to embed the SWF. This supplies what the user sees in his browser – the Flex-built SWF. The SWF, in turn, loads its data from the above XML.

    So, really, it isn’t any more work than standard Flash, it’s just a slightly different twist: Rather than using a plain-Jane HTML file to embed the SWF and a separate XML file to supply the data to the SWF, you let the XML file double as a search-engine-friendly page and use an XSL file to transform it into HTML that embeds the SWF.

  4. 4 John Dowdell March 17, 2008 at 11:06 pm

    On which search terms do you think people might search for your services, and on which of those might you be able to place within the first page of Google results?

  5. 5 thesaj March 18, 2008 at 8:47 am

    Josh/Jim – Thanks for the link to Ted’s directory. That is definitely one of the cleaner ways I’ve seen a transition implemented.

    Josh – My idea was not to have search engines try to understand the SWF format any deeper. Rather to simply give them a channel to listen to. And to perhaps provide a method that would optionally allow for differentiation of states.

    Andrew – Most definitely would have to be internal to Flash Player.

    John – We have numerous clients, some have some pretty high relevancy on the web. Just pulling one off the top of my head, Home Media Magazine ranks 4th for the following keywords: “Blu-Ray format war over”.

    So there is validity to our desire to improve indexing results.

  6. 6 Brian April 10, 2008 at 1:46 am

    For Google Analytics to work on a Flash/Flex site, you already have to manually notify the tracker that a pageview has occurred. It’d be nice if GoogleBot could at least tie into that somehow.

  7. 7 devdave April 27, 2008 at 10:33 am

    I agree with Josh – I don’t want bots trawling through every string in my code; a lot of the strings will be misleading for search engine purposes.

    Using an XML document as a content provider and XSL is ingenious – however, do you reckon there is a chance that at some point Google might classify this as grey hat since the bot is indexing something which doesn’t really appear on the page.

    For the majority of my clients we usually end up grabbing the data from some kind of CMS (either XML or a database) and creating a simple HTML microsite which either redirects to the Flash (also a little dangerous from a blacklisting perspective) or has links to the real site.

    I’m sure I heard a rumour that Adobe had some kind of big and cunning plan for improving this in the next generation of Flash Player, but hey, they’ve been saying that for a while 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

March 2008

Awesome Developer Conferences

Nxtbook MediaFormer Employer - Great Company

The Saj... "Dark Lord of the SWF"