Today, I am reviewing methods of including content for Google indexing. As well as my thoughts of where this should be going in the future:
Current Trends:
- SWFObject style – division filled with content re-populated if Flash player exists.
- First, potential of code to be seen briefly before the Flash content is loaded on slower machines.
- Question over whether Google respects such code and what risks of Googlebot viewing such methods as code hiding.
- Dual HTML/Flash content sites, using Flash only for animations, videos and other media.
- Not suitable for all Flash based sites, especially those that more so full fledged applications or RIAs
- Once again limiting the web to the age old, over-extended cumbersome HTML
- Lose out on the full power of the Flash environment
- Not suitable for all Flash based sites, especially those that more so full fledged applications or RIAs
- Limited garnering by Googlebot via Adobe’s swf2html SDK
- No Flash (obviously not an acceptable option for RIA developers like myself)
Are there other means being implemented? Please chime in….
***
Where should we be going? Sadly, this is a ship that is stranded in a sargasso sea of the doldrums with little us mere sailers are able to do to free her.
The solution requires joint work on the part of Adobe and Google. And while such is not within my power to make happen. I do have a thought about how to do it, but not being involved in the nitty-gritty inner workings of Flash itself these thoughts are not much more than whimsical – but I’ll share anyways.
Flash recently went to a very event listener based model. I believe this should be leveraged for Flash/Flex apps. My thought is to have a listener that can be defined as active or not call echobot.
If echobot is active, it gives a hook for a spider to listen. Obviously, Adobe would have to work very closely with Google so that their spiderbot would be modified to listen. The echobot listener would essentially receive a simple collection of the current content in an XML structure most likely. Delivering text content along with basic formatting (ie: font size, weight, etc) which would allow google to weigh the content a bit. Also, would receive tabular info via data grid which would include row content and global content (ie: # records, number of fields, etc). Perhaps the means to define “alt” values for graphics/animations/effects/etc.
Of course, there is another aspect which is necessary. As is often brought up, the issue of tracking location within a Flash application. And this would tie in to a URL/state definition. So that a spider could go thru various states similar to loading pages. And the echobot listener would deliver the bot the content result of the given state. (Or in a traditional Flash app, the current stopped frame.)
Now, clearly this would require some logistical changes both internally to Flash and with Google’s bot. Adobe would need to devise the internal and then put forth the workings as an API for Google (or any other indexer) to utilize.
Of course there is a performance hit, you’re listening and delivering said content at all these event points. However, firstly, the echobot listener would be able to be specified as disabled. Second, even if enabled, it should not be active unless passed a signal by the spider-bot saying “I’m here, I want to listen”. Only then should the listener be active.
This would enable Flash sites to be indexed, even to sub states. As well as encouraging much more internal state existence for purposes of URL bookmarking, etc.



