Pesquisar

Allowing other search engines to index the Oracle B2C Service application

Identificação da resposta 1669 | Revisado 01/06/2022

How can I modify the robots.txt file to allow search engines to index our site?

Environment:

Site Indexing
Spiders, Robots, Search Engine Site Indexing

Resolution:

For Oracle B2C Service sites, a robots.txt file is installed on each interface. The robots.txt file prevents random spider searches that can be enacted against an Oracle B2C Service site. This file can be viewed at:

http(s)://<interface_name>.custhelp.com/robots.txt or
http(s)://vhostname/robots.txt

The default disallow file used on sites contains the following:

User-agent: *
Disallow: /

The default allow file contains the following:

User-agent: Googlebot # Google   # ADDED BY HMS
Disallow:                        # ADDED BY HMS

User-agent: MSNBot    # MSN      # ADDED BY HMS
Disallow:                        # ADDED BY HMS
Crawl-delay: 0.2  # ADDED BY HMS

User-agent: Slurp     # Yahoo!   # ADDED BY HMS
Disallow:                        # ADDED BY HMS
Crawl-delay: 0.2  # ADDED BY HMS

User-agent: TEOMA     # Ask.com  # ADDED BY HMS
Disallow:                        # ADDED BY HMS

User-agent: bingbot   # Bing     # ADDED BY HMS
Disallow:                        # ADDED BY HMS

User-agent: *                    # ADDED BY HMS
Disallow: /                      # ADDED BY HMS

The instructions to update the robots.txt file can be found in Answer ID 12254: Sitemap and robots.txt.

Note: When configuring your changes please follow these formatting requirements.

There are default allow and default block robots files. You can only ADD entries.
All entries added need to have the flag #CUSTOM on the end of the line.
Since the default uses disallow it is easier to add disallows to get the desired end result.

There are some standard rules for configuring robot.txt files. For information about building, modifying, and maintaining robots.txt files, refer to http://www.robotstxt.org. NOTE: Robots.txt files cannot be altered for community sites.

Most search engines code their spiders to look for these files and obey them. However, not everyone does -- especially email harvesters -- thus, the existence of the robots.txt file does not always mean that a spider cannot search your site.

Allowing your Site to be Indexed:
Customers can modify the robots.txt file to allow site indexing. This would allow their site to be indexed by search engines, including Google, Yahoo, Bing, and so on.

Many of these search engines have various robot agents that index sites. Some are standard (free), while others require content submission and/or are paid services.

Each of the major search engines has a section on their web site devoted to configuration parameters for their robots. Also, there are exclusion parameters available that allow various pieces of a site to be indexed and others excluded.

Every legitimate robot should have online content devoted to its proper configuration so that you can determine how your robots.txt file should be updated. Expand the section below to see a few common examples (though there are several other search engines):

Clique no ao lado do cabeçalho apropriado abaixo para expandir e visualizar a seção.

Examples for Google, Yahoo and Bing

Google:
http://www.google.com/webmasters/bot.html

Yahoo:

https://help.yahoo.com/kb/slurp-crawling-page-sln22600.html

BING:

https://www.bing.com/webmasters/help/which-crawlers-does-bing-use-8c184ec0

Prior to making changes:
It is important that you review the appropriate information regarding robot configuration for that engine. Then, determine the content for your robots.txt file in accordance with what you want indexed. With this feature enabled, common search engine's web bots are added to the robots.txt file.

It is your responsibility as a customer to define the content of the robots.txt file if you wish to change it. In addition, if you have multiple interfaces, be sure to select the appropriate interfaces and robots.txt files are to be modified since each interface has its own robots.txt file.

- Note that the search engines only look at the entries until they find the entry that applies to them. So if the first entry on your robots.txt file is the default, this will allow every search engine to index your site and ignore following entries, regardless of what the following entries are.

- Please also note that our system will sometimes add entries below what you have specified in your robots.txt file, but as long as the first entry is not changed, this will not affect indexing.

Sitemaps:
This feature enables search engines to easily catalog your site. For more information regarding sitemap, refer to Answer ID 2553: Using a sitemap with our Oracle B2C Service application.

Additional information is available in online documentation for the version your site is currently running. Para acessar os manuais e a documentação on-line do Oracle B2C Service, consulte a Documentação para produtos Oracle B2C Service.

Se você tiver dúvidas sobre o que gera uma sessão e como você pode evitar o faturamento impreciso da sessão no seu site, analise o documento Demystifying Session Usage (PDF). Algumas etapas simples de personalização e configuração podem aumentar as sessões faturáveis. Para maiores informações, consulte Informação de Uso de Sessão.