TWI858021B

TWI858021B - Audio-based interactive system, interactive method, and interactive device

Info

Publication number: TWI858021B
Application number: TW109105728A
Authority: TW
Inventors: 陳軒德; 陳凱逸; 廖翌涵; 桂左易; 許維志
Original assignee: 仁寶電腦工業股份有限公司
Priority date: 2019-02-21
Filing date: 2020-02-21
Publication date: 2024-10-11
Also published as: TW202034153A

Abstract

An audio-based interactive system, an interactive method, and an interactive device are provided. The interactive method includes: generating, by one of at least one sensor of the interactive device and a microphone of a smart speaker, first sensed data, wherein the at least one sensor including at least one of a switch, a G-sensor, a color sensor, and a pressure sensor; communicatively connecting, via the smart speaker, the interactive device to a cloud server; and playing, via the smart speaker, a first audio data corresponding to the first sensed data and playing, via an image output interface, a first image corresponding to the first sensed data by the cloud server in response to receiving the first sensed data.

Description

Audio-based interactive system, interactive method and interactive device

本發明是有關於一種系統、方法以及電子裝置，且特別是有關於一種基於音訊的互動系統、互動方法以及互動裝置。The present invention relates to a system, a method and an electronic device, and in particular to an interactive system, an interactive method and an interactive device based on audio.

隨著網路技術的發展，有越來越多的電子商務公司或網路服務提供業者開始推行語音助理的相關服務。語音助理通常可由麥克風以及揚聲器組合而成，並可通訊連接至網際網路。語音助理可受控於雲端伺服器而與使用者進行互動。例如，語音助理可回答使用者的問題或提供使用者取得自網際網路的資訊。With the development of Internet technology, more and more e-commerce companies or Internet service providers have begun to promote voice assistant related services. Voice assistants are usually composed of a microphone and a speaker, and can communicate with the Internet. Voice assistants can be controlled by cloud servers to interact with users. For example, voice assistants can answer users' questions or provide users with information obtained from the Internet.

然而，目前的語音助理僅能與使用者進行簡單的談話，而無法與使用者進行更進一步的互動。因此，如何提出一種能與語音助理進行更多元的互動的方法，是本領域人員致力的目標之一。However, current voice assistants can only have simple conversations with users, but cannot interact with users further. Therefore, how to propose a method for more diverse interactions with voice assistants is one of the goals that researchers in this field are committed to.

本發明提供一種基於音訊的互動系統、互動方法和互動裝置，令使用者可通過具有多樣化的感測器的互動裝置來與智慧音箱進行互動。The present invention provides an audio-based interactive system, an interactive method and an interactive device, which allow users to interact with a smart speaker through an interactive device with a variety of sensors.

本發明的一種基於音訊的互動系統，包括雲端伺服器、智慧音箱以及互動裝置。雲端伺服器儲存第一音訊資料以及第一影像。智慧音箱包括麥克風，並且通訊連接至雲端伺服器。互動裝置包括至少一感測器以及影像輸出介面，並且通訊連接至智慧音箱，其中至少一感測器包括按鍵、加速度計、顏色感測器以及壓力感測器的至少其中之一，其中至少一感測器以及麥克風的其中之一產生第一感測資料；互動裝置通過智慧音箱以通訊連接至雲端伺服器；以及雲端伺服器響應於接收第一感測資料而通過智慧音箱以播放對應於第一感測資料的第一音訊資料以及通過影像輸出介面以播放對應於第一感測資料的第一影像。The present invention discloses an audio-based interactive system, comprising a cloud server, a smart speaker, and an interactive device. The cloud server stores first audio data and a first image. The smart speaker comprises a microphone and is communicatively connected to the cloud server. The interactive device comprises at least one sensor and an image output interface and is communicatively connected to the smart speaker, wherein at least one sensor comprises at least one of a button, an accelerometer, a color sensor, and a pressure sensor, wherein at least one of the sensor and the microphone generates first sensing data; the interactive device is communicatively connected to the cloud server through the smart speaker; and the cloud server plays first audio data corresponding to the first sensing data through the smart speaker and plays a first image corresponding to the first sensing data through the image output interface in response to receiving the first sensing data.

在本發明的一實施例中，上述的雲端伺服器更儲存參考資料，並且雲端伺服器根據第一感測資料與參考資料的匹配結果來決定是否播放第一音訊資料以及第一影像。In one embodiment of the present invention, the cloud server further stores reference data, and the cloud server determines whether to play the first audio data and the first image according to the matching result between the first sensing data and the reference data.

在本發明的一實施例中，上述的雲端伺服器通過智慧音箱以播放第二音訊資料，並且響應於在播放了第二音訊資料之後的預設時間內接收到第一感測資料而決定撥放第一音訊資料以及第一影像。In one embodiment of the present invention, the cloud server plays the second audio data through the smart speaker, and decides to play the first audio data and the first image in response to receiving the first sensing data within a preset time after playing the second audio data.

在本發明的一實施例中，上述的互動裝置的至少一感測器包括按鍵，並且第一感測資料包括脈衝訊號，其中按鍵的表面設置了影像輸出介面，其中雲端伺服器通過影像輸出介面播放影像，並且響應於在影像的播放期間內接收到脈衝訊號而通過智慧音箱播放第一音訊資料以及通過影像輸出介面播放第一影像。In one embodiment of the present invention, at least one sensor of the above-mentioned interactive device includes a button, and the first sensing data includes a pulse signal, wherein an image output interface is provided on the surface of the button, wherein the cloud server plays the image through the image output interface, and in response to receiving the pulse signal during the playback of the image, plays the first audio data through the smart speaker and plays the first image through the image output interface.

在本發明的一實施例中，上述的至少一感測器更包括第二麥克風。In one embodiment of the present invention, the at least one sensor further includes a second microphone.

在本發明的一實施例中，上述的互動裝置的至少一感測器包括加速度計，並且第一感測資料包括傾斜值，其中雲端伺服器響應於傾斜值小於閾值而通過智慧音箱來播放第一音訊資料以及通過影像輸出介面播放第一影像。In one embodiment of the present invention, at least one sensor of the above-mentioned interactive device includes an accelerometer, and the first sensing data includes a tilt value, wherein the cloud server plays the first audio data through the smart speaker and plays the first image through the image output interface in response to the tilt value being less than a threshold.

在本發明的一實施例中，上述的第一影像對應於傾斜值。In an embodiment of the present invention, the first image corresponds to a tilt value.

在本發明的一實施例中，上述的第一感測資料更包括音訊資料，其中雲端伺服器響應於音訊資料與儲存於雲端伺服器的參考資料匹配並且傾斜值小於閾值而通過智慧音箱來播放第一音訊資料以及通過影像輸出介面來播放第一影像。In one embodiment of the present invention, the first sensing data further includes audio data, wherein the cloud server plays the first audio data through the smart speaker and plays the first image through the image output interface in response to the audio data matching the reference data stored in the cloud server and the tilt value being less than the threshold.

在本發明的一實施例中，上述的雲端伺服器通過影像輸出介面來播放用於進行倒數的影像，並且響應於在倒數結束前接收到音訊資料而通過智慧音箱來播放第一音訊資料以及通過影像輸出介面來第一影像。In one embodiment of the present invention, the cloud server plays the countdown image through the image output interface, and in response to receiving the audio data before the countdown ends, plays the first audio data through the smart speaker and the first image through the image output interface.

在本發明的一實施例中，上述的第一感測資料包括音訊資料，其中雲端伺服器更儲存語音辨識模型以及使用者資訊，通過語音辨識模型辨識對應於音訊資料的使用者以產生辨識結果，並且根據辨識結果以更新使用者資訊。In one embodiment of the present invention, the first sensing data includes audio data, wherein the cloud server further stores a speech recognition model and user information, identifies the user corresponding to the audio data by the speech recognition model to generate a recognition result, and updates the user information according to the recognition result.

在本發明的一實施例中，上述的互動裝置的至少一感測器包括加速度計，並且第一感測資料包括加速度值，其中雲端伺服器響應於加速度值大於閾值而通過智慧音箱播放第一音訊資料以及通過影像輸出介面來第一影像。In one embodiment of the present invention, at least one sensor of the above-mentioned interactive device includes an accelerometer, and the first sensing data includes an acceleration value, wherein the cloud server plays the first audio data through the smart speaker and displays the first image through the image output interface in response to the acceleration value being greater than a threshold.

在本發明的一實施例中，上述的至少一感測器包括顏色感測器，並且第一感測資料包括顏色值。In an embodiment of the present invention, the at least one sensor includes a color sensor, and the first sensing data includes a color value.

在本發明的一實施例中，上述的至少一感測器包括壓力感測器，並且第一感測資料包括壓力值。In one embodiment of the present invention, the at least one sensor includes a pressure sensor, and the first sensing data includes a pressure value.

本發明的一種基於音訊的互動方法，適用於包括雲端伺服器、智慧音箱以及互動裝置的互動系統，其中互動方法包括：由互動裝置的至少一感測器以及智慧音箱的麥克風的其中之一來產生第一感測資料，其中至少一感測器包括按鍵、加速度計、顏色感測器以及壓力感測器的至少其中之一；由互動裝置通過智慧音箱以通訊連接至雲端伺服器；以及由雲端伺服器響應於接收第一感測資料而通過智慧音箱以播放對應於第一感測資料的第一音訊資料以及通過互動裝置的影像輸出介面以播放對應於第一感測資料的第一影像。An audio-based interaction method of the present invention is applicable to an interactive system including a cloud server, a smart speaker and an interactive device, wherein the interaction method includes: generating first sensing data by at least one sensor of the interactive device and one of the microphones of the smart speaker, wherein the at least one sensor includes at least one of a button, an accelerometer, a color sensor and a pressure sensor; the interactive device is connected to the cloud server through the smart speaker for communication; and the cloud server plays first audio data corresponding to the first sensing data through the smart speaker and plays a first image corresponding to the first sensing data through the image output interface of the interactive device in response to receiving the first sensing data.

本發明的一種互動裝置，適用於與雲端伺服器以及智慧音箱進行互動，其中互動裝置包括收發器、至少一感測器、影像輸出介面以及處理器。收發器通訊連接至智慧音箱，並且通過智慧音箱通訊連接至雲端伺服器。至少一感測器產生第一感測資料。處理器耦接至收發器、至少一感測器以及影像輸出介面，其中處理器通過收發器傳送第一感測資料，通過收發器接收對應於第一感測資料的第一影像，並且通過影像輸出介面播放第一影像，其中第一感測資料用以指示雲端伺服器配置智慧音箱以播放對應於第一感測資料的第一音訊資料。An interactive device of the present invention is suitable for interacting with a cloud server and a smart speaker, wherein the interactive device includes a transceiver, at least one sensor, an image output interface, and a processor. The transceiver is communicatively connected to the smart speaker, and is communicatively connected to the cloud server through the smart speaker. At least one sensor generates first sensing data. The processor is coupled to the transceiver, at least one sensor, and the image output interface, wherein the processor transmits the first sensing data through the transceiver, receives a first image corresponding to the first sensing data through the transceiver, and plays the first image through the image output interface, wherein the first sensing data is used to instruct the cloud server to configure the smart speaker to play the first audio data corresponding to the first sensing data.

基於上述，本發明的互動系統提供使用者一種具有多種類型之感測器的互動裝置。使用者可通過互動裝置來與智慧音箱進行互動，從而得到更多元的娛樂體驗。Based on the above, the interactive system of the present invention provides users with an interactive device with multiple types of sensors. Users can interact with the smart speaker through the interactive device to obtain a more diverse entertainment experience.

為了使本發明之內容可以被更容易明瞭，以下特舉實施例作為本發明確實能夠據以實施的範例。另外，凡可能之處，在圖式及實施方式中使用相同標號的元件/構件/步驟，係代表相同或類似部件。In order to make the content of the present invention more clearly understood, the following embodiments are specifically cited as examples by which the present invention can be truly implemented. In addition, wherever possible, elements/components/steps with the same reference numerals in the drawings and embodiments represent the same or similar components.

為了提供使用者更多元的娛樂體驗，本發明提出一種基於音訊的互動系統。圖1根據本發明的實施例繪示一種基於音訊的互動系統100的示意圖。互動系統100可包含雲端伺服器100、智慧音箱200以及互動裝置300，其中雲端伺服器100可設置在遠端，並且智慧音箱200以及互動裝置300可設置在本地端。智慧音箱200可通訊連接至雲端伺服器100以及互動裝置300。雲端伺服器100以及互動裝置300可通過智慧音箱200來轉發訊號給彼此。In order to provide users with a more diverse entertainment experience, the present invention proposes an audio-based interactive system. FIG1 shows a schematic diagram of an audio-based interactive system 100 according to an embodiment of the present invention. The interactive system 100 may include a cloud server 100, a smart speaker 200, and an interactive device 300, wherein the cloud server 100 may be disposed at a remote end, and the smart speaker 200 and the interactive device 300 may be disposed at a local end. The smart speaker 200 may be communicatively connected to the cloud server 100 and the interactive device 300. The cloud server 100 and the interactive device 300 may forward signals to each other through the smart speaker 200.

圖2根據本發明的實施例繪示雲端伺服器100的示意圖。雲端伺服器100可包含處理器110、儲存媒體120以及收發器130。FIG2 is a schematic diagram of a cloud server 100 according to an embodiment of the present invention. The cloud server 100 may include a processor 110, a storage medium 120, and a transceiver 130.

處理器110例如是中央處理單元（central processing unit，CPU），或是其他可程式化之一般用途或特殊用途的微控制單元（micro control unit，MCU）、微處理器（microprocessor）、數位信號處理器（digital signal processor，DSP）、可程式化控制器、特殊應用積體電路（application specific integrated circuit，ASIC）、圖形處理器（graphics processing unit，GPU）、影像訊號處理器（image signal processor，ISP）、影像處理單元（image processing unit，IPU）、算數邏輯單元（arithmetic logic unit，ALU）、複雜可程式邏輯裝置（complex programmable logic device，CPLD）、現場可程式化邏輯閘陣列（field programmable gate array，FPGA）或其他類似元件或上述元件的組合。處理器110可耦接至儲存媒體120以及收發器130，並且存取和執行儲存於儲存媒體120中的多個模組和各種應用程式。The processor 110 is, for example, a central processing unit (CPU), or other programmable general-purpose or special-purpose micro control unit (MCU), microprocessor, digital signal processor (DSP), programmable controller, application specific integrated circuit (ASIC), graphics processing unit (GPU), image signal processor (ISP), image processing unit (IPU), arithmetic logic unit (ALU), complex programmable logic device (CPLD), field programmable gate array (FPGA), or other similar components or combinations of the above components. The processor 110 may be coupled to the storage medium 120 and the transceiver 130 , and access and execute a plurality of modules and various applications stored in the storage medium 120 .

儲存媒體120例如是任何型態的固定式或可移動式的隨機存取記憶體（random access memory，RAM）、唯讀記憶體（read-only memory，ROM）、快閃記憶體（flash memory）、硬碟（hard disk drive，HDD）、固態硬碟（solid state drive，SSD）或類似元件或上述元件的組合，而用於儲存可由處理器110執行的多個模組或各種應用程式。在本實施例中，儲存媒體120可儲存各種音訊資料，或可儲存用於進行智慧音箱200與互動裝置300之間的互動時所使用的參考資料以及程式碼等。The storage medium 120 is, for example, any type of fixed or removable random access memory (RAM), read-only memory (ROM), flash memory, hard disk drive (HDD), solid state drive (SSD) or similar components or a combination of the above components, and is used to store multiple modules or various applications that can be executed by the processor 110. In this embodiment, the storage medium 120 can store various audio data, or can store reference data and program codes used for interaction between the smart speaker 200 and the interactive device 300.

收發器130以無線或有線的方式傳送及接收訊號。收發器130還可以執行例如低噪聲放大、阻抗匹配、混頻、向上或向下頻率轉換、濾波、放大以及類似的操作。雲端伺服器100可通過收發器130來與智慧音箱200或互動裝置300進行通訊。The transceiver 130 transmits and receives signals wirelessly or wiredly. The transceiver 130 may also perform operations such as low noise amplification, impedance matching, mixing, up or down frequency conversion, filtering, amplification, and the like. The cloud server 100 may communicate with the smart speaker 200 or the interactive device 300 via the transceiver 130 .

圖3根據本發明的實施例繪示智慧音箱200的示意圖。智慧音箱200可用以播放所接收到的音訊資料。智慧音箱200可包含處理器210、儲存媒體220以及收發器230。在一實施例中，智慧音箱200更可包含麥克風240。FIG3 is a schematic diagram of a smart speaker 200 according to an embodiment of the present invention. The smart speaker 200 can be used to play received audio data. The smart speaker 200 can include a processor 210, a storage medium 220, and a transceiver 230. In one embodiment, the smart speaker 200 can further include a microphone 240.

處理器210例如是中央處理單元，或是其他可程式化之一般用途或特殊用途的微控制單元、微處理器、數位信號處理器、可程式化控制器、特殊應用積體電路、圖形處理器、影像訊號處理器、影像處理單元、算數邏輯單元、複雜可程式邏輯裝置、現場可程式化邏輯閘陣列或其他類似元件或上述元件的組合。處理器210可耦接至儲存媒體220、收發器230以及麥克風240，並且存取和執行儲存於儲存媒體220中的多個模組和各種應用程式。The processor 210 is, for example, a central processing unit, or other programmable general-purpose or special-purpose microcontroller unit, microprocessor, digital signal processor, programmable controller, special application integrated circuit, graphics processor, image signal processor, image processing unit, arithmetic logic unit, complex programmable logic device, field programmable logic gate array or other similar components or combinations of the above components. The processor 210 can be coupled to the storage medium 220, the transceiver 230 and the microphone 240, and access and execute multiple modules and various applications stored in the storage medium 220.

儲存媒體220例如是任何型態的固定式或可移動式的隨機存取記憶體、唯讀記憶體、快閃記憶體、硬碟、固態硬碟或類似元件或上述元件的組合，而用於儲存可由處理器210執行的多個模組或各種應用程式。The storage medium 220 is, for example, any type of fixed or removable random access memory, read-only memory, flash memory, hard disk, solid state drive or similar element or a combination of the above elements, and is used to store multiple modules or various applications that can be executed by the processor 210.

收發器230以無線或有線的方式傳送及接收訊號。收發器230還可以執行例如低噪聲放大、阻抗匹配、混頻、向上或向下頻率轉換、濾波、放大以及類似的操作。智慧音箱200可通過收發器230來與雲端伺服器100或互動裝置300進行通訊。The transceiver 230 transmits and receives signals wirelessly or wiredly. The transceiver 230 may also perform operations such as low noise amplification, impedance matching, mixing, up or down frequency conversion, filtering, amplification, and the like. The smart speaker 200 may communicate with the cloud server 100 or the interactive device 300 via the transceiver 230.

麥克風240可用來接收聲音。舉例來說，麥克風240可將使用者的聲音以產生對應的音訊資料，並可將所述音訊資料上傳至雲端伺服器100。The microphone 240 can be used to receive sound. For example, the microphone 240 can generate corresponding audio data from the user's voice and upload the audio data to the cloud server 100.

圖4根據本發明的實施例繪示互動裝置300的示意圖。使用者可通過互動裝置300來與智慧音箱200進行基於音訊的互動。互動裝置300例如是一種手持裝置。互動裝置300可包含處理器310、儲存媒體320、收發器330以及感測器340。在一實施例中，互動裝置300更可包含影像輸出介面350。值得注意的是，感測器340的種類以及數量可依使用者的需求而調整，本發明並不限於此。舉例來說，互動裝置300可包含兩種以上的感測器340。FIG4 is a schematic diagram of an interactive device 300 according to an embodiment of the present invention. A user can perform audio-based interaction with the smart speaker 200 through the interactive device 300. The interactive device 300 is, for example, a handheld device. The interactive device 300 may include a processor 310, a storage medium 320, a transceiver 330, and a sensor 340. In one embodiment, the interactive device 300 may further include an image output interface 350. It is worth noting that the type and quantity of the sensor 340 can be adjusted according to the needs of the user, and the present invention is not limited thereto. For example, the interactive device 300 may include more than two sensors 340.

處理器310例如是中央處理單元，或是其他可程式化之一般用途或特殊用途的微控制單元、微處理器、數位信號處理器、可程式化控制器、特殊應用積體電路、圖形處理器、影像訊號處理器、影像處理單元、算數邏輯單元、複雜可程式邏輯裝置、現場可程式化邏輯閘陣列或其他類似元件或上述元件的組合。處理器310可耦接至儲存媒體320、收發器330、感測器340以及影像輸出介面350，並且存取和執行儲存於儲存媒體320中的多個模組和各種應用程式。The processor 310 is, for example, a central processing unit, or other programmable general-purpose or special-purpose microcontroller unit, microprocessor, digital signal processor, programmable controller, special application integrated circuit, graphics processor, image signal processor, image processing unit, arithmetic logic unit, complex programmable logic device, field programmable logic gate array or other similar components or combinations of the above components. The processor 310 can be coupled to the storage medium 320, the transceiver 330, the sensor 340 and the image output interface 350, and access and execute multiple modules and various applications stored in the storage medium 320.

儲存媒體320例如是任何型態的固定式或可移動式的隨機存取記憶體、唯讀記憶體、快閃記憶體、硬碟、固態硬碟或類似元件或上述元件的組合，而用於儲存可由處理器310執行的多個模組或各種應用程式。The storage medium 320 is, for example, any type of fixed or removable random access memory, read-only memory, flash memory, hard disk, solid state drive or similar element or a combination of the above elements, and is used to store multiple modules or various applications that can be executed by the processor 310.

收發器330以無線或有線的方式傳送及接收訊號。收發器230還可以執行例如低噪聲放大、阻抗匹配、混頻、向上或向下頻率轉換、濾波、放大以及類似的操作。互動裝置300可通過收發器330來與雲端伺服器100或智慧音箱200進行通訊。The transceiver 330 transmits and receives signals wirelessly or wiredly. The transceiver 230 may also perform operations such as low noise amplification, impedance matching, mixing, up or down frequency conversion, filtering, amplification, and the like. The interactive device 300 may communicate with the cloud server 100 or the smart speaker 200 via the transceiver 330.

感測器340例如是下列的其中之一或其組合：麥克風、按鍵、加速度計、角速度計、磁力計、壓力感測器或顏色感測器，但本發明並不限於此。The sensor 340 is, for example, one of the following or a combination thereof: a microphone, a button, an accelerometer, a angular velocity meter, a magnetometer, a pressure sensor, or a color sensor, but the present invention is not limited thereto.

影像輸出介面350例如是液晶顯示器（liquid-crystal display，LCD）、發光二極體（light-emitting diode，LED）顯示器、LED陣列、真空螢光顯示器（vacuum fluorescent display，VFD）、等離子顯示器（plasma display panel，PDP）、有機發光顯示器（organic light-emitting diode，OLED）或場發射顯示器（field-emission display，FED），但本發明不限於此。雲端伺服器100可通過智慧音箱200傳送指令至互動裝置300，藉以指示影像輸出介面350播放對應的影像，其中所述影像例如是來自於雲端伺服器100或是預存於儲存媒體320之中。The image output interface 350 is, for example, a liquid crystal display (LCD), a light-emitting diode (LED) display, an LED array, a vacuum fluorescent display (VFD), a plasma display panel (PDP), an organic light-emitting diode (OLED) or a field-emission display (FED), but the present invention is not limited thereto. The cloud server 100 can send a command to the interactive device 300 through the smart speaker 200 to instruct the image output interface 350 to play a corresponding image, wherein the image is, for example, from the cloud server 100 or pre-stored in the storage medium 320.

智慧音箱200可通過麥克風240產生第一感測資料，或自互動裝置300接收由感測器340所產生的第一感測資料。在取得第一感測資料之後，智慧音箱200可將第一感測資料傳送至雲端伺服器100。雲端伺服器100可響應於接收到第一感測資料而通過智慧音箱200播放對應於第一感測資料的第一音訊資料。第一感測資料可包含音訊資料、加速度值、傾斜值、壓力值或顏色值，但本發明不限於此。The smart speaker 200 may generate the first sensing data through the microphone 240, or receive the first sensing data generated by the sensor 340 from the interactive device 300. After obtaining the first sensing data, the smart speaker 200 may transmit the first sensing data to the cloud server 100. The cloud server 100 may play the first audio data corresponding to the first sensing data through the smart speaker 200 in response to receiving the first sensing data. The first sensing data may include audio data, acceleration value, tilt value, pressure value or color value, but the present invention is not limited thereto.

在一實施例中，互動裝置300的感測器340可包含麥克風。智慧音箱200可通過麥克風240來接收使用者發出的聲音以產生對應的音訊資料，或自互動裝置300接收由麥克風240所產生的音訊資料。智慧音箱200可將所述音訊資料傳送至雲端伺服器100。雲端伺服器100可根據所接收的音訊資料以通過智慧音箱200播放對應於的音訊資料。In one embodiment, the sensor 340 of the interactive device 300 may include a microphone. The smart speaker 200 may receive the sound emitted by the user through the microphone 240 to generate corresponding audio data, or receive the audio data generated by the microphone 240 from the interactive device 300. The smart speaker 200 may transmit the audio data to the cloud server 100. The cloud server 100 may play the corresponding audio data through the smart speaker 200 according to the received audio data.

舉例來說，雲端伺服器100可通過智慧音箱200播放音訊資料給使用者以向使用者提問。使用者可通過麥克風240或感測器340來回答問題。雲端伺服器100可根據預存於儲存媒體120的參考資料來判斷使用者所回答的內容與參考資料是否匹配，並且基於匹配結果來決定通過智慧音箱200播放代表「正確」或「錯誤」的音訊資料。雲端伺服器100還可限定使用者的答題時間。舉例來說，在雲端伺服器100向使用者提問後，雲端伺服器100可響應於在預設時間內接收到正確的回答而通過智慧音箱200播放代表「正確」的音訊資料，並可響應於未在預設時間內接收到正確的回答而通過智慧音箱200播放代表「錯誤」的音訊資料。在預設時間期間內，雲端伺服器100可通過智慧音箱200播放音訊資料以進行倒數。For example, the cloud server 100 can play audio data to the user through the smart speaker 200 to ask the user questions. The user can answer the question through the microphone 240 or the sensor 340. The cloud server 100 can determine whether the content of the user's answer matches the reference data based on the reference data pre-stored in the storage medium 120, and decide to play the audio data representing "correct" or "wrong" through the smart speaker 200 based on the matching result. The cloud server 100 can also limit the user's answer time. For example, after the cloud server 100 asks a question to the user, the cloud server 100 may play audio data representing "correct" through the smart speaker 200 in response to receiving a correct answer within a preset time, and may play audio data representing "error" through the smart speaker 200 in response to not receiving a correct answer within the preset time. During the preset time, the cloud server 100 may play audio data through the smart speaker 200 to count down.

上述的實施例可讓使用者通過互動裝置100的麥克風或智慧音箱200的麥克風240來與智慧音箱200進行對話。如此，可實現諸如問答遊戲、猜歌遊戲或飼養虛擬寵物等互動行為。The above-mentioned embodiments allow the user to communicate with the smart speaker 200 through the microphone of the interactive device 100 or the microphone 240 of the smart speaker 200. In this way, interactive behaviors such as quiz games, song guessing games, or raising virtual pets can be realized.

在一實施例中，智慧音箱200可通過麥克風240來接收使用者發出的聲音以產生對應的音訊資料，或可自互動裝置300接收由麥克風240所產生的音訊資料。接著，智慧音箱200可通過智慧音箱200之中的放大電路來放大音訊資料的功率以產生經放大音訊資料，並且播放所述經放大音訊資料。如此，使用者可將智慧音箱200作為擴音器使用。In one embodiment, the smart speaker 200 can receive the sound emitted by the user through the microphone 240 to generate corresponding audio data, or can receive the audio data generated by the microphone 240 from the interactive device 300. Then, the smart speaker 200 can amplify the power of the audio data through the amplifier circuit in the smart speaker 200 to generate amplified audio data, and play the amplified audio data. In this way, the user can use the smart speaker 200 as a loudspeaker.

在一實施例中，雲端伺服器100的儲存媒體120可儲存語音辨識模組並且儲存對應於互動系統10的一或多個使用者的使用者資訊。在雲端伺服器100自智慧音箱200接收到音訊資料後，雲端伺服器100可利用語音辨識模型辨識對應於所述音訊資料的使用者，並產生對應的辨識結果。而後，雲端伺服器100可根據辨識結果來更新該名使用者的使用者資訊。舉例來說，在雲端伺服器100通過智慧音箱200播放一問題之後，智慧音箱200可接收人員A的回答的音訊資料，並將音訊資料轉發至雲端伺服器100。雲端伺服器100通過語音辨識模型判斷出所接收的音訊資料對應於人員A。而後，雲端伺服器100可響應於音訊資料代表正確的回答內容而更新人員A的使用者資訊。例如，雲端伺服器100可增加人員A的分數。In one embodiment, the storage medium 120 of the cloud server 100 can store a voice recognition module and store user information corresponding to one or more users of the interactive system 10. After the cloud server 100 receives audio data from the smart speaker 200, the cloud server 100 can use the voice recognition model to identify the user corresponding to the audio data and generate a corresponding recognition result. Then, the cloud server 100 can update the user information of the user based on the recognition result. For example, after the cloud server 100 plays a question through the smart speaker 200, the smart speaker 200 can receive the audio data of the answer of person A and forward the audio data to the cloud server 100. The cloud server 100 determines through the speech recognition model that the received audio data corresponds to person A. Then, the cloud server 100 can update the user information of person A in response to the audio data representing the correct answer content. For example, the cloud server 100 can increase the score of person A.

在一實施例中，互動裝置300的感測器340可包含加速度計。感測器340可用於偵測互動裝置300的加速度值。智慧音箱200可將接收自互動裝置300的加速度值傳送至雲端伺服器100。雲端伺服器100可響應於加速度值大於閾值而通過智慧音箱200來播放對應於所述加速度值的音訊資料。In one embodiment, the sensor 340 of the interactive device 300 may include an accelerometer. The sensor 340 may be used to detect the acceleration value of the interactive device 300. The smart speaker 200 may transmit the acceleration value received from the interactive device 300 to the cloud server 100. The cloud server 100 may play audio data corresponding to the acceleration value through the smart speaker 200 in response to the acceleration value being greater than the threshold.

舉例來說，互動裝置300例如是球。當使用者投擲互動裝置300時，互動裝置300中的感測器340可感測出加速度值。當加速度值大於閾值時，雲端伺服器100可通過智慧音箱200來播放與球的移動行為相關的音訊資料。For example, the interactive device 300 is a ball. When the user throws the interactive device 300, the sensor 340 in the interactive device 300 can sense the acceleration value. When the acceleration value is greater than the threshold, the cloud server 100 can play audio data related to the movement of the ball through the smart speaker 200.

舉另一例來說，互動裝置300例如是劍形玩具。當使用者揮動互動裝置300時，互動裝置300中的感測器340可感測出加速度值。當加速度值大於閾值時，雲端伺服器100可通過智慧音箱200來播放與劍形玩具的移動行為相關的音訊資料。For another example, the interactive device 300 is a sword-shaped toy. When the user swings the interactive device 300, the sensor 340 in the interactive device 300 can sense the acceleration value. When the acceleration value is greater than the threshold, the cloud server 100 can play audio data related to the movement of the sword-shaped toy through the smart speaker 200.

在一實施例中，互動裝置300的感測器340可包含顏色感測器。感測器340可用以偵測物體的顏色。舉例來說，互動裝置300例如是顏色偵測筆。假設使用者擁有一塊繪示了多種不同顏色的色塊的板子。雲端伺服器100可通過智慧音箱200來指示使用者將互動裝置300指向所述板子上的其中一種顏色的色塊。在使用者將互動裝置300指向所述板子中的一特定色塊後，互動裝置300可產生對應於所述特定色塊的顏色值，並可通過智慧音箱200將顏色值傳送給雲端伺服器100。雲端伺服器100可根據顏色值來判斷使用者的操作是否正確，並且通過智慧音箱200播放代表「正確」或「錯誤」的音訊資料。In one embodiment, the sensor 340 of the interactive device 300 may include a color sensor. The sensor 340 can be used to detect the color of an object. For example, the interactive device 300 is a color detection pen. Assume that the user has a board with color blocks of multiple different colors drawn on it. The cloud server 100 can instruct the user through the smart speaker 200 to point the interactive device 300 at a color block of one of the colors on the board. After the user points the interactive device 300 at a specific color block on the board, the interactive device 300 can generate a color value corresponding to the specific color block, and can transmit the color value to the cloud server 100 through the smart speaker 200. The cloud server 100 can determine whether the user's operation is correct based on the color value, and play audio data representing "correct" or "error" through the smart speaker 200.

在一實施例中，互動裝置300的感測器340可包含壓力感測器。感測器340可用於偵測施加在互動裝置300上的壓力值。智慧音箱200可將接收自互動裝置300的壓力值傳送給雲端伺服器100。雲端伺服器100響應於接收到所述壓力值而通過智慧音箱200來播放對應於所述壓力值的音訊資料。In one embodiment, the sensor 340 of the interactive device 300 may include a pressure sensor. The sensor 340 may be used to detect the pressure value applied to the interactive device 300. The smart speaker 200 may transmit the pressure value received from the interactive device 300 to the cloud server 100. In response to receiving the pressure value, the cloud server 100 plays audio data corresponding to the pressure value through the smart speaker 200.

舉例來說，互動裝置300例如是鍵盤樂器。當使用者按壓互動裝置300上的一琴鍵時，互動裝置300的感測器340可感測出壓力值。智慧音箱200可將壓力值轉發至雲端伺服器100。雲端伺服器100可響應於接收到所述壓力值而通過智慧音箱200播放對應於所述琴鍵及/或所述壓力值的音訊資料。For example, the interactive device 300 is a keyboard instrument. When the user presses a key on the interactive device 300, the sensor 340 of the interactive device 300 can sense the pressure value. The smart speaker 200 can forward the pressure value to the cloud server 100. The cloud server 100 can play audio data corresponding to the key and/or the pressure value through the smart speaker 200 in response to receiving the pressure value.

舉另一例來說，互動裝置300例如是打擊樂器。當使用者打擊互動裝置300（例如：打擊互動裝置300的鼓面）時，互動裝置300的感測器340可感測出壓力值。智慧音箱200可將壓力值轉發至雲端伺服器100。雲端伺服器100可響應於接收到所述壓力值而通過智慧音箱200播放對應於所述壓力值的音訊資料。For another example, the interactive device 300 is a percussion instrument. When the user strikes the interactive device 300 (e.g., strikes the drumhead of the interactive device 300), the sensor 340 of the interactive device 300 can sense the pressure value. The smart speaker 200 can forward the pressure value to the cloud server 100. In response to receiving the pressure value, the cloud server 100 can play audio data corresponding to the pressure value through the smart speaker 200.

在一實施例中，互動裝置300的感測器340可包含按鍵。當使用者按壓感測器340時，感測器340可產生脈衝訊號。互動裝置300可通過智慧音箱200以將脈衝訊號傳送至雲端伺服器。雲端伺服器100可響應於接收到脈衝訊號而通過智慧音箱200播放對應於所述脈衝訊號的音訊資料。In one embodiment, the sensor 340 of the interactive device 300 may include a button. When the user presses the sensor 340, the sensor 340 may generate a pulse signal. The interactive device 300 may transmit the pulse signal to the cloud server through the smart speaker 200. The cloud server 100 may play audio data corresponding to the pulse signal through the smart speaker 200 in response to receiving the pulse signal.

圖5根據本發明的實施例繪示互動裝置300的一種態樣的示意圖。在本實施例中，互動裝置300可包含感測器340以及影像輸出介面350。互動裝置300的感測器340可例如是一按鍵。當使用者按壓感測器340時，感測器340可產生脈衝訊號。互動裝置300可通過智慧音箱200以將脈衝訊號轉發給雲端伺服器100。感測器340的表面可設置影像輸出介面350，並且影像輸出介面350可例如是一LED陣列。影像輸出介面350可受控於處理器310或雲端伺服器100而播放來自雲端伺服器100或預存於儲存媒體320中的影像。FIG5 is a schematic diagram of an interactive device 300 according to an embodiment of the present invention. In this embodiment, the interactive device 300 may include a sensor 340 and an image output interface 350. The sensor 340 of the interactive device 300 may be, for example, a button. When the user presses the sensor 340, the sensor 340 may generate a pulse signal. The interactive device 300 may forward the pulse signal to the cloud server 100 through the smart speaker 200. The image output interface 350 may be provided on the surface of the sensor 340, and the image output interface 350 may be, for example, an LED array. The image output interface 350 can be controlled by the processor 310 or the cloud server 100 to play images from the cloud server 100 or pre-stored in the storage medium 320 .

圖6根據本發明的實施例繪示通過影像輸出介面350以輸出影像的示意圖。影像輸出介面350可輸出例如動物圖案（例如：貓或狗等）、幾何圖形（例如：心形、圓形或正方形等）、數字（例如：22）或者表情符號（例如：笑臉或哭臉等），如圖6所示，但本發明並不限於此。FIG6 is a schematic diagram showing an image output through the image output interface 350 according to an embodiment of the present invention. The image output interface 350 can output, for example, animal patterns (e.g., cats or dogs), geometric shapes (e.g., hearts, circles, or squares), numbers (e.g., 22), or emoticons (e.g., smiling faces or crying faces), as shown in FIG6 , but the present invention is not limited thereto.

在一實施例中，雲端伺服器100可通過智慧音箱200來控制影像輸出介面350播放用於進行倒數的影像。舉例來說，雲端伺服器100可通過智慧音箱200播放音訊資料以向使用者進行提問，並且控制影像輸出介面350播放用於倒數的影像。若雲端伺服器100在倒數尚未結束時通過智慧音箱200接收到對應於正確答案的音訊資料，則雲端伺服器100可判斷使用者的回答是正確的。據此，雲端伺服器100可通過智慧音箱200來播放對應於「正確」的音訊資料。另一方面，若雲端伺服器100並未在倒數結束前通過智慧音箱200接收到對應於正確答案的音訊資料，則雲端伺服器100可判斷使用者的回答是錯誤的。據此，雲端伺服器100可通過智慧音箱200來播放對應於「錯誤」的音訊資料。In one embodiment, the cloud server 100 can control the image output interface 350 to play the image for countdown through the smart speaker 200. For example, the cloud server 100 can play audio data through the smart speaker 200 to ask questions to the user, and control the image output interface 350 to play the image for countdown. If the cloud server 100 receives audio data corresponding to the correct answer through the smart speaker 200 before the countdown ends, the cloud server 100 can determine that the user's answer is correct. Accordingly, the cloud server 100 can play the audio data corresponding to "correct" through the smart speaker 200. On the other hand, if the cloud server 100 does not receive the audio data corresponding to the correct answer through the smart speaker 200 before the countdown ends, the cloud server 100 can determine that the user's answer is wrong. Accordingly, the cloud server 100 can play the audio data corresponding to "error" through the smart speaker 200.

在一實施例中，雲端伺服器100可響應於在影像的播放期間內自智慧音箱200接收到感測資料（或音訊資料）而通過智慧音箱200播放對應於所述感測資料的音訊資料。舉例來說，雲端伺服器100可通過智慧音箱200播放音訊資料來指示使用者在影像輸出介面350輸出特定影像的期間按壓感測器340，其中感測器340例如是按鍵。接著，雲端伺服器100可響應於在影像輸出介面350顯示特定影像的期間內接收到由使用者按壓感測器340所產生的脈衝訊號而通過智慧音箱200來播放音訊資料，從而藉由音訊資料來指示使用者的回應是正確的。In one embodiment, the cloud server 100 may play audio data corresponding to the sensing data through the smart speaker 200 in response to receiving sensing data (or audio data) from the smart speaker 200 during the playback of the image. For example, the cloud server 100 may play audio data through the smart speaker 200 to indicate that the user presses the sensor 340 during the period when the image output interface 350 outputs a specific image, wherein the sensor 340 is, for example, a button. Then, the cloud server 100 may play audio data through the smart speaker 200 in response to receiving a pulse signal generated by the user pressing the sensor 340 during the period when the image output interface 350 displays the specific image, thereby indicating that the user's response is correct through the audio data.

在一實施例中，雲端伺服器100可響應於在影像的播放期間內自智慧音箱200接收到感測資料（或音訊資料）而通過智慧音箱200來指示影像輸出介面350播放對應於感測資料的影像。舉例來說，雲端伺服器100可通過智慧音箱200播放音訊資料來指示使用者在影像輸出介面350輸出特定影像的期間按壓感測器340，其中感測器340例如是按鍵。接著，雲端伺服器100可響應於在影像輸出介面350輸出特定影像的期間內接收到由使用者按壓感測器340所產生的脈衝訊號而判斷使用者按壓感測器340的時間點是正確的。據此，雲端伺服器100可通過智慧音箱200來控制影像輸出介面350，從而通過影像輸出介面350來顯示對應於「正確」的影像，其中所述影像例如為一笑臉符號。另一方面，雲端伺服器100可通過智慧音箱200播放對應於「正確」的音訊資料。In one embodiment, the cloud server 100 may instruct the image output interface 350 to play the image corresponding to the sensing data through the smart speaker 200 in response to receiving the sensing data (or audio data) from the smart speaker 200 during the playback of the image. For example, the cloud server 100 may instruct the user to press the sensor 340 during the period when the image output interface 350 outputs a specific image by playing the audio data through the smart speaker 200, wherein the sensor 340 is, for example, a button. Then, the cloud server 100 may determine that the time point when the user presses the sensor 340 is correct in response to receiving the pulse signal generated by the user pressing the sensor 340 during the period when the image output interface 350 outputs the specific image. Accordingly, the cloud server 100 can control the image output interface 350 through the smart speaker 200, so as to display the image corresponding to the "correct" through the image output interface 350, wherein the image is, for example, a smiley face. On the other hand, the cloud server 100 can play the audio data corresponding to the "correct" through the smart speaker 200.

圖7根據本發明的實施例繪示一種執行問答遊戲的方法的流程圖，其中所述方法可由如圖1所示的互動系統10實施，其中所述互動系統10的互動裝置300例如是如圖5所示的互動裝置300。FIG. 7 is a flow chart of a method for executing a quiz game according to an embodiment of the present invention, wherein the method can be implemented by the interactive system 10 shown in FIG. 1 , wherein the interactive device 300 of the interactive system 10 is, for example, the interactive device 300 shown in FIG. 5 .

在步驟S701中，雲端伺服器100可偵測對應於使用者的呼叫的音訊資料。在本實施例中，雲端伺服器100所取得的任何音訊資料例如是產生自智慧音箱200的麥克風240或互動裝置300的麥克風。若音訊資料的來源是互動裝置300的麥克風，則如圖5所示的互動裝置300的感測器340除了包含按鍵之外，還可進一步包含麥克風。In step S701, the cloud server 100 may detect audio data corresponding to the user's call. In this embodiment, any audio data obtained by the cloud server 100 is, for example, generated from the microphone 240 of the smart speaker 200 or the microphone of the interactive device 300. If the source of the audio data is the microphone of the interactive device 300, the sensor 340 of the interactive device 300 shown in FIG5 may further include a microphone in addition to a button.

在步驟S702中，雲端伺服器100可通過智慧音箱200來播放音訊資料以回應使用者的呼叫。另一方面，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350顯示用於回應使用者的呼叫的影像。In step S702, the cloud server 100 can play audio data to respond to the user's call through the smart speaker 200. On the other hand, the cloud server 100 can instruct the image output interface 350 of the interactive device 300 to display an image for responding to the user's call through the smart speaker 200.

在步驟S703中，雲端伺服器100可響應於使用者的呼叫而開始執行互動流程，其中所述互動流程例如是問答遊戲。In step S703, the cloud server 100 may start executing an interactive process in response to the user's call, wherein the interactive process is, for example, a question-and-answer game.

在步驟S704中，雲端伺服器100可根據預存於儲存媒體120中的腳本而通過智慧音箱200來播放音訊資料以向使用者說明遊戲內容，並請使用者選擇難易度。另一方面，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350顯示用於者選擇難易度的影像。In step S704, the cloud server 100 can play audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to explain the game content to the user and ask the user to select the difficulty. On the other hand, the cloud server 100 can instruct the image output interface 350 of the interactive device 300 to display an image for the user to select the difficulty through the smart speaker 200.

在步驟S705中，雲端伺服器100可根據來自智慧音箱200的音訊資料來判斷使用者選擇的難易度。In step S705, the cloud server 100 can determine the difficulty of the user's selection based on the audio data from the smart speaker 200.

在步驟S706中，雲端伺服器100可通過智慧音箱200來播放音訊資料以向使用者確認是否開始執行問答遊戲。In step S706, the cloud server 100 may play audio data through the smart speaker 200 to confirm with the user whether to start the quiz game.

在步驟S707中，雲端伺服器100可根據來自智慧音箱200的音訊資料來判斷使用者同意開始執行問答遊戲。In step S707, the cloud server 100 can determine whether the user agrees to start the quiz game based on the audio data from the smart speaker 200.

在步驟S708中，雲端伺服器100可通過智慧音箱200以開始播放用來進行倒數的音訊資料。雲端伺服器100還可進一步通過互動裝置300的影像輸出介面350來播放用於進行倒數的影像。In step S708 , the cloud server 100 may start playing the audio data for the countdown through the smart speaker 200 . The cloud server 100 may further play the image for the countdown through the image output interface 350 of the interactive device 300 .

在步驟S709中，雲端伺服器100可在倒數結束後，根據預存於儲存媒體120的腳本而通過智慧音箱200來播放音訊資料以向使用者提出問題或發出指示。舉例來說，雲端伺服器100可通過智慧音箱200來指示使用者在互動裝置300的影像輸出介面350顯示一特定動物時，按壓互動裝置300上的感測器340。In step S709, after the countdown ends, the cloud server 100 can play audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to ask questions or give instructions to the user. For example, the cloud server 100 can instruct the user to press the sensor 340 on the interactive device 300 when the image output interface 350 of the interactive device 300 displays a specific animal through the smart speaker 200.

在步驟S710中，雲端伺服器100可通過智慧音箱200來控制互動裝置300的影像輸出介面350隨機地顯示不同的多個影像，其中所述多個影像例如包含各種類型的動物、數字或符號的影像。此外，雲端伺服器100可進一步地通過智慧音箱200來隨機地播放不同的多個音訊資料，其中所述多個音訊資料例如包含各種類型的動物的聲音。In step S710, the cloud server 100 can control the image output interface 350 of the interactive device 300 to randomly display multiple different images through the smart speaker 200, wherein the multiple images include, for example, images of various types of animals, numbers, or symbols. In addition, the cloud server 100 can further randomly play multiple different audio data through the smart speaker 200, wherein the multiple audio data include, for example, sounds of various types of animals.

在步驟S711中，雲端伺服器100可自智慧音箱200接收產生自感測器340的脈衝訊號，並且根據脈衝訊號判斷使用者按壓感測器340的時間是否正確。舉例來說，若使用者在影像輸出介面350顯示由雲端伺服器100所指定的特定動物之影像時按下感測器340，則雲端伺服器100可判斷使用者的回應是正確的。若使用者在影像輸出介面350顯示非雲端伺服器100所指定的其他動物之影像時按下感測器340，或使用者在一預設時間內未按下感測器340，則雲端伺服器100可判斷使用者的回應是錯誤的。In step S711, the cloud server 100 may receive a pulse signal generated from the sensor 340 from the smart speaker 200, and determine whether the time when the user presses the sensor 340 is correct based on the pulse signal. For example, if the user presses the sensor 340 when the image output interface 350 displays an image of a specific animal specified by the cloud server 100, the cloud server 100 may determine that the user's response is correct. If the user presses the sensor 340 when the image output interface 350 displays an image of another animal not specified by the cloud server 100, or the user does not press the sensor 340 within a preset time, the cloud server 100 may determine that the user's response is wrong.

在步驟S712中，雲端伺服器100可響應於判斷使用者的回應是正確的而通過智慧音箱200播放對應於「正確」的音訊資料。另一方面，雲端伺服器100可響應於判斷使用者的回應是正確的而通過智慧音箱200指示互動裝置300的影像輸出介面350顯示對應於「正確」的影像。In step S712, the cloud server 100 may respond to judging that the user's response is correct by playing the audio data corresponding to "correct" through the smart speaker 200. On the other hand, the cloud server 100 may respond to judging that the user's response is correct by instructing the image output interface 350 of the interactive device 300 to display the image corresponding to "correct" through the smart speaker 200.

在步驟S713中，雲端伺服器100可通過智慧音箱200接收用於請求更改問答遊戲的難易度的音訊資料。In step S713, the cloud server 100 may receive audio data for requesting to change the difficulty level of the quiz game through the smart speaker 200.

在步驟S714中，雲端伺服器100可通過智慧音箱200以播放用來詢問使用者欲選擇之難易度的音訊資料。In step S714, the cloud server 100 may play audio data through the smart speaker 200 to inquire the user about the difficulty level to be selected.

在步驟S715中，雲端伺服器100可通過智慧音箱200接收到用於選擇的難易度的音訊資料。In step S715 , the cloud server 100 may receive audio data for selecting the difficulty level through the smart speaker 200 .

在步驟S716中，雲端伺服器100可通過智慧音箱200來播放音訊資料以向使用者確認是否開始執行問答遊戲。In step S716, the cloud server 100 may play audio data through the smart speaker 200 to confirm with the user whether to start the quiz game.

在步驟S717中，雲端伺服器100可根據來自智慧音箱200的音訊資料來判斷使用者同意開始執行問答遊戲。In step S717, the cloud server 100 can determine whether the user agrees to start the quiz game based on the audio data from the smart speaker 200.

在步驟S718中，雲端伺服器100可通過智慧音箱200以播放用來進行倒數的音訊資料。另一方面，雲端伺服器100可通過智慧音箱200以指示互動裝置300的影像輸出裝置350播放用來進行倒數的影像。In step S718, the cloud server 100 can play the audio data for countdown through the smart speaker 200. On the other hand, the cloud server 100 can instruct the image output device 350 of the interactive device 300 to play the image for countdown through the smart speaker 200.

在步驟S719中，雲端伺服器100可在倒數結束後，根據預存於儲存媒體120的腳本而通過智慧音箱200來播放音訊資料以向使用者提出問題或發出指示。In step S719, after the countdown ends, the cloud server 100 may play audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to ask questions or give instructions to the user.

在步驟S720中，雲端伺服器100可通過智慧音箱200來控制互動裝置300的影像輸出介面350隨機地顯示不同的多個影像，其中所述多個影像例如包含各種類型的動物、數字或符號的影像。此外，雲端伺服器100可進一步地通過智慧音箱200來隨機地播放不同的多個音訊資料，其中所述多個音訊資料例如包含各種類型的動物的聲音。In step S720, the cloud server 100 can control the image output interface 350 of the interactive device 300 to randomly display multiple different images through the smart speaker 200, wherein the multiple images include, for example, images of various types of animals, numbers, or symbols. In addition, the cloud server 100 can further randomly play multiple different audio data through the smart speaker 200, wherein the multiple audio data include, for example, sounds of various types of animals.

在步驟S721中，雲端伺服器100可自智慧音箱200接收產生自感測器340的脈衝訊號，並且根據脈衝訊號判斷使用者按壓感測器340的時間是否正確。舉例來說，若使用者在影像輸出介面350顯示由雲端伺服器100所指定的特定動物之影像時按下感測器340，則雲端伺服器100可判斷使用者的回應是正確的。若使用者在影像輸出介面350顯示非雲端伺服器100所指定的其他動物之影像時按下感測器340，或使用者在一預設時間內未按下感測器340，則雲端伺服器100可判斷使用者的回應是錯誤的。In step S721, the cloud server 100 may receive a pulse signal generated from the sensor 340 from the smart speaker 200, and determine whether the time when the user presses the sensor 340 is correct based on the pulse signal. For example, if the user presses the sensor 340 when the image output interface 350 displays an image of a specific animal specified by the cloud server 100, the cloud server 100 may determine that the user's response is correct. If the user presses the sensor 340 when the image output interface 350 displays an image of another animal not specified by the cloud server 100, or the user does not press the sensor 340 within a preset time, the cloud server 100 may determine that the user's response is wrong.

在步驟S722中，雲端伺服器100可響應於判斷使用者的回應是正確的而通過智慧音箱200播放對應於「正確」的音訊資料。另一方面，雲端伺服器100可響應於判斷使用者的回應是正確的而通過智慧音箱200指示互動裝置300的影像輸出介面350顯示對應於「正確」的影像。In step S722, the cloud server 100 may respond to judging that the user's response is correct by playing the audio data corresponding to "correct" through the smart speaker 200. On the other hand, the cloud server 100 may respond to judging that the user's response is correct by instructing the image output interface 350 of the interactive device 300 to display the image corresponding to "correct" through the smart speaker 200.

在步驟S723中，雲端伺服器100可根據預存於儲存媒體120的腳本而通過智慧音箱200來播放音訊資料以向使用者提出問題或發出指示。In step S723, the cloud server 100 may play audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to ask questions or give instructions to the user.

在步驟S724中，雲端伺服器100可響應於自智慧音箱200接收到對應於終止指令的音訊資料（例如：產生自智慧音箱200的麥克風240）而判斷使用者已指示互動系統10終止進行中的問答遊戲。In step S724, the cloud server 100 may determine that the user has instructed the interactive system 10 to terminate the ongoing quiz game in response to receiving audio data corresponding to the termination command from the smart speaker 200 (e.g., generated by the microphone 240 of the smart speaker 200).

在步驟S725中，雲端伺服器100可響應於判斷使用者的回應是錯誤的而通過智慧音箱200播放對應於「錯誤」的音訊資料。另一方面，雲端伺服器100可響應於判斷使用者的回應是錯誤的而通過智慧音箱200指示互動裝置300的影像輸出介面350顯示對應於「錯誤」的影像。In step S725, the cloud server 100 may play audio data corresponding to "error" through the smart speaker 200 in response to determining that the user's response is erroneous. On the other hand, the cloud server 100 may instruct the image output interface 350 of the interactive device 300 to display an image corresponding to "error" through the smart speaker 200 in response to determining that the user's response is erroneous.

在步驟S726中，雲端伺服器100可通過智慧音箱200來播放音訊資料以詢問使用者是否準備好接受處罰遊戲。In step S726, the cloud server 100 may play audio data through the smart speaker 200 to inquire whether the user is ready to accept the punishment game.

在步驟S727中，雲端伺服器100可根據來自智慧音箱200的音訊資料來判斷使用者同意開始進行處罰遊戲。In step S727, the cloud server 100 can determine whether the user agrees to start the punishment game based on the audio data from the smart speaker 200.

在步驟S728中，雲端伺服器100可根據預存於儲存媒體120中的腳本而通過智慧音箱200來播放音訊資料，藉以詢問使用者欲進行的處罰遊戲種類。In step S728, the cloud server 100 may play audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to inquire the user about the type of punishment game he wants to play.

在步驟S729中，雲端伺服器100可根據來自智慧音箱200的音訊資料來判斷使用者所選擇的處罰遊戲。In step S729, the cloud server 100 can determine the punishment game selected by the user based on the audio data from the smart speaker 200.

在步驟S730中，雲端伺服器100可根據預存於儲存媒體120中的腳本而通過智慧音箱200來播放音訊資料，藉以指示使用者進行處罰遊戲。舉例來說，雲端伺服器100可通過智慧音箱200來指示使用者進行伏地挺身。In step S730, the cloud server 100 can play audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to instruct the user to play the punishment game. For example, the cloud server 100 can instruct the user to perform push-ups through the smart speaker 200.

在步驟S731中，互動系統10可結束問答遊戲。雲端伺服器100可在結束問答遊戲之前，通過智慧音箱200來播放音訊資料以通知使用者問答遊戲即將結束。此外，雲端伺服器100還可通過智慧音箱200以指示互動裝置300的影像輸出介面350播放影像，藉以通知使用者問答遊戲即將結束。In step S731, the interactive system 10 may end the quiz game. Before ending the quiz game, the cloud server 100 may play audio data through the smart speaker 200 to notify the user that the quiz game is about to end. In addition, the cloud server 100 may also instruct the image output interface 350 of the interactive device 300 to play an image through the smart speaker 200 to notify the user that the quiz game is about to end.

圖8根據本發明的實施例繪示互動裝置300的另一種態樣的示意圖。在本實施例中，互動裝置300可包含感測器340以及影像輸出介面350。互動裝置300的感測器340可例如是一加速度計。當互動裝置300傾斜時，感測器340可偵測互動裝置300並產生對應的傾斜值。影像輸出介面350可例如是一LED陣列。影像輸出介面350可受控於處理器310或雲端伺服器100而播放來自雲端伺服器100或預存於儲存媒體320中的影像，其中所述影像輸出介面350可關聯於傾斜值。FIG8 is a schematic diagram showing another embodiment of the interactive device 300 according to an embodiment of the present invention. In this embodiment, the interactive device 300 may include a sensor 340 and an image output interface 350. The sensor 340 of the interactive device 300 may be, for example, an accelerometer. When the interactive device 300 is tilted, the sensor 340 may detect the interactive device 300 and generate a corresponding tilt value. The image output interface 350 may be, for example, an LED array. The image output interface 350 may be controlled by the processor 310 or the cloud server 100 to play images from the cloud server 100 or pre-stored in the storage medium 320, wherein the image output interface 350 may be associated with the tilt value.

在一實施例中，互動裝置300可通過智慧音箱200以將傾斜值傳送給雲端伺服器100。雲端伺服器100可根據傾斜值而通過智慧音箱200以發送指令至互動裝置300，藉以指示影像輸出介面350播放對應於傾斜值的影像。在一實施例中，互動裝置300的處理器310可根據傾斜值而發送控制命令至影像輸出介面350，藉以指示影像輸出介面350播放對應於傾斜值的影像。In one embodiment, the interactive device 300 can transmit the tilt value to the cloud server 100 through the smart speaker 200. The cloud server 100 can send a command to the interactive device 300 through the smart speaker 200 according to the tilt value, thereby instructing the image output interface 350 to play the image corresponding to the tilt value. In one embodiment, the processor 310 of the interactive device 300 can send a control command to the image output interface 350 according to the tilt value, thereby instructing the image output interface 350 to play the image corresponding to the tilt value.

在一實施例中，雲端伺服器100可通過智慧音箱200向使用者提問。使用者可一邊維持互動裝置300的平衡而一邊回答問題。雲端伺服器100可通過智慧音箱200來接收對應於使用者的回答的音訊資料以及由互動裝置300的感測器340所產生的傾斜值。若雲端伺服器100判斷使用者可維持互動裝置300的平衡而使得感測器340所產生的傾斜值小於閾值，並且使用者的回答與預存於雲端伺服器100的儲存媒體120中的參考資料相匹配，則雲端伺服器100可通過智慧音箱200播放對應於「正確」的音訊資料。另一方面，若雲端伺服器100判斷使用者無法維持互動裝置300的平衡而導致感測器340所產生的傾斜值大於或等於閾值，或者使用者的回答與儲存媒體120中的參考資料不相匹配，則雲端伺服器100可通過智慧音箱200播放對應於「錯誤」的音訊資料。In one embodiment, the cloud server 100 may ask questions to the user through the smart speaker 200. The user may answer the questions while maintaining the balance of the interactive device 300. The cloud server 100 may receive audio data corresponding to the user's answer and the tilt value generated by the sensor 340 of the interactive device 300 through the smart speaker 200. If the cloud server 100 determines that the user can maintain the balance of the interactive device 300 so that the tilt value generated by the sensor 340 is less than the threshold, and the user's answer matches the reference data pre-stored in the storage medium 120 of the cloud server 100, the cloud server 100 may play the audio data corresponding to the "correct" answer through the smart speaker 200. On the other hand, if the cloud server 100 determines that the user cannot maintain the balance of the interactive device 300, causing the tilt value generated by the sensor 340 to be greater than or equal to the threshold, or the user's answer does not match the reference data in the storage medium 120, the cloud server 100 can play audio data corresponding to "error" through the smart speaker 200.

在一實施例中，雲端伺服器100可根據傾斜值而通過智慧音箱200指示互動裝置300的影像輸出介面350顯示對應於傾斜值的物件。在本實施例中，對應於傾斜值的物件例如是球的影像。當傾斜值越小時，雲端伺服器100可指示影像輸出介面350將對應於傾斜值的物件的影像顯示在越接近影像輸出介面350的中心的位置。當傾斜值越大時，雲端伺服器100可指示影像輸出介面350將對應於傾斜值的物件的影像顯示在越遠離影像輸出介面350的中心的位置。In one embodiment, the cloud server 100 may instruct the image output interface 350 of the interactive device 300 to display an object corresponding to the tilt value through the smart speaker 200 according to the tilt value. In this embodiment, the object corresponding to the tilt value is, for example, an image of a ball. When the tilt value is smaller, the cloud server 100 may instruct the image output interface 350 to display the image of the object corresponding to the tilt value at a position closer to the center of the image output interface 350. When the tilt value is larger, the cloud server 100 may instruct the image output interface 350 to display the image of the object corresponding to the tilt value at a position farther from the center of the image output interface 350.

圖9根據本發明的實施例繪示通過影像輸出介面350以輸出對應於傾斜值的影像的示意圖。如圖9所示，影像361或影像362分別對應於較小的傾斜值，並且影像363或影像364分別對應於較大的傾斜值。影像362較影像361接近影像輸出介面350的中心位置代表影像362所對應的傾斜值小於影像361所對應的傾斜值。雲端伺服器100可響應於傾斜值大於閾值而指示影像輸出介面350改變所顯示之影像的外觀。例如，雲端伺服器100可指示影像輸出介面350以較淺的顏色來顯示對應於小於或等於閾值的傾斜值的影像361或影像362，並且以較深的顏色來顯示對應於大於閾值的傾斜值的影像363或影像364。FIG. 9 is a schematic diagram showing an image outputting an image corresponding to a tilt value through the image output interface 350 according to an embodiment of the present invention. As shown in FIG. 9 , image 361 or image 362 corresponds to a smaller tilt value, and image 363 or image 364 corresponds to a larger tilt value. The fact that image 362 is closer to the center of the image output interface 350 than image 361 indicates that the tilt value corresponding to image 362 is smaller than the tilt value corresponding to image 361. The cloud server 100 may instruct the image output interface 350 to change the appearance of the displayed image in response to the tilt value being greater than the threshold value. For example, the cloud server 100 may instruct the image output interface 350 to display the image 361 or 362 corresponding to the tilt value less than or equal to the threshold in a lighter color, and to display the image 363 or 364 corresponding to the tilt value greater than the threshold in a darker color.

圖10根據本發明的實施例繪示另一種執行問答遊戲的方法的流程圖，其中所述方法可由如圖1所示的互動系統10實施，其中所述互動系統10的互動裝置300例如是如圖8所示的互動裝置300。FIG. 10 is a flow chart showing another method for executing a quiz game according to an embodiment of the present invention, wherein the method can be implemented by the interactive system 10 shown in FIG. 1 , wherein the interactive device 300 of the interactive system 10 is, for example, the interactive device 300 shown in FIG. 8 .

在步驟S1001，雲端伺服器100可偵測對應於使用者的呼叫的音訊資料。在本實施例中，雲端伺服器100所取得的任何音訊資料例如是產生自智慧音箱200的麥克風240或互動裝置300的麥克風。若音訊資料的來源是互動裝置300的麥克風，則如圖8所示的互動裝置300的感測器340除了包含加速度計之外，還可進一步包含麥克風。In step S1001, the cloud server 100 may detect audio data corresponding to a user's call. In this embodiment, any audio data obtained by the cloud server 100 is, for example, generated from the microphone 240 of the smart speaker 200 or the microphone of the interactive device 300. If the source of the audio data is the microphone of the interactive device 300, the sensor 340 of the interactive device 300 shown in FIG8 may further include a microphone in addition to an accelerometer.

在步驟S1002，雲端伺服器100可通過智慧音箱200來播放音訊資料以回應使用者的呼叫。另一方面，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350顯示用於回應使用者的呼叫的影像。In step S1002, the cloud server 100 can play audio data to respond to the user's call through the smart speaker 200. On the other hand, the cloud server 100 can instruct the image output interface 350 of the interactive device 300 to display an image for responding to the user's call through the smart speaker 200.

在步驟S1003，雲端伺服器100可響應於使用者的呼叫而開始執行互動流程，其中所述互動流程例如是問答遊戲。In step S1003, the cloud server 100 may start executing an interactive process in response to the user's call, wherein the interactive process is, for example, a question-and-answer game.

在步驟S1004，雲端伺服器100可根據預存於儲存媒體120中的腳本而通過智慧音箱200來播放音訊資料以向使用者說明遊戲內容，並請使用者選擇難易度。另一方面，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350顯示用於者選擇難易度的影像。In step S1004, the cloud server 100 can play audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to explain the game content to the user and ask the user to select the difficulty. On the other hand, the cloud server 100 can instruct the image output interface 350 of the interactive device 300 to display an image for the user to select the difficulty through the smart speaker 200.

在步驟S1005，雲端伺服器100可根據來自智慧音箱200的音訊資料來判斷使用者選擇的難易度。In step S1005, the cloud server 100 can determine the difficulty of the user's selection based on the audio data from the smart speaker 200.

在步驟S1006，雲端伺服器100可通過智慧音箱200來播放音訊資料以向使用者詢問遊戲時間的長度。In step S1006, the cloud server 100 may play audio data via the smart speaker 200 to inquire the user about the length of the game time.

在步驟S1007，雲端伺服器100可根據來自智慧音箱200的音訊資料來判斷使用者選擇的遊戲時間的長度。In step S1007, the cloud server 100 can determine the length of the game time selected by the user based on the audio data from the smart speaker 200.

在步驟S1008，雲端伺服器100可通過智慧音箱200來播放音訊資料以向使用者確認是否向使用者說明遊戲規則。In step S1008, the cloud server 100 may play audio data through the smart speaker 200 to confirm with the user whether to explain the game rules to the user.

在步驟S1009，雲端伺服器100可根據來自智慧音箱200的音訊資料來判斷是否向使用者說明遊戲規則。In step S1009, the cloud server 100 can determine whether to explain the game rules to the user based on the audio data from the smart speaker 200.

在步驟S1010，雲端伺服器100可響應於判斷向使用者說明遊戲規則而根據預存於儲存媒體120中的腳本以通過智慧音箱200來播放音訊資料以向使用者說明遊戲規則。In step S1010, the cloud server 100 may respond to the determination to explain the game rules to the user by playing audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to explain the game rules to the user.

在步驟S1011，雲端伺服器100可根據來自智慧音箱200的音訊資料來判斷使用者同意開始執行問答遊戲。In step S1011, the cloud server 100 can determine whether the user agrees to start the quiz game based on the audio data from the smart speaker 200.

在步驟S1012，雲端伺服器100可通過智慧音箱200以開始播放用來進行倒數的音訊資料。雲端伺服器100還可進一步通過互動裝置300的影像輸出介面350來播放用來進行倒數的影像。In step S1012, the cloud server 100 can start playing the audio data for the countdown through the smart speaker 200. The cloud server 100 can further play the image for the countdown through the image output interface 350 of the interactive device 300.

在步驟S1013，雲端伺服器100可在倒數結束後，根據預存於儲存媒體120的腳本而通過智慧音箱200來播放音訊資料以向使用者提出問題。In step S1013, after the countdown ends, the cloud server 100 may play audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to ask questions to the user.

在步驟S1014，雲端伺服器100可自智慧音箱200接收對應於使用者的回答的音訊資料。In step S1014, the cloud server 100 may receive audio data corresponding to the user's answer from the smart speaker 200.

在步驟S1015，則雲端伺服器100可判斷使用者的回答與預存於雲端伺服器100的儲存媒體120的參考資料是否匹配，並且響應於使用者的回答與參考資料匹配而在儲存媒體120中的所述使用者的使用者資料中記錄所述使用者的回答正確的次數。接著，雲端伺服器100可通過智慧音箱200播放音訊資料，藉以指示使用者將互動裝置300傳遞給下一位使用者，並且由下一位使用者來回答下一個問題。In step S1015, the cloud server 100 may determine whether the user's answer matches the reference data pre-stored in the storage medium 120 of the cloud server 100, and in response to the user's answer matching the reference data, record the number of correct answers of the user in the user data of the user in the storage medium 120. Then, the cloud server 100 may play the audio data through the smart speaker 200 to instruct the user to pass the interactive device 300 to the next user, and the next user will answer the next question.

在步驟S1016，雲端伺服器100可自智慧音箱200接收對應於使用者的回答的音訊資料。In step S1016, the cloud server 100 may receive audio data corresponding to the user's answer from the smart speaker 200.

在步驟S1017，雲端伺服器100可判斷使用者的回答與預存於雲端伺服器100的儲存媒體120的參考資料是否匹配，並且響應於使用者的回答與參考資料不匹配而在儲存媒體120中的所述使用者的使用者資料中記錄所述使用者的回答錯誤的次數。接著，雲端伺服器100可通過智慧音箱200播放音訊資料，藉以指示使用者繼續回答下一個問題。In step S1017, the cloud server 100 may determine whether the user's answer matches the reference data pre-stored in the storage medium 120 of the cloud server 100, and in response to the user's answer not matching the reference data, record the number of times the user's answer is wrong in the user data of the user in the storage medium 120. Then, the cloud server 100 may play audio data through the smart speaker 200 to instruct the user to continue answering the next question.

在步驟S1018，雲端伺服器100可根據預存於儲存媒體120的腳本而通過智慧音箱200來播放音訊資料以向使用者提出問題。接著，雲端伺服器100可自智慧音箱200接收對應於使用者的回答的音訊資料，並且判斷使用者的回答與預存於雲端伺服器100的儲存媒體120的參考資料是否匹配。若使用者的回答與參考資料匹配，則進入步驟S1019。若使用者的回答與參考資料不匹配，則進入步驟S1020。In step S1018, the cloud server 100 may play audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to ask the user a question. Then, the cloud server 100 may receive audio data corresponding to the user's answer from the smart speaker 200, and determine whether the user's answer matches the reference data pre-stored in the storage medium 120 of the cloud server 100. If the user's answer matches the reference data, then proceed to step S1019. If the user's answer does not match the reference data, then proceed to step S1020.

在步驟S1019，雲端伺服器100可響應於使用者的回答與參考資料匹配而在儲存媒體120中的所述使用者的使用者資料中記錄所述使用者的回答正確的次數。In step S1019, the cloud server 100 may record the number of correct answers of the user in the user data of the user in the storage medium 120 in response to the user's answer matching the reference data.

在步驟S1020，雲端伺服器100響應於使用者的回答與參考資料不匹配而在儲存媒體120中的所述使用者的使用者資料中記錄所述使用者的回答錯誤的次數。In step S1020 , the cloud server 100 records the number of incorrect answers of the user in the user data of the user in the storage medium 120 in response to the user's answer not matching the reference data.

在步驟S1021，雲端伺服器100可通過智慧音箱200播放代表「正確」的音訊資料。另一方面，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350播放代表「正確」的影像。而後，雲端伺服器100可通過智慧音箱200播放音訊資料，以提示使用者將互動裝置300傳遞給下一位使用者。此外，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350播放影像，以提示使用者將互動裝置300傳遞給下一位使用者。In step S1021, the cloud server 100 may play audio data representing "correct" through the smart speaker 200. On the other hand, the cloud server 100 may instruct the image output interface 350 of the interactive device 300 through the smart speaker 200 to play an image representing "correct". Then, the cloud server 100 may play audio data through the smart speaker 200 to prompt the user to pass the interactive device 300 to the next user. In addition, the cloud server 100 may instruct the image output interface 350 of the interactive device 300 through the smart speaker 200 to play an image to prompt the user to pass the interactive device 300 to the next user.

在步驟S1022，雲端伺服器100可通過智慧音箱200播放代表「錯誤」的音訊資料。另一方面，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350播放代表「錯誤」的影像。而後，雲端伺服器100可通過智慧音箱200播放音訊資料，以提示使用者將不要將互動裝置300傳遞給下一位使用者，並且繼續回答下一個問題。此外，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350播放影像，以提示使用者不要將互動裝置300傳遞給下一位使用者，並且繼續回答下一個問題。In step S1022, the cloud server 100 may play audio data representing "error" through the smart speaker 200. On the other hand, the cloud server 100 may instruct the image output interface 350 of the interactive device 300 through the smart speaker 200 to play an image representing "error". Then, the cloud server 100 may play audio data through the smart speaker 200 to prompt the user not to pass the interactive device 300 to the next user and continue to answer the next question. In addition, the cloud server 100 may instruct the image output interface 350 of the interactive device 300 through the smart speaker 200 to play an image to prompt the user not to pass the interactive device 300 to the next user and continue to answer the next question.

在步驟S1023，雲端伺服器100可根據預存於儲存媒體120的腳本而通過智慧音箱200來播放音訊資料以向使用者提出問題。接著，雲端伺服器100可自智慧音箱200接收對應於使用者的回答的音訊資料，並且判斷使用者的回答與預存於雲端伺服器100的儲存媒體120的參考資料是否匹配。若使用者的回答與參考資料匹配，則進入步驟S1024。若使用者的回答與參考資料不匹配，則進入步驟S1025。In step S1023, the cloud server 100 may play audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to ask the user a question. Then, the cloud server 100 may receive audio data corresponding to the user's answer from the smart speaker 200, and determine whether the user's answer matches the reference data pre-stored in the storage medium 120 of the cloud server 100. If the user's answer matches the reference data, then proceed to step S1024. If the user's answer does not match the reference data, then proceed to step S1025.

在步驟S1024，雲端伺服器100可響應於使用者的回答與參考資料匹配而在儲存媒體120中的所述使用者的使用者資料中記錄所述使用者的回答正確的次數。In step S1024, the cloud server 100 may record the number of correct answers of the user in the user data of the user in the storage medium 120 in response to the user's answer matching the reference data.

在步驟S1025，雲端伺服器100響應於使用者的回答與參考資料不匹配而在儲存媒體120中的所述使用者的使用者資料中記錄所述使用者的回答錯誤的次數。In step S1025, the cloud server 100 records the number of incorrect answers of the user in the user data of the user in the storage medium 120 in response to the user's answer not matching the reference data.

在步驟S1026，雲端伺服器100可通過智慧音箱200播放代表「正確」的音訊資料。另一方面，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350播放代表「正確」的影像。而後，雲端伺服器100可通過智慧音箱200播放音訊資料，以提示使用者將互動裝置300傳遞給下一位使用者。此外，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350播放影像，以提示使用者將互動裝置300傳遞給下一位使用者。In step S1026, the cloud server 100 may play the audio data representing "correct" through the smart speaker 200. On the other hand, the cloud server 100 may instruct the image output interface 350 of the interactive device 300 through the smart speaker 200 to play the image representing "correct". Then, the cloud server 100 may play the audio data through the smart speaker 200 to prompt the user to pass the interactive device 300 to the next user. In addition, the cloud server 100 may instruct the image output interface 350 of the interactive device 300 through the smart speaker 200 to play the image to prompt the user to pass the interactive device 300 to the next user.

在步驟S1027，雲端伺服器100可判斷使用者回答錯誤的次數是否大於閾值，並且響應於使用者回答錯誤的次數大於閾值而判斷所述使用者輸了問答遊戲。雲端伺服器100可通過智慧音箱200播放音訊資料（例如：爆炸的音效），以提示使用者其已輸了問答遊戲。此外，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350播放影像（例如：哭臉的影像），以提示使用者其已輸了問答遊戲。In step S1027, the cloud server 100 may determine whether the number of wrong answers given by the user is greater than a threshold, and in response to the number of wrong answers given by the user being greater than the threshold, determine that the user has lost the quiz game. The cloud server 100 may play audio data (e.g., an explosion sound effect) through the smart speaker 200 to prompt the user that he has lost the quiz game. In addition, the cloud server 100 may instruct the image output interface 350 of the interactive device 300 to play an image (e.g., an image of a crying face) through the smart speaker 200 to prompt the user that he has lost the quiz game.

在步驟S1028，雲端伺服器100可根據來自智慧音箱200的感測值（即：由感測器340所產生的傾斜值）來判斷使用者是否維持了互動裝置300的平衡。具體來說，雲端伺服器100可判斷傾斜值是否小於閾值。若傾斜值小於閾值，則進入步驟S1030。若傾斜值大於或等於閾值，則進入步驟S1029。In step S1028, the cloud server 100 can determine whether the user has maintained the balance of the interactive device 300 based on the sensed value from the smart speaker 200 (i.e., the tilt value generated by the sensor 340). Specifically, the cloud server 100 can determine whether the tilt value is less than the threshold value. If the tilt value is less than the threshold value, the process proceeds to step S1030. If the tilt value is greater than or equal to the threshold value, the process proceeds to step S1029.

在步驟S1029，雲端伺服器100可響應於傾斜值大於或等於閾值而判斷使用者輸了問答遊戲。雲端伺服器100可通過智慧音箱200播放音訊資料（例如：爆炸的音效），以提示使用者已輸了問答遊戲。此外，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350播放影像（例如：哭臉的影像），以提示使用者已輸了問答遊戲。In step S1029, the cloud server 100 may determine that the user has lost the quiz game in response to the tilt value being greater than or equal to the threshold value. The cloud server 100 may play audio data (e.g., explosion sound effects) through the smart speaker 200 to prompt the user that the quiz game has been lost. In addition, the cloud server 100 may instruct the image output interface 350 of the interactive device 300 to play an image (e.g., a crying face image) through the smart speaker 200 to prompt the user that the quiz game has been lost.

在步驟S1030，雲端伺服器100可根據預存於儲存媒體120的腳本而通過智慧音箱200來播放音訊資料以向使用者提出問題。In step S1030, the cloud server 100 may play audio data through the smart speaker 200 according to the script pre-stored in the storage medium 120 to ask questions to the user.

在步驟S1031，雲端伺服器100可響應於遊戲時間已經結束而決定結束問答遊戲。此外，雲端伺服器100還可根據儲存於儲存媒體120中的使用者資訊來判斷問答遊戲的輸家為哪一位使用者。雲端伺服器100可通過智慧音箱200播放音訊資料，以提示使用者問答遊戲已經結束或提示問答遊戲的輸家為哪一位使用者。此外，雲端伺服器100可通過智慧音箱200指示互動裝置300的影像輸出介面350播放影像，以提示使用者問答遊戲已經結束或提示問答遊戲的輸家為哪一位使用者。In step S1031, the cloud server 100 may decide to end the quiz game in response to the game time having ended. In addition, the cloud server 100 may also determine which user is the loser of the quiz game based on the user information stored in the storage medium 120. The cloud server 100 may play audio data through the smart speaker 200 to prompt the user that the quiz game has ended or to prompt which user is the loser of the quiz game. In addition, the cloud server 100 may instruct the image output interface 350 of the interactive device 300 to play an image through the smart speaker 200 to prompt the user that the quiz game has ended or to prompt which user is the loser of the quiz game.

在步驟S1032，互動系統10可結束問答遊戲。雲端伺服器100可在結束問答遊戲之前，通過智慧音箱200來播放音訊資料以通知使用者問答遊戲即將結束。此外，雲端伺服器100還可通過智慧音箱200以指示互動裝置300的影像輸出介面350播放影像，藉以通知使用者問答遊戲即將結束。In step S1032, the interactive system 10 may end the quiz game. Before ending the quiz game, the cloud server 100 may play audio data through the smart speaker 200 to notify the user that the quiz game is about to end. In addition, the cloud server 100 may also instruct the image output interface 350 of the interactive device 300 to play an image through the smart speaker 200 to notify the user that the quiz game is about to end.

圖11根據本發明的實施例繪示一種基於音訊的互動方法的流程圖，其中所述互動方法可如圖1所示的互動系統10實施。在步驟S1101中，由互動裝置300的至少一感測器340以及智慧音箱200的麥克風240的其中之一來產生第一感測資料，其中至少一感測器340包括按鍵、加速度計、顏色感測器以及壓力感測器的至少其中之一。在步驟S1102中，由互動裝置300通過智慧音箱200以通訊連接至雲端伺服器100。在步驟S1103中，由雲端伺服器100響應於接收第一感測資料而通過智慧音箱200以播放對應於第一感測資料的第一音訊資料以及通過互動裝置300的影像輸出介面350以播放對應於第一感測資料的第一影像。FIG11 is a flow chart of an audio-based interactive method according to an embodiment of the present invention, wherein the interactive method can be implemented as the interactive system 10 shown in FIG1 . In step S1101, at least one sensor 340 of the interactive device 300 and one of the microphones 240 of the smart speaker 200 generate first sensing data, wherein the at least one sensor 340 includes at least one of a button, an accelerometer, a color sensor, and a pressure sensor. In step S1102, the interactive device 300 is connected to the cloud server 100 through the smart speaker 200 for communication. In step S1103 , the cloud server 100 plays first audio data corresponding to the first sensing data through the smart speaker 200 and plays a first image corresponding to the first sensing data through the image output interface 350 of the interactive device 300 in response to receiving the first sensing data.

綜上所述，本發明的互動系統提供使用者一種具有多種類型之感測器的互動裝置。使用者可通過互動裝置或智慧音箱上的麥克風來與智慧音箱進行問答遊戲。互動裝置可包括按鍵，雲端伺服器可判斷使用者是否配合智慧音箱發出的語音指令或互動裝置所顯示的影像來按壓按鍵，並且根據判斷結果來控制智慧音箱以播放對應的音訊以及影像。多名使用者可通過在適當的時機搶先壓下按鍵來決定勝負，如圖12所示。互動裝置可包括加速度計，雲端伺服器可根據加速度計的感測資料判斷互動裝置是否處於傾斜或被揮動的狀態，從而根據互動裝置來控制智慧音箱以播放對應的音訊以及影像。例如，多名使用者可傳遞互動裝置並且回答由智慧音箱所播放的問題。若特定使用者回答錯誤或使互動裝置過於傾斜，則雲端伺服器可判斷該特定使用者輸了遊戲，如圖13所示。互動裝置可包括顏色感測器或壓力感測器，雲端伺服器可根據互動裝置所量測到的顏色值或壓力值來控制智慧音箱以播放對應的音訊。據此，使用者可通過互動裝置來與智慧音箱進行互動，從而得到更多元的娛樂體驗。In summary, the interactive system of the present invention provides users with an interactive device with multiple types of sensors. Users can play a question-and-answer game with the smart speaker through the interactive device or the microphone on the smart speaker. The interactive device may include buttons, and the cloud server may determine whether the user presses the button in accordance with the voice command issued by the smart speaker or the image displayed by the interactive device, and control the smart speaker to play the corresponding audio and image based on the judgment result. Multiple users can determine the winner by pressing the button first at the appropriate time, as shown in Figure 12. The interactive device may include an accelerometer, and the cloud server may determine whether the interactive device is in a tilted or swung state based on the sensing data of the accelerometer, thereby controlling the smart speaker to play corresponding audio and images based on the interactive device. For example, multiple users can pass the interactive device and answer questions played by the smart speaker. If a specific user answers incorrectly or tilts the interactive device too much, the cloud server may determine that the specific user has lost the game, as shown in Figure 13. The interactive device may include a color sensor or a pressure sensor, and the cloud server may control the smart speaker to play corresponding audio based on the color value or pressure value measured by the interactive device. Accordingly, users can interact with the smart speaker through the interactive device to obtain a more diverse entertainment experience.

10:互動系統 100:雲端伺服器 110、210、310:處理器 120、220、320:儲存媒體 130、230、330:收發器 200:智慧音箱 240:麥克風 300:互動裝置 340:感測器 350:影像輸出介面 361、362、363、364:影像 S701、S702、S703、S704、S705、S706、S707、S708、S709、S710、S711、S712、S713、S714、S715、S716、S717、S718、S719、S720、S721、S722、S723、S724、S725、S726、S727、S728、S729、S730、S731、S1001、S1002、S1003、S1004、S1005、S1006、S1007、S1008、S1009、S1010、S1011、S1012、S1013、S1014、S1015、S1016、S1017、S1018、S1019、S1020、S1021、S1022、S1023、S1024、S1025、S1026、S1027、S1028、S1029、S1030、S1031、S1032、S1101、S1102、S1103:步驟10: Interactive system 100: Cloud server 110, 210, 310: Processor 120, 220, 320: Storage media 130, 230, 330: Transceiver 200: Smart speaker 240: Microphone 300: Interactive device 340: Sensor 350: Image output interface 361, 362, 363, 364: Image S701, S702, S703, S704, S705, S706, S707, S708, S709, S710, S711, S712, S713, S714, S715, S716, S717, S718, S719, S720, S721, S722, S723, S72 4. S725, S726, S727, S728, S729, S730, S731, S1001, S1002, S1003, S1004, S1005, S1006, S1007, S1008, S1009, S1010, S1011, S1012, S1013, S1014, S10 15. S1016, S1017, S1018, S1019, S1020, S1021, S1022, S1023, S1024, S1025, S1026, S1027, S1028, S1029, S1030, S1031, S1032, S1101, S1102, S1103: Step

圖1根據本發明的實施例繪示一種基於音訊的互動系統的示意圖。圖2根據本發明的實施例繪示雲端伺服器的示意圖。圖3根據本發明的實施例繪示智慧音箱的示意圖。圖4根據本發明的實施例繪示互動裝置的示意圖。圖5根據本發明的實施例繪示互動裝置的一種態樣的示意圖。圖6根據本發明的實施例繪示通過影像輸出介面以輸出影像的示意圖。圖7根據本發明的實施例繪示一種執行問答遊戲的方法的流程圖。圖8根據本發明的實施例繪示互動裝置的另一種態樣的示意圖。圖9根據本發明的實施例繪示通過影像輸出介面以輸出對應於傾斜值的影像的示意圖。圖10根據本發明的實施例繪示另一種執行問答遊戲的方法的流程圖。圖11根據本發明的實施例繪示一種基於音訊的互動方法的流程圖。圖12根據本發明的實施例繪示利用互動裝置進行問答遊戲的示意圖。圖13根據本發明的另一實施例繪示利用互動裝置進行問答遊戲的示意圖。FIG. 1 is a schematic diagram of an audio-based interactive system according to an embodiment of the present invention. FIG. 2 is a schematic diagram of a cloud server according to an embodiment of the present invention. FIG. 3 is a schematic diagram of a smart speaker according to an embodiment of the present invention. FIG. 4 is a schematic diagram of an interactive device according to an embodiment of the present invention. FIG. 5 is a schematic diagram of an interactive device according to an embodiment of the present invention. FIG. 6 is a schematic diagram of outputting an image through an image output interface according to an embodiment of the present invention. FIG. 7 is a flow chart of a method for executing a question-and-answer game according to an embodiment of the present invention. FIG. 8 is a schematic diagram of another embodiment of an interactive device according to an embodiment of the present invention. FIG. 9 is a schematic diagram of outputting an image corresponding to a tilt value through an image output interface according to an embodiment of the present invention. FIG. 10 is a flow chart of another method for executing a quiz game according to an embodiment of the present invention. FIG. 11 is a flow chart of an audio-based interactive method according to an embodiment of the present invention. FIG. 12 is a schematic diagram of using an interactive device to perform a quiz game according to an embodiment of the present invention. FIG. 13 is a schematic diagram of using an interactive device to perform a quiz game according to another embodiment of the present invention.

S1101、S1102、S1103:步驟 S1101, S1102, S1103: Steps

Claims

An audio-based interactive system includes: a cloud server storing first audio data and a first image; a smart speaker including a microphone and communicatively connected to the cloud server; and an interactive device including at least one sensor and an image output interface and communicatively connected to the smart speaker, wherein the at least one sensor includes an accelerometer, wherein one of the at least one sensor and the microphone generates first sensing data, wherein the first sensing data includes a tilt value; the interactive device is communicatively connected to the cloud server through the smart speaker; and the cloud server plays the first audio data corresponding to the first sensing data through the smart speaker and plays the first image corresponding to the first sensing data through the image output interface in response to receiving the first sensing data and the tilt value being less than a first threshold.

An interactive system as described in claim 1, wherein the cloud server further stores reference data, and the cloud server determines whether to play the first audio data and the first image based on the matching result between the first sensing data and the reference data.

An interactive system as described in claim 1, wherein the cloud server plays the second audio data through the smart speaker, and decides to play the first audio data and the first image in response to receiving the first sensing data within a preset time after playing the second audio data.

An interactive system as described in claim 1, wherein the at least one sensor of the interactive device further includes a button, and the first sensing data further includes a pulse signal, wherein the surface of the button is provided with the image output interface, wherein the cloud server plays the image through the image output interface, and in response to receiving the pulse signal during the playback of the image, plays the first audio data through the smart speaker and plays the first image through the image output interface.

An interactive system as described in claim 1, wherein the at least one sensor further includes a second microphone.

An interactive system as described in claim 1, wherein the first image corresponds to the tilt value.

An interactive system as described in claim 1, wherein the first sensing data further includes audio data, wherein the cloud server plays the first audio data through the smart speaker and plays the first image through the image output interface in response to the audio data matching the reference data stored in the cloud server and the tilt value being less than the first threshold.

An interactive system as described in claim 7, wherein the cloud server plays the image used for countdown through the image output interface, and in response to receiving the audio data before the countdown ends, plays the first audio data through the smart speaker and the first image through the image output interface.

An interactive system as described in claim 1, wherein the first sensing data further includes audio data, wherein the cloud server further stores a speech recognition model and user information, and the speech recognition model is used to recognize the user corresponding to the audio data to generate a recognition result, and the user information is updated according to the recognition result.

An interactive system as described in claim 1, wherein the first sensing data further includes an acceleration value, wherein the cloud server plays the first audio data through the smart speaker and displays the first image through the image output interface in response to the acceleration value being greater than a second threshold.

An interactive system as described in claim 1, wherein the at least one sensor further includes a color sensor, and the first sensing data further includes a color value.

An interactive system as described in claim 1, wherein the at least one sensor further includes a pressure sensor, and the first sensing data further includes a pressure value.

An audio-based interaction method is applicable to an interactive system including a cloud server, a smart speaker, and an interactive device, wherein the interaction method includes: generating first sensing data by at least one sensor of the interactive device and one of the microphones of the smart speaker, wherein the at least one sensor includes an accelerometer, wherein the first sensing data includes a tilt value; the interactive device is connected to the cloud server through the smart speaker for communication; and the cloud server responds to receiving the first sensing data and the tilt value is less than a first threshold by playing first audio data corresponding to the first sensing data through the smart speaker and playing a first image corresponding to the first sensing data through the image output interface of the interactive device.

As described in claim 13, the cloud server further stores reference data, and the cloud server determines whether to play the first audio data and the first image based on the matching result between the first sensing data and the reference data.

The interactive method as described in claim 13, wherein the cloud server plays the second audio data through the smart speaker, and decides to play the first audio data and the first image in response to receiving the first sensing data within a preset time after playing the second audio data.

As described in claim 13, the at least one sensor of the interactive device further includes a button, and the first sensing data further includes a pulse signal, wherein the surface of the button is provided with the image output interface, wherein the cloud server plays the image through the image output interface, and in response to receiving the pulse signal during the playback of the image, plays the first audio data through the smart speaker and plays the first image through the image output interface.

An interactive method as described in claim 13, wherein the at least one sensor further includes a second microphone.

An interactive method as described in claim 13, wherein the first image corresponds to the tilt value.

The interactive method as described in claim 13, wherein the first sensing data further includes audio data, wherein the cloud server plays the first audio data through the smart speaker and plays the first image through the image output interface in response to the audio data matching the reference data stored in the cloud server and the tilt value being less than the first threshold.

The interactive method as described in claim 19, wherein the cloud server plays the image used for countdown through the image output interface, and in response to receiving the audio data before the countdown ends, plays the first audio data through the smart speaker and the first image through the image output interface.

As described in claim 13, the first sensing data further includes audio data, wherein the cloud server further stores a speech recognition model and user information, identifies the user corresponding to the audio data by the speech recognition model to generate a recognition result, and updates the user information according to the recognition result.

As described in claim 13, the first sensing data further includes an acceleration value, wherein the cloud server plays the first audio data through the smart speaker and the first image through the image output interface in response to the acceleration value being greater than a second threshold.

The interactive method as described in claim 13, wherein the at least one sensor further includes a color sensor, and the first sensing data further includes a color value.

An interactive method as described in claim 14, wherein the at least one sensor further includes a pressure sensor, and the first sensing data further includes a pressure value.

An interactive device is suitable for interacting with a cloud server and a smart speaker, wherein the interactive device includes: a transceiver, communicatively connected to the smart speaker, and communicatively connected to the cloud server through the smart speaker; at least one sensor, generating first sensing data, wherein the at least one sensor includes an accelerometer, and the first sensing data includes a tilt value; an image output interface; and a processor, coupled to the transceiver, the at least one sensor, and the image output interface, wherein the processor transmits the first sensing data through the transceiver, receives a first image corresponding to the first sensing data through the transceiver in response to the tilt value being greater than a first threshold, and plays the first image through the image output interface, wherein the first sensing data is used to instruct the cloud server to configure the smart speaker to play first audio data corresponding to the first sensing data.